Oracle to Azure Synapse Data Warehouse Migration Validation

Oracle to azure synapse migration: This article discusses how a migration from Oracle to Azure Synapse data warehouse can be validated from an End-to-End perspective.

Why migrate from Oracle to Azure Synapse Data Warehouse?

Any organizations continue to use Oracle database for storing transactional and warehouse data in their on-premise data centers. However, with data growing exponentially, their data environments are struggling to handle the growing load which requires more processing power and increased storage, adding to data costs. So, migrating Oracle data to the cloud is an extremely viable option for optimizing costs and performance, and migrating the Oracle database to Azure SQL Server (or Azure Synapse) is a popular alternative.

Migration from Oracle to Azure Synapse DWH can be justified mainly by below:

Reduced Licensing Costs (especially post exponential upscaling, by virtualizing servers, database consolidation, and compression of data)
Superior Performance
Easier Implementation of In-Database Analytics

If you looking to automate your ETL testing, try Datagaps ETL Validator ETL testing tool.

Request Free Trial

Primary Challenges in Migration and Points of Vulnerability

While Oracle and Azure Synapse both are RDMS with SQL as the main language, there are a few sets of differences that can cause issues during or post-migration.

Oracle Specific Features
- If any features such as Package Variables, Collections (Nested Tables, Objects VARRAYS), System Packages, Wrapped Source Code, Row-level Triggers, and User-defined Types are used, they might not be properly translated in the Azure Synapse.
Data Type Differences
- As both the systems interpret SQL ANSI standards slightly differently, there are variations in the data type conversions, especially in terms of numeric and date data types & formats
Migration Indexes
Performance Comparisons

Datatype Mapping

Azure Synapse	Oracle	Comment
BIGINT	NUMBER (19)
BINARY	RAW	–
BIT	NUMBER (3)	–
CHAR	CHAR	–
DATETIME	DATE	Fractional parts of a second are truncated
DECIMAL	NUMBER (p[, s])	–
FLOAT	FLOAT (49)	–
IMAGE	LONG RAW	–
INTEGER	NUMBER (10)	NUMBER range is -2,147,483,647 to 2,147,483,647
MONEY	NUMBER (19,4)	–
NCHAR	NCHAR	–
NTEXT	LONG	–
NVARCHAR	NCHAR	–
NUMERIC	NUMBER(p[,s])	–
REAL	FLOAT (23)	–
SMALL DATETIME	DATE	The value for seconds is returned as 0
SMALL MONEY	NUMBER (10,4)	–
SMALLINT	NUMBER (5)	NUMBER range is -32,767 to 32,767
TEXT	LONG	–
TIMESTAMP	RAW	–
TINYINT	NUMBER (3)	–
UNIQUEIDENTIFIER	CHAR (36)
VARBINARY	RAW	–
VARCHAR	VARCHAR2	–

Pitfalls and Validation Techniques

Pitfalls/Drawback – Since retaining the accuracy of data is the utmost priority in loading data from a source to a target system, data needs to be verified and consistent. Therefore, there could be the following downfall during the overall process of ETL which may cause businesses to lose in different aspects if it is not cared for, such as.

Cost Of Migration

Time To Market

Poor Data Quality

Read: Pitfalls of Cloud Data Migration

Validation Techniques – Migrating data is not simply a process of migrating the data from its source system to the destination system, but it is also a complex task of retaining the quality of the data which requires a dedicated validation team with quality assurance skills, trusted validation tools, and the expertise in this domain.

From the validation point of view, it means that the data must be validated thoroughly end-to-end along with migrating from the existing system to the new destination system successfully. Therefore, Validation of already migrated data could be achieved by following some standard steps.

Step 1:

Step 2:

Step 3:

Step 4:

Step 5:

Step 6:

Step 1:

Migration Validation Scope

Having a clear understanding and accurate analysis of the scope of the migration test as to when and what needs to be tested.

Step 2:

Step 3:

Step 4:

Step 5:

Step 6:

Thus, by following the above 6 steps, data migration validation could be achieved with 100% error-free data without having any impediments to the business in decision-making.

Validation of Initial Data Migration

Initial Data Migration Validation, first and foremost before evaluating and checking the initial data migration, it is important to understand the meaning of Initial data migration. Thus, it basically truncates and loads the entire data into the data warehouse whenever an ETL process/ETL job is said to be run hence this process is known as an initial load or full load. Here, in this process, historical data is not maintained, and only current data is maintained in the data warehouse. The old data is erased and loaded with new data on the next ETL job run. This type of load is performed in type1 SCD (Slowly Changing Dimension).

Therefore, to validate the initial migration, follow the Validation Techniques as mentioned above.

Validation of ETL Changes

Since the data warehouse doesn’t contain any data initially, after full load/initial load when the ETL process happens for the second, third, or fourth time, data is loaded in an incremental fashion where changes are appended to the data warehouse to keep the historical/modified records.

The Insertion of a record happens only if a record doesn’t already exist in the data warehouse else the update is done only for modified records.

Validate data sources – Perform a data count check and verify that the table and column data type meet the specifications of the data model. Make sure check keys are in place and remove duplicate data. If not done correctly, the aggregate report could be inaccurate or misleading.

Validate constraints – This involves validating the constraints and making sure they are applied on the expected tables. This involves checking the misuse of integrity constraints like Foreign Key.

Validate nulls – This involves validating the full data set in the source and the target tables by using the minus query. We need to perform both source minus target and target minus source. If the minus query returns a value, then that should be considered as mismatched rows.

Validation approach for SCD types

What are SCDs – Slowly Changing Dimensions?

A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse.
This applies to cases where the attribute for a record varies over time.

There are in general ways to solve this type of problem, and they are categorized as follows.

SCD Type 1:

SCD 1, is used to keep the latest data, the old data is overwritten. In other words, no historical data is managed.

Advantage – This is the easiest way to handle the Slowly Changing Dimension problem as there is no requirement to keep track of the old/historical records (data).

Disadvantages – By applying this methodology, it is not possible to keep track back in the history of changes made periodically for a record, for example in the below example the company/organization will not be able to know if James was in New York earlier.

Example:

Source Table: Customer

Customer_Key	Name	City
10001	James	New York
10002	Luis	Arizona
10003	Alexander	Texas

After Migration:

Let’s assume customer James has moved from
New York City to
California , in this case,
SCD Type 1 will overwrite the existing record with the newly updated record as shown in the below dimension table.

Target Table: Customer_Dim

Customer_Key	Name	City
10001	James	California
10002	Luis	Arizona
10003	Alexander	Texas

Validation to be performed:

Count Check – Validate that the count of updated records in the target table matches with the updated count’s records in the source table./p>

Duplicate check – Validate that the uniqueness of key columns in the target(dimension) is maintained./p>

Transformation Check – Validate that the transformation logic applied to any individual columns is transformed correctly and the records are getting loaded as per the mapping document./p>

Referential Integrity Check – Validate that the parent table (Primary Key column) has the same number of counts as the child table (Foreign Key column)./p>

When to use SCD Type 1:

A Slowly Changing Dimension Type 1 should be used only when it is not necessary for the data warehouse to keep track of historical changes

SCD Type 2:

SCD Type 2, is used to keep the history of old data by adding a new record in the target table if any record was updated in the source table. Therefore, both the original and the new record will be present, and the new record gets its own primary key.

Advantage – SCD type 2 allows us to accurately keep all historical information.

Disadvantages – This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.

Example:

Source Table: Customer


Customer_Key	Name	City
10001	James	New York
10002	Luis	Arizona
10003	Alexander	Texas

After Migration:

Let’s assume customer James has moved from New York City to
California and from California moved to Bloomberg in this case,
SCD Type 2 will add a new record with the existing record as shown in the below dimension table.

Target Table: Customer_Dim

Customer_Key	Name	City
10001	James	New York
10002	Luis	Arizona
10003	Alexander	Texas
10004	James	California
10005	James	Bloomberg

Validation to be done:

Delta Check – Verify whether the historical data is preserved, and whether new records are being created or not.

Duplicate check – Verify the uniqueness of records (Minus Query).

Referential Integrity Check – Validate that the parent table (Primary Key column) has the same number of counts as the child table (Foreign Key column)

When to use SCD Type 2:

SCD Type 2 should be used when it is necessary for the data warehouse to track the historical changes.

SCD Type 3:

In Type-3 Slowly Changing Dimension, a few technical columns such as Original_City, Current_City, and Effective_Date are added as new attributes to store the changed values and their cities.

Attribute Original_City, indicates the original city (before update/changed city) of the customer, while attribute Current_City indicates the present city (latest update/changed city) of the customer. There will also be a column that indicates when the current value becomes active.

Advantage:

SCD Type 3, does not increase the size of the table since new information is updated in an individual attribute.
This allows us to keep some part of history.

Note: SCD Type 3 is minimally used in an organization.

Disadvantages – Type 3 will not be able to keep all historical changes where an attribute is changed more than once. For example, if James later moves to Texas on December 31, 2010, the California information will get lost.

Example:

Source Table: Customer


Customer_Key	Name	City
10001	James	New York
10002	Luis	Arizona
10003	Alexander	Texas

If any updates or changes are made in the source table, then the SCD Type 3 table will capture the records as shown below.

Let’s assume, James moved from New York to California , the original information gets updated, and the records will get populated as the below table (assuming the effective date of change is January 01, 2010).

Target Table: Customer_Dim

Customer_Key	Name	Original_City	Current_City	Effective_Date
10001	James	New York	California	01-01-2010
10002	Luis	Arizona
10003	Alexander	Texas

When to use SCD Type 3:

Type 3 Slowly Changing Dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur a finite(limited) number of times.

Conclusion:

Data warehouse migration validation is as important as getting rid of old technologies and migrating to cloud technology. Here, during the post-migration testing phase, first, the count of records in source vs target is compared, then it must be confirmed during the migration the records were transformed as per the business requirements (referring to mapping documents) which can harm the business potentially.

Then, data reconciliation, Duplicate check, NULL check, and referential integrity check must be performed to ensure the quality of data.

Also, read – Oracle to Azure SQL Server Migration Validation

Established in the year 2010 with the mission of building trust in enterprise data & reports. Datagaps provides software for ETL Data Automation, Data Synchronization, Data Quality, Data Transformation, Test Data Generation, & BI Test Automation. An innovative company focused on providing the highest customer satisfaction. We are passionate about data-driven test automation. Our flagship solutions, ETL Validator, DataFlow, and BI Validator are designed to help customers automate the testing of ETL, BI, Database, Data Lake, Flat File, & XML Data Sources. Our tools support Snowflake, Tableau, Amazon Redshift, Oracle Analytics, Salesforce, Microsoft Power BI, Azure Synapse, SAP BusinessObjects, IBM Cognos, etc., data warehousing projects, and BI platforms. Datagaps

Use Case

Cloud

Analytics

Industry

Academy

Support

Use Case

Cloud

Analytics

Industry

Academy

Support

Oracle to Azure Synapse Data Warehouse Migration Validation

Why migrate from Oracle to Azure Synapse Data Warehouse?

Primary Challenges in Migration and Points of Vulnerability

Pitfalls and Validation Techniques

Validation of Initial Data Migration

Validation of ETL Changes

Validation approach for SCD types

Validation to be performed:

When to use SCD Type 1:

Validation to be done:

When to use SCD Type 2:

When to use SCD Type 3:

Conclusion:

Related Posts:

Recent Blogs

Solutions

Data Testing Concepts

Products