Oracle to azure synapse migration: This article discusses how a migration from Oracle to Azure Synapse data warehouse can be validated from an End-to-End perspective.
Why migrate from Oracle to Azure Synapse Data Warehouse?
Migration from Oracle to Azure Synapse DWH can be justified mainly by below:
- Reduced Licensing Costs (especially post exponential upscaling, by virtualizing servers, database consolidation, and compression of data)
- Superior Performance
- Easier Implementation of In-Database Analytics
If you looking to automate your ETL testing, try Datagaps ETL Validator ETL testing tool.
Primary Challenges in Migration and Points of Vulnerability
- Oracle Specific Features
- If any features such as Package Variables, Collections (Nested Tables, Objects VARRAYS), System Packages, Wrapped Source Code, Row-level Triggers, and User-defined Types are used, they might not be properly translated in the Azure Synapse.
- Data Type Differences
- As both the systems interpret SQL ANSI standards slightly differently, there are variations in the data type conversions, especially in terms of numeric and date data types & formats
- Migration Indexes
- Performance Comparisons
| Azure Synapse | Oracle | Comment |
| BIGINT | NUMBER (19) | |
| BINARY | RAW | – |
| BIT | NUMBER (3) | – |
| CHAR | CHAR | – |
| DATETIME | DATE | Fractional parts of a second are truncated |
| DECIMAL | NUMBER (p[, s]) | – |
| FLOAT | FLOAT (49) | – |
| IMAGE | LONG RAW | – |
| INTEGER | NUMBER (10) | NUMBER range is -2,147,483,647 to 2,147,483,647 |
| MONEY | NUMBER (19,4) | – |
| NCHAR | NCHAR | – |
| NTEXT | LONG | – |
| NVARCHAR | NCHAR | – |
| NUMERIC | NUMBER(p[,s]) | – |
| REAL | FLOAT (23) | – |
| SMALL DATETIME | DATE | The value for seconds is returned as 0 |
| SMALL MONEY | NUMBER (10,4) | – |
| SMALLINT | NUMBER (5) | NUMBER range is -32,767 to 32,767 |
| TEXT | LONG | – |
| TIMESTAMP | RAW | – |
| TINYINT | NUMBER (3) | – |
| UNIQUEIDENTIFIER | CHAR (36) | |
| VARBINARY | RAW | – |
| VARCHAR | VARCHAR2 | – |
Pitfalls and Validation Techniques

Cost Of Migration


Migration Validation Scope
Having a clear understanding and accurate analysis of the scope of the migration test as to when and what needs to be tested.
Understanding the source system and metadata.
The Subset of data to be validated. During the validation, one must have a proper understanding of the volume of data that is moved from the source database to the destination database.
This will help to validate and analyze the performance issue.
- What type of data is it?
- What format is the data in? Are there multiple formats that you need to plan for?
- What attributes does this data have?
Data Transformation check
A Data Transformation check is a process used to validate if the data transformation is working in accordance with the mapping logic document and business rules to check if new tables or columns are created with the proper data types and lengths (as specified in the mapping document).
Data Integrity and Uniqueness check
A. Data Integrity check: In the data integrity validation, the key attributes are validated to check if the referential integrity between parent and child tables is mapped correctly and if any uncertain data is removed or loaded.
B. Data Uniqueness check: A data Uniqueness check is nothing but a data duplication check, it is evaluated to verify the uniqueness of data of a table/entity as per the data model rule (as specified in the mapping document).
Example: verify the customer table if it has authentic data and does not contain any duplicate records based on ‘Customer_Key’ as the key column.
Data Completeness check
Traditionally, in the data warehouse, data completeness is evaluated through ETL testing that uses aggregate functions like (sum, max, min, count) to evaluate the average completeness of a column or record. Throughout the evaluation process, it also needs to be verified that the data in the target system is as per expectation after loading which can help in achieving data completeness.
- Record count verification.
- Comparing source and target data sets.
- Aggregated records validation.
Data Quality Check
Data Quality check is critical, in order to make sure that the data is correctly loaded into the destination tables/fields, it must be ensured that the application correctly rejects, substitutes default values, corrects issues, ignores unwanted characters, and reports invalid records if any.Validation of Initial Data Migration
Initial Data Migration Validation, first and foremost before evaluating and checking the initial data migration, it is important to understand the meaning of Initial data migration. Thus, it basically truncates and loads the entire data into the data warehouse whenever an ETL process/ETL job is said to be run hence this process is known as an initial load or full load. Here, in this process, historical data is not maintained, and only current data is maintained in the data warehouse. The old data is erased and loaded with new data on the next ETL job run. This type of load is performed in type1 SCD (Slowly Changing Dimension).

Validation of ETL Changes
Validation approach for SCD types
What are SCDs – Slowly Changing Dimensions?
- A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse.
- This applies to cases where the attribute for a record varies over time.
SCD 1, is used to keep the latest data, the old data is overwritten. In other words, no historical data is managed.
Advantage – This is the easiest way to handle the Slowly Changing Dimension problem as there is no requirement to keep track of the old/historical records (data).
Disadvantages – By applying this methodology, it is not possible to keep track back in the history of changes made periodically for a record, for example in the below example the company/organization will not be able to know if James was in New York earlier.
Example:
Source Table: Customer
| Customer_Key | Name | City |
| 10001 | James | New York |
| 10002 | Luis | Arizona |
| 10003 | Alexander | Texas |
After Migration:
Let’s assume customer James has moved from New York City to California , in this case, SCD Type 1 will overwrite the existing record with the newly updated record as shown in the below dimension table.
Target Table: Customer_Dim
| Customer_Key | Name | City |
| 10001 | James | California |
| 10002 | Luis | Arizona |
| 10003 | Alexander | Texas |
Validation to be performed:
Count Check – Validate that the count of updated records in the target table matches with the updated count’s records in the source table./p>
Duplicate check – Validate that the uniqueness of key columns in the target(dimension) is maintained./p>
Transformation Check – Validate that the transformation logic applied to any individual columns is transformed correctly and the records are getting loaded as per the mapping document./p>
Referential Integrity Check – Validate that the parent table (Primary Key column) has the same number of counts as the child table (Foreign Key column)./p>
When to use SCD Type 1:
A Slowly Changing Dimension Type 1 should be used only when it is not necessary for the data warehouse to keep track of historical changes
SCD Type 2, is used to keep the history of old data by adding a new record in the target table if any record was updated in the source table. Therefore, both the original and the new record will be present, and the new record gets its own primary key.
Advantage – SCD type 2 allows us to accurately keep all historical information.
Disadvantages – This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
Example:
Source Table: Customer
| Customer_Key | Name | City |
| 10001 | James | New York |
| 10002 | Luis | Arizona |
| 10003 | Alexander | Texas |
After Migration:
Let’s assume customer James has moved from New York City to California and from California moved to Bloomberg in this case, SCD Type 2 will add a new record with the existing record as shown in the below dimension table.
Target Table: Customer_Dim
| Customer_Key | Name | City |
| 10001 | James | New York |
| 10002 | Luis | Arizona |
| 10003 | Alexander | Texas |
| 10004 | James | California |
| 10005 | James | Bloomberg |
Validation to be done:
Delta Check – Verify whether the historical data is preserved, and whether new records are being created or not.
Duplicate check – Verify the uniqueness of records (Minus Query).
Referential Integrity Check – Validate that the parent table (Primary Key column) has the same number of counts as the child table (Foreign Key column)
When to use SCD Type 2:
SCD Type 2 should be used when it is necessary for the data warehouse to track the historical changes.
In Type-3 Slowly Changing Dimension, a few technical columns such as Original_City, Current_City, and Effective_Date are added as new attributes to store the changed values and their cities.
Attribute Original_City, indicates the original city (before update/changed city) of the customer, while attribute Current_City indicates the present city (latest update/changed city) of the customer. There will also be a column that indicates when the current value becomes active.
Advantage:
- SCD Type 3, does not increase the size of the table since new information is updated in an individual attribute.
- This allows us to keep some part of history.
Note: SCD Type 3 is minimally used in an organization.
Disadvantages – Type 3 will not be able to keep all historical changes where an attribute is changed more than once. For example, if James later moves to Texas on December 31, 2010, the California information will get lost.
Example:
Source Table: Customer
| Customer_Key | Name | City |
| 10001 | James | New York |
| 10002 | Luis | Arizona |
| 10003 | Alexander | Texas |
If any updates or changes are made in the source table, then the SCD Type 3 table will capture the records as shown below.
Let’s assume, James moved from New York to California , the original information gets updated, and the records will get populated as the below table (assuming the effective date of change is January 01, 2010).
Target Table: Customer_Dim
| Customer_Key | Name | Original_City | Current_City | Effective_Date |
| 10001 | James | New York | California | 01-01-2010 |
| 10002 | Luis | Arizona | ||
| 10003 | Alexander | Texas |
Validation to be done:
fdkghdfjghjdfhgjhdfjhgjdfs
When to use SCD Type 3:
Type 3 Slowly Changing Dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur a finite(limited) number of times.
Conclusion:
Data warehouse migration validation is as important as getting rid of old technologies and migrating to cloud technology. Here, during the post-migration testing phase, first, the count of records in source vs target is compared, then it must be confirmed during the migration the records were transformed as per the business requirements (referring to mapping documents) which can harm the business potentially.







