Why migrate from Oracle to Azure Synapse Data Warehouse?
any organizations continue to use Oracle database for storing transactional and warehouse data in their on-premise data centers. However, with data growing exponentially, their data environments are struggling to handle the growing load which requires more processing power and increased storage, adding to data costs. So, migrating Oracle data to the cloud is an extremely viable option for optimizing costs and performance, and migrating the Oracle database to Azure SQL Server (or Azure Synapse) is a popular alternative.
Migration from Oracle to Azure Synapse DWH can be justified mainly by below:
- Reduced Licensing Costs (especially post exponential upscaling, by virtualizing servers, database consolidation, and compression of data)
- Superior Performance
- Easier Implementation of In-Database Analytics
If you looking to automate your ETL testing, try Datagaps ETL Validator ETL testing tool.
Primary Challenges in Migration and Points of Vulnerability
- Oracle Specific Features
- If any features such as Package Variables, Collections (Nested Tables, Objects VARRAYS), System Packages, Wrapped Source Code, Row-level Triggers, and User-defined Types are used, they might not be properly translated in the Azure Synapse.
- Data Type Differences
- As both the systems interpret SQL ANSI standards slightly differently, there are variations in the data type conversions, especially in terms of numeric and date data types & formats
- Migration Indexes
- Performance Comparisons
- Oracle Specific Features

Datatype Mapping
Azure Synapse | Oracle | Comment |
BIGINT | NUMBER (19) | |
BINARY | RAW | – |
BIT | NUMBER (3) | – |
CHAR | CHAR | – |
DATETIME | DATE | Fractional parts of a second are truncated |
DECIMAL | NUMBER (p[, s]) | – |
FLOAT | FLOAT (49) | – |
IMAGE | LONG RAW | – |
INTEGER | NUMBER (10) | NUMBER range is -2,147,483,647 to 2,147,483,647 |
MONEY | NUMBER (19,4) | – |
NCHAR | NCHAR | – |
NTEXT | LONG | – |
NVARCHAR | NCHAR | – |
NUMERIC | NUMBER(p[,s]) | – |
REAL | FLOAT (23) | – |
SMALL DATETIME | DATE | The value for seconds is returned as 0 |
SMALL MONEY | NUMBER (10,4) | – |
SMALLINT | NUMBER (5) | NUMBER range is -32,767 to 32,767 |
TEXT | LONG | – |
TIMESTAMP | RAW | – |
TINYINT | NUMBER (3) | – |
UNIQUEIDENTIFIER | CHAR (36) | |
VARBINARY | RAW | – |
VARCHAR | VARCHAR2 | – |
Pitfalls and Validation Techniques
Pitfalls/Drawback – Since retaining the accuracy of data is the utmost priority in loading data from a source to a target system, data needs to be verified and consistent. Therefore, there could be the following downfall during the overall process of ETL which may cause businesses to lose in different aspects if it is not cared for, such as.

Cost Of Migration

Time To Market

Poor Data Quality
Read: Pitfalls of Cloud Data Migration
Validation Techniques – Migrating data is not simply a process of migrating the data from its source system to the destination system, but it is also a complex task of retaining the quality of the data which requires a dedicated validation team with quality assurance skills, trusted validation tools, and the expertise in this domain.
From the validation point of view, it means that the data must be validated thoroughly end-to-end along with migrating from the existing system to the new destination system successfully. Therefore, Validation of already migrated data could be achieved by following some standard steps.
Having a clear understanding and accurate analysis of the scope of the migration test as to when and what needs to be tested.
The Subset of data to be validated. During the validation, one must have a proper understanding of the volume of data that is moved from the source database to the destination database.
This will help to validate and analyze the performance issue.
- What type of data is it?
- What format is the data in? Are there multiple formats that you need to plan for?
- What attributes does this data have?
Data Transformation check
A Data Transformation check is a process used to validate if the data transformation is working in accordance with the mapping logic document and business rules to check if new tables or columns are created with the proper data types and lengths (as specified in the mapping document).
A. Data Integrity check: In the data integrity validation, the key attributes are validated to check if the referential integrity between parent and child tables is mapped correctly and if any uncertain data is removed or loaded.
B. Data Uniqueness check: A data Uniqueness check is nothing but a data duplication check, it is evaluated to verify the uniqueness of data of a table/entity as per the data model rule (as specified in the mapping document).
Example: verify the customer table if it has authentic data and does not contain any duplicate records based on ‘Customer_Key’ as the key column.
Traditionally, in the data warehouse, data completeness is evaluated through ETL testing that uses aggregate functions like (sum, max, min, count) to evaluate the average completeness of a column or record. Throughout the evaluation process, it also needs to be verified that the data in the target system is as per expectation after loading which can help in achieving data completeness.
- Record count verification.
- Comparing source and target data sets.
- Aggregated records validation.
Data Quality check is critical, in order to make sure that the data is correctly loaded into the destination tables/fields, it must be ensured that the application correctly rejects, substitutes default values, corrects issues, ignores unwanted characters, and reports invalid records if any.
Thus, by following the above 6 steps, data migration validation could be achieved with 100% error-free data without having any impediments to the business in decision-making.
Validation of Initial Data Migration
Initial Data Migration Validation, first and foremost before evaluating and checking the initial data migration, it is important to understand the meaning of Initial data migration. Thus, it basically truncates and loads the entire data into the data warehouse whenever an ETL process/ETL job is said to be run hence this process is known as an initial load or full load. Here, in this process, historical data is not maintained, and only current data is maintained in the data warehouse. The old data is erased and loaded with new data on the next ETL job run. This type of load is performed in type1 SCD (Slowly Changing Dimension).

Therefore, to validate the initial migration, follow the Validation Techniques as mentioned above.
Validation of ETL Changes
Since the data warehouse doesn’t contain any data initially, after full load/initial load when the ETL process happens for the second, third, or fourth time, data is loaded in an incremental fashion where changes are appended to the data warehouse to keep the historical/modified records.
The Insertion of a record happens only if a record doesn’t already exist in the data warehouse else the update is done only for modified records.

Validate data sources — Perform a data count check and verify that the table and column data type meet the specifications of the data model. Make sure check keys are in place and remove duplicate data. If not done correctly, the aggregate report could be inaccurate or misleading.
Validate constraints – This involves validating the constraints and making sure they are applied on the expected tables. This involves checking the misuse of integrity constraints like Foreign Key.
Validate nulls – This involves validating the full data set in the source and the target tables by using the minus query. We need to perform both source minus target and target minus source. If the minus query returns a value, then that should be considered as mismatched rows.
Validation approach for SCD types
- A Slowly Changing Dimension (SCD) is a dimension that stores and manages both current and historical data over time in a data warehouse.
- This applies to cases where the attribute for a record varies over time.
There are in general ways to solve this type of problem, and they are categorized as follows.
SCD Type 1:
Advantage – This is the easiest way to handle the Slowly Changing Dimension problem as there is no requirement to keep track of the old/historical records (data).
Disadvantages – By applying this methodology, it is not possible to keep track back in the history of changes made periodically for a record, for example in the below example the company/organization will not be able to know if James was in New York earlier.
Example:
Source Table: Customer
Customer_Key 🔑 | Name | City |
10001 | James | New York |
10002 | Luis | Arizona |
10003 | Alexander | Texas |
After Migration:
Let’s assume customer James has moved from New York City to California, in this case, SCD Type 1 will overwrite the existing record with the newly updated record as shown in the below dimension table.
Target Table: Customer_Dim
Customer_Key 🔑 | Name | City |
10001 | James | California |
10002 | Luis | Arizona |
10003 | Alexander | Texas |
Validation to be performed:
Count Check – Validate that the count of updated records in the target table matches with the updated count’s records in the source table.
Duplicate check – Validate that the uniqueness of key columns in the target(dimension) is maintained.
Transformation Check – Validate that the transformation logic applied to any individual columns is transformed correctly and the records are getting loaded as per the mapping document.
Referential Integrity Check – Validate that the parent table (Primary Key column) has the same number of counts as the child table (Foreign Key column).
When to use SCD Type 1:
A Slowly Changing Dimension Type 1 should be used only when it is not necessary for the data warehouse to keep track of historical changes.
SCD Type 2:
SCD Type 2, is used to keep the history of old data by adding a new record in the target table if any record was updated in the source table. Therefore, both the original and the new record will be present, and the new record gets its own primary key.
Advantages – SCD type 2 allows us to accurately keep all historical information.
Disadvantage – This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.
Example:
Source Table: Customer
Customer_Key 🔑 | Name | City |
10001 | James | New York |
10002 | Luis | Arizona |
10003 | Alexander | Texas |
After Migration:
Let’s assume customer James has moved from New York City to California and from California moved to Bloomberg in this case, SCD Type 2 will add a new record with the existing record as shown in the below dimension table.
Target Table: Customer_Dim
Customer_Key 🔑 | Name | City |
10001 | James | New York |
10002 | Luis | Arizona |
10003 | Alexander | Texas |
10004 | James | California |
10005 | James | Bloomberg |
Validation to be done:
Delta Check – Verify whether the historical data is preserved, and whether new records are being created or not.
Duplicate Check – Verify the uniqueness of records (Minus Query).
Referential Integrity Check – Validate that the parent table (Primary Key column) has the same number of counts as the child table (Foreign Key column)
When to use SCD Type 2:
SCD Type 2 should be used when it is necessary for the data warehouse to track the historical changes.
SCD Type 3:
In Type-3 Slowly Changing Dimension, a few technical columns such as Original_City, Current_City, and Effective_Date are added as new attributes to store the changed values and their cities.
Attribute Original_City indicates the original city (before update/changed city) of the customer, while attribute Current_City indicates the present city (latest update/changed city) of the customer. There will also be a column that indicates when the current value becomes active.
Advantages:
- SCD Type 3, does not increase the size of the table since new information is updated in an individual attribute.
- This allows us to keep some part of history.
Note: SCD Type 3 is minimally used in an organization.
Disadvantages – Type 3 will not be able to keep all historical changes where an attribute is changed more than once. For example, if James later moves to Texas on December 31, 2010, the California information will get lost.
Example:
Source Table: Customer
Customer_Key 🔑 | Name | City |
10001 | James | New York |
10002 | Luis | Arizona |
10003 | Alexander | Texas |
If any updates or changes are made in the source table, then the SCD Type 3 table will capture the records as shown below.
Let’s assume, James moved from New York to California, the original information gets updated, and the records will get populated as the below table (assuming the effective date of change is January 01, 2010).
Target Table: Customer_Dim
Customer_Key 🔑 | Name | Original_City | Current_City | Effective_Date |
10001 | James | New York | California | 01-01-2010 |
10002 | Luis | Arizona | ||
10003 | Alexander | Texas |
Validation to be done:
fdkghdfjghjdfhgjhdfjhgjdfs
When to use SCD Type 3:
Type 3 Slowly Changing Dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur a finite(limited) number of times.
Conclusion:
Data warehouse migration validation is as important as getting rid of old technologies and migrating to cloud technology. Here, during the post-migration testing phase, first, the count of records in source vs target is compared, then it must be confirmed during the migration the records were transformed as per the business requirements (referring to mapping documents) which can harm the business potentially.
Then, data reconciliation, Duplicate check, NULL check, and referential integrity check must be performed to ensure the quality of data.
Also, read – Oracle to Azure SQL Server Migration Validation

Queries: contact@datagaps.com