Data Validation for Snowflake

Snowflake Data Migration

Migrating to Snowflake Cloud Data Warehouse

Migration from a legacy data warehouse such as Netezza to a cloud based Snowflake data warehouse requires multiple steps. Data Validation is the key to success of data migration projects. Datagaps Data Flow can be used to validate data in each step of the data migration as well as the end-to-end data validation scenarios.

Step 1: Extract data from Legacy data warehouse

Data is typically extracted into CSV or Parquet format and moved to a landing zone in AWS S3. Depending on the data volumes, AWS offers multiple options to moving the files to S3. Once the data has been moved to AWS S3, data validations need to be performed to ensure that all the data was properly extracted and migrated to AWS S3. Since there is not much transformation in this step, these tests are typically one-to-one comparison of the data in the tables in the legacy data warehouse and the files in AWS S3 landing zone. 

– Compare table to file row counts
– Compare data encoding
– Compare data completeness
– Compare data values

A sample test case diagram is shown to the right. JDBC Component can be used to read data from the legacy data warehouse. File Component can be used to read data from AWS S3. Finally, Data Compare component can be used to compare the two datasets. Sample output for a data compare component is shown below.

Data comparison test case
Output of data comparison

Step 2: Transform data

Transformations such as data type conversions can be performed in this step. Data curation can be also done to improve the data quality before the data is loaded into Snowflake. Before curating the data, it is important to profile the data and run data quality tests to identify data quality issues with the data. Data Flow can be used to perform these tasks. 

– Compare data between landing zone and staging (curated) zone in S3
– Use Data Profile and Data Rules components to identify data quality issues
– Curate data and sync to the staging zone 

Data Profile component
Data Rules component

Step 3: Copy data to Snowflake

Assuming that the Snowflake tables have been created, the last step is to copy the data to snowflake. Use the VALIDATE function to validate the data files and identify any errors. Data Flow can be used to compare the data between the Staging Zone (S3) files and Snowflake after the load. 

– Compare table to file row counts
– Compare data encoding
– Compare data completeness
– Compare data values
– End-to-end data validation (Legacy data warehouse to Snowflake)

Data Flow can be used to perform end-to-end data validation in a single test as shown to the right. A single data flow can be used to compare data between legacy data warehouse and S3 as well as legacy data warehouse and Snowflake. 

End-to-end test case

Step 4: Modify reports to use Snowflake

While snowflake provides JDBC/ODBC drivers and supports most of the commonly used SQL functions, there are going to some differences between the way reports are developed and executed in the legacy data warehouse and Snowflake. Once these changes are made, thorough testing needs to be performed between the reports using the legacy data warehouse and the equivalent reports using Snowflake. 

– Compare report data
– Compare report layout
– Compare report performance
– Stress test reports in the new environments by simulating concurrent user loads
– Compare security

Datagaps BI Validator is a nocode BI testing tool that can help automate all these tests for the supported BI tools. 

Add a Comment

Your email address will not be published. Required fields are marked *