Data validation in snowflake

Migrating to Snowflake Cloud Data Warehouse
Migration from a legacy data warehouse such as Netezza to a cloud-based Snowflake data warehouse requires multiple steps. Data Validation is the key to the success of data migration projects. Datagaps Data Flow can be used to validate data in each step of the data migration as well as the end-to-end data validation scenarios. If you are looking for Snowflake testing tools – Try Dataflow free for 14 days
Step 1: Extract data from the Legacy data warehouse
Data is typically extracted into CSV or Parquet format and moved to a landing zone in AWS S3. Depending on the data volumes, AWS offers multiple options for moving the files to S3. Once the data has been moved to AWS S3, data validations need to be performed to ensure that all the data was properly extracted and migrated to AWS S3. Since there are not many transformations in this step, these tests are typically one-to-one comparisons of the data in the tables in the legacy data warehouse and the files in the AWS S3 landing zone.
– Compare table to file row counts
– Compare data encoding
– Compare data completeness
– Compare data values
A sample test case diagram is shown to the right. JDBC Component can be used to read data from the legacy data warehouse. File Component can be used to read data from AWS S3. Finally, the Data Compare component can be used to compare the two datasets. Sample output for a data comparison component is shown below.


Step 2: Transform data
Transformations such as data type conversions can be performed in this step. Data curation can be also done to improve the Data Quality before the data is loaded into Snowflake. Before curating the data, it is important to profile the data and run data quality tests to identify data quality issues with the data. Data Flow can be used to perform these tasks.
– Compare data between landing zone and staging (curated) zone in S3
– Use Data Profile and Data Rules components to identify data quality issues
– Curate data and sync to the staging zone


Step 3: Copy data to Snowflake
Assuming that the Snowflake tables have been created, the last step is to copy the data to the snowflake. Use the VALIDATE function to validate the data files and identify any errors. Data Flow can be used to compare the data between the Staging Zone (S3) files and Snowflake after the load.
– Compare table to file row counts
– Compare data encoding
– Compare data completeness
– Compare data values
– End-to-end data validation (Legacy data warehouse to Snowflake)
Data Flow can be used to perform end-to-end Data Validation in a single test as shown to the right. A single data flow can be used to compare data between legacy data warehouse and S3 as well as legacy data warehouse and Snowflake.

Step 4: Modify reports to Use Snowflake
While snowflake provides JDBC/ODBC drivers and supports most of the commonly used SQL functions, there are going to be some differences between the way reports are developed and executed in the legacy Data Warehouse and Snowflake. Once these changes are made, thorough testing needs to be performed between the reports using the legacy data warehouse and the equivalent reports using Snowflake.
– Compare report data
– Compare report layout
– Compare report performance
– Stress test reports in the new environments by simulating concurrent user loads
– Compare security
Datagaps BI Validator is a no-code BI Testing Tool that can help automate all these tests for the supported BI tools.
Try BI Validator free for 14 days for your Snowflake BI testing needs – Download Now