Testing of inbound flat files presents unique challenges because the producer of the flat file is usually different organizations within an enterprise or an external vendor. Consequently, there might be differences in the format and content of the files since there is no easy way to enforce the data type and data quality constraints on the data in the flat files. Issues in flat file data can cause failures in the consuming process. While the file processing requirements are different from project to project, the focus of this use case is to list out some of the common checks that need to be performed for validating flat files.
When data is moved using flat files between enterprises or organizations within enterprise, it is important to perform a set of file ingestion validations on the inbound flat files before consuming the data in those files.
Files are ftp’ed or copied over to a specific folder for processing. These files usually have a specific naming convention so that the process consuming the file is able to understand the contents and date. From a testing standpoint, the file name pattern needs to be validated to verify that it meets the requirement.
Example: A government agency that gets files from multiple vendors on a periodic basis. The arriving files should follow a naming convension of ‘CompanyCode_ContentType_DateTimestamp.csv’. However, the files coming in from a specific vendor do not have have the correct company name.
Although, flat files are generally delimited or fixed width, it is common to have a header and footer in these files. Sometimes, these headers have a rowcount that can be used to verify that the file contains the entire data as expected.
Some of the relevant checks are:
Example: A financial reporting company generates files with a header that contains the summary amount with the line items having the detailed split. The sum of the amounts in the line items should match the summary amount in the header.
Files arrive periodically into a specific network folder or an ftp location before getting consumed by a process. Usually, there are specific requirements that need to be met regarding the file arrival time, order of arrival and retaining them.
Example: A pharma company gets a set of files from a vendor on a daily basis. The process consuming this files expects the complete set of files to be available before processing.
- A file that were supposed to come yesterday was delayed. It came in sometime after today’s file arrived causing issues due to difference in the order of processing the files.
- After the files gets processed, it is supposed to be moved to a specific directory where it is to be retained for a specified period of time and deleted. However, the file did not get copied over.
- Compare the transformed data in the target table with the expected values for the test data.
ETL Validator comes with Component Test Case and File Watcher which can be used to test Flat Files.
Reduce your data testing costs dramatically with ETL Validator. Download your 30 day free trial now.