DataOps Suite: Validate Rest API Response

Q: How do you pull REST API data into DataOps Suite?

Using the Code Component, which supports Scala, Python, and SparkR, users can write custom code to call a REST API, parse the response, and convert it into a dataset for use within a DataFlow.

Q: Why is the Attribute Component needed after fetching API data?

REST API data initially loads with all columns as string data types. The Attribute Component lets users rename columns and assign the correct data types before comparisons or further processing.

Q: How do you compare REST API data against another data source?

The Data Compare Component validates the API dataset against another dataset by mapping columns by name or position and using key columns to identify matching records.

Q: How can you track whether a DataFlow ran successfully?

DataFlow uses color-coded execution indicators: green for success, blue for in progress, yellow for warnings, and red for errors, allowing users to quickly verify execution status.

By Rajesh Kumar
March 11, 2020
1:34 pm
Uncategorized

This step-by-step tutorial shows how to validate REST API data in DataOps Suite using DataFlow. It covers using the Code Component (Scala, Python, or SparkR) to fetch and parse API responses into a dataset, applying the Attribute Component to correct data types and column names, and using the Data Compare Component to validate the API dataset against a file-based dataset. The guide walks through mapping, key selection, running comparisons, and reviewing differences via DataFlow’s visual execution tracker.

Key Takeaways

Code Component enables REST API ingestion — supports Scala, Python, and SparkR, letting users write custom code to pull API responses and convert them into a queryable dataset within DataFlow.
Attribute Component fixes data types post-ingestion — since API data initially loads as string type, this component lets users rename columns and correctly assign data types before comparison.
Data Compare Component validates API data against other sources — supports flexible column mapping (by order or by name) and key-based comparison, surfacing duplicates, mismatches, and differences.
DataFlow provides visual, reusable pipelines — once built, the flow can be re-run anytime via “Run DataFlow,” with color-coded status tracking (green/blue/yellow/red) for each component’s execution state.

DataOps Suite supports data validation for different kinds of Data sources. Along with other Data sources also it has support for Rest API.

DataFlow is a powerful application using which you can easily perform end-to-end automation of a data migration process. In DataFlow, there are different kinds of components to serve different purposes. One of them is the Code Component. It supports three kinds of languages. Spark SQL, Scala, Python. Using the Code Component, you can write queries or code on top of different datasets created in the current DataFlow. This gives you flexibility to do different DML operations on top of the existing datasets. This document covers reading the data from a Rest API, converting into a dataset and comparing it with another dataset from file.

Please go through the following steps:

On the left menu, select ‘DataFlows’.
Click on ‘New DataFlow’ button on the top right corner.

In the ‘New DataFlow’ dialog, fill the details and Save.
1. Name: Any random name to identify DataFlow.
2. Livy server: By default, the application comes with a default livy server. Select any configured livy server.

During the first run, all the list of components are displayed (as shown in the below image). Select Code component in the processor bucket.

A new Code component will be opened with the ‘Properties’ tab selected. Fill the details and then go to the next step –
1. Name: Any random name.
2. Dependency: Not required as it is the first component.
3. Description: Optional. You can give any useful info about the code component.
4. Dataset name: Name of the dataset which you want to create using the code. You can give multiple names, separated by commas.

Select the kind. Scala is selected by default. The Code component supports Scala, Python and SparkR languages. By clicking on the ‘Sample API code’ button on the top right, a sample code will be populated to read data from Rest API. Here a sample code is provided.

Code:

import spark.implicits._;

var jsonStr =scala.io.Source.fromURL("https://192.168.6.42:9080/DataPrepRest/api/v1.0/templates/table?containerId=81&userName=sh&password=******&url=jdbc:oracle:thin:@192.168.6.76:1521:orcl&schema=sh&table=customers").mkString;

var df = spark.read.json(Seq(jsonStr).toDS());

df.createOrReplaceTempView("code_ds");

df.cache();
After execution of the above code a dataset with the name code_ds will be created. You can write multiple sets of such codes and can create multiple datasets. These datasets should be listed in the Properties tab as mentioned in the 5th step.

Create a new File component. Fill the details and move to the next step –

Name: Any random name.
Data Source: File data source list can be seen here. You need to select a data source.
Dependency: In the present case, there’s no need to give any dependency.
Description: Optional. Write some basic info about the component.
Dataset name: For File components, only one dataset will be created. Default name will be populated based on the component name. You can enter your desired Dataset name.

In the File step, fill in the details as shown below –
1. File Name: Give the filename you want to read. Enter the filename manually or select a filename in the Files panel on the right.
2. Encode: Optional (File encoding type).
3. Options: Spark file read options. Some important options will be popped up with the default values. Please go through the following link for further info –
  https://docs.databricks.com/data/data-sources/read-csv.html.

code_ds is the dataset created by reading Rest API data in the Code component. By default, all the data types will be considered as string. If you want to change these data types or column names, you can use the Attribute component. For any dataset, you can change the data types by using the Attribute component. Create a new Attribute component by using Add component.

Fill the details –

Name: Any random name
Source Dataset: For which dataset user wants to change data types and column names. In the current example, we are selecting code_ds which is the output of the Code component.
Dependency: As this component can be run only after creation of code_ds dataset from the Code component, you must add the code component in the dependency list.
Description: Optional (description of the component).
Dataset Name: Output dataset name. After converting data types and column names, a new dataset will be created as the output. Default is the component name.

In the rename step, enter the desired column names and data types.

Save and Run the component.
Now click on Add component and select ‘Data Compare’ from the ‘Data Quality bucket’ as shown below –

In the ‘Data Compare’ component, you need to give two datasets as input. The comparison would be between these two datasets. Fill the details –
1. Name: Any Random name.
2. Dataset A: In this example we are selecting the output of Attribute component.
3. Dataset B: In this example we are selecting output of File component.
4. Dependency: In this example we are consuming datasets from file component and attribute component. So give both of them as dependencies.
5. Compare type: Different comparison types you want to run.
6. Description: Description of the component.
7. Dataset Name: Default name will be populated.

In the mapping step both dataset A and dataset B will be mapped by order of columns by default. If required, you can click on the “Remap by Name” button to reorder the mapping. You can select unique keys, then comparison would be based on keys. It allows multiple key columns. Move to next step after changes done.

Run the component. Each component executes in a set of statements. Data Comparison component contains more number of statements and the progress of execution can be seen at the bottom in the ‘Run’ tab. After the Run is completed, you can see the component results as shown in the following images. At the bottom, you can see failed and passed statements. These statements contain Duplicate calculation, Only in Dataset A, Only in Dataset B, Differences etc. For each calculation a statement will be there. By clicking on the link, you can see the details of each statement.

By clicking on the difference count statement, a window pops up. Please check the following image.

Now the design of DataFlow has been completed. It’s a one time step. You can run the DataFlow whenever you want just by clicking the “Run DataFlow” button at the top of the DataFlow window. Then the following window opens –

This image is created based on the dependencies given. And it is the execution order of components. Here each color indicates the progress of different components –

Green: Successfully completed and the status is passed.
Blue: In Queue.
Yellow: Running
Red: Completed. But the status is failure.

FAQs: REST API Data Validation in DataOps Suite

1) How do you pull REST API data into DataOps Suite?

Using the Code Component, which supports Scala, Python, and SparkR, you can write custom code to call a REST API, parse the response, and convert it into a dataset that can be used within a DataFlow.

2) Why is the Attribute Component needed after fetching API data?

REST API data is initially loaded with all columns as string data types. The Attribute Component allows users to rename columns and assign the correct data types—such as integer, date, decimal, or boolean—before performing comparisons or downstream processing.

3) How do you compare REST API data against another data source?

The Data Compare Component validates the API dataset against a second dataset, such as a file or database table. Users can map columns by name or position and define key columns to accurately identify matching records and detect differences.

4) How can you track whether a DataFlow ran successfully?

DataFlow provides color-coded execution status indicators for every component: green for successful execution, blue for in progress, yellow for warnings, and red for errors, making it easy to monitor workflow execution.

Get Started Today

Talk to a datagaps expert

Rajesh Kumar A

Digital Marketing Manager, Datagaps

Digital Marketing Manager at Datagaps. Drives data-driven growth through content, performance campaigns, and marketing technology.

Subrahmanya Narayana Chirravuri

Senior Director, Technology, Datagaps

Senior Director of Technology at Datagaps. Leads engineering for the ETL, BI, and data-quality validation platforms.

Established in the year 2010 with the mission of building trust in enterprise data & reports. Datagaps provides software for ETL Data Automation, Data Synchronization, Data Quality, Data Transformation, Test Data Generation, & BI Test Automation. An innovative company focused on providing the highest customer satisfaction. We are passionate about data-driven test automation. Our flagship solutions, ETL Validator, DataFlow, and BI Validator are designed to help customers automate the testing of ETL, BI, Database, Data Lake, Flat File, & XML Data Sources. Our tools support Snowflake, Tableau, Amazon Redshift, Oracle Analytics, Salesforce, Microsoft Power BI, Azure Synapse, SAP BusinessObjects, IBM Cognos, etc., data warehousing projects, and BI platforms. Datagaps

Use Case

Cloud

Analytics

Industry

Academy

Support

Use Case

Cloud

Analytics

Industry

Academy

Support

DataOps Suite: Validate Rest API Response

FAQs: REST API Data Validation in DataOps Suite

Talk to a datagaps expert

Related Posts:

Recent Blogs

Solutions

Data Testing Concepts

Products