Datagaps - Automated Testing Tools for ETL, BI & BigData Testing
What Are Data Quality Dimensions?

What Are Data Quality Dimensions?

“A recent survey by TDWI found that 66% of organizations are looking for ways to improve data quality and trust. Data Validation using Data Quality testing tools such as Datagaps DataOps suite is essential to ensure trust in your data and analytics.”

 

The Data Quality dimensions provide a way to categorize data validation rules and measure data quality. There are seven data quality dimensions that are commonly used to measure data quality.

 

Completeness refers to the existence of all required attributes in the population of data records. Data element is:

  1. Always required (or)
  2. Required based on the condition of another data element.

Example:

  1. Person record with a null First Name
  2. The person record is missing a value for marital status. The married (Y/N) field should have a non-null value of ‘Y’ or ‘N’ but is populated with a “null” value instead.

Conformity means the data is following the set of standard data definitions like data type, size, and format. All data values conform to the requirements of their respective field.

Example:

  1. Date of Birth is listed as “26/05/1990” but should be in the format “mm/dd/yyyy“
  2. Zip Code contains letters but it should be numeric

Validity is determined by how closely data values correspond to reference tables, lists of golden values documented in metadata, and value ranges, etc. All data values are valid in relation to reference tables.

Example:

  1. Country Code should be a valid value from the reference data for countries
  2. Age for a Person should be less than 100 years old

Accuracy refers to the degree to which information accurately reflects what’s being described. It can be measured against either original documents or authoritative sources and validated against defined business rules.

Example:

  1. US Zip Codes should match a list of legal US postal codes
  2. Person name is spelled incorrectly

Uniqueness refers to the singularity of records and or attributes. The objective is a single (unique) recording of data. Data element is unique — there are no duplicate values.

Example:

  1. Each Person should only have one record, but there are two instances of the same Person with different identifiers or spellings.
  2. SSN should be unique, but there are two Person records that have the same social security number.

Consistency means data across all systems reflects the same information and are in synch with each other across the enterprise. The absence of difference, when comparing two or more representations of a thing against a definition.

Example:

  1. Employee status is terminated but pay status is active
  2. Employee start date cannot be later than the Employee end date
  3. N number for a Person record must be the same across systems

Timeliness references whether the information is available when it is expected and needed.

Example:

  1. For quarterly reporting, data must be up to date by the time the data is extracted
  2. Last Review Date for the policy must be within the last three years

How To Measure Data Quality Using Data Quality Dimensions?

Datagaps DataOps suite automatically computes Data Quality Scores for each rule based on the number of good vs bad data. This score is rolled up to Data Quality Dimensions at Table, Data Model, and System level.

A sample dashboard showing the Data Quality trend is shown below:

Data Quality Dimensions

Figure: Shows the Data Quality Dimensions and their scores at a system-level DataOps Data Quality comes with the following rule types to make it easy to define Data Validation rules:

Customized SQL Query to identify good or bad records.
Use Query Builder to provide a SQL Expression for identifying bad records.
Automatically Identifies duplicate records. Useful for measuring the Uniqueness of the data elements.
Check for Referential Integrity of data. Useful for measuring Data Integrity.
Check if data in a column conforms to the list of acceptable values. Useful for measuring the Validity of the data attributes.
Configurable rules for identifying data quality issues without the need to write any SQL queries.
A rule similar to the SQL Query rule but uses the ‘with’ clause (CTE) which makes it more efficient to execute.
Compare data from the previous snapshot of the data to the latest values.
DataOps dataflow & test cases can be used to compare data across different systems in the enterprise. This is useful in ensuring that data consistency can be maintained.

How To Compute Data Quality Score?

Data Quality Score allows us to quickly understand the current state of our data and more easily compare quality over time. Data Quality Scores will be a percent calculated by the following:

= [1 – (# of bad records / # total records)] x 100

Data Quality scores

Figure: Shows the trend of Data Quality scores at the system level

Conclusion

Automated Data Quality testing can be done using Data Validation rules. Data Quality Score provides a means to measure and track the Data Quality of your enterprise data at rest and in motion. Data Quality dimensions help categorize the data validation rules into meaningful buckets. DataOps Data Quality is a simple data validation testing tool that can be used to automate the Data Quality testing process.

Datagaps

Established in the year 2010 with the mission of building trust in enterprise data & reports. Datagaps provides software for ETL Data Automation, Data Synchronization, Data Quality, Data Transformation, Test Data Generation, & BI Test Automation. An innovative company focused on providing the highest customer satisfaction. We are passionate about data-driven test automation. Our flagship solutions, ETL Validator, Data Flow and BI Validator are designed to help customers automate the testing of ETL, BI, Database, Data Lake, Flat File, & XML Data Sources. Our tools support Snowflake, Tableau, Amazon Redshift, Oracle Analytics, Salesforce, Microsoft Power BI, Azure Synapse, SAP BusinessObjects, IBM Cognos, etc., data warehousing projects, and BI platforms.

 

Queries: contact@datagaps.com

How Do You Automate Big Data Testing? Everything To Know

How Do You Automate Big Data Testing? Everything To Know

Big Data has passed the tipping point. It’s no longer a fringe interest being practiced by data scientists. The Big Data industry is currently on track to be worth $77 billion by 2023.

There’s a saying in programming and data analysis – “Garbage In, Garbage Out.” That is to say, your data analytics are only as good as the data that fuels them. This is why Big Data testing is so important.

Testing your Big Data by hand rather defeats the purpose of data-driven strategies in the first place. It might not even be possible to assess all of your data depending on how there is. That’s where Big Data testing tools come into play.

Guide To Big Data Testing

Back in 2018, 92% of businesses reported wanting to incorporate Big Data automation testing tools by 2020. Clearly, this is something that’s been on tech-savvy business owners’ minds for some time. Luckily, with today’s Big Data testing tools, this is more feasible than ever for businesses of all sizes.

Data testing is fairly simple and straightforward for routine data applications. Repetitive business practices like forms are highly predictable. A simple program would likely be enough to catch any potential errors in structured data.

Much business data is unstructured or semi-structured. It is estimated that around 80% of data collected by businesses is either unstructured or semi-structured like JSON.

Here are some steps you can take to incorporate an automated Cloud Big Data testing tool in your data pipeline.

Incorporate an ETL Testing Tool

At the beginning of your data pipeline, it’s highly recommended you incorporate an extract, transform, and load (ETL) testing tool. An ETL testing tool can be configured to monitor an incoming data stream for data relevant to your business.

Once this data is gathered, an ETL testing tool will transform the data into a format suitable for your Big Data cloud platform. Once it’s clean, it’s loaded into your data analytics environment.

Implement A Big Data Testing Strategy

You’ll also want to put a solution in place to make sure your Big Data testing tools are functioning properly. This presents certain challenges when dealing with the monumental amount of data that Big Data involves.

A Big Data testing strategy usually involves putting conditions in place to make sure you’re getting the data you need. Some examples of common data constraints could include:

  • Conformity
  • Accuracy
  • Duplication
  • Consistency
  • Completeness

Trying to assess every byte of data could slow your Big Data analytics down to a crawl, however. You’ll also want to decide on the scope for your testing as representative of the entire body. You might test every 10th entry, for instance, and have a subroutine in place of errors rising above a certain rate.

Big Data Testing is Critical

Structure Each Source

To get the most accurate Big Data testing, you should configure each data entry point to make sure the data is configured correctly. Say you wanted to collect data from your blog for analysis. Examples of data you might collect from blog posts might include:

  • Publication data
  • Wordcount
  • Time published
  • SEO metadata
  • Social shares

You should spend some time figuring out where you want to collect data from when you’re compiling your data testing strategy. Once you’ve got a list of where your data is coming from, you should then think about what data you want to harvest from that particular source.

Taking the time to answer these questions will help you set up your ETL tool properly. When all of these steps have been handled correctly, your Big Data pipeline can truly deliver automated insights!

Consolidate Data

Some of your data streams are likely to contain repeat data or monitor the same assets. Leaving all that data unstructured is going to bog down your data analytics platform significantly. You might want to implement an additional abstraction layer for additional processing.

Say you’re analyzing temperature data from a list of cities. This data might be entered as a pair, as is often the case, with the name of the city acting as the key and the temperature as the value.

Depending on where this data is coming from, these values could be returned at a specified rate. Or if it’s coming from a scientific sensor it might be a string of continuous data. You’ll want to determine the scope of the data you want returned to your testing platform, for starters.

Setting up an additional layer for each city makes this problem relatively simple to solve. All of the data for that particular city would be returned to that city’s specific layer. You can put additional constraints in place to make sure the data is in a usable and useful form.

Say you put a filter in place to only return the highest and lowest temperature from a particular city. Now it doesn’t matter if it’s a continuous stream from a sensor or collected periodically. It ensures that all of your data will work nicely together.

It also makes it so that your Big Data testing platform can receive data from however many sources you like. This is essential for making your Big Data testing solution scalable and adaptable to any solution you apply it to!

These are just a few things to keep in mind to illustrate how the right data testing tools and the proper foresight sets you and your data-driven business up for success. It ensures your data is formatted properly no matter how much there is or where it’s coming from.

Are You Looking For Big Data Testing Tools?

Big Data is quickly making science fiction become science fact. Disciplines like machine learning and artificial intelligence were still in the realm of sci-fi even 10 years ago. Now they’re available for anybody to benefit from!

If you’re ready to find out how data-driven tools like Big Data testing can empower you and your business, Sign Up for a Demo today!

Datagaps

Established in the year 2010 with the mission of building trust in enterprise data & reports. Datagaps provides software for ETL Data Automation, Data Synchronization, Data Quality, Data Transformation, Test Data Generation, & BI Test Automation. An innovative company focused on providing the highest customer satisfaction. We are passionate about data-driven test automation. Our flagship solutions, ETL Validator, Data Flow and BI Validator are designed to help customers automate the testing of ETL, BI, Database, Data Lake, Flat File, & XML Data Sources. Our tools support Snowflake, Tableau, Amazon Redshift, Oracle Analytics, Salesforce, Microsoft Power BI, Azure Synapse, SAP BusinessObjects, IBM Cognos, etc., data warehousing projects, and BI platforms.

 

Queries: contact@datagaps.com

Bookmarks Development and Testing

Bookmarks Development and Testing

Tableau, Power BI, OBIEE, Cognos, Qlikview, Looker, Micro Strategy, and other developing BI Visualization technologies have pushed business intelligence reporting to the next level in recent years. We mostly employed Excel and Web pages for reporting in the early years of this decade. At such a stage, the functionality and deployment options are limited.

In this new era of BI reporting visualization tools, including Tableau and Power BI, are investigating new possibilities of visualization, broadening the reach of management reporting and analysis to many levels.

In the process of empowering and including many web-based functionalities also with the low code environment in Power BI, bookmarks play an important role to enable many attractive and user-friendly functionalities.

Bookmarks are one of the most powerful features in Power BI. Bookmarks capture the state of the Report Page with Filters and Slicers state, visual state( visibility of the Visuals selected in the selection pane), sorting applied in Visuals, Focus, or Spotlight Mode too.

What is the Data option checked in the Bookmarks?

Data option in the Bookmarks enabled by default and it captures the data applied to the Filters, Slicers, or any other visuals on the page of the report.

 

How to uncheck the data option in Bookmark?

Click on the three dots beside the bookmarks you can see data, display, Current Page, and All visuals options checked

Data tips: When the bookmarks are applied to a few visuals and filters on the same page then the visuals get filtered from one bookmark to another. To avoid this behavior the bookmarks should be updated with the data unchecked.

How Do ‘All Visuals’ and ‘Selected Visuals’ Work in Bookmarks for a Power BI Report?

While applying the bookmark in a report by default ‘All Visuals’ is selected because the bookmark applied should apply to all the visuals on the page. 

Example:

if a report has an option to show the ‘Last Refresh Date’ which is the MAX(Date_table[Posted_Date]) and there should be a Reset Filters option to be applied on the page to reset all the filters applied for a better User experience. 

Here, the Reset Filters are created and applied with the Button Actions associated with the Bookmark. The Bookmark which is updated at the default state of the Filters but there are some more visuals that do not need to be captured at that state. 

In the above Scenario, All Visuals should be avoided and we should select the Selected Visuals only. 

How do the Current Page option work in Bookmarks for a Power BI Report?

By default Bookmarks are updated to capture the state of the current page but if it is unchecked to capture the change on another page too.

The current Page option is a very unique feature to enable the Bookmark applied to another page which may be hidden or open which will eventually change the next page with the selection etc.

Magic Combination of Power BI:

Buttons, Bookmarks, and Selection pane are the magic combination that enables many features in the Power BI Reports. Buttons, Image, and Shape have six options to apply the action.

Button names can be modified and stored for four different states like 

  1. Default State – Text assigned to the default state will be always visible.
  2. On Hoover- Text applied to Hoover state will be visible only on mouse hover
  3. On Press – If the Button is clicked the text applied on the press is visible
  4. Disabled –  Text assigned to this will be visible at Disabled state

 Available actions: 

  1. Back
  2. Bookmark
  3. Page Navigation
  4. Drill Through
  5. Q&A
  6. Web URL

1.Back – Back action is a powerful option that can be used to come back to the same page after navigating to any page.

2.Bookmark: Linking of a created bookmark in the action to a button will apply the bookmarks when we click the button.

3.Page Navigation: Page Navigation action applied to the button navigates to another page or to the given function. Navigating to the given page is straightforward and the using function to navigate is also a good feature.

4.Drill Through – Drill through feature can be applied through this action

5.Q&A – Q&A option can be enabled with this button click action for a better user experience like a button with a name Queries can be used for this action.

6.Web URL – a given Web URL can be used for the external site navigation or an internal function can be used to enable it in the provided column.

Bookmarks Slicer Panel: 

Creating a dynamic Slicer Pane in power BI Reports :

Step 1: Insert an image or button or shape for the ON Action

Step 2: Select a shape to fix the filter within

Step 3 Fix the filters in the shape

Step 4: Select the visuals in the selection pane and open the ‘On button’ and open the ‘Off button” 

Creating forms  and Sales funnel forms with Bookmarks:

With the help of Bookmark features, a successful Sales funnel form for user requirements can be created with Low Code in Power BI reports. With the beautiful functionality of the Current Page option in Bookmarks, we can achieve many use cases which only are possible till now with different WordPress or Web applications or Forms in coding environments.

Let’s discuss the Forms and Sales Funnel creation for user requirements in Power BI reports in detail in the next section.

Maintenance and testing of Bookmarks:

Bookmarks maintenance and testing become a tedious task after the creation of multiples of Bookmarks on each page in the report. 

Common issues at  Bookmarks maintenance and testing:

  1. In the process of development, if the team decides to change the button or the shapes attached to the bookmarks, the bookmarks assigned behind that will be idle and create confusion later in maintenance.
  2. In the regression testing Bookmark, testing takes time and needs to workaround in the reports and pages.
  3. Buttons are difficult to check whether the bookmark action applied to it and works as expected.
  4. Bookmark options when applied with data or filters need to be tested for which filters it has applied. Bookmarks applied on selected visuals need to be carefully checked for which selection it is applied to.

Bookmark testing :

Bookmark testing is not a straightforward task. The following  steps should be followed:

  1. To test the Bookmark state, the button or image or the shape which has been assigned with Bookmark has to be tested whether the action to the same has been applied in the Power BI format actions.
  2. The action may be on but the bookmark might not be assigned or not properly assigned.
  3. If the bookmark is assigned in the action to the button then the bookmark state has to be played and tested.
  4. Some visuals, forms, buttons, and images in the selected pane may need to be deleted as of a result of bookmark development and revisions.
  5. Hidden slicers in a selected form have an impact on the data presented in this report. If the data displayed in a table contains 1000s of entries, and if Slicers are applied to it, data presentation errors may arise on each visual.

If you’re looking for BI Testing Tools to automate your report testing, BI Validator can help. Please contact us for a demo.

Datagaps

Established in the year 2010 with the mission of building trust in enterprise data & reports. Datagaps provides software for ETL Data Automation, Data Synchronization, Data Quality, Data Transformation, Test Data Generation, & BI Test Automation. An innovative company focused on providing the highest customer satisfaction. We are passionate about data-driven test automation. Our flagship solutions, ETL Validator, Data Flow and BI Validator are designed to help customers automate the testing of ETL, BI, Database, Data Lake, Flat File, & XML Data Sources. Our tools support Snowflake, Tableau, Amazon Redshift, Oracle Analytics, Salesforce, Microsoft Power BI, Azure Synapse, SAP BusinessObjects, IBM Cognos, etc., data warehousing projects, and BI platforms.

 

Queries: contact@datagaps.com

Slicers Testing in Power BI Report

Slicers Testing in Power BI Report

One of our clients decided to change their reporting platform to Power BI and started rebuilding their reports. However, they had a large number of reports. The implementation process began with the planning of the development phase. They embarked on the journey by creating eight sets of reports using multiple PBIX files. 

One of the eight reports had 60 pages with 4 to 10 slicers on each page and navigation between the pages. A closer look at the plan reveals that the number of days or hours spent on development is only one portion of the project. A small amount of time in the plan was allocated towards testing and fixing the issues. However, in reality, testing is more painful than development and the fixing of issues took longer than originally planned. Every round of testing may result in changes to the functionality of the report.

Why is testing a report more painful than Development?

Development is a wheeled voyage, but testing and verifying each aspect requires walking back and covering the entire journey. There are numerous parts to test, including functionality, regression, user interface, upgrading, migration, and so on. Why do we need to have a proper plan for testing the reports when the report developer has completed the development as per the requirement?

Power BI reports, like any other software development project, necessitate careful planning and testing. Data issues in reports can lead to a loss of trust in the data displayed in the report if they are not tested. Let’s only discuss the testing of Slicers in this post because it may appear that testing of slicers is simple and won’t take up much time.

How to use Slicers in a Power BI report?

Case 1: Slicers used as a list with selection boxes or horizontal tabs. We may slice this data or report by Organization Name, and the elements in the list are easy to see.

Pros:

  • The user can inspect all of the Slicer’s elements.
  • Any element can be readily selected by the user.
  • Simple to put to the test

Cons: 

  • This Slicer takes up a lot of room in the report.
  • When we only have a minimal number of items on the list, it will be user-friendly.

Case 2: Slicers used as the dropdown in Power BI

   

We can add additional numbers in a limited place using the Slicers’ dropdown kind.

Pros: 

  1. The dropdown menu allows the user to simply navigate the list.
  2. It takes up very little space.

Cons:

  1. It’s difficult to tell whether it’s single or multiple selections.
  2. It’s difficult to see if the search option for multiple selections is activated.
  3. It’s difficult to tell if the list is in the correct sequence.
  4. It’s difficult to see if the Slicer has been assigned the correct field.

We must look at the header when creating dropdown slicers or any other Slicer because the header provided in the Slicer header option is only by default left-aligned, which appears unusual, or even if we apply the name to the Slicer header, it can only be confirmed with the field at by clicking on it.

We must check the dropdown to see if all of the fields in the slicer are available and not filtered at the visual, page, or report level.

Case 3: Slicer used for Navigation function:

Slicers are typically used for bookmarks that have been selected or for a navigation list that has been prepared. This Navigation list is used to apply Navigation from a button or image/icon that has action.

The initial list of needed fields or values in the column, as well as the action applied to it, should be used to test the Slicer navigation.

Case 4: Date Slicers for the data in a date range 

Date Slicer is to define a range of date fields in a dataset.

There are two ways to use a date slicer:

1.  A date field assigned to a slicer will display the Slicer’s calendar selection option. In this case, we can assign a single column and define the range using the between option in the Slicer’s header.

2. We should use two slicers to describe the data if we have two date columns, one that displays the “from date” and the other that displays the “to date.” The range will be defined by selecting the From date Slicer with the option of ‘After’ and the To date Slicer with the option of ‘Before.’ d to specify the date range.

Adjust the slider to define the range.

Case 5: Slicer with a DAX Query

Slicer with a DAX Query, such as producing a list of items in the slicer using the SWITCH function

Testing of the Slicers

Slicer data validation

Slicer data should be validated individually to confirm that the list of items in the field is visible and matches the data source and that the data is not affected by any filter applied in the report.

Slicer format validation 

Slicer layout and format should conform to the report requirements and report development standards of the organization. For example, the color, font, x, and y positions should be validated. 

RLS based Slicer validation

The values displayed in a slicer may change based on the role of the user viewing the report. If there is a requirement to show limited values in the slicer based on the RLS security, the data in the slicer should be validated for different roles. 

Slicer data sorting validation

Sorting applied to the slicer data is important for the end-user to be able to easily use the slicer. The sorting should conform to the requirements. 

Validating data in other visuals based on the Slicer selection

Based on the slicer selections, filters should be applied to the visuals on the page automatically. In the case of Sync Slicers, filters should be applied to the visuals in all selected pages. 

Slicer performance validation

Slicer rendering should be within the expected SLA for the report performance. As the slicer selections are changed, the report should be refreshed within the expected SLA. 

Regression testing of Slicers

Any change in the data model or report can result in a regression issue for the slicer over a period of time. Regression testing of the slicers should be performed to ensure that the slicers are working as expected. 

Interested in learning how to automate the slicer testing using BI Validator? Reach out to the Datagaps team.

Datagaps

Established in the year 2010 with the mission of building trust in enterprise data & reports. Datagaps provides software for ETL Data Automation, Data Synchronization, Data Quality, Data Transformation, Test Data Generation, & BI Test Automation. An innovative company focused on providing the highest customer satisfaction. We are passionate about data-driven test automation. Our flagship solutions, ETL Validator, Data Flow and BI Validator are designed to help customers automate the testing of ETL, BI, Database, Data Lake, Flat File, & XML Data Sources. Our tools support Snowflake, Tableau, Amazon Redshift, Oracle Analytics, Salesforce, Microsoft Power BI, Azure Synapse, SAP BusinessObjects, IBM Cognos, etc., data warehousing projects, and BI platforms.

 

Queries: contact@datagaps.com