March 08, 2024

How to Manage Test Data in Your Test Automation Project

Although “Test Data” is a commonly used term in test automation, how to test that data is a subject often overlooked. As automation engineers, we typically use property files or Excel sheets to store the test data needed for automation, but this might not always be the best solution.

The importance of maintaining and managing test data cannot be overstated as It is crucial for reducing maintenance overhead in automated software tests. Without proper planning, maintaining your test data can become quite challenging regardless of the type of data you use in your tests which could include URLs, credentials, or API keys.

By properly managing your test data, you ensure that you spend more time conducting the actual tests than maintaining the data.

In this article, I share some strategies you can implement as you redefine or re-iterate your test data management strategy.

Let’s dive right in!

Identify And Categorise Your Test Data

You can categorise your test data in two ways: based on the access level and the data type.

Levels of Access

Test data used in test automation projects can typically be categorised into different levels based on their accessibility which are usually global, suite, and test script levels.

Global level data: Data categorised into this level are data that are accessible from every part of the project. For example, login credentials, which almost every script requires to access the application under test.
Suite level data: This includes data that is common to tests in a test suite or data that is executed before or after the test execution. For instance, if you need multiple users with different roles to run your tests, you would create these users before the execution.

This comes in handy when you need to execute your tests across different environments. No manual intervention is needed, as all the data is created before execution. Conversely, once the execution is finished, you can clean up the generated data. 3. Test script Level Data — This categorisation level is for data that is only needed for a specific test. A fitting test data for this level of categorisation would be invalid credentials used to validate that users are unable to log in with invalid credentials.

Types of Data

Test Data can be broadly categorized as static and dynamic data, based on its type.

Static Data

Data fitting into this category typically remains unchanged regardless of the number of executions. Using static data can make test execution faster and more reliable since you’re running tests on a known data set.

Static data can come in two flavors: existing data in the system under test that can be reused across multiple test cases and predefined test data, which is reused to create new records in the system.

A good example of the prior would be attempting to verify the search functionality of an application or validating the data presentation on the user interface. The latter would be more suitable in a scenario where you need to create multiple purchase orders with the same list of items. In this case, you can use the same data set repeatedly, which remains static relative to the automation suite.

Dynamic Data

If the application you’re testing accepts requires a unique value for a certain scenario on every test, that is a good candidate to apply dynamic data.

Dynamic data is data that’s generated during the test execution process. So, Instead of running with the same data set every time, you can use dynamically generated values as inputs for your test.

For example, suppose you need to validate a user registration form and generate a unique username for each execution. You would simply dynamically generate the username during execution to ensure its uniqueness.

The main benefit of dynamic data over static data is that it simulates how users would interact with your software in the real world. In the real world, users provide new, unique data on each visit, and by generating new test data on the fly, tests gain more variability and validity.

Both dynamic and static data types are data types that are already known by the automation suite. However, there could exist some data set generated by the system under test that your test suite isn’t aware of. Confusing?

Let’s consider an example.

Assuming you were working on a flight booking application and part of your testing sequence is the booking of a flight. To validate the data, you would need to search with the application with the generated booking number which your execution suite isn’t aware of as it was generated from your test.

This results in a discrepancy between the state of the system under test and the awareness of the execution suite.

You can resolve this by programmatically reading the data from the system under test, either through the user interface or an API. In the context of our example, you could extract the booking reference, maybe by scraping its user interface and using it to search and validate the data of the booked flight.

By properly categorising test data into levels and types, you can maximize the efficiency and validity of your automated tests.

Deciding Where Each Data Set Fits in Your Suite

As a rule of thumb, I prefer to decouple test data from the scripts to enhance maintainability. In a case where I want to update data, there’s only one place to update, which would only ever happen rarely.

One disadvantage of this approach, however, is that there is no direct way to figure out the test scripts that consume the data. To mitigate this, you can use the “find usage” feature in your Integrated Development Environment (IDE), such as IntelliJ or Eclipse.

On the other hand, if you want to see the mappings between data sets and test scripts directly in the data file, you can add a comment block to add the IDs of the test scripts that consume the particular data set. Since the compiler ignores comments, though, it might be a good idea to define some guidelines or standard practices to ensure that the mappings are accurate.

Here is an example. PurchaseOrderData is your data file containing data required to create a purchase order in the application under test.

/** Dependency : TC-1, TC-2, TC-3*/

class PurchaseOrderData {

String PO Number="PO-0001";

String supplier="Supplier-1";

String buyer="Buyer-1";

double price=12.5;

int quantity=1000;

}

Here, data is stored using Java, which is one way to store automation data. Other methods could include Excel sheets, JSON files, and other language data structures.

Take Aways

The optimal approach to planning your test data is to first identify the different levels and types of data you currently have or are planning to incorporate as test data.
Determine where to store the data based on your accessibility needs.
Derive a model for your data and articulate it in your test project.

MagicPod is a no-code AI-driven test automation platform for testing mobile and web applications designed to speed up release cycles. Unlike traditional “record & playback” tools, MagicPod uses an AI self-healing mechanism. This means your test scripts are automatically updated when the application’s UI changes, significantly reducing maintenance overhead and helping teams focus on development.

MagicPod is a no-code AI-driven test automation platform for testing mobile and web applications designed to speed up release cycles. Unlike traditional "record & playback" tools, MagicPod uses an AI self-healing mechanism. This means your test scripts are automatically updated when the application's UI changes, significantly reducing maintenance overhead and helping teams focus on development.

How to Manage Test Data in Your Test Automation Project