Skip to content
  • There are no suggestions because the search field is empty.

A Step-by-Step Guide to Debugging & Fixing Flaky Tests

amanda-belec-Bnfo-UWJgB8-unsplash

One major step in the software development process is continuous delivery and continuous integration (CI/CD) which ensures that code moves smoothly from development to production. However, this vital process can be interrupted by failing tests and, sometimes more problematic, flaky tests.

Flaky tests are tests that produce inconsistent outcomes when run without any changes to the codebase. They might pass one moment and fail the next, disrupting the CI/CD pipeline.

So, in this article, you will learn the common causes of flaky tests, strategies to avoid them, and solutions for fixing them.

Let’s dive right in!

Common Causes of Flakiness

Below are some of the reasons that commonly cause flaky tests.

Concurrency Issues

This occurs when tests that utilize multiple threads do not behave consistently each time they're run. This unpredictability is often due to the test's dependency on the timing of specific events, such as the order in which threads execute.

Let’s take, for example, a shared logging service that is used by multiple components.

This occurs when tests that utilize multiple threads do not behave consistently each time they're run. This unpredictability is often due to the test's dependency on the timing of specific events, such as the order in which threads execute.

Let’s take, for example, a shared logging service that is used by multiple components.

Here, an instance of Logger is used to log messages to a file. The log method appends messages to the specified file synchronously using Node.js's appendFileSync method. 

However, when two instances of Logger (loggerA and loggerB) are created to log messages to the same file (transaction.log) and their log methods are called concurrently in the testLoggingConcurrency function, race conditions can arise.

class Logger {
constructor(file) {
this.file = file;
this.fs = require("fs");
}

log(message) {
this.fs.appendFileSync(this.file, message + "\\n"); // Synchronously append to file
}
}

function testLoggingConcurrency(logger1, logger2) {
logger1.log("Start transaction A");
logger2.log("Start transaction B");
logger1.log("End transaction A");
logger2.log("End transaction B");
}

const loggerA = new Logger("transaction.log");
const loggerB = new Logger("transaction.log");

// Running this in parallel could lead to mixed-up entries in transaction.log
testLoggingConcurrency(loggerA, loggerB);

Reliance on External Dependencies

Dependency on external services like a database, web service, or third-party API is a usual cause of flakiness in tests. These tests would normally contain a component outside of the application being tested.

For instance, a test like the one below could result in flaky tests.

const fetch = require("node-fetch");

class UserService {
async getUser(id) {
const response = await fetch(`https://api.example.com/users/${id}`);
const data = await response.json();
return data;
}
}

const UserService = require('./userService');

describe('UserService', () => {
it('should fetch user data', async () => {
// Arrange
const userService = new UserService();
const userId = 1;

const user = await userService.getUser(userId);

expect(user).toHaveProperty('id', userId);
expect(user).toHaveProperty('name');property
});
});

The test is unreliable as it depends on the availability and consistency of https://api.example.com  If the external service were to go down, the tests would suddenly start failing.

Timing Challenges

Tests depending on a specific timing condition to pass may bring about flakiness in your tests, due to the inconsistency in execution time, usually influenced by system load or testing environment.

For instance, let’s consider a test that waits for an element to appear after a specific delay using a hard-coded timeout:

test("element appears after 1 second", (done) => {
setTimeout(() => {
const element = document.querySelector("#my-element");
expect(element).not.toBeNull();
done();
}, 1000); // Waits for 1 second assuming the element will appear within this time
});

The issue with the above code is that if the element takes longer than 1 second to appear due to some reason, this test will fail. Additionally, the test will unnecessarily wait for a full second, even if the element appears earlier.

Resource Leaks and State Contamination

When the state of a system and its resources are not managed properly and isolated between tests, flaky tests are bound to occur. This causes subsequent tests to behave in an unpredictable manner based on altered state that was left by your previous tests.

Diagnosing Flaky Tests

Blotting out flaky tests is necessary if you want a CI/CD pipeline that is reliable. To do that you can use the following proven strategies in diagnosing the reason behind their flakiness.

Analyzing Test Code and Environment

Analyzing your test code and environment should be one of the first things you do to diagnose flakiness in your tests. Carefully examine your codebase to ensure you're not implementing anti-patterns like duplicate code or tight coupling.

This is a good way to ensure that you don't push flaky tests into the continuous integration process.

Continuous Integration Tools

CI tools can cause your tests to become unreliable if they run simultaneously in the same environment causing them to interfere with each other, producing unpredictable results. Verify your server setting as inconsistent configuration and random tests in your continuous integration setup can also contribute to varying outcomes.

You can also monitor your CI/CD server workload. Tests will run slower or time out if your CI server is busy, making your tests come off as flaky.

It is also important to properly manage your cache. If the cache isn’t managed right, your tests could be working with old data, which can throw things off too

Logging and Monitoring

If logging is too heavy, meaning your application produces a lot of log output, it might slow down your application during testing, which can cause timing issues or timeouts in tests that don't appear under normal conditions.

Monitoring tools that check system performance or resource usage can also interfere by consuming the CPU or memory that your tests need to run smoothly.

These are only some of the ways to diagnose flaky tests. Next, let’s look at some strategies to help us fix flaky tests.

Strategies for Fixing Flakiness

Isolating External Dependencies

When your tests rely on services and APIs beyond your control, such as a third-party service or a database, the outcome can be difficult to predict since you're not testing in a fully controlled environment.

However, by isolating these dependencies and replacing them with stubs and mocks, you can avoid testing in an uncontrolled environment, reducing the likelihood of flaky tests.

Let's refer back to the example we discussed in the section on causes of flakiness.

const UserService = require("./UserService");
jest.mock("node-fetch");
const fetch = require("node-fetch");

describe("UserService", () => {
it("should fetch user data correctly", async () => {
const expectedUser = { id: 1, name: "John Doe" };
fetch.mockResolvedValue({
json: () => Promise.resolve(expectedUser),
});

const userService = new UserService();
const user = await userService.getUser(1);

expect(user).toEqual(expectedUser);
expect(fetch).toHaveBeenCalledWith("<https://api.example.com/users/1>");
});
});

In this example, we mock node-fetch to ensure that no HTTP requests are made during the tests. Jest intercepts the fetch call, and a resolved value is returned.

By controlling the output of the fetch function, you can test various scenarios, such as user found, user not found, server errors, and so on, without relying on the actual service.

This greatly improves reliability and reduces instances of flaky tests.

Ensure test independence

When writing your tests, try to make them as standalone as possible. Your tests should be able to run without third-party services or external dependencies. This practice will help avoid flakiness or spot a bug that causes flaky tests early on.

One way to make your tests standalone is to ensure each test uses its own data, avoiding conflict with other tests. This can be achieved by using setup and teardown routines that create and then clean up test data before and after each test.

Here’s an example of a setup and teardown of a database before and after every test.

// example test
beforeEach(() => {
// Set up test data
database.create({ id: 1, name: "John Doe" });
});

afterEach(() => {
// Clean up test data
database.deleteAll();
});

Manage Timing and Concurrency

Since timing and concurrency issues are one of the major causes of flaky tests, an easy fix to prevent irregularities in your test outcomes is to use dynamic waits or mock mechanisms in your test scripts. This allows your tests to run accurately even in concurrent environments.

Let’s reference the shared logging service example we had earlier in the causes of flakiness section.

jest.mock("fs");

const fs = require("fs");
const Logger = require("./Logger"); // Assuming Logger class is exported from a module

describe("Logger tests with mocks", () => {
it("should log messages without interleaving", () => {
const loggerA = new Logger("transaction.log");
const loggerB = new Logger("transaction.log");

// Setup the mock for appendFileSync to simply track calls instead of writing to file
fs.appendFileSync.mockImplementation((file, message) => {
console.log(`Mock log to ${file}: ${message}`);
});

loggerA.log("Start transaction A");
loggerB.log("Start transaction B");
loggerA.log("End transaction A");
loggerB.log("End transaction B");

// Check that the mock function was called correctly
expect(fs.appendFileSync.mock.calls).toEqual([
["transaction.log", "Start transaction A\\n"],
["transaction.log", "Start transaction B\\n"],
["transaction.log", "End transaction A\\n"],
["transaction.log", "End transaction B\\n"],
]);
});
});

This is a much better way to write the test for the logging service. In this improved version, we use mocks to avoid concurrency issues by mocking the file writes, thus avoiding direct disk I/O.

Not only does this speed up your tests, it also prevents the test outcome from being affected by real-life system behavior. The function calls are also tracked using Jest’s mock functionality, allowing you to ensure the log messages are being written in the correct order without being affected by concurrency.

Utilize Reliable Data and Environment

To minimize the occurrence of flaky tests, they should be designed to use pre-defined data within a controlled environment. By making use of consistent datasets and configurations, you can greatly reduce the occurrence of flaky tests.

Leveraging Technology and Tools

Developer tools such as testing frameworks and CI/CD tools can be really helpful in debugging flaky tests. Using the right tech and tools, however, is key to fixing flaky tests because they help make your testing process more consistent and controlled.

Automation tools run tests the same way every time, reducing human mistakes. Furthermore, the use of virtualization and containerization technologies gives each test its own clean, controlled environment, avoiding problems from different system setups.

This mitigates potential issues arising from different system configurations, contributing to a more reliable and stable testing process.

Conclusion

In this article, we explored what causes flaky tests, diagnoses with different tools on how to notice and easily debug flaky tests, and how to fix flaky tests in our CI/CD process. With this knowledge, you will be able to Identify any flakiness in your tests, Implement effective solutions to fix them and improve the overall reliability of your testing processes.

Victor Uma

Written by Victor Uma

Uma Victor is a Software Engineer, blockchain developer, and Technical writer who loves learning, teaching, and building web tools and applications. He has over four years of experience in the world of web development. He's created content ranging from videos to articles to talks and has published over 143+ pieces of content.