Selecting the right tests for a code change with Test Impact Analysis

Test Impact Analysis is a method of determining which tests are the most important tests to run for a given code change. Launchable uses a branch of Test Impact Analysis known as predictive test selection to allow users to create dynamic subsets of the most important tests for code changes in real-time. This powerful capability enables running a few minutes of each test suite on each `git push` rather than waiting hours to get feedback on your changes.

The traditional approach

Traditionally, Test Impact Analysis approaches use static code analysis to determine the tests that are most likely to be affected by a code change. It works by analyzing dependencies in code to determine which tests should be run for a change. One of the downsides to this approach is that it requires using tools that are specific to particular programming languages or frameworks. This makes the traditional approach harder and more expensive to apply in environments with many different languages and environments.

Predictive test selection

In contrast to the traditional approach, Launchable doesn’t use static code analysis at all. Instead, we use machine learning to build relationships between code and tests to enable predictive test selection. The model is trained using data from many test runs. When tests fail, Launchable learns that those tests have a stronger relationship with the files that were changed. (How long does it take to train a model? On average we can train a model on your code in about 3-4 weeks, but it varies depending on the frequency of your test runs.)

Once the model is trained, it can provide lightning-fast test recommendations for individual code changes.

Launchable uses AI to determine which tests should be run for code changes.

Launchable is a test recommendation engine. It tells you the right tests to run for your code changes.

The impact of our approach

What’s amazing about predictive test selection is that many projects can get 80-90% of the value of running an entire test suite with just 10-20% of the tests. To expand on that, for each customer we look at the odds of finding at least one failure in a given run by only running a percentage of the tests. (Why only one failure? Because a single failure unlocks a developer’s ability to begin exploring what went wrong. The more failures that are found the better, but at least one is what is important.)

Once the model is trained, we can plot the odds of finding at least one failure on a chart that looks something like this:

With machine learning you can get most of the value out of a test suite while only running the tests that are most important for code changes

Machine learning gurus will recognize this as a ROC curve. What it shows is Launchable’s confidence that it will detect a failing run on the Y-axis and the percentage of test duration that needs to be run to achieve that on the X-axis. Baseline performance (the dotted line) is an approximation of performance without Launchable. The red line is how Launchable performs.

For this particular customer, they can get 90% confidence by running only 20% of the full test suite duration.

Imagine what this would look like with your own test suite:

  • Reduce a 30-minute pre-merge suite to 6 minutes
  • Reduce a 5-hour nightly suite to 1 hour on each pull request

Interested in trying this out with your own test suite? Sign up for the beta to get access to Launchable before anyone else!