As automated test suites grow in size, it may become impractical to run every test for every change. Because of this, many teams choose to run longer test suites less frequently, delaying crucial feedback for developers. This in turn drives up cycle times because developers only learn about the true ramifications of their changes later in the development cycle and must spend additional time reworking changes.
One way to combat this is to select a subset of tests from the larger test suite to run on a more frequent interval. For example, on each push to a source code repository. We call this manual test selection. While manual test selection does improve the frequency that tests are run, the downside is that it uses a static subset of tests that isn’t necessarily the best set of tests to run for a given change. In other words, you’re running the same tests for every change, not the set of tests that is most relevant.
Into this gap steps Predictive Test Selection: a way of using machine learning to choose the highest value tests to run for a specific change.
Predictive Test Selection is a branch of what is commonly known as Test Impact Analysis.
Test Impact Analysis is the practice of automating the selection of which tests to run for a given code change based on their expected value. Code changes come in all shapes and sizes: some are minor tweaks, whereas others are more cross-cutting. Test Impact Analysis tries to find the “needle in the haystack” for every change to minimize test execution time without sacrificing quality. Traditionally, Test Impact Analysis has used static code analysis to map dependencies and select the right tests to run. (For more info see MartinFowler.com’s, The Rise of Test Impact Analysis.)
Predictive Test Selection is a new approach to Test Impact Analysis that uses machine learning to dynamically select which tests to run based on the characteristics of a code change. To achieve this, historic test results and information about the changes that were tested are used to train a machine learning model. The model learns the relationships between the characteristics of code changes and which tests passed and failed, enabling a high-quality prediction of which tests to run.
Then, before running tests, the continuous integration server asks the model for a subset of tests specifically selected for the changes being tested. Finally, test results from every run are continually fed into the model to keep it up to date.
Because the model is trained using historic metadata about code changes and the pass/fail status of tests (instead of static code analysis) Predictive Test Selection can scale across an entire organization regardless of the programming languages being used.
Both Google and Facebook have used this method to shrink massive test suites to only the most relevant tests. Predictive Test Selection shortens build time dramatically (up to 90%!), providing faster feedback for developers while reducing operational overhead. And because the subset of tests to run is intelligently selected for every change, you can get faster feedback with only minimal loss of confidence.
At Launchable, we are working to productize Predictive Test Selection so that it can be adopted by any team. As seen in the graph above, we’ve found that for many software projects you can run 20% of the tests and achieve 90% confidence that you’ve found a failure if one exists for a code change. (Read our case study on how a major auto manufacturer is using predictive test selection.)
Predictive Test Selection is a generic approach that can be applied in many different testing scenarios. However, we find that it especially resonates in these very common situations:
Tests that run on every pull request (or git push), like unit tests, integration tests, or others. Even if these tests only take 20 minutes, they run the most frequently, so reducing their execution time has a broad impact. Also, when a feature is in early development you want feedback as quickly as possible. For many customers, Predictive Test Selection can reduce a 20 minute run down to only 2-4 minutes, enabling much faster iterations.
End-to-end, UI, or system tests, which can be disproportionately time-consuming due to the nature of what they are testing. Because they take a while to run, they tend to run less often, delaying crucial feedback. You can use Predictive Test Selection to choose a small subset of these tests to run on every push, providing feedback from these high-value tests earlier.
Lengthy regression tests that run less frequently than other suites, such as nightly. This is any suite that needs at least 3-5 hours to run. Regression tests often bring the added burden of failure triage, since lots of changes are tested together. You can reduce this burden by scheduling a subset to run for 20-30 minutes before every merge, for example.
In the second and third scenarios, we can shift tests left by running a subset of the suite more frequently. In the first scenario, tests are already shifted quite far to the left, so Predictive Test Selection can reduce the overall in-place test execution time to get feedback faster. (Note that these approaches aren’t mutually exclusive: you could still shift-left pull request tests by running a subset on every push but still requiring the full suite to run before merging.)
In any case, many teams find value in continuing to run the entire test suite on some frequency to provide those last few percentage points of risk coverage and to continually train the model.
At Launchable, we believe the vast majority of teams running automated tests – which is a lot of teams! – stand to benefit from this approach. Whether you’re running tests on every PR, after merging, or even nightly, you can easily apply Predictive Test Selection to your pipeline to provide faster feedback and reduce operational costs.