Approaches to Test Impact Analysis often rely on source code analysis to build dependency graphs between code and tests
The challenge of a code coverage-based approach is that the data gets very large very quickly.
How machine learning is enabling teams to predict test selection, cutting test cycle times.
For teams struggling with slow tests, Test Impact Analysis is a way to accelerate test cycles by running only the tests that matter for source code changes. Using this approach it’s possible to programmatically reduce a long test run (for example, 5 hours) to something more manageable (e.g. 20 minutes).
For some teams, not running all of the tests may seem like a bad idea. However, when tests become a significant bottleneck, you often have no choice but to consider some form of subsetting (even test parallelization has its limits). It’s really up to each to each team to determine what level of risk they are comfortable with in each phase.
Many teams decide to run all of the tests later in the development cycle and use subsets earlier (for example on pull-requests) to accelerate the development process. This can be part of a shift-left approach to testing.
Approaches to impact analysis in software testing often rely on source code analysis to build dependency graphs between code and tests. In his article, The Rise of Test Impact Analysis, Paul Hammant talks about how you can use code coverage or instrumentation while tests are running to build a map of how tests exercise production code.
The idea is that you run a single test, observe which code that test exercised, and then push these references into a map. You then repeat the process for all tests in the suite. (Of course, you can also build or adjust the map manually if you prefer.) Once the map is built, you then write a small program that runs every time a developer pushes code that reviews modified files and looks up which tests should be run. This list is then passed to your test runner which executes the tests. Finally, you need to regularly update the map so that it has an accurate dependency graph as code changes.
The challenge of the code coverage-based approach is that the data gets very large very quickly. It’s also hard to reliably prove that a change in one place doesn’t change the code execution path in another place. Just because a test previously didn’t exercise some parts of your code does not mean it will be the same next time you run it.
Historically, most approaches to Test Impact Analysis rely on source code analysis to build dependency graphs between code and tests. But modern approaches rely on machine learning and historical test runs to build associations between tests and code. This is called predictive test selection. Facebook and Google have both famously architected systems that rely on machine learning and code analysis. At Launchable, we’ve productized predictive test selection, making it easy for any team to get up and running with this approach.
One benefit of a pure machine learning approach is that it can scale across languages and frameworks making it ideal for polyglot organizations or projects.
Another benefit of using machine learning is that it allows you to manage risk based on confidence. Code analysis produces a static list of associations for each test. But it doesn’t necessarily do a good job of identifying the most useful tests to run for a code change. This is something that machine learning can do well.
What does a risk-based approach look like? Launchable can calculate the odds of a single failure being found in a run (if one exists) based on the percentage of tests selected. This can then be plotted on a graph:
Here, confidence is plotted on the Y axis and the percentage of tests required to achieve that confidence is plotted on the X axis. The dotted line represents baseline performance without Launchable. The red line is Launchable’s assessment of the risk for a particular project. In this case, the customer can create a dynamic subset of 20% of the tests to get 90% confidence that a failing run will be detected.
This allows you to run at various levels of confidence and frequency in different phases of development by choosing a different size (and confidence target) for each subset. For example on every push to a pull request, you could run 20% of the tests and post-merge you could run the remaining 100% of tests. Launchable allows you to choose the level of risk that you are comfortable with at each phase.
Impact analysis can be run on all tests from all levels of the Test Pyramid. Here are a few examples:
Tests that run on every pull request (or git push) - these include unit tests, integration tests, and more. Perhaps test run time has gotten so out of control that you can no longer do this. Test Impact Analysis can make it easy for you to run a smaller percentage of tests in just a few minutes.
End-to-end, UI, and system tests - can require a lot of time to run compared to other types of tests and often run much later in the development cycle. In this case, you can shift left a subset of your tests generated by Test Impact Analysis and run it much earlier in the process. For example on every push to a pull request.
Smoke tests or regression tests - These tests are often manually selected from a much larger suite to validate that the software is running. Test Impact Analysis allows you to automate the selection process and run only the tests that matter for a particular code change.
In some ways, the approach that you take with Test Impact Analysis depends on your assessment of where the risk is for your project. Some teams may benefit from running a fraction of the tests on each code push (in pull requests) and all of the tests post-merge. More sophisticated teams may run a fraction of the tests before doing a rolling deploy (and automatically rolling back changes on an error).
One way to think about Test Impact Analysis is that it’s allowing you to shift much of the risk earlier in the development process by running the most important tests for code changes first. This allows developers to get feedback faster and creates a flywheel effect where developers are able to iterate on changes much faster.
What may not be intuitive is that faster dev cycles tend to produce higher quality code, because bad code can be detected and remedied much faster. Test Impact Analysis pushes this approach to the next level ensuring that you’re running the most valuable tests as early as possible for each change.