Today, many software projects have long-running test suites that run all the tests each in no particular order. When you are working on a small change in a large project, this is wasteful. You know that only a few tests are relevant, yet there’s no easy way to know exactly which tests to run.
The core of Launchable Predictive Test Selection is a machine learning algorithm that predicts the likelihood of failure for each test based on past runs and the source code changes being tested.
This lets you to run a dynamically selected subset of tests that are likely to fail, reducing a long-running test suite to a few minutes.
How a machine learning model is trained
Every time tests run, your changes and test results are passed to Launchable to continuously train a model.
Model training looks at the changes associated with each build and which tests failed when tests ran. It builds associations between changed files and which tests tend to fail. It can be thought of as an advanced frequency counting algorithm that tracks associations between failures and source code.
Model training generally takes between 4 and 6 weeks, depending on how often tests run and how many tests fail in a typical run. More data improves the quality of the model, but valuable predictions are generally obtained within a few weeks.
Calculating test prioritization
One way to think about how Launchable prioritizes your tests is that with each successful test, Launchable's confidence grows that the entire run will be successful. The ideal algorithm optimizes for yielding the highest confidence as early as possible.
So confidence and individual test run time are the two primary determining factors for test prioritization.
Confidence is a function of the probability of failure for each individual test as tests run. Tests with a high probability of failure yield a higher confidence gain when successful. When tests with a low probability of failure pass, they yield smaller confidence gains.
Since the goal is to deliver as much confidence as quickly as possible, it makes sense for Launchable to deprioritize a long-running test if the confidence gain from that single test is not high enough to offset the gain of running shorter tests during the same period of time. This is exactly what the Launchable algorithm does.
For example, if test T8 has a high probability of failure and takes 3 minutes to run, and test T4 has a slightly lower probability of failure but only takes 300 milliseconds, Launchable will prioritize the shorter test (T4) before the longer test (T8) because it yields a higher confidence gain in a shorter period of time.
If your tests take a very long time to run, you should consider running a subset of your tests earlier in your development cycle. We call this use case "shift-left." For example, if a test suite that runs after every pull request is merged takes 5 hours to run, a 30-minute version of the same suite could be run on every git push while the pull request is still open.
While you could accomplish this by manually selecting tests to run, this has the disadvantage that the tests most relevant to the changes present in a build may not be run until much later in the development cycle.
Launchable provides the ability to create a subset based on the changes present in the build every time you run tests. We call this a dynamic subset because the subset adapts to your changes in real-time.
A dynamic subset prioritizes all of your tests and then returns the first part of the total sequence for your test runner to run. The cutoff point can be based on either the maximum length of time you specify (e.g. 30 minutes in the above example) or the minimum confidence level you wish to achieve.
Launchable can be quickly integrated with your test suite through your existing pipeline script and the Launchable CLI:
Basically, when a build is ready for testing:
1. You use the CLI to send the changes in the build, the full list of tests to run, and a target for the test run (e.g. 20 minutes or 90% confidence) to Launchable.
2. Launchable responds with which tests to run based on the changes in the build and the target.
3. Your CI tool runs the subset of tests using your build tool or test runner.
4. When the run ends, your CI tool sends the test results to Launchable to continuously train your machine learning model.
What data is sent to Launchable?
Launchable’s machine learning algorithm learns the relationship between code changes and the tests impacted by those changes through metadata that is sent to the Launchable API through your test runner.
We do not currently employ static code analysis, so the full contents of your source code does not need to be sent to our servers. The data that is sent currently includes...
Metadata about the code changes being tested:
the names and paths of files added/removed/modified in the change
number of modified lines in files in the change
the location of modified lines in the files in the change
Git commit hashes associated with the change
Git author details associated with those commits
Metadata about the test cases that were run:
the names and paths of test cases and test files
pass/fail/skipped status of each test case
the duration (run time) of each test case
test case associations to test suites (e.g. ‘unit tests,' ‘integration tests,’ etc.)