Once your workspace's model is trained, you can start requesting dynamic subsets of tests for your builds.
Every time your CI system requests a subset of tests to run for a build, it's essentially asking the PTS service a question:
1) the tests in my test suite,
2) the changes in the build we're testing, and
3) the flavors (environments) we're testing under,
which tests from the test suite (1) should we run/not run in order to find out if the change is going to cause a test failure?
To do this, for each subset request, the PTS service completes a two step process:
Prioritizing all tests based on the request's inputs
Creating a subset of tests from the full prioritized test list
The following sections explore this process in more detail.
The inputs for a subset request are:
Full test suite - The list of the tests in your test suite you would normally run in a full test execution. This is what's being prioritized and subsetted
Changes - The changes in the build being tested
Flavors - The environment(s) in which the tests are running
Optimization target - The factor that determines the size (in terms of test duration) of the subset to create to satisfy an aggregate goal (e.g. confidence, duration)
The first three inputs (Full test suite, Changes, Flavors) are fed into the test prioritization step.
Here, we're basically asking the model to prioritize the list of tests for us. The model prioritizes tests based on the factors described above in Model training.
Now let's cover a few common questions about test prioritization.
A model is not a simple mapping of files to tests. Although file paths and test paths are compared for similarity, it's important to point out that Launchable extracts characteristics from changes in a way that makes each change more generally useful for training and inference. Additionally, the historical behavior of the tests themselves (without incorporating changes) are also an important factor.
After all, if a model were just a mapping of files to tests, then it would not be able to make predictions for file changes it has not seen before. Using lots of different extracted and historical data solves this problem.
Because Launchable extracts characteristics from changes in a way that makes each change more generally useful for training and inference, your workspace's model can make predictions for changes made in logical areas of your codebase that it hasn't "seen" yet. This is a massive benefit!
Sometimes, a model may may prioritize tests that, on-the-surface, may not appear to relate to the logical area that is being changed.
In this case, it's important to remember
the model learns from much more than just the relationship between files and changes, such as the tests' execution history and the other factors described above, which may outweigh the logical relationship
the model's goal is to prioritize tests that fail - i.e. tests that don't fail don't usually get prioritized!
given two tests with the same likelihood of failure, the model will prioritize the shorter test over the longer one
because of test runner constraints, in many cases tests must be prioritized at a higher altitude (e.g. class instead of testcase) which can impact prioritization
Then, the prioritized list of tests is combined with the subset request's Optimization target to create a subset of tests. This process essentially cuts the prioritized list into two chunks: the subset, and the remainder.
For example, if your optimization target is 20% duration, and the estimated duration of the prioritized full test suite is 100 minutes, then the subset will include the top tests from the prioritized list until those tests add up to 20 minutes of estimated duration.
Similarly, some common questions:
Assuming the same 1) full test suite, 2) optimization target, and 3) model, two subset requests should take about the same amount of time to run.
Models are regularly re-trained with the latest data, in practice this means that a given day's subsets should all be about the same length, regardless of changes. The duration is informed by the Confidence curve.