Michael works for an American multinational technology company that provides workforce management and human resource management services, lets call it the Northstar Recruiting International (NRI). NRI is one of the largest cloud computing companies in the US with around ~13,000 employees.
Michael is a senior engineer for a DevOps/QA team at NRI and is primarily responsible for maintaining their CI tooling and certain aspects of testing.
The RecruitingX application is one of their key SaaS applications.
NRI needed their developers and development teams to iterate faster on new releases. They needed faster critical feedback to understand if their code was deployable. NRI needed to actually create more time to write more tests, but without slowing their pipeline down. Additionally, NRI needed to reduce cloud costs for test execution while running additional test suites.
RecruitingX is a multi-repo project and owned under the Quality Growth team. It has a fairly complex build process. The build comes together post-merge with code from 7 separate repositories and integrated together.
They refer to the entire build/test process as “continuous regression” to show their desire to constantly do a build-test loop. However, the long test cycle gets in the way—today it runs every two hours. It takes 6-7 hours to compile, build, deploy and test the application.
The problem is that they have reached their threshold of parallelizing tests. This parallelization is split between 5 sub-domains inside of the product, which all test simultaneously. Even factoring-in this 5x parallelization of their testing process, the test times were roughly 5 hours long.
A layout of their current post-merge process looks like this:
We asked Michael to fundamentally rethink how he can improve the efficiency of his system by bringing ML to testing. Launchable brings Predictive Test Selection to smartly identify the tests that are most likely to fail and prioritize them first. (See “replacing static parallel suites with dynamic subsets” documentation for more)
|Parallelize → throw more hardware → run out of money or run out of hardware||Parallelize → optimize the tests that land in each parallel resource → Optimized resource utilization → Faster delivery|
Michael threw their code and tests to train Launchable’s ML. The team chose to use a 90% confidence threshold which means that the ML can identify 90% of the failing builds by running around 30 minutes for this particular test suite.
With Launchable, they run a subset of these domain tests for much faster feedback. Essentially, their entire layout is the same—except run for a shorter period of team. The RecruitingX application has another test suite that is run just before deployment to production to catch any missed failures. In this case, they have optimized for faster turn around time for the development team.
Since each domain requires a unique environment, and each domain has unique tests, they are subset individually by domain.
A layout of this process can be seen here:
This has significantly sped up their deployment process, with testing no longer being the bottleneck in this pipeline. The team now has additional internal resources to work on other build pipeline improvements to use in conjunction with Launchable.
NRI was able to optimize their costs, and send feedback earlier to their developers. By using Predictive Test Selection, NRI reduced their 5 hour tests to run in only 30 minutes, an efficiency increase of 90%. Shorter test times means faster feedback, resulting in better developer experience.
Michael achieved his goal of drastically improving their testing turnaround times, allowing for developers to find bugs faster and get code out the door. His developers are seeing massive reductions in testing time, greatly improving dev cycles.
After seeing the success Michael had with this project, other teams in NRI are onboarding themselves to also harness the power of Launchable ML to reduce their testing times.