One member of Launchable’s beta program is a leading car manufacturer. We'll call them RocketCar. The central tooling team at RocketCar serves approximately 150 developers.
Key challenges: Slow software delivery times. Slow test feedback is a major component of their cycle time challenges. Without Launchable, their only path forward is to add expensive testing hardware.
How does Launchable help? Launchable brings machine learning in test automation tooling to identify tests that matter to run for each change being made by developers. This AI-based testing automation approach helps test faster and delivers cycle time reduction to the team.
RocketCar can make the following improvements with Launchable:
The bottom line? Launchable can help RocketCar to save around $1.1M per year slashing feedback time by 40% (17.4k hours) and reducing the need for costly hardware.
RocketCar builds some of the world's top luxury vehicles: sedans, SUVs, and sports cars. The team at RocketCar is a central tooling team that supports around a thousand developers (internal + vendors). For the beta program, they chose a project that supports 100+ developers.
The team builds software for in-car dashboard systems. The test automation platform thus tests builds on a custom hardware device which costs a few thousand dollars. They maintain an internal lab to house the hardware.
At its most basic, the division at RocketCar is about delivering software on embedded devices. The challenges of slow feedback and optimal hardware utilization are applicable to any company delivering software on embedded devices. RocketCar is similar to several Launchable beta partners who fit this mode of development.
Their primary challenge is slow feedback caused by long build times. Thus, the end-to-end cycle time for delivering features is very high. The tools team faces a couple of executive escalations a month from their stakeholder development teams. Worse yet, this team cannot quantify the organizational cost of slow feedback.
Cycle time was just brought up in an escalation in a meeting. Developers are frustrated that things are not moving fast enough…There is not enough hardware, and we can't ramp up fast enough. Not enough hardware drives up execution time. This is the biggest pain in our team.
At heart, the division at RocketCar is about delivering software on an embedded device. The challenges of slow feedback, cycle time reduction, optimal hardware procurement & utilization are applicable to any company delivering software on devices.
RocketCar is archetypical of a few Launchable beta partners who fit this mode of software development.
Developers have to wait for the hardware to be available before new workloads can be tested. This is where the lack of testing hardware hits the team hard.
The current mitigation path is to add new hardware, but they can only do so during the annual budget process. The annual process means that it is hard to course-correct during the year.
An interesting side effect of this process is that new projects are starved for resources until new hardware is ready. Therefore, newer projects suffer a higher wait time than older projects.
Maintaining the hardware and the corresponding lab is a non-trivial cost. A dedicated team manages the lab. Additional hardware comes with additional maintenance overhead.
We can break RocketCar's test run time into several components:
Each build takes about two hours to complete, so feedback time for developers is also about two hours per change. The fixed overhead and optimizable components are split evenly in the build.
Launchable brought insight into the aggregate time spent waiting across the division. The team runs between 316-476 builds per week. With this and build time data we were able to compute the daily and yearly hours spent waiting by all developers and the corresponding dollar impact. It costs them approximately 35k-50k hours for this team of 100-200 developers.
The machine learning-based testing algorithm analyzed builds, code changes, and test results to create what we call a "Confidence Curve" for RocketCar (shown below). This curve represents how quickly a developer finds that her changes have a problem. It shows the percentage of tests that Launchable needs to run on the x-axis to achieve the confidence level on the y-axis. You can think of confidence as the likelihood that a regression will be detected.
The dotted line shows the pre-Launchable performance of the system. The key number on this curve is 90% confidence; 75% of the tests must run to get to 90% confidence (dotted line). With Launchable, only 20% of the test must run to get to the same 90% confidence number (red line).
By using Launchable to only run the pertinent 20% subset of tests, the test execution time drops by 50 minutes. The build time reduces from 127 minutes to 77 minutes which includes the 65 minutes overhead that we cannot influence.
This is a 39% reduction in build time for developers!
Furthermore, Launchable allows RocketCar to choose any number on the red curve to optimize between feedback time and confidence. For example:
|Test Execution Time||Confidence|
|1.5 minutes (2.5%)||80%|
|6 minutes (10%)||87.5%|
|11 minutes (20%)||90%|
|29 minutes (50%)||95%|
By testing what matters, the pressure on hardware units is reduced and the net impact is that capacity is effectively increased by 2.7x. The team can defer adding more hardware. Getting more juice out of the same hardware increase implies that the associated maintenance costs don’t grow. It is worth calling out that smaller projects don’t starve for resources.
Here are the numbers after choosing to optimize for 20% tests at 90% confidence (on the red curve from the earlier section).
Every test session occupies a single hardware unit. First, about 15 minutes are spent to reset and load the software onto the testing hardware. It then takes about 57 minutes to run the full test suite.
But by only testing what matters, that 57 minutes is cut down to 11 minutes. Now, a test session only occupies a hardware unit for 26 minutes instead of 72! This means that the same hardware pool can now accept 2.7x the previous workload, resulting in shorter cycle time and faster time to market. This stretching of the capacity comes in really handy early in the development cycle when hardware units are expensive and time consuming to manufacture.
A developer must wait for test hardware to come available her tests can run. Today, RocketCar over-provisions hardware to keep the queue wait time reasonable for most cases. 50% of test sessions get the hardware unit they need within 1 minute.
But you might be surprised to hear that the average wait time is much worse – 5 minutes. This is because when things get busy, the queue goes much longer. The worst-case scenario goes north of 60 minutes.
Yet those are precisely the time that developers need test hardware the most!
The issue for the team is that most people remember the one time that they had a bad experience, and that takes away from all the excellent work this team is doing.
By reducing the time a test session occupies a hardware unit, we can reduce the queue wait time by at least 63%. The impact is even bigger in a crunch time when the queue starts to back up.
There are two ways to think about the benefits to RocketCar – bottom and top-line impact.
Here, we’ve focused on the bottom-line impact, which is cost savings in terms of time spent waiting and hardware reduction, because it is easier to quantify. However, we posit that the bigger impact is on the top-line because faster cycle times with higher quality shippable code translate to more benefits delivered by RocketCar to its customers.
The bottom line impact is $1.1M saved each year.
We have a two-fold impact here:
A 40% drop in a test run drives a corresponding drop in the yearly wait time. This was a staggering 17.4k (mean) hours saved (range 12.9k - 22.4k) for a group of 100-200 engineers. The potential for further savings is enormous because this team serves about 500-1000 engineers across all projects in the division.
Saving 40% time implies a 40% dollar savings or roughly ~$1M. This saving easily exceeds the stated goal from the team as they began their engagement with us.
The workload that used to require 11 hardware units can now be served by only 4 hardware units. At $2K per unit, this translates into $14K CapEx saving as well as $3K/yr OpEx saving, even assuming a conservative $35/HU/mo. (OpEx pricing is calculated based on Mac Mini colocation hosting.)
The team’s test suite grows at the rate of 9% per month. This implies that the team is in a race to add more hardware. More hardware requires more lab space, cooling units, and personnel to manage this lab. By reducing hardware, the team can delay these costs.
RocketCar isn’t a company that uses dated software delivery practices. However, the limitations imposed on them because of the underlying hardware implies that they cannot utilize modern development practices to a full extent. These practices recommend pushing smaller changes through the pipeline more often which in their environment is hard.
This is where Launchable shines. We reduce software delivery cycle times because we can identify the right tests to verify changes. Developers getting faster feedback means their eventual changes are higher quality. Consequently, developers can push through smaller changes faster and more often. The ultimate winner is the customer, who now gets value delivered to her earlier.