Measuring Developer Experience Requires Empathy and AI

Real metrics to measure and improve developer experience

Key Takeaways

  • DevEx makes or breaks software development teams but it's difficult concept to actually quantify.

  • A foundational premise to keep in mind when measuring developer experience: developer experience directly drives software delivery velocity.

  • Metrics provides a window into developer experience, but can’t give a full picture of what it’s like in the shoes of your developers.

  • Understand the pain points caused by testing bottlenecks and measuring the health of your test suite over time is the key to fixing your developer experience. 

Nowadays, more and more businesses are emphasizing Developer Experience. There are several reasons as to why it’s top-of-mind for development teams, but it’s mainly because happy developers mean better products, higher retention rates, and overall greater ROI for businesses. With teams more globally distributed, and developers more self-motivated and willing to leave if a company isn’t the right fit for them, organizations are seeing that DevEx makes or breaks software development teams. 

But, Developer Experience is a difficult concept to actually quantify. After all, it’s based in how developers feel about their experiences with the various technologies, people, and processes that they come into contact with every day. More specifically, it relies on understanding factors such as developers’ perception of infrastructure, their feelings toward work, and the value of their contribution.

Obviously, we can’t put microchips into developers’ brains and track how they feel about their experiences at work and about their organizations’ infrastructure (First, because that’d probably be very expensive. And second, because that’s going into “Black Mirror” territory).

But, it turns out that there are some practical ways to measure developer experience that work well for today’s development processes. In fact, many of these metrics are similar to the ways that world-class developer experience engineers assess the processes, technologies, and people at their respective organizations.

A foundational premise to keep in mind when measuring developer experience: developer experience directly drives software delivery velocity. Because of this, taking steps to measure software delivery velocity will lead to a better understanding of how to improve the developer experience. But, this doesn’t mean that typical performance metrics will cut it. Developers are very productive people working on very hard problems. Often, the common measurement defaults don’t take this into context when measuring “success”.

Some examples of common performance metrics that DON’T measure developer experience:

  • Hours worked (doesn’t indicate if those hours went smoothly for the developer)

  • Bugs fixed (doesn’t show how many times the developer wanted to pull their hair out while fixing these bugs, or how many of the bugs shouldn’t have even been in there in the first place)

  • Features shipped (doesn’t give an accurate picture of how easy/difficult it was to ship these features with the processes in place) 

Developer Experience Metrics

While traditional metrics have a place in measuring other aspects of the development process, they don’t truly gauge whether or not your organization is providing a positive developer experience. Instead, there are other metrics that teams can use, to see how their DX is doing:

Deployment Frequency

Deployment Frequency measures how often a team can successfully release software into production. If this number is high, it’s a strong indication that the process is running smoothly.

But, Deployment Frequency misses the possibility that while software might be released frequently, it could just be flat-out bad software. After all, speed doesn’t always equate to quality, and something that looks great in production could actually be absolute chaos under the surface. And if developers need to patch and fix something that’s already in production, that doesn’t bode well for developer experience. 

Lead Time and Cycle Time

Similarly, lead time and cycle time are quality indicators of how long the software development process is taking on a daily basis. Lead time measures the time from the beginning of a project to the moment when it becomes available to the customer. Cycle time reveals how much time each individual project takes to complete.

While these are both great indicators of efficiency, they cannot gauge what happened behind the scenes to get these projects released. Are they actually quality products? Was the work done well the first time, to prevent developers from needing to overhaul parts of the product again in the future? 

Velocity

Velocity is a metric specifically meant for agile teams. It indicates how long it took a team to finish a particular sprint. However, velocity does not accurately show the effort it took to keep up with this timeline. For example, what if the developers kept needing to work after hours because they kept hitting bottlenecks throughout the workday? They might still meet the required timeline for the sprint, meaning that the velocity metric would look positive. But, this still wouldn’t be considered a great experience, by any means.

Work in Progress

Work in progress (WIP) measures the number of tasks in progress. If this measurement is high, it’s a good indicator of bottlenecks somewhere in the software development lifecycle. 

Simply knowing that a project has high WIP is not enough to fix bottlenecks, though. In fact, it can feel like opening a can of worms to measure WIP and realize that it’s too high. Many organizations implement a CI/CD pipeline to clear up bottlenecks. This can help because Continuous Integration includes build automation, which automates when artifacts written by various developers get compiled into builds. Plus, Continuous Delivery or Continuous Deployment moves builds out of the beginning stages of development and into either the test environment or into production (in the case of Continuous Deployment). All of these practices mean fewer bottlenecks, as the code gets moved to wherever it needs to go, without human intervention. 

But, CI/CD pipelines can be slowed down by inefficient testing, which can make bottlenecks that are just as bad, if not worse, than the bottlenecks caused by a lack of CI/CD practices. 

Change Failure Rate

Change failure rate measures how many deployments lead to a degradation in service that must be addressed. This measurement focuses on some of the issues that other speed-related metrics don’t. It brings up the possibility that even though the software might be deployed at a fast pace, it might not be sustainable or great quality. 

Change Failure Rate is an effective indicator of a positive developer experience (because who wants to go back and fix a domino effect of degrading services and software?). But, it leaves out the efficiency piece when used on its own. Good deployments on their own don’t indicate happy developers. This is because the development process in getting to that picture-perfect deployment could have been a long, tedious journey. 

Customer Satisfaction Score

Happy customers mean that a great software product was delivered. A high customer satisfaction score is a huge win because as developers know, customers test a software product’s limits in a way that no one else can. 

This still doesn’t give a complete picture of developer experience, though. It doesn’t show how the process went for the developers behind-the-scenes, in order to produce this great software. 

Team Health

Team Health can be a great way to measure developer experience because it shows how well the work is divided between team members. It means that Team Member #1 isn’t doing extra work that should be Team Member #2 and #3s responsibilities. So, it’s more likely that Team Member #1 will have a good developer experience. After all, nobody wants this experience at their job:

While Team Health is a great measurement, it only considers the “people” part of the developer experience, and not the “process” and “technology” components. Teams might be equally wrangling terrible development technology and sharing the burden together, but that won’t make them enjoy this process any more. 

Time to Restore Service

This metric gauges how long it takes to recover from a failure in production. This can be helpful because it shows how in sync the people, processes, and technologies are when faced with an emergent situation and a fast-paced timeline. 

But, it misses the obvious: it doesn’t measure why that failure happened in the first place, and how many failures end up needing to be fixed in production from month to month! 

Fully Measuring Developer Experience Requires Empathy and AI

Each of the above metrics provides a window into developer experience, but can’t give a full picture of what it’s like in the shoes of your developers. Truly measuring developer experience requires both empathy, and in-depth tooling. One of the biggest components of developer experience that these metrics leave out is the testing piece. 

Modern-day development cycles rely on testing at every point, in order to succeed. It’s one of the foundational pieces of DevOps methodology (i.e. “testing early and often”), and it’s seen throughout the CI/CD pipeline. Continuous Integration relies on automated unit tests, and Continuous Delivery and Continuous Deployment rely on tests to make sure that a given build is ready to proceed to the next stage of development.

But all too often, bad testing practices are shoehorned into the process, and it comes at the expense of any of the metrics we mentioned earlier. Testing creates bottlenecks when developers have to wait on tests for days, leading to low deployment frequency, low velocity, high cycle and lead times, and high levels of work in progress. Plus, running low-quality or irrelevant tests not only takes up extra time but also means that errors will slip through the cracks, leading to a high Change Failure Rate and high Time to Restore Service.

While each metric is helpful, it only measures output, not the input that is causing each of those issues. Understanding the pain points caused by testing bottlenecks and measuring the health of your test suite over time is the key to fixing your developer experience. It’s all about tracking the health of your teams’ bottlenecks, beyond the outputs. We’re helping teams measure developer experience with AI and testing insights.

We use machine learning to measure test suite entropy and improve developer experience with these deep insights on your tests.

  1. Determine if your tests are actually returning accurate results, or if they are sending back false negatives and positives with Flaky Tests Insights.

  2. Identify if there has been an increase in test session time. This could imply that the developer cycle time has been trending up with Test Session Duration Insights.

  3. Flag which tests are being run less often with Test Session Frequency Insights. This signifies negative health metrics like increased cycle time, reduced quality, or fewer changes flowing through the pipeline.

  4. Track which tests fail most often with Test Session Failure Ratio Insights. An increase in failed tests means that a release is becoming unstable, especially if the bump is in post-merge tests.

Test Suite Insights empower development teams to connect the dots between failed tests and how they are affecting the development lifecycle, and developer experience, as a whole.  

By measuring developer experience through the lens of testing insights, you can track the health of your test suite and understand what your developers are facing. Knowing thy enemy is half the battle - tackling bloated test suites is the second half in improving your developer experience metrics.

Identify which tests within your suite are the most critical to run, and stop bloated test cycles with Predictive Test Selection. Run faster, smarter testing cycles. Focus on further improving developer experience by identifying and running tests with the highest probability of failing based on code and test metadata. While Test Suite Insights measure the quality of your process, Predictive Test Selection improves your SDLC’s speed and agility

Seeking Your Expert Feedback on Our AI-Driven Solution

Quality a focus? Working with nightly, integration or UI tests?
Our AI can help.