Learn how Airbnb built the Page Performance Score, a 0–100 score that measures multiple performance metrics from real users on any platform.
Performance is important at Airbnb and part of our Commitment to Craft. A fast experience is good for business and critical to our mission to “create a world where anyone can belong anywhere”.
Before we can create a fast experience we need to agree on what “fast” measures. Web, iOS, and Android each have different platform-specific performance metrics. For product engineers it can be challenging to understand which of these metrics to prioritize, and for management it’s difficult to compare platforms and keep progress reports succinct.
We’ve developed a new performance measurement system called the Page Performance Score that allows us to track multiple performance metrics from real customers across different platforms with ease. This post describes that system, and in the following weeks we’ll be publishing deep dives into the specifics for Web, iOS, and Android.
Early Performance Measurement Efforts
When Airbnb first started measuring performance, we used a single metric called “Time To Airbnb Interactive” (TTAI) that measured the time from page start to when content became visible and interactive. This approach had many positive outcomes. We built performance tracking architecture, fixed latency issues, and cultivated a company culture that valued performance.
However, TTAI also had shortcomings. Different platforms had different baselines and goals. Page comparisons were difficult because the “interactive” definition could change between similar pages. In some situations TTAI improved but engagement metrics did not. Most importantly, TTAI was a single metric and a single metric cannot capture the full spectrum of our customers’ performance expectations. Our definition of “fast” was incomplete and limited our overall performance efforts.
A single metric cannot capture the full spectrum of our customers’ performance expectations.
Introducing the Page Performance Score
We needed a nuanced view of performance while maintaining the simplicity of tracking a single number, so we created the Page Performance Score (PPS).
- Page: The entire customer journey on Airbnb is divided into different pages.
- Performance: A page contains multiple performance metrics.
- Score: Every day, on each platform, we formulate a given page’s performance data into a 0–100 score.
PPS allows us to combine multiple input metrics into an easily comparable score. PPS is a step-function improvement over our prior single-metric approach.
The metrics that we measure differ by platform, but the general approach of measuring multiple metrics and formulating a 0–100 score is the same. All of the metrics are user-centric and fall into two general categories:
- Initial Load Metrics measure the time from “page start” to content visible.
- Post Load Metrics measure page responsiveness after the initial load.
Initial Load Metrics
Time To First Contentful Paint (Web) and Time To First Layout (Native) measure the time from “page start” until the first piece of content is visible, which is commonly a loader.
Time To First Meaningful Paint (Web) and Time To Initial Load (Native) measure the time from “page start” until the meaningful content is displayed.
Initial Load Metrics are visualized on the left.
Post Load Metrics
First Input Delay (Web) measures the delay between user interaction and when the browser begins to respond. Delays of 50ms or longer are perceptible to the user.
Total Blocking Time (Web) and Thread Hangs (Native) cause the app to lag during layout, animations, and scrolling.
Additional Load Time (Native) measures the average time that additional loaders are displayed within a page, such as during pagination.
Rich Content Load Time (Native) measures the average time for images and videos to load.
Cumulative Layout Shift (Web) measures layout instability weighted by the size and distance of the element shift.
After measuring the metrics we distill that information into a single number using the PPS Formula, which was forked from the Lighthouse Formula. For each metric we identified Good, Moderate, and Poor thresholds based on internal and industry data. We created a scoring curve by assigning the Good range a score above 0.7, the Poor range below 0.5, and the Moderate range in between.
Every day we calculate a given page’s metric’s capped average value from millions of real-user page loads. We map that capped average value against the metric’s curve to get a 0–1 score. We combine the metric scores into a composite PPS score by multiplying the metric scores by the metric weights. We chose the weights by examining our performance-focused A/B tests and ensuring that the weights are maximally aligned with Airbnb’s internal engagement metrics.
Web Metric Weights
Native Metric Weights
The resulting PPS formula can be expressed as….
PPS = curve(metric_1) * weight_1 + curve(metric_2) * weight_2 …
For example, on Web….
PPS = curve(TTFCP) * 35% + curve(TTFMP) * 15% + curve(FID) * 30% + curve(TBT) * 15% + curve(CLS) * 5%
Migrating the company from a single metric to PPS was organizationally challenging. We had to train the company to stop viewing performance as a single seconds-based number, which is a paradigm shift that requires cross functional alignment. To help ease the transition we mapped the old TTAI ranges with the new PPS ranges.
Once the company understood PPS, improving on it was comparatively easy. We simply add or replace metrics as our understanding of performance improves and the 0–100 score remains constant. PPS was designed to evolve. For example, in 2019 the Chrome team introduced Cumulative Layout Shift, which was a perfect candidate for Web PPS. It was a user-centric metric, had good browser coverage, and could be measured on direct and client-routed page loads. We instrumented the metric, validated the data, and then incorporated it into the next version of PPS. Easy!
Weighted Average Score
In addition to tracking individual pages’ PPS scores we track the entire organization’s overall performance progress by creating a Weighted Average Score (WAS). Consider these example PPS scores and traffic for three common pages:
(73 * 5,000,000 + 84 * 20,000,000 + 75 * 10,000,000) / 35,000,000 = ~80
If these were the only pages at Airbnb our WAS would be ~80. Airbnb has hundreds of pages so a WAS helps us prioritize and proportionally weight the most high-traffic pages.
With PPS our engineers and data scientists now have a multitude of user-centric performance metrics to understand and improve their products. We can clearly compare the performance progress of different pages, different organizations, and even different platforms. PPS allows teams to set simple goals and determine which individual metrics to prioritize. PPS can evolve: metrics can be replaced, weights can change, targets can tighten, and yet the 0–100 score remains constant.
Changing our definition of “fast” has been well worth the effort. The company has evolved from viewing performance as a single metric to a 0–100 score that represents the rich, complex realities of performance. We have leveled up our performance measurement system and hope that you apply these learnings in your organization as well.
Thank you to the everyone who has helped build PPS over the years: Aditya Punjani, Alper Kokmen, Antonio Niñirola, Ben Weiher, Charles Xue, Egor Pakhomov, Eli Hart, Elliot Sachs, Gabe Lyons, Guy Rittger, Jean-Nicolas Vollmer, Josh Nelson, Josh Polsky, Luping Lin, Mark Giangreco, Matt Schreiner, Nick Miller, Nick Reynolds, Noah Martin, Xiaokang Xin, and everyone else who helped along the way.
Interested in joining Airbnb? Check out these roles:
Android Software Engineer, Guest Experience Senior iOS Software Engineer, Guest Experience Senior Android Software Engineer, Guest Experience Staff iOS Software Engineer, Guest Experience Staff Android Software Engineer, Guest Experience Senior Software Engineer, Guest Experience Staff Fullstack Engineer, Guest Experience Senior Data Scientist — Analytics Engineering, Guest Experience