The benefits of prioritizing and measuring performance in WordPress 6.2

Based on lab benchmarks, WordPress 6.2 loads 14-18% faster overall for block themes and 2-5% faster overall for classic themes (measured via Largest Contentful Paint / LCP). Particularly server-side performance (measured via Time to First Byte / TTFB) is seeing a major boost of 17-23% for block themes and 3-5% for classic themes, which directly contributes to the overall load time.

This post provides more information and a retrospective on how those performance wins were achieved in the WordPress 6.2 release cycle.

Learning from previous WordPress core releases

During the beta/RC testing phase of WordPress 6.1 in November 2022, it became evident that a few notable performance regressions had made it into the release when compared to the performance of WordPress 6.0. While the single most impactful performance regression was addressed before the 6.1 stable release in one of its release candidates, overall performance still regressed slightly when using a block theme. Previous WordPress core releases, especially 6.0, saw more notable performance regressions. Despite other performance enhancements landing in those releases, the regressions effectively ended up canceling out the enhancements. In WordPress 6.2, this is significantly different, with all key metrics improving over the previous release, as highlighted in these performance benchmarks shared by @oandregal.

For WordPress 6.1, the regressions were discussed in a general assessment of WordPress 6.1 RC performance. Despite contributors from different core teams quickly prioritizing and investigating the issues, it was too late in the release cycle to address them all. This was a great learning experience: As much as we are working on performance enhancements, it is as important to continuously monitor performance of existing core features to avoid regressions. The more regressions there are, the less impactful any other performance enhancements are overall. While this seems logical and simple, it sometimes still takes an actual learning experience to get things right.

It is great to see that we have put these learnings into practice, and the performance wins in WordPress 6.2 demonstrate that. So what changed in the 6.2 release cycle?

Increased focus on performance measurement

When trying to summarize how performance work changed between the 6.1 and the 6.2 cycle in a single word, an increased focus on measurement proved to be the deciding factor. The following elaborates on the nuance of that simplified statement.

Identifying performance bottlenecks and opportunities for improvement

It was clearly visible that contributors were keen to rectify the 6.1 regressions and learn from that prior release. This already surfaced in the WordPress 6.1.1 follow up release which contained a few performance-related fixes: 12 of 30 tickets fixed in that release were focused on performance.

Numerous contributors from different core teams actively worked towards identifying and addressing performance bottlenecks, by both benchmarking and profiling WordPress core performance.

Profiling WordPress core has been tremendously helpful in identifying performance issues on the server-side, which has led to the notably improved TTFB performance we are seeing in WordPress 6.2. Common tools used for server-side profiling among contributors have been the open source tools Xdebug and XHProf and the SaaS tool Blackfire. Contributors have been starting to define and document standardized ways for using these tools, which are intended to be published in the Make Performance Handbook soon.

For benchmarking WordPress core on the other hand, there was less clarity or known tools that could be used as is. While contributors initially came up with individual tooling of their choice to measure performance, the different benchmarking approaches had varying degrees of accuracy and ease of use, so it soon became clear that a more consistent approach would be needed. This was one of the key topics discussed in the first performance focused hallway hangout in January. From there, several contributors started more coordinated efforts for measuring performance, for both manual benchmarks locally and automated benchmarks via continuous integration.

The Performance Lead role

Another change that was made for the WordPress 6.2 cycle which has been supporting everything I have mentioned so far is the introduction of a new Performance Lead role as part of the release squad. This came as a result of the aforementioned performance focused hallway hangout, and I ended up stepping into this role for the 6.2 release. This enabled me to closely collaborate and support the other contributors and coordinate with them our performance measurement approaches. As mentioned before, I would like to emphasize that the performance wins in this release are a result of excellent work from several contributors on identifying performance weaknesses. The introduction of the Performance Lead role then merely brought a better representation of performance alongside the other members of the release squad.

I hope the role of the Performance Lead is here to stay, and I am excited to see additional contributors step into this role in the future.

Assessing performance on individual WordPress core patches / pull requests

As mentioned before, profiling is the recommended approach to identify performance bottlenecks in WordPress core. However, once a pull request with a potential fix is implemented, it is also crucial to measure the actual performance impact and through that validate whether the outcome is as expected. While profiling gives us an idea about the potential performance impact, it comes with caveats such as the overhead of the profiling tools running on the WordPress site, and also that it only captures a single request, which as mentioned above is subject to a good degree of variance when it comes to performance.

Various contributors in the WordPress 6.2 cycle benchmarked performance on individual pull requests, whether to prove a positive impact or performance or ensure that no regression is introduced. Different tools were used to assess the impact, often CLI commands like “benchmark-server-timing” or “benchmark-web-vitals”. The “benchmark-server-timing” command has been most helpful for individual PRs with server-side performance impact, for example it was used in the #57502 ticket (see this comment), which is possibly the largest single performance enhancement in the 6.2 release. The “benchmark-web-vitals” command however has still been useful for a few situations where performance decisions came down to exclusively client-side performance, for example in the #56990 ticket (see this comment), which explores the classic-themes.css performance impact.

Launching an automated performance testing workflow

In the WordPress 6.2 cycle, most of these benchmarks were conducted manually, which sometimes is a necessity due to the nature of the pull request, but other times is rather inefficient. Furthermore, it would not be feasible to benchmark performance manually for every WordPress core change – and that is precisely how a performance regression may be merged unnoticed. Several contributors have been collaborating on introducing an automated performance measuring CI workflow to WordPress core, and a first MVP was committed to WordPress core in [55459]. With this CI workflow, WordPress core performance metrics are now recorded for every single commit and are available in this dashboard. This allows us to easily spot a potential regression where previously it would have gone unnoticed. While at this point, there is still a sizable amount of variance in the data points and a limited number of metrics are available, the team will iterate in the coming weeks and months. This is only the starting point, and additional features like CWV support are already being planned. Needless to say, this is a major milestone and win for monitoring performance in WordPress core and will reduce some of the measuring workload already for the upcoming 6.3 cycle.

Assessing performance of WordPress core holistically

While assessing performance on every individual WordPress core change (pull request / commit) is very important to ensure continuous monitoring of performance and avoid regressions, it is also important to keep track of overall performance in WordPress core. This is particularly true during the Beta and RC stages of a release cycle.

At this point in particular, it is advisable to use the production ZIP version of WordPress core (e.g. a particular Beta or RC release) instead of measuring in the WordPress core development environment. The “benchmark-web-vitals” command mentioned in the previous section is perfect for this use-case, as it provides high-level performance metrics that capture both server-side and client-side performance. The resulting data can then be compared with the same metrics from e.g. the previous stable release, to get an idea how performance of WordPress core has changed (hopefully improved!) in the new release. This approach is what the numbers I shared in the beginning of this post are based on.

Performance benchmarks for WordPress 6.2

I would like to share a few more detailed numbers for the WordPress 6.2 performance improvements. Generally, I have been benchmarking two different scenarios, which showcase particularly the server-side performance enhancements (with #57502 contributing the most to them):

  • Home page using a block theme (Twenty Twenty-Three) with the default content (“Hello World!” post)
  • Home page using a classic theme (Twenty Twenty-One) with the default content (“Hello World!” post)

Since WordPress 6.2 included one notable client-side performance enhancement that affects only sites using images (see #56930), I included two more scenarios to assess that impact too:

  • Home page using a block theme (Twenty Twenty-Three) with the default content (“Hello World!” post) and a featured image on that post
  • Home page using a classic theme (Twenty Twenty-One) with the default content (“Hello World!” post) and a featured image on that post

For all of these scenarios, I then loaded the URL 20 times using the “benchmark-web-vitals” command and recorded the metrics. The full metrics include more granular percentiles, but by far the most important ones are the medians (p50). Here is the data for the two scenarios with only the default “Hello world!” post (no featured image):

ScenarioMetricWP 6.1.1 medianWP 6.2 medianDiff %
Block Theme: Twenty Twenty-ThreeLCP281.7ms241.15ms-14.39%
Classic Theme: Twenty Twenty-OneLCP209.65ms203.65ms-2.86%

For comparison, here is the same data for the two alternative scenarios where the post has a featured image. Note how the LCP improvement for block themes is even more pronounced in this scenario:

ScenarioMetricWP 6.1.1 medianWP 6.2 medianDiff %
Block Theme: Twenty Twenty-ThreeLCP292.8ms241.4ms-17.55%
Classic Theme: Twenty Twenty-OneLCP217.65ms206.95ms-4.92%

The full data can be inspected in this spreadsheet.

To close this section, it should be noted that of course the 4 scenarios above are not representative of what most actual WordPress sites look like. Some other benchmarks were using the theme unit test data, and while that is more content, it is not necessarily more accurate either. There are inevitably limitations from lab analyses, and we will never be able to capture the “average” or “realistic” WordPress site synthetically. However, the Core Performance Team is exploring a few good baseline scenarios as part of enhancing the aforementioned automated core performance testing CI workflow, and it would be great if in the 6.3 cycle we could align the benchmarks to use similar scenarios rather than every contributor including myself having their own scenarios for benchmarking.

For some of the individual ticket highlights that contributed to the performance improvements in WordPress 6.2, please see the recent core editor improvements post.

Comparing performance between block themes and classic themes

You may have noticed in the data above that the classic theme is apparently loading faster than the block theme. Let me clarify that a bit since, while that is technically true based on my benchmarks, the data may be deceiving at first glance.

Let’s take another look at the data from the first table above: For the LCP metric, the classic theme already loads notably faster (203.65ms vs 241.15ms, based on the first two scenarios for WordPress 6.2), but for TTFB, it’s almost twice as fast (72.6ms vs 137.8ms). Server-side performance is better in classic themes mainly because block themes have to handle more data since much of what a classic theme’s code is responsible for is now handled through more dynamic features that furthermore rely on making additional database queries. However, it also needs to be acknowledged that block theme support in WordPress core is still relatively new (little more than 1 year old now), while the logic for classic themes has seen more than a decade of refinements. In other words, chances are there is still a lot of headroom for improving server-side performance of block themes, while for classic themes there is probably less of that. The performance improvements in WordPress 6.2 perfectly indicate that, noting that TTFB with a block theme has improved far more than TTFB with a classic theme.

Now here comes the most important part though: In client-side performance, block themes are much faster than classic themes. The way to spot that is by looking at the difference between the LCP metric and the TTFB metric, and here’s why: LCP can be considered representative of the overall load time, while TTFB is the server response time. So “LCP minus TTFB” is representative of the client-side load time. If you make this calculation for the above data points for WordPress 6.2, you can clearly see the benefits of the block theme when it comes to client-side performance (103.35ms vs 131.05ms). These benefits are evident due to the more dynamic logic that is applied in block themes to e.g. load scripts and stylesheets. Rather than the common pattern of enqueuing a big stylesheet and big script in the theme, block themes load assets more granularly and dynamically, only for what is actually needed on the current page.

You may argue that it doesn’t matter that client-side performance is ~30ms faster in a block theme if at the same time server-side performance is ~65ms slower. Yes, in this benchmark the classic theme is overall ~35ms faster than the block theme – however that is without any caching. It is a common best practice for WordPress sites to use a full page cache to avoid the need for all the WordPress server-side logic to run on every page load or, even better, to avoid the request to hit the WordPress site at all in favor of serving a cached response. While by far not all WordPress sites use a full page cache, many do. And in that case, the server-side performance becomes potentially less relevant. Keep in mind however that even then improving server-side performance is still important: As mentioned, many sites still do not use a full page cache, and even for those that do, certain dynamic content is almost impossible to reliably cache. However, if your WordPress site uses primarily static content and a full page cache, block themes are already faster than classic themes today – because they are faster client-side. You can “cache” away server-side performance problems, but you cannot do that on the client-side. In other words: Despite block themes being slower on the server-side, they provide a better foundation for building performant sites in the long run.

Appendix: Tools to measure and profile performance

To finish this post, here you find a list of links to some of the tools mentioned:

Props @annezazu @tweetythierry @desrosj @joemcgill @hellofromtonya @spacedmonkey for extensive review and proofreading.

#6-2, #performance, #retrospective

Leave a Reply

Your email address will not be published. Required fields are marked *