During the 6.2 Release Cycle, members of the WordPress Performance Team performed a performance analysis in order to identify the biggest opportunities to target for future performance enhancements. To do so, the team first created a methodology that could be used to perform a repeatable process by which results could be tested and confirmed. The full methodology and analysis can be found in this document.
Analysis scope and methodology
To start, the team identified a number of key use cases that would be tested. These were meant to cover the primary out of the box functionality of a simple, default WordPress site, which included testing both a classic theme (Twenty Twenty-One) and a block theme (Twenty Twenty-Three) both configured with the same content from the Theme Unit Test data.5 specific scenarios were tested for both themes:
- A homepage showing the latest posts: A common use case that includes fetching data for a list of several posts in a single request
- A basic page that is text only: A minimal use case that can serve as a baseline against a more complex post.
- A post page including a large set of images and default blocks: This use case allowed us to observe the effect of extra database queries and PHP routines required to render a complex page that is more realistic.
- The same homepage, with translation: This allowed us to see the performance impact of WPs translation functionality compared to a non-translated site.
- The same basic page, with translation: Same as above.
Each profiling test was done on a wp-env docker environment running PHP 7.4 (the recommended version for WP when the analysis was conducted) with XHProf installed for profiling. Other use cases and configurations were considered but not included in this initial analysis, including testing with a persistent object cache active, a multisite setup, and additional supported PHP versions.
Observations from the analysis
Below are the biggest opportunities identified for potential performance improvements, based on the profiling data collected against WordPress 6.2. An overview of the raw data for these observations is in the full results spreadsheet. When possible, relevant existing tickets for each improvement area are included for reference. These are not meant to be an exhaustive list of everything that will be addressed and additional tickets should be created as needed.
Improve template loading and rendering for classic themes
In the classic theme tested, the most expensive process is related to locating and rendering template parts. This starts with get_template_part(), includes the process of locating the template part files with locate_template(), and rendering the content for each template part. This whole process accounted for approximately 30–60% of the entire server response in the test results, with much of that time spent handling filesystem checks (e.g., file_exists() is responsible for 4–9% of all time measured and can likely be optimized with a cache), rendering widget blocks, etc. Given many of these filesystem checks aren’t likely to produce different outcomes often between requests, there are likely opportunities to find substantial improvements here.
Improve rendering of block widgets
Related to template loading, loading and rendering block widgets (i.e. profiling WP_Widget::display_callback) took ~8–24% of the response time. This may be partially due to the inclusion of widgets in the theme test data and may not be representative of all themes. Even so, we have identified that much of this code runs even when a theme is not utilizing widgets, making it a good candidate for further exploration. It’s possible that some widgets have a larger impact than others. For example, the core categories block used in the category list widget was responsible for 5–10% of the response time in our tests.
Improve registration of blocks from metadata
In all test scenarios, register_block_type_from_metadata() is called 91 times and takes from 3–24% of total response time across use cases, with a larger impact on block themes than classic themes. It is heaviest on the home page, and always worse in the first request compared with subsequent ones. Again, much of this time is due to file operations that could be optimized. We could also consider techniques like lazy loading block registration based on whether blocks are in use on a page, or caching registered blocks to avoid duplicate file read operations for blocks that are unlikely to change. For block themes, block registration accounts for 15–25% of the total response time, with register_block_style_handle() specifically accounting for most of that time (184 calls, 13–21% of inclusive wall time, or iwt).
Improve loading translations
In both theme types, the load_textdomain() function was called 2 times when language packs are in use (once for core, and once for the theme), and is responsible for 9–26% of the total response time (most of which is the MO::import_from_file method). This took up a higher percentage of the execution time in the classic theme tests (17–26%) than the block theme tests (9–16%), which may point to opportunities for improvement in classic themes.
- MO::import_from_reader (1 call, 9–25% iwt)
- MO::make_entry (4058 and 3977 calls in our classic and block use cases, respectively, 2–7.5% iwt)
Improve resolving block templates
For block themes, resolving block templates from the file system takes a large amount of time. This is likely due to the need for both database and file system reads during this process. Example function paths:
- get_block_templates (12–21% iwt)
- get_block_theme_folders (3–6% iwt)
- build_template_part_block_instance_variations (4–6% iwt)
- #57756 (fixed in trunk)
Improve term field sanitization
Term field sanitization is being called ~3000-5000 times during a page load in our classic theme tests—adding 1–6% to the total response time alone. A recent change has already been identified that contributed to this problem, and a fix has been committed. A deeper look into ways of reducing unnecessary calls to this function could result in additional improvements. Interestingly, block themes don’t exhibit the same problem with sanitize_term_field, as with classic themes, so it would be helpful to understand why to see if the same improvements could be applied to classic themes.
Additional block theme improvements
Other notable functions that are taking up a lot of time in block themes related to parsing and using data from theme.json and block registry include:
- WP_Theme_JSON::compute_style_properties (65 calls, 2–8% iwt)
- > WP_Theme_JSON::get_property_value (3453 calls, 1–2% iwt)
- Note: this has since been fixed (Related PR)
- WP_Block_Type_Registry::get_registered (homepages only: 3422 calls, 1–2% iwt)
- Likely performance improvements can be made to WP_Block_Type_Registry::is_registered method.
- The function, wp_maybe_inline_styles is particularly slow on the homepage, but always contributes to a large portion of the request time for block themes. (2 calls, 7–18% of iwt)
- https://github.com/WordPress/gutenberg/pull/47833 (merged)
Proposed priorities from the research
Of all the opportunities identified during this analysis, the ones that seem likely to produce the largest impact are the following:
- Improve template loading for classic themes – A majority of websites (based on an April 2023 search of the HTTPArchive) still use the classic theme architecture, so improvements made here could have the largest horizontal impact.
- Improve translation loading – The translation process has a large performance impact when in use. Given that 56%+ of all WordPress websites are using translations, performance improvements to the translations system should have a large horizontal impact as well.
- Improve handling of block registration from metadata – block registration requires expensive file reading and parsing, which could be cached. Additionally, every block might not be needed for every request, so more intelligent registration logic could eliminate the need for much of this effort.
- Improve resolving block templates – this is a heavy operation for block themes that is unlikely to change often unless the site templates are edited or the theme is updated. The addition of some caching mechanisms could really improve TTFB (note that some improvements to this system have already been implemented during the 6.3 release cycle).
- Improve rendering of block widgets – This is a lower priority when compared to the previous items due to the fact that further research is needed to determine the horizontal impact of these changes. Specifically, it’s possible that these test cases may not be representative of real world uses of block widgets.
These efforts will likely require additional research and architectural design before engineering begins. All other items identified could be worked on directly through individual Trac tickets as capacity allows.
Future efforts worth consideration
- Finish making the XHProf/XHGui tooling available more broadly via wp-env and a core environment integration so more people can verify and extend the work we’ve done here.
- Reach out to hosting companies to get various platforms to run analysis on their infrastructure as well.
- Do additional analysis on use cases not covered in this initial effort, e.g., PHP Versions, Object Caching setups, etc.)
- Review and improve the approach used in this analysis to make it easier for the same type of research to be conducted in the future