Lighthouse, Web Performance, Architecture, And You
In my most recent project, I’ve been using Google’s Lighthouse audit tool to understand the performance profile of what I’m building. I wound up getting our target pages to around a 97 performance without the use of many of the advanced performance techniques usually required, and without Service Workers. We accomplished this by keeping performance as a key audit in all the work we delivered, and heavily leveraging knowledge of Lighthouse and Web Performance from the inception of our project, influencing everything down to our architecture. Sharing this high-level knowledge will hopefully make it easier for others to do the same.
What is Lighthouse, and How It Works
Lighthouse is a performance audit tool available as a Chrome extension, Node module, and under the Audit tab in Chrome’s Dev Tools built by Google’s Chrome team (world-recognized as leading experts on web performance in the world). Lighthouse performs the following 4 audits, all on a scale from 1-100 (100 being the best):
- Progressive Web App - Percentage of items from Google’s Progressive Web App Checklist that are complete
- Performance - Grade based on the user-centered RAIL Performance Model and real-world performance work the Chrome team, and other experts on web performance around the world, have done, as well as how browsers handle the rendering of a page. The Focus on the user section of perception of performance delays comes from direct research they and others have done on the limits of human perception. The recommendations in RAIL stem from those perceptions.
- Accessibility - Percentage of the provided selection of automated accessibility provided by the aXe accessibility testing engine that pass. Importantly, these are not all of the accessibility tests that can be run by aXe, and they only cover things that can be tested through automation; many aspects of accessibility testing cannot be tested automatically.
- Best Practices - Roughly the percentage of the provided modern best practices the Chrome team recommends websites implement.
When running tests, Lighthouse by default runs from a cold cache (no files in cache), emulating 3G networks speeds and average mobile device hardware by throttling the network and slowing down the CPU by 4x from the machine’s default speed. The reason this is done is to attempt to emulate real-world browsing conditions for users on average mobile hardware (slow networks, slow hardware, small and likely flushed caches); the fastest growing, and in many areas, largest demographic of Internet users. Worldwide performance experts emphasize this group both for that reason and, much like mobile-first responsive web design, making a website perform under those constraints will result in excellent experiences on other combinations, where the reverse is not strictly true. Globally, expert performance recommendations and expectations are built on this model, as well as the performance perception concepts presented in RAIL.
When including a Progressive Web App audit, Lighthouse will run multiple tests to see how a site performs offline with a warm cache.
Any Chrome extensions you may have in place that may affect what is loaded on a page (such as ad or tracking blockers) will apply to Lighthouse tests, so be cognizant of that when running lighthouse tests. For the most accurate audit, I recommend running Lighthouse from a live website in a state with no chrome extensions (except Lighthouse) enabled; incognito mode is good for this.
Within Lighthouse’s Performance metrics, there are a handful of specific numbers that may not make sense immediately:
- First Meaningful Paint - The number of
msit takes for the primary content of a page to be rendered
- First Contentful Paint - The number of
msit takes for any content defined in the DOM (text, images, a canvas render, etc…) to be rendered. This is on track to be added to Lighthouse but not in what’s presented below
- First Interactive - The number of
msit takes (to the start of the timeframe)for the main JS thread to be idle enough to handle user input. In an upcoming version of Lighthouse, this will be likely be renamed to something like First Idle
- Consistently Interactive - The number of
msit takes to the start of 5 full seconds of network and main thread idle time (see the Time to Interactive definition from the original PR for how this is calculated). In an upcoming version of Lighthouse, this will likely be renamed to Time to Interactive (or TTI)
- Perceptual Speed Index - A score based on then the perception of a complete page render is calculated, as well as a weighted score compared to an ideal Speed Index based on RAIL. See the Official WebPagetest Speed Index Documentation for how the metric is calculated works.
- Estimated Input Latency - The estimated number of
msit takes for an user input request to trigger a response, as well as a score compared to an ideal input latency based on RAIL. The value provided describes is the 90th Percentile, so it’s estimating that 90% of inputs will have the provided latency or less, while 10% will have this latency or more. It describes the availability of the main thread as a proxy metric for input latency.
Understanding Web Performance and Where to Optimize
One of my primary uses for Lighthouse is using it to understand the performance characteristics of a page, so understanding how different resources affect performance is important in understanding why Lighthouse scores and makes the recommendations it does, as well as helping us understand how we can improve. When talking performance bottlenecks, while every KB costs the same over-the-wire (while being downloaded), not every KB has the same effect once it hits the browser. 1KB JS > 1KB Images > 1KB CSS > 1KB HTML.
- Images are next-most expensive as they’re less able to be compressed and you tend to need more of them to “get your point across” than other bytes. In addition, some formats are better than others at different things, so knowing when to use one type of image over another is important in optimizing performance. All else equal, JPEG usually produces smaller file sizes for photograph-style images than PNGs, but at the cost of image quality loss (which often is OK for graphic images, less OK for fine details or line art). PNGs are usually lossless (good for fine-details) and support transparency, but generally produce a larger file size when used for graphic images. WebP (supported in Chromium browsers with Webkit and Geko experimenting) are lossless like PNGs but very compressed, resulting in images that are usually much smaller than JPEGs or PNGs with the advantages of PNGs, including animation support. SVGs are great for iconography and other non-graphic images (like logos), can be styled with CSS, and are text under-the-hood so they can be inlined in to HTML and compressed. Knowing which image format to use, optimizing them before sending, and migrating to inline SVGs for icons and logos can have a large impact on overall file size, as can loading only the large graphical images that are needed in the current display (lazy loading).
The number of files sent over-the-wire is also important as browsers allow only a limited number (<10) parallel connections from a single domain; the more things being requested at once from the same domain, the higher the likelihood that an asset will need to wait to be downloaded. HTTP2/Push can help with this by returning more than one item per request, but it’s not a panacea as it ignores browser cache, which is likely to cause a problem if not coupled with something like a Service Worker to more precisely control how a user’s cache works to prevent extra downloads. Needing to draw resources from multiple domains also incurs performance problems as each new domain needs to go through the networking handshake the first time it’s used on a page load, adding overhead. A balance should be struck, with a good rule of thumb being loading about 6 items per domain, and trying to only include a new domain only if more than one resource is coming from it.
Finally, some things are discovered really late in the render process (custom fonts are a good example, not being found until CSS has been downloaded, parsed, and a selector matches that requires the custom font). Preloading/Prefetching can help get these resources available for the browser to use before they’re actually discovered for use, reducing the time it takes to get it active.
Over-the-wire costs for all resources can be drastically reduced for warm caches (2nd+ page load) by introducing a [Service Worker] to control the browser’s cache, allowing you to go so far as provide full offline support for assets. Be careful with Service Workers, though, as they’re still a little bit difficult to get right, cache invalidation being one of the two hard things in Computer Science. That said, Jeremy Keith has a blog post on a minimal viable service worker that’s a good first view of a useful, functional service worker to start from.
Lighthouse in Architectural Discussions
As part of my recent work, I worked with another team to discuss potential architectural changes through the lens of Lighthouse’s audit and the web performance overview above. They knew they had a performance problem, but seeing just how stark it was through Lighthouse was eye-opening, almost shocking. One of the tricky things about performance (much like mobile experience back in the early days of Responsive Web Design) is it’s hard to quantify how bad the problem is if the site is abandoned before web analytics kicks in. From Google to Amazon to Walmart we know that even a 100ms improvement in web performance has a direct correlation to conversions, revenue, and SEO, so we agreed upon a hypothesis that improving their performance would improve their analytics, and started to discuss how we could improve.
The first thing we did was talk about how different user experiences should affect the architectural choices and why some trade-offs in performance make more sense depending on these user experiences. We generally boiled down web experiences in to three categories:
- Static - These experiences mostly have content that is displayed to the user with very little interaction between different elements on a page. The full content and URL of a page usually changes with an interaction instead of causing a small change elsewhere on the page. Blogs, tutorials, and news and marketing sites are usually examples of static experiences.
- Dynamic - These experiences mostly have a single view with many moving parts that change often. Sometimes these change with user input, sometimes they change with new incoming information, sometimes both. Changes in the display aren’t easily correlated with a logical stand-alone URL. Dashboards, games, and apps are all examples of usually dynamic experiences.
- Stynamic (because I’m funny) - These experiences are somewhere in the middle; I tend to find they are mostly static experiences with dynamic parts scattered throughout. Many sites have part of the experience that falls in between, like a real-time news feed or comment system on an article.
I generally encourage always sending some form of meaningful HTML over-the-wire regardless of major experience category. This can be a fully-rendered page, a partially-rendered page, or even an App Shell. I encourage this to improve perceived performance and given the browser a leg-up in rendering the final experience. We saw in Lighthouse (and confirmed through other means) that no meaningful HTML was being served over-the-wire to a user, and that was a big cause of the slow First Meaningful Paint score in Lighthouse.
With this insight, we looked at the Opportunities section in Lighthouse’s Performance section and used that as a discussion point to start to talk about what low-hanging fruit we could tackle to improve performance, and what was going to require some deeper architectural changes. Our goal is to use Lighthouse to chew down this performance debt little by little, testing and either confirming or nullifying our assumptions as we go.