Lighthouse, Web Performance, Architecture, And You

3/7/2018

In my most recent project, I’ve been using Google’s Lighthouse audit tool to understand the performance profile of what I’m building. I wound up getting our target pages to around a 97 performance without the use of many of the advanced performance techniques usually required, and without Service Workers. We accomplished this by keeping performance as a key audit in all the work we delivered, and heavily leveraging knowledge of Lighthouse and Web Performance from the inception of our project, influencing everything down to our architecture. Sharing this high-level knowledge will hopefully make it easier for others to do the same.

What is Lighthouse, and How It Works

Lighthouse is a performance audit tool available as a Chrome extension, Node module, and under the Audit tab in Chrome’s Dev Tools built by Google’s Chrome team (world-recognized as leading experts on web performance in the world). Lighthouse performs the following 4 audits, all on a scale from 1-100 (100 being the best):

Progressive Web App - Percentage of items from Google’s Progressive Web App Checklist that are complete
Performance - Grade based on the user-centered RAIL Performance Model and real-world performance work the Chrome team, and other experts on web performance around the world, have done, as well as how browsers handle the rendering of a page. The Focus on the user section of perception of performance delays comes from direct research they and others have done on the limits of human perception. The recommendations in RAIL stem from those perceptions.
Accessibility - Percentage of the provided selection of automated accessibility provided by the aXe accessibility testing engine that pass. Importantly, these are not all of the accessibility tests that can be run by aXe, and they only cover things that can be tested through automation; many aspects of accessibility testing cannot be tested automatically.
Best Practices - Roughly the percentage of the provided modern best practices the Chrome team recommends websites implement.

When running tests, Lighthouse by default runs from a cold cache (no files in cache), emulating 3G networks speeds and average mobile device hardware by throttling the network and slowing down the CPU by 4x from the machine’s default speed. The reason this is done is to attempt to emulate real-world browsing conditions for users on average mobile hardware (slow networks, slow hardware, small and likely flushed caches); the fastest growing, and in many areas, largest demographic of Internet users. Worldwide performance experts emphasize this group both for that reason and, much like mobile-first responsive web design, making a website perform under those constraints will result in excellent experiences on other combinations, where the reverse is not strictly true. Globally, expert performance recommendations and expectations are built on this model, as well as the performance perception concepts presented in RAIL.

When including a Progressive Web App audit, Lighthouse will run multiple tests to see how a site performs offline with a warm cache.

Any Chrome extensions you may have in place that may affect what is loaded on a page (such as ad or tracking blockers) will apply to Lighthouse tests, so be cognizant of that when running lighthouse tests. For the most accurate audit, I recommend running Lighthouse from a live website in a state with no chrome extensions (except Lighthouse) enabled; incognito mode is good for this.

Within Lighthouse’s Performance metrics, there are a handful of specific numbers that may not make sense immediately:

First Meaningful Paint - The number of ms it takes for the primary content of a page to be rendered
First Contentful Paint - The number of ms it takes for any content defined in the DOM (text, images, a canvas render, etc…) to be rendered. This is on track to be added to Lighthouse but not in what’s presented below
First Interactive - The number of ms it takes (to the start of the timeframe)for the main JS thread to be idle enough to handle user input. In an upcoming version of Lighthouse, this will be likely be renamed to something like First Idle
Consistently Interactive - The number of ms it takes to the start of 5 full seconds of network and main thread idle time (see the Time to Interactive definition from the original PR for how this is calculated). In an upcoming version of Lighthouse, this will likely be renamed to Time to Interactive (or TTI)
Perceptual Speed Index - A score based on then the perception of a complete page render is calculated, as well as a weighted score compared to an ideal Speed Index based on RAIL. See the Official WebPagetest Speed Index Documentation for how the metric is calculated works.
Estimated Input Latency - The estimated number of ms it takes for an user input request to trigger a response, as well as a score compared to an ideal input latency based on RAIL. The value provided describes is the 90th Percentile, so it’s estimating that 90% of inputs will have the provided latency or less, while 10% will have this latency or more. It describes the availability of the main thread as a proxy metric for input latency.

Understanding Web Performance and Where to Optimize

One of my primary uses for Lighthouse is using it to understand the performance characteristics of a page, so understanding how different resources affect performance is important in understanding why Lighthouse scores and makes the recommendations it does, as well as helping us understand how we can improve. When talking performance bottlenecks, while every KB costs the same over-the-wire (while being downloaded), not every KB has the same effect once it hits the browser. 1KB JS > 1KB Images > 1KB CSS > 1KB HTML.

JavaScript has cost not only over-the-wire, but has a significant parsing and execution overhead per KB, much more so than other types of bytes. It also blocks the main thread when being parsed and executed, which means the rest of the browser stops moving while JavaScript is “thinking”. Finally, while the browser can optimize JavaScript, it’s just-in-time optimization, and dependent upon how the end-user writes their code (unlike other bytes which the browser can optimize the handling of much more easily). Code splitting, route-based loading, the PRPL Pattern, and progressive enhancement are all strategies of providing JavaScript-powered functionality while reducing the overall impact of JavaScript to the user, but because of its cost the most effective method is reducing the overall JavaScript footprint. Remember! JavaScript parsing and execution is especially bad on non-developer-laptop devices, often 2-5 times longer on phones as on desktops.
Images are next-most expensive as they’re less able to be compressed and you tend to need more of them to “get your point across” than other bytes. In addition, some formats are better than others at different things, so knowing when to use one type of image over another is important in optimizing performance. All else equal, JPEG usually produces smaller file sizes for photograph-style images than PNGs, but at the cost of image quality loss (which often is OK for graphic images, less OK for fine details or line art). PNGs are usually lossless (good for fine-details) and support transparency, but generally produce a larger file size when used for graphic images. WebP (supported in Chromium browsers with Webkit and Geko experimenting) are lossless like PNGs but very compressed, resulting in images that are usually much smaller than JPEGs or PNGs with the advantages of PNGs, including animation support. SVGs are great for iconography and other non-graphic images (like logos), can be styled with CSS, and are text under-the-hood so they can be inlined in to HTML and compressed. Knowing which image format to use, optimizing them before sending, and migrating to inline SVGs for icons and logos can have a large impact on overall file size, as can loading only the large graphical images that are needed in the current display (lazy loading).
CSS has an over-the-wire cost and block the main thread like JavaScript does, but instead of being executed like JavaScript it is transformed in to a fast, easy to work with tree-like data structure called the CSS Object Model (CSSOM). This makes the execution of blazing fast (to the point where it’s usually the last thing that needs optimization), and you generally need less of it to “get your point across” (with well-structured CSS). Using strong CSS naming conventions (like BEM, optimizing and reducing selector specificity CSS, inlining CSS needed for initial view and lazy-loading the rest, and removing unused CSS selectors are all ways of improving both the cost of CSS and perceived performance related to CSS.
HTML is the basis of a website and browsers are, like CSS, blazing fast at dealing with it. You can even stream HTML to a browser so it can start progressively rendering before the full document is complete. HTML is converted in to a tree-like data structure called the Document Object Model (DOM), which the browser uses to both render the HTML and, combined with the CSSOM, style the rendered HTML. JavaScript can interact with the DOM to manipulate it, but if not careful, can also block the parsing of HTML. From Lighthouse: “A large DOM can increase memory usage, cause longer style calculations, and produce costly layout reflows”, which is why Light house recommends <1500 total nodes (items) in the DOM, fewer than 32 nested items deep (DOM Depth), and fewer than 60 nodes per child.

The number of files sent over-the-wire is also important as browsers allow only a limited number (<10) parallel connections from a single domain; the more things being requested at once from the same domain, the higher the likelihood that an asset will need to wait to be downloaded. HTTP2/Push can help with this by returning more than one item per request, but it’s not a panacea as it ignores browser cache, which is likely to cause a problem if not coupled with something like a Service Worker to more precisely control how a user’s cache works to prevent extra downloads. Needing to draw resources from multiple domains also incurs performance problems as each new domain needs to go through the networking handshake the first time it’s used on a page load, adding overhead. A balance should be struck, with a good rule of thumb being loading about 6 items per domain, and trying to only include a new domain only if more than one resource is coming from it.

Finally, some things are discovered really late in the render process (custom fonts are a good example, not being found until CSS has been downloaded, parsed, and a selector matches that requires the custom font). Preloading/Prefetching can help get these resources available for the browser to use before they’re actually discovered for use, reducing the time it takes to get it active.

Over-the-wire costs for all resources can be drastically reduced for warm caches (2nd+ page load) by introducing a [Service Worker] to control the browser’s cache, allowing you to go so far as provide full offline support for assets. Be careful with Service Workers, though, as they’re still a little bit difficult to get right, cache invalidation being one of the two hard things in Computer Science. That said, Jeremy Keith has a blog post on a minimal viable service worker that’s a good first view of a useful, functional service worker to start from.

Lighthouse in Architectural Discussions

As part of my recent work, I worked with another team to discuss potential architectural changes through the lens of Lighthouse’s audit and the web performance overview above. They knew they had a performance problem, but seeing just how stark it was through Lighthouse was eye-opening, almost shocking. One of the tricky things about performance (much like mobile experience back in the early days of Responsive Web Design) is it’s hard to quantify how bad the problem is if the site is abandoned before web analytics kicks in. From Google to Amazon to Walmart we know that even a 100ms improvement in web performance has a direct correlation to conversions, revenue, and SEO, so we agreed upon a hypothesis that improving their performance would improve their analytics, and started to discuss how we could improve.

The first thing we did was talk about how different user experiences should affect the architectural choices and why some trade-offs in performance make more sense depending on these user experiences. We generally boiled down web experiences in to three categories:

Static - These experiences mostly have content that is displayed to the user with very little interaction between different elements on a page. The full content and URL of a page usually changes with an interaction instead of causing a small change elsewhere on the page. Blogs, tutorials, and news and marketing sites are usually examples of static experiences.
Dynamic - These experiences mostly have a single view with many moving parts that change often. Sometimes these change with user input, sometimes they change with new incoming information, sometimes both. Changes in the display aren’t easily correlated with a logical stand-alone URL. Dashboards, games, and apps are all examples of usually dynamic experiences.
Stynamic (because I’m funny) - These experiences are somewhere in the middle; I tend to find they are mostly static experiences with dynamic parts scattered throughout. Many sites have part of the experience that falls in between, like a real-time news feed or comment system on an article.

I generally encourage always sending some form of meaningful HTML over-the-wire regardless of major experience category. This can be a fully-rendered page, a partially-rendered page, or even an App Shell. I encourage this to improve perceived performance and given the browser a leg-up in rendering the final experience. We saw in Lighthouse (and confirmed through other means) that no meaningful HTML was being served over-the-wire to a user, and that was a big cause of the slow First Meaningful Paint score in Lighthouse.

We then discussed rendering patterns for the different kinds of experiences. I find that static and stynamic experiences tend to benefit greatly from almost exclusive full server-side rendering, with progressive enhancement on the client-side for any small dynamic pieces of the experience that exist. I bias towards using the least amount of JavaScript possible to do this progressive enhancement, using as few dependencies as possible to do so and ensuring dependencies are light-weight, and to not block the DOM while the enhancement is being set up. An optimized HTML page sent from the server can often be similar in size to the JSON payload that would be required to render it client-side, and through streaming and having a warm cache can be as fast or faster than rendering client-side, so subsequent page loads delivered from the server I also recommend.

When it comes to dynamic experiences, it’s a little harder. You still want to send down meaningful content, but what to send will really vary by need. For instance, even with the JavaScript overhead of the fairly large dependency of D3, I found that client-rendering a complex graph was better for performance client-side (in many circumstances) than server-side as the generated SVG was very large. So, finding the right balance of what should be server-rendered to kick off the dynamic experience and what should be client-rendered is something that is going to need to be experimented with. Lighthouse is a good tool to use to check your assumptions here! No matter the first render, I find that most teams I work with are OK with the tradeoffs of a larger JavaScript payload to bring in client-rendering functionality for dynamic experiences.

What we found, looking through Lighthouse and examining their project, was that all of the functionality we saw fell under either a static experience or a stynamic experience, yet they had architected it as a a dynamic experience! This meant a very large (~2.8MB main JavaScript file, not including other assets or JavaScript files!) upfront cost to users to set up client rendering and then, of course, render the actual page (remember that you need both JavaScript parsed and running and HTML parsed and the model for the content to be rendered in order start client rendering, and then need to rely on the user’s varied devices to do the actual rendering, and then write the new HTML before a user can see it!).

With this insight, we looked at the Opportunities section in Lighthouse’s Performance section and used that as a discussion point to start to talk about what low-hanging fruit we could tackle to improve performance, and what was going to require some deeper architectural changes. Our goal is to use Lighthouse to chew down this performance debt little by little, testing and either confirming or nullifying our assumptions as we go.