How Google’s AMP project speeds up the Web—by sandblasting HTML

[This story originally appeared on Ars Technica, to comment and enjoy the full reading experience with images (including a TRS-80 browsing the web) you should read it over there.]

There’s a story going around today that the Web is too slow, especially over mobile networks. It’s a pretty good story—and it’s a perpetual story. The Web, while certainly improved from the days of 14.4k modems, has never been as fast as we want it to be, which is to say that the Web has never been instantaneous.

Curiously, rather than a focus on possible cures, like increasing network speeds, finding ways to decrease network latency, or even speeding up Web browsers, the latest version of the “Web is too slow” story pins the blame on the Web itself. And, perhaps more pointedly, this blame falls directly on the people who make it.

The average webpage has increased in size at a terrific rate. In January 2012, the average page tracked by HTTPArchive transferred 1,239kB and made 86 requests. Fast forward to September 2015, and the average page loads 2,162kB of data and makes 103 requests. These numbers don’t directly correlate to longer page load-and-render times, of course, especially if download speeds are also increasing. But these figures are one indicator of how quickly webpages are bulking up.

Native mobile applications, on the other hand, are getting faster. Mobile devices get more powerful with every release cycle, and native apps take better advantage of that power.

So as the story goes, apps get faster, the Web gets slower. This is allegedly why Facebook must invent Facebook Instant Articles, why Apple News must be built, and why Google must now create Accelerated Mobile Pages (AMP). Google is late to the game, but AMP has the same goal as Facebook’s and Apple’s efforts—making the Web feel like a native application on mobile devices. (It’s worth noting that all three solutions focus exclusively on mobile content.)

For AMP, two things in particular stand in the way of a lean, mean browsing experience: JavaScript… and advertisements that use JavaScript. The AMP story is compelling. It has good guys (Google) and bad guys (everyone not using Google Ads), and it’s true to most of our experiences. But this narrative has some fundamental problems. For example, Google owns the largest ad server network on the Web. If ads are such a problem, why doesn’t Google get to work speeding up the ads?

There are other potential issues looming with the AMP initiative as well, some as big as the state of the open Web itself. But to think through the possible ramifications of AMP, first you need to understand Google’s new offering itself.

What is AMP?

To understand AMP, you first need to understand Facebook’s Instant Articles. Instant Articles use RSS and standard HTML tags to create an optimized, slightly stripped-down version of an article. Facebook then allows for some extra rich content like auto-playing video or audio clips. Despite this, Facebook claims that Instant Articles are up to 10 times faster than their siblings on the open Web. Some of that speed comes from stripping things out, while some likely comes from aggressive caching.

But the key is that Instant Articles are only available via Facebook’s mobile apps—and only to established publishers who sign a deal with Facebook. That means reading articles from Facebook’s Instant Article partners like National Geographic, BBC, and Buzzfeed is a faster, richer experience than reading those same articles when they appear on the publisher’s site. Apple News appears to work roughly the same way, taking RSS feeds from publishers and then optimizing the content for delivery within Apple’s application.

All this app-based content delivery cuts out the Web. That’s a problem for the Web and, by extension, for Google, which leads us to Google’s Accelerated Mobile Pages project.

Unlike Facebook Articles and Apple News, AMP eschews standards like RSS and HTML in favor of its own little modified subset of HTML. AMP HTML looks a lot like HTML without the bells and whistles. In fact, if you head over to the AMP project announcement, you’ll see an AMP page rendered in your browser. It looks like any other page on the Web.

AMP markup uses an extremely limited set of tags. Form tags? Nope. Audio or video tags? Nope. Embed tags? Certainly not. Script tags? Nope. There’s a very short list of the HTML tags allowed in AMP documents available over on the project page. There’s also no JavaScript allowed. Those ads and tracking scripts will never be part of AMP documents (but don’t worry, Google will still be tracking you).

AMP defines several of its own tags, things like amp-youtube, amp-ad, or amp-pixel. The extra tags are part of what’s known as Web components, which will likely become a Web standard (or it might turn out to be “ActiveX part 2,” only the future knows for sure).

So far AMP probably sounds like a pretty good idea—faster pages, no tracking scripts, no JavaScript at all (and so no overlay ads about signing up for newsletters). However, there are some problematic design choices in AMP. (At least, they’re problematic if you like the open Web and current HTML standards.)

AMP re-invents the wheel for images by using the custom component amp-img instead of HTML’s img tag, and it does the same thing with amp-audio and amp-video rather than use the HTML standard audio and video. AMP developers argue that this allows AMP to serve images only when required, which isn’t possible with the HTML img tag. That, however, is a limitation of Web browsers, not HTML itself. AMP has also very clearly treated accessibility as an afterthought. You lose more than just a few HTML tags with AMP.

In other words, AMP is technically half baked at best. (There are dozens of open issues calling out some of the most egregious decisions in AMP’s technical design.) The good news is that AMP developers are listening. One of the worst things about AMP’s initial code was the decision to disable pinch-and-zoom on articles, and thankfully, Google has reversed course and eliminated the tag that prevented pinch and zoom.

But AMP’s markup language is really just one part of the picture. After all, if all AMP really wanted to do was strip out all the enhancements and just present the content of a page, there are existing ways to do that. Speeding things up for users is a nice side benefit, but the point of AMP, as with Facebook Articles, looks to be more about locking in users to a particular site/format/service. In this case, though, the “users” aren’t you and I as readers; the “users” are the publishers putting content on the Web.

It’s the ads, stupid

The goal of Facebook Instant Articles is to keep you on Facebook. No need to explore the larger Web when it’s all right there in Facebook, especially when it loads so much faster in the Facebook app than it does in a browser.

Google seems to have recognized what a threat Facebook Instant Articles could be to Google’s ability to serve ads. This is why Google’s project is called Accelerated Mobile Pages. Sorry, desktop users, Google already knows how to get ads to you.

If you watch the AMP demo, which shows how AMP might work when it’s integrated into search results next year, you’ll notice that the viewer effectively never leaves Google. AMP pages are laid over the Google search page in much the same way that outside webpages are loaded in native applications on most mobile platforms. The experience from the user’s point of view is just like the experience of using a mobile app.

Google needs the Web to be on par with the speeds in mobile apps. And to its credit, the company has some of the smartest engineers working on the problem. Google has made one of the fastest Web browsers (if not the fastest) by building Chrome, and in doing so the company has pushed other vendors to speed up their browsers as well. Since Chrome debuted, browsers have become faster and better at an astonishing rate. Score one for Google.

The company has also been touting the benefits of mobile-friendly pages, first by labeling them as such in search results on mobile devices and then later by ranking mobile friendly pages above not-so-friendly ones when other factors are equal. Google has been quick to adopt speed-improving new HTML standards like the responsive images effort, which was first supported by Chrome. Score another one for Google.

But pages keep growing faster than network speeds, and the Web slows down. In other words, Google has tried just about everything within its considerable power as a search behemoth to get Web developers and publishers large and small to speed up their pages. It just isn’t working.

One increasingly popular reaction to slow webpages has been the use of content blockers, typically browser add-ons that stop pages from loading anything but the primary content of the page. Content blockers have been around for over a decade now (No Script first appeared for Firefox in 2005), but their use has largely been limited to the desktop. That changed in Apple’s iOS 9, which for the first time put simple content-blocking tools in the hands of millions of mobile users.

Combine all the eyeballs that are using iOS with content blockers, reading Facebook Instant Articles, and perusing Apple News, and you suddenly have a whole lot of eyeballs that will never see any Google ads. That’s a problem for Google, one that AMP is designed to fix.

Static pages that require Google’s JavaScript

The most basic thing you can do on the Web is create a flat HTML file that sits on a server and contains some basic tags. This type of page will always be lightning fast. It’s also insanely simple. This is literally all you need to do to put information on the Web. There’s no need for JavaScript, no need even for CSS.

This is more or less the sort of page AMP wants you to create (AMP doesn’t care if your pages are actually static or—more likely—generated from a database. The point is what’s rendered is static). But then AMP wants to turn around and require that each page include a third-party script in order to load. AMP deliberately sets the opacity of the entire page to 0 until this script loads. Only then is the page revealed.

This is a little odd; as developer Justin Avery writes, “Surely the document itself is going to be faster than loading a library to try and make it load faster.”

Pinboard.in creator Maciej Cegłowski did just that, putting together a demo page that duplicates the AMP-based AMP homepage without that JavaScript. Over a 3G connection, Cegłowski’s page fills the viewport in 1.9 seconds. The AMP homepage takes 9.2 seconds. JavaScript slows down page load times, even when that JavaScript is part of Google’s plan to speed up the Web.

Ironically, for something that is ostensibly trying to encourage better behavior from developers and publishers, this means that pages using progressive enhancement, keeping scripts to a minimum and aggressively caching content—in other words sites following best practices and trying to do things right—may be slower in AMP.

In the end, developers and publishers who have been following best practices for Web development and don’t rely on dozens of tracking networks and ads have little to gain from AMP. Unfortunately, the publishers building their sites like that right now are few and far between. Most publishers have much to gain from generating AMP pages—at least in terms of speed. Google says that AMP can improve page speed index scores by between 15 to 85 percent. That huge range is likely a direct result of how many third-party scripts are being loaded on some sites.

The dependency on JavaScript has another detrimental effect. AMP documents depend on JavaScript, which is to say that if their (albeit small) script fails to load for some reason—say, you’re going through a tunnel on a train or only have a flaky one-bar connection at the beach—the AMP page is completely blank. When an AMP page fails, it fails spectacularly.

Google knows better than this. Even Gmail still offers a pure HTML-based fallback version of itself.

AMP for publishers

Under the AMP bargain, all big media has to do is give up its ad networks. And interactive maps. And data visualizations. And comment systems.

Your WordPress blog can get in on the stripped-down AMP action as well. Given that WordPress powers roughly 24 percent of all sites on the Web, having an easy way to generate AMP documents from WordPress means a huge boost in adoption for AMP. It’s certainly possible to build fast websites using WordPress, but it’s also easy to do the opposite. WordPress plugins often have a dramatic (negative) impact on load times. It isn’t uncommon to see a WordPress site loading not just one but several external JavaScript libraries because the user installed three plugins that each use a different library. AMP neatly solves that problem by stripping everything out.

So why would publishers want to use AMP? Google, while its influence has dipped a tad across industries (as Facebook and Twitter continue to drive more traffic), remains a powerful driver of traffic. When Google promises more eyeballs on their stories, big media listens.

AMP isn’t trying to get rid of the Web as we know it; it just wants to create a parallel one. Under this system, publishers would not stop generating regular pages, but they would also start generating AMP files, usually (judging by the early adopter examples) by appending /amp to the end of the URL. The AMP page and the canonical page would reference each other through standard HTML tags. User agents could then pick and choose between them. That is, Google’s Web crawler might grab the AMP page, but desktop Firefox might hit the AMP page and redirect to the canonical URL.

On one hand, what this amounts to is that after years of telling the Web to stop making m. mobile-specific websites, Google is telling the Web to make /amp-specific mobile pages. On the other hand, this nudges publishers toward an idea that’s big in the IndieWeb movement: Publish (on your) Own Site, Syndicate Elsewhere (or POSSE for short).

The idea is to own the canonical copy of the content on your own site but then to send that content everywhere you can. Or rather, everywhere you want to reach your readers. Facebook Instant Article? Sure, hook up the RSS feed. Apple News? Send the feed over there, too. AMP? Sure, generate an AMP page. No need to stop there—tap the new Medium API and half a dozen others as well.

Reading is a fragmented experience. Some people will love reading on the Web, some via RSS in their favorite reader, some in Facebook Instant Articles, some via AMP pages on Twitter, some via Lynx in their terminal running on a restored TRS-80 (seriously, it can be done. See below). The beauty of the POSSE approach is that you can reach them all from a single, canonical source.

AMP and the open Web

While AMP has problems and just might be designed to lock publishers into a Google-controlled format, so far it does seem friendlier to the open Web than Facebook Instant Articles.

In fact, if you want to be optimistic, you could look at AMP as the carrot that Google has been looking for in its effort to speed up the Web. As noted Web developer (and AMP optimist) Jeremy Keith writes in a piece on AMP, “My hope is that the current will flow in both directions. As well as publishers creating AMP versions of their pages in order to appease Google, perhaps they will start to ask ‘Why can’t our regular pages be this fast?’ By showing that there is life beyond big bloated invasive webpages, perhaps the AMP project will work as a demo of what the whole Web could be.”

Not everyone is that optimistic about AMP, though. Developer and Author Tim Kadlec writes, “[AMP] doesn’t feel like something helping the open Web so much as it feels like something bringing a little bit of the walled garden mentality of native development onto the Web… Using a very specific tool to build a tailored version of my page in order to ‘reach everyone’ doesn’t fit any definition of the ‘open Web’ that I’ve ever heard.”

There’s one other important aspect to AMP that helps speed up their pages: Google will cache your pages on its CDN for free. “AMP is caching… You can use their caching if you conform to certain rules,” writes Dave Winer, developer and creator of RSS, in a post on AMP. “If you don’t, you can use your own caching. I can’t imagine there’s a lot of difference unless Google weighs search results based on whether you use their code.”