Page MenuHomePhabricator

Work out a strategy on Google's AMP
Closed, ResolvedPublic

Description

Google's AMP (Accelerated Mobile Pages) initiative seeks to improve the performance of the mobile web by creating an efficient standard format for web content. The standard comprises a small library of custom HTML templates and tags, as well JavaScript code for fetching and rendering this HTML and the assets it references (images, fonts, etc.).

As I understand it, Google is giving publishers two reasons to adopt this format. The first reason is that it has been carefully designed by their engineers to load quickly on mobile devices and connections, and is likely to outperform most existing mobile web sites. The second is that Google is offering to cache AMP content and serve it using their CDN, which is fast and has good geographical distribution. AMP is designed to fit the business model of for-profit publishers: the specification provides means for for publishers to put up paywalls, deliver advertising, and collect analytic data, even when the content is served from Google's CDN.

It is not entirely clear to me how AMP will influence search engine result pages (SERPs). Most people seem to think that Google intends to raise AMP content to the top of SERPs and push everything else down. Google may also provide some visual indication that non-AMP pages are liable to be slow on mobile connections. AMP is expected to debut on Google SERPs in late February.

Google has not (to my knowledge) explicitly threatened publishers with the prospect of declining traffic due to fewer search engine referrals, but external commentators seem to agree that these are the stakes. Googlers have reached out to us to evangelize AMP, gauge our interest, and answer any questions we may have.

My personal views on AMP are mostly negative, but I am (for the moment) keeping them to myself. I have attempted above to present AMP sympathetically, because I think it deserves a public hearing, and because I would like there to be a coordinated response to AMP from the Wikimedia movement that reflects the consensus of the community.

More specifically, I would like to see us answer the following questions:

  • Should we adopt AMP?
  • If "yes": how should we resource and prioritize the work that would be needed?
  • If "no": do we have any feedback to give Google?
  • What voice (if any) do we want to have in the ongoing design and standardization process for AMP?

Finally, there is also the issue of how to actually go about canvassing people's opinions. Should there be an RfC on Meta-Wiki, or a public discussion on IRC (or neither / both)?

I am tagging this with TechCom-RFC because I think TechCom has some good processes in place for announcing and conducting public discussions, and because there is a technical component to the discussion that would benefit from their input. But it is clear that the scope of the discussion goes beyond software architecture, and would therefore not be appropriate for the weekly RfC discussions.

Related Objects

Event Timeline

ori created this task.Jan 20 2016, 10:54 PM
ori raised the priority of this task from to Normal.
ori updated the task description. (Show Details)
ori added a subscriber: ori.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 20 2016, 10:54 PM
ori updated the task description. (Show Details)Jan 20 2016, 10:59 PM
ori set Security to None.

There are three components:

  • AMP HTML
  • AMP JS
  • AMP CDN

At least two of these, AMP HTML and AMP CDN, are incompatible with our goals for user privacy. The AMP HTML spec says:

AMP HTML documents MUST
...

So AMP HTML requires the use of the AMP CDN. Any browser viewing an AMP page will send a request to the AMP CDN, with Referer header. That server is controlled by Google and can send any JavaScript it wants, without being reviewed by us for security or privacy.

Gilles added a subscriber: Gilles.EditedJan 20 2016, 11:49 PM

I vote no on AMP because:

  • It looks unappealing as a standard (from a programming perspective, I mean, there's nothing elegant about it).
  • It shouldn't provide any speed benefit that we couldn't get by doing our performance homework as an organization.
  • It simply forces a set of best practices through a standard. We could just apply the best practices...
  • Google's intention as a corporation is not our performance, but keeping people captive in their ad-ridden playground. The faster you go back to their interface, the faster you see your next ad. I find that Facebook's implementation of fast load, which is the only one I've experienced as an end-user, seems to focus heavily on an "in-and-out" approach. It's not just about getting into the link's content faster, it's also about getting out of it faster, because you're still seeing it in Facebook's chrome. We're on the contrary looking for people to take a deep dive on wikis and spend time on them. Not return to Facebook/Twitter/Google as fast as possible. If my theory holds true, AMP could actually be detrimental to our traffic.
  • We have a CDN of our own.
  • We're high profile enough that we can probably get a free pass on any search result demotion that might result from non-AMPishness.
  • I dislike this role reversal of Google having to adapt to the wide variety of websites and the forms they can take, to us having to adapt to the subset of functionality they've deemed to be reasonable on mobile at this point in time.
RobLa-WMF added a comment.EditedJan 21 2016, 12:35 AM

I think the reason for saying "no" to this is the same as the reason for saying "no" to using Google Analytics for our analytics needs. In 2010, The Wikimedia Analytics Task Force determined that it didn't meet our privacy requirements, and per Tim and Gilles' prior comments, I can't see how we could come to a different conclusion today (either about Google Analytics or about AMP), short of rethinking our larger position on privacy, which I doubt we would want to do.

greg added a subscriber: greg.Jan 21 2016, 12:48 AM
  • You can opt-out of the cache via robots.txt rules.
  • The AMP Team would probably permit same-origin hosting of the v0.js if requested

(Full Disclosure: I am a Google Developer Advocate)

Some thoughts I had on AMP about a month ago on a non-public mailing list:

Some background I hadn't seen before on AMP at highscalability [0]. This article claims that time to load rather than AMP usage will be the thing that is counted in link rank, but if Google can scare^Wconvince enough sites to provide AMP versions that Google can cache at the edge it will be hard to beat that kind of speed. I love some good Google bashing and I distrust the amount of power they have over North American web usage, but I also grudgingly agree with on premise from the article that an open Internet is in Google's corporate interest. Walled gardens hurt their business model as much as they hurt personal freedom. Dropping heavy javascript, graphics for the sake of graphics and focusing on content also resonates with my 80x25 terminal mindset. I don't love the idea of actually relying on a layer of javascript to make it all happen, but at least that javascript layer is being developed in public under an OSI approved license [1].
Heading towards AMP probably isn't as flashily exciting as service workers, but it would be an interesting experiment to tweak the Parsoid/RESTBase transformation layer that the group has been working on to spit out AMP flavored content. Possible hackathon project?
[0]: http://highscalability.com/blog/2015/12/14/does-amp-counter-an-existential-threat-to-google.html
[1]: https://github.com/ampproject/amphtml

I'd be happier to see the Wikipedia projects pursue a view layer abstraction that allows easily transitioning content from high bandwidth, large screen richness to high latency, small form factor usability. That of course would require us to solve a wide variety of problems that may be more difficult in the sort term than some HTML munging and style stripping.

The prohibition against <form> and all <input> tags in the current AMP spec however really kills the idea that we could use it to deliver a wiki in a meaningful way. A wiki that can't be edited is just another silo of other people's opinions.

ori added a comment.Jan 21 2016, 2:15 AM

The prohibition against <form> and all <input> tags in the current AMP spec however really kills the idea that we could use it to deliver a wiki in a meaningful way. A wiki that can't be edited is just another silo of other people's opinions.

The read view is mostly non-interactive anyway. AMP pages can contain an edit link and it can point to the full-fledged (non-AMP) variant of the page.

MaxSem added a subscriber: MaxSem.Jan 21 2016, 2:34 AM

Quoting https://www.ampproject.org/docs/get_started/technical_overview.html

AMP pages can’t include any author-written JavaScript

That's a blocking issue for us. Not only we use a ton of JS to improve user experience but also to facilitate contribution. And there is also site, user and gadget JS that can differ from user to user. And splitting stuff sharply on AMP/non-AMP borders would only make non-AMP features slower. A simple example: search bar suggestions. Without them, we might as well get rid of the whole thingie, sending people to Google for searches instead. Good for Google, not good for us. My impression is that AMP is for small static sites that don't provide much dynamic user experience. For others, it's just not sufficient.

I share @MaxSem's concerns. Discovery is putting a lot of effort into improving search on all platforms, and it'd be a shame if that were completely lost.

That said, I agree with @bd808's comments here; AMP seems worth investigating in more detail, so that we can make a more detailed determination on the tradeoffs involved, rather than relying on our gut instincts and quick reactions.

With work, AMP could potentially be a good fit for Wikipedia Zero.

I personally don't like using the description field of a Phabricator Maniphest task for drafting a document of this nature, so I created https://www.mediawiki.org/wiki/Requests_for_comment/Accelerated_Mobile_Pages.

ori added a comment.Jan 21 2016, 7:06 AM

Quoting https://www.ampproject.org/docs/get_started/technical_overview.html

AMP pages can’t include any author-written JavaScript

That's a blocking issue for us. Not only we use a ton of JS to improve user experience but also to facilitate contribution.

To improve user experience for people on fast connections, maybe. In many places around the world, "a ton of JS" translates to "takes forever to load".

faidon added a subscriber: faidon.Jan 21 2016, 10:11 AM
bmansurov added a subscriber: bmansurov.EditedJan 21 2016, 1:34 PM

Any reason why we can't do the things listed on [1] ourselves? I think we are already doing some of them. We should just implement the rest (prioritizing resource loading, for example) and some more so that we stay independent from external forces and deliver the best user experience to our users ourselves.

Also why should we only care about mobile pages? In developing countries desktop speeds aren't that great either. This is especially important when you consider that schools and libraries usually offer desktop computers to their users, not mobile phones.

[1] https://www.ampproject.org/docs/get_started/technical_overview.html

Peter added a subscriber: Peter.Jan 21 2016, 5:28 PM
Krinkle added a subscriber: Krinkle.EditedJan 21 2016, 5:43 PM

Remember we're not talking about changing our default experience. We're evaluating whether we should implement and provide a separate AMP-based experience to view our content.

Aside from the business considerations, there are also a few technical considerations to be had. For one, we've historically not been very good at separating presentation from content. As such, it's not trivial to create a simplified viewing experience.

We've encountered the same problems as part of the "light" version of Wikipedia that is being worked on. It would use as basis the Parsoid HTML. The main issue I see is with the various stylesheets and scripts (MediaWiki core, site customisations, MediaWiki skin, MediaWiki extensions). For various reasons, content doesn't look right without all of the stylesheets loaded. In part due to use of non-semantic html that pretty much requires styling to make sense (and requires knowledge of the specific component to style, no "generic" styling option). In part due to site customisations, and authors assuming presence of those when writing content. In part due to various "core" Wikipedia features not being software features at all (e.g. Navigation box and Infobox are "just" wikitext markup templates with custom CSS.)

This makes it very hard to create a standalone viewing experience with < 50Kb of CSS, no javascript, and no inline style attributes.

Quoting https://www.ampproject.org/docs/get_started/technical_overview.html

AMP pages can’t include any author-written JavaScript

That's a blocking issue for us. Not only we use a ton of JS to improve user experience but also to facilitate contribution. And there is also site, user and gadget JS that can differ from user to user.

I don't see how that's a blocking issue. Logged-in users don't have to be affected. And we don't need A/B testing on AMP pages either. There can be a plain link to "Edit" and same for other features. Any interaction or browser detection we can deal with on the "other" side. I'd like to think of AMP more as a special device or app (e.g. TV screen app, or RSS reader) that takes a simplified set of HTML intended for providing pure content only. As bonus we even get to provide our own stylesheet and a custom interface around the content. It allows providing buttons like "Edit", "History", "Search" etc. The fact that we currently heavily rely on JavaScript to provide an experience that feels "right" is a problem, not a design requirement.

Take Jake Archibald's https://wiki-offline.jakearchibald.com/ for example.

Aside from the business considerations, there are also a few technical considerations to be had. For one, we've historically not been very good at separating presentation from content. As such, it's not trivial to create a simplified viewing experience.
We've encountered the same problems as part of the "light" version of Wikipedia that is being worked on. It would use as basis the Parsoid HTML. The main issue I see is with the various stylesheets and scripts (MediaWiki core, site customisations, MediaWiki skin, MediaWiki extensions). For various reasons, content doesn't look right without all of the stylesheets loaded. In part due to use of non-semantic html that pretty much requires styling to make sense (and requires knowledge of the specific component to style, no "generic" styling option). In part due to site customisations, and authors assuming presence of those when writing content. In part due to various "core" Wikipedia features not being software features at all (e.g. Navigation box and Infobox are "just" wikitext markup templates with custom CSS.)

This is a better elaboration of my statement about "a wide variety of problems" in T124243#1951048. If work on AMP (or any other alternate view layer for the Wikimedia projects) gets us closer to clean separation of presentation and content I'm all for it.

The read view is mostly non-interactive anyway. AMP pages can contain an edit link and it can point to the full-fledged (non-AMP) variant of the page.

But how do you make a search box?

Something is missing from the task description. This is not a kind of lightweight web, a reincarnation of WAP, this is a way to embed web pages within Google search result pages. Read their blog post on the subject closely, and watch the demo video. The idea is that if you want to search, you'll just dismiss the content popup. They are probably not planning to link to AMP pages at all, so concerns about cross-site requests are moot.

You see the kind of vision Google's search engineers have for the mobile web. Built-in support for a Brightcove video player, but no way to make a search box.

This is not a kind of lightweight web, a reincarnation of WAP,

This is an interesting point. Is there anyone that is working on a "reincarnation of WAP" that is compelling?

More to the point, what is it that we can/should do in a search engine agnostic way that will make accessing Wikimedia content more appealing and useful? As @bd808 says "If work on AMP (or any other alternate view layer for the Wikimedia projects) gets us closer to clean separation of presentation and content I'm all for it."

My main worry is that we did a terrible job of supporting the separate code path needed for our WAP project when we had it and it was one of the main reasons for us disbanding the WAP experience. The overhead of a small group of people maintaining and host 2 versions of the same content alone makes me uneasy. Our support of Zero hasn't been ideal but maybe if AMP can replace that, this would make it a more worthy approach.

But that aside, given our shortcomings in other areas that we've only really now started addressing (beginning with asynchronous JS) and limited resources I'd prefer we invested our efforts in improving our main site for the time being (we've already identified that through lazy loading references and nav boxes we can reduce HTML size by up to 60%). I'd personally like to see us do that first and then re-evaluate when we have reached a better stage and we can see what the rest of the web is doing. This doesn't seem to be something that we need to be early adopters of.

AMP specific: I worry about degrading user experience for performance. From what I've seen of googleweblight's bare minimum css, I question whether this kind of approach is a bit too extreme and does a dis-service to our users (e.g. by making it harder to find other content outside links).

This got published today:

Before today, Google selected those stories based on criteria for ranking news stories in search such as relevance and speed. Starting today, however, stories that have AMP versions will get priority. “AMP is a requirement,” says David Besbris, Google’s vice president of engineering for search.

http://www.wired.com/2016/02/google-will-now-favor-pages-use-fast-loading-tech/

Interesting (but very early): http://yoavweiss.github.io/ContentPerformancePolicy/

I'd like to understand this quote better:

“Links are a core part of Twitter,” Michael Ducker, a Twitter product manager working on AMP, says. He won’t say when Twitter expects all links to be converted to AMP but says he supports the idea behind it. “The open web is a core part of Twitter.”

What does AMP do to links?

Peter added a comment.Mar 18 2016, 8:18 AM

@RobLa-WMF I think he meant to say that they will check if a page has a corresponding AMP page (you can add a rel link in the head to signal that you have a AMP version of a page) and link to that AMP page instead of the link that the user added?

AMP has nothing to do with the open web though :/

Krinkle claimed this task.Mar 23 2016, 8:20 PM
Krinkle removed a project: Proposal.
daniel moved this task from Inbox to Under discussion on the TechCom-RFC board.Mar 23 2016, 8:21 PM

What's under discussion?

@Krinkle mentioned in our team meeting that some folks have already seen google search results for news events where the new AMP carousel was used in the wild.

It seems to me that news has always been the biggest use case for AMP. News sites were the first to implement it, etc.

Assuming that this is only going to be used for news for now (until we see other uses in the wild), it might be interesting to know what share of our traffic news-related articles represent. If this can be tracked, when google rolls the AMP carousel out for all news-related search results, we'll be able to estimate the traffic impact we might be looking at, should this be released for all search results.

Speaking of news sites, this Nieman Lab article provides a good overview of how various publisher are approaching AMP and which technical challenges they have been facing in the implementation: http://www.niemanlab.org/2016/02/diving-all-in-or-dipping-a-toe-how-publishers-are-approaching-googles-accelerated-mobile-pages-initiative/

Donors' money should be invested in performance improvements that benefit all users rather than supporting a vendor's EEE strategy.

Nuria added a subscriber: Nuria.EditedJul 1 2016, 5:08 PM

Two comments:

Regarding performance:
It is come to my attention -after talking to google foilks- that google tested on their own the improvements that will result from applying the AMP standard to wikipedia and it actually did not worked so well, difference was minimal.

I am not sure if this results are published or ever will be.

I also agree with @Gilles comments: given that we control our stack, cdns and code we can (and should) work on performance ourselves.

Regarding privacy:
Given that AMP and similar approaches require google (or fb or amzn) proxy-ing every single one of our pages after a first request in their CDN infrastructure (did I understand that correctly?) I am pretty sure our community will never buy onto it, we have had countless discussions about us sending data to third parties and how we never do that.
Abiding to AMP and using googles cdns will mean giving all our request history for all users for all projects forever to Google for free.

Recent announcements

Reddit + AMP
https://amphtml.wordpress.com/2016/09/20/a-faster-reddit-with-accelerated-mobile-pages/

Ebay + AMP
https://amphtml.wordpress.com/2016/09/21/experience-the-lightning-bolt/

Shopify + AMP
https://amphtml.wordpress.com/2016/09/20/shopify-merchants-will-soon-get-ampd/

Given that AMP and similar approaches require google (or fb or amzn) proxy-ing every single one of our pages after a first request in their CDN infrastructure (did I understand that correctly?) I am pretty sure our community will never buy onto it, we have had countless discussions about us sending data to third parties and how we never do that.

Can anyone confirm if this is correct?

is reddit, shopify, ebay actually OK with google caching and redirecting?

this project here seems very standalone
https://github.com/ampproject/amphtml

As the description points out, is the google caching part _still_ a mandatory step?

Also I agree, it's a very bad idea if that's the case. but if not, i think Wikipedia is perfect candidate for it.

ori added a comment.Sep 30 2016, 12:44 AM

As the description points out, is the google caching part _still_ a mandatory step?

You can opt out of the caching, but you forfeit much of the benefits that way. But the bigger issue is this: per the spec, AMP pages must load the AMP runtime via a script tag that references a Google-controlled server, cdn.ampproject.org. This means we'd be handing over site security and traffic logs to Google.

You can opt out of the caching, but you forfeit much of the benefits that way. But the bigger issue is this: per the spec, AMP pages must load the AMP runtime via a script tag that references a Google-controlled server, cdn.ampproject.org. This means we'd be handing over site security and traffic logs to Google.

That's a bummer

Krinkle closed this task as Resolved.Nov 16 2016, 11:06 PM

This task was created to bring AMP to the attention of developers, operations, and Architecture Committee. It's main objective was to figure out our strategy.

I believe that after a few months of discussion and considerations, our strategy is that we will take some of the lessons from AMP ("best practices") and work on applying them directly to the MediaWiki software and other Wikimedia infrastructure.

There is opposition to loading AMP javascript and/or AMP'ed Wikipedia page views from a third-party CDN.

Insightful experience read, strengthening and adding to objections above (by small site owner):

  • Google caches & serves content from their own cache, resulting in links like https://www.google.com/amp/www.bbc.co.uk/news/amp/39130072 – trapping users in
  • Scrolling sucks due to override of default browser scrolling
  • Sharing links is hard
  • Maintenance overhead in sense of providing an extra version kinda like the WAP site. Ahem, we're still in a similar situation anyways, just with more control.
  • AMP is not optional for users
  • No non-JS support, results in eating up big part of speed improvements over internal performance measurements

Hi all,

I have authored an extension to convert wiki pages to AMP, I understand that AMP is not so useful for WMF but if anyone wants to try it out they can. Its currently in Beta and may need some tweaking for your site.
https://www.mediawiki.org/wiki/Extension:Amp