Page MenuHomePhabricator

Work out a strategy on Yandex's Turbo Pages
Closed, ResolvedPublic

Description

Yandex has a technology called Turbo Pages that hosts "lite" versions of webpages on their servers when they detect low-bandwidth connections. They apply it automatically to Wikipedia articles in their search results (based on the user's connectivity).
Example: https://en-wikipedia-org.turbopages.org/en.wikipedia.org/s/wiki/Yandex

I see a few issues with that that I think should push us towards seeking to stop this practice:

  • This poses brand/privacy issues: the result appears as a normal Wikipedia result in Yandex search results but then instead goes to a Yandex-hosted page
  • This poses analytics issues: as far as I can tell, we don't get any pageview information when someone ends up on these pages.
  • Our general approach to these sorts of technologies in the past seems to be to opt-out of them for the above issues and because Wikipedia articles are relatively lightweight to begin with.

This ticket is to gather folks' thoughts and figure out how to handle these turbo pages (and ideally help us deal with similar technologies in the future). A few options as far as I can tell:

  • Do nothing (we decide we're okay with them).
  • Add the no-transform header brought up in a related ticket (T218618) and hope this deals with Yandex too (no documentation that supports this but it's possible as no-transform is a web standard).
  • Seek to opt-out via Yandex' webmaster tools. I have no idea how to get access to this but presumably we could work it out.
  • Reach out to Yandex to see what's going on.

Note: this is all based on my own exploration -- please comment if I seem to be misunderstanding etc. what's going on.

Scale

It's hard to know how many pageviews are going to these Turbo Pages because I don't think we get any pageview data from them. However, we do see referrals from these pages (anecdotally about half of the links on turbo pages point to other turbo-page-hosted articles but half point to the standard pages) and these referrals are pretty high (on the order of the # of referrals we get from Facebook, Youtube, Reddit). In short, well-worth addressing for the Yandex-specific case (and ideally helping us to form a more general policy around these sites long-term).

Background

Yandex's Turbo Pages are not novel. My attempt to summarize a few related technologies and the similarities / differences:

TechnologyDecisionWebmaster opt-in or automaticUser opt-in or automaticTransparent in search resultsTransparent on pageWMF sees pageviews
Yandex Turbo PagesThis ticket :)Automatic (can be turned off)Automatic (low connection speed)NoBarely (example)No (hosted on their own domain)
Google AMPReject (T124243)Opt-inAutomaticNo?No
Google WeblightStalled discussion (T218618)Automatic (can be turned off)Automatic (low connection speed)NoYesYes (though it isn't perfect for our analytics)
Chrome Lite PagesStalled discussion (T218618)Automatic (can be turned off)Opt-in (data saver mode)NoNot sureNot sure

Event Timeline

Based on a quick analysis, I suspect this Yandex feature is unlike the others in the table. It appears to be a tailor-made specifically for Wikipedia with a level of changes and theming and tuning I find unlikely to be the result of an automatic process that would work similarly for other websites.

It's quite possible that Yandex does something more "Chrome Light"-like for other websites in this mode, but for Wikipedia at least I suspect they're doing something quite custom. That suggests they might be somewhat invested in doing this, but nevertheless I agree experimenting with no-transform and/or reaching out to them is worthwhile.

Based on a quick analysis, I suspect this Yandex feature is unlike the others in the table. It appears to be a tailor-made specifically for Wikipedia with a level of changes and theming and tuning I find unlikely to be the result of an automatic process that would work similarly for other websites.

Thanks @Krinkle for looking into this! Good point. Their documentation suggests it's far from a simple automated process too so that would not be surprising.

That suggests they might be somewhat invested in doing this, but nevertheless I agree experimenting with no-transform and/or reaching out to them is worthwhile.

Yeah, I want to make some space to hear from others too but my general feeling is that this isn't so urgent that we need to immediately contact them and ideally we have a general solution like no-transform rather than having to reach out to every search engine that goes down this road. I'll raise the issue on the other ticket that has discussed no-transform (T218618) to see if there are any major blockers to rolling that out.

I like the idea of using no-transform but I'm not sure how many users it will impact so that would be interesting to know first. I added a comment about it in T218618#7610396 for Google Lite. It could also potentially affect Opera mini and other proxy browsers.

@ovasileva @jwang wanted to make sure this was on the Web's team radar.

@Maryana looks like this is already on Partnership's radar?

@kzimmerman Yep, Isaac has kept us in the loop – happy to reach out to Yandex if/when we decide that more upstream technical changes aren't feasible.

Seek to opt-out via Yandex' webmaster tools. I have no idea how to get access to this but presumably we could work it out.

Check out T302617 which explicitly tracks this (among other search consoles).

Hi all, I reached out to Yandex's search team last week to opt-out of their Turbo Pages experience, and they've confirmed they have stopped using Turbo Pages for Wikipedia on their search result pages. They said that "all other instances" of Turbo Pages being used for Wikipedia will be turned off in the near future, and their team is working on making that happen.

Do we have any way of monitoring changes in traffic or Turbo usage to give us visibility into when this change is complete in the other instances they've referenced?

Hi all, I reached out to Yandex's search team last week to opt-out of their Turbo Pages experience, and they've confirmed they have stopped using Turbo Pages for Wikipedia on their search result pages. They said that "all other instances" of Turbo Pages being used for Wikipedia will be turned off in the near future, and their team is working on making that happen.

Many thanks @Nicholas_Perry!!

Do we have any way of monitoring changes in traffic or Turbo usage to give us visibility into when this change is complete in the other instances they've referenced?

Nothing public but I can run analyses on how many referrals we get from the Turbo Pages. It's only a fraction of the traffic that goes to these pages (both because it requires someone to click on a link from the article and also because like half of the links on Turbo Pages just point to other Turbo Pages instead of back to Wikipedia) and a drop in traffic could actually mean they just increased their coverage of turbo pages but in the short term it's probably an effective way to check along with spot-checks. I can check tomorrow hopefully and report back and then again in a few weeks.

Isaac claimed this task.

Yeah, just writing to confirm that the link given in the description now redirects to Wikipedia (along with a few others I tried), we're seeing very little traffic being referred from turbopages (down to a few thousand referrals per day as opposed to the over 100K that initially triggered this discussion), and I haven't been able to trigger the turbopage behavior natively via Yandex Search. Looking through the history, it seems that this drop in referrals happened earlier this year but I feel comfortable that we've done what we could here. I'll resolve this task but folks should feel free to reopen if it still feels like there are open questions.