Yandex has a technology called Turbo Pages that hosts "lite" versions of webpages on their servers when they detect low-bandwidth connections. They apply it automatically to Wikipedia articles in their search results (based on the user's connectivity).
Example: https://en-wikipedia-org.turbopages.org/en.wikipedia.org/s/wiki/Yandex
I see a few issues with that that I think should push us towards seeking to stop this practice:
- This poses brand/privacy issues: the result appears as a normal Wikipedia result in Yandex search results but then instead goes to a Yandex-hosted page
- This poses analytics issues: as far as I can tell, we don't get any pageview information when someone ends up on these pages.
- Our general approach to these sorts of technologies in the past seems to be to opt-out of them for the above issues and because Wikipedia articles are relatively lightweight to begin with.
This ticket is to gather folks' thoughts and figure out how to handle these turbo pages (and ideally help us deal with similar technologies in the future). A few options as far as I can tell:
- Do nothing (we decide we're okay with them).
- Add the no-transform header brought up in a related ticket (T218618) and hope this deals with Yandex too (no documentation that supports this but it's possible as no-transform is a web standard).
- Seek to opt-out via Yandex' webmaster tools. I have no idea how to get access to this but presumably we could work it out.
- Reach out to Yandex to see what's going on.
Note: this is all based on my own exploration -- please comment if I seem to be misunderstanding etc. what's going on.
Scale
It's hard to know how many pageviews are going to these Turbo Pages because I don't think we get any pageview data from them. However, we do see referrals from these pages (anecdotally about half of the links on turbo pages point to other turbo-page-hosted articles but half point to the standard pages) and these referrals are pretty high (on the order of the # of referrals we get from Facebook, Youtube, Reddit). In short, well-worth addressing for the Yandex-specific case (and ideally helping us to form a more general policy around these sites long-term).
Background
Yandex's Turbo Pages are not novel. My attempt to summarize a few related technologies and the similarities / differences:
Technology | Decision | Webmaster opt-in or automatic | User opt-in or automatic | Transparent in search results | Transparent on page | WMF sees pageviews |
---|---|---|---|---|---|---|
Yandex Turbo Pages | This ticket :) | Automatic (can be turned off) | Automatic (low connection speed) | No | Barely (example) | No (hosted on their own domain) |
Google AMP | Reject (T124243) | Opt-in | Automatic | No | ? | No |
Google Weblight | Stalled discussion (T218618) | Automatic (can be turned off) | Automatic (low connection speed) | No | Yes | Yes (though it isn't perfect for our analytics) |
Chrome Lite Pages | Stalled discussion (T218618) | Automatic (can be turned off) | Opt-in (data saver mode) | No | Not sure | Not sure |