Page MenuHomePhabricator

Serve Main Page of WMF wikis from a consistent URL
Open, NormalPublic

Description

Objective

  • Serve the main page of WMF wikis from a consistent URL, one that does not vary by wiki configuration, site language, or local interface message overrides.

Secondary objective:

  • Let the domain root serve actual content instead of a redirect.

Stakeholders:

  • Traffic team. (assert potential routing impact)
  • Reading Web team. (about SEO, and reader user experience)
  • Performance team. (believed to improve performance)
  • Core Platform Team. (core behaviour being utilised that previously has only been used by low-traffic wikis and third-parties)
  • Wikimedia communities via Tech News and Community Engagement team. (identify potential impact on technical workflows we may not be aware of, so that we may help accommodate those)

Status quo

The URL to WMF wiki main page varies by wiki configuration (site language, or hooks), and interface message overrides locally to the wiki. For example:

The following are HTTP 301 redirects to https://en.wikipedia.org/wiki/Main_Page:

The following are HTTP 301 redirects to https://fixcopyright.wikimedia.org/:

Examples of affected links:

  • Portals, such as https://www.wikipedia.org and https://www.wikimedia.org.
  • Language links in the sidebar of the main pages themselves.
  • Interwiki links, such as [[mw:]], or [[wikitech:]].
  • Browsing directly by entering the hostname of a wiki project.
  • Browsing by changing homepage address of one project into another (usually leads to a 404 Not Found, as "Wikipedia:Hauptseite" would not exist on nl.wikipedia.org).

Current issues

  • Accessing wiki projects by domain results in a redirect. (Subpar performance)
  • Address bars, urls and search results for our projects prominently expose the inconsistent naming conventions of each wiki. (Subpar user experience)
  • SEO. "Avoid Landing Page Redirects", Google PageSpeed, https://developers.google.com/speed/docs/insights/AvoidRedirects.
  • Difficulties with tooling. Performance tests are difficult to write in a way that targets a normal view of a main page without a redirect, due to the url not being deterministic or consistent. (Current workarounds: Using a ?whatever query string, which will serve the Main Page as the default title without redirect).
  • Monitoring such as "Is the Main Page for all projects up and responding content?" is not trivial, as simplistic tools do not follow redirects or consider a 301 it as success, even if the actual page with a random url is returning an error. In some places, Main_Page as a redirect is sometimes deleted, leading to false alarms.

Performance data

From Navigation Timing, over February 2019:

stat1007/hive
-- sampled views to enwiki/Main_Page
SELECT COUNT(*),SUM(event.redirecting) FROM event.NavigationTiming WHERE year=2019 AND month=2 AND wiki="enwiki" AND event.revId=870437359 AND event.action="view" AND event.isOversample=false;
-- sampled views to enwiki/Main_Page that involved a redirect
SELECT COUNT(*),SUM(event.redirecting) FROM event.NavigationTiming WHERE year=2019 AND month=2 AND wiki="enwiki" AND event.revId=870437359 AND event.action="view" AND event.isOversample=false AND event.redirecting != 0;
Sampled viewsSampled views (redirected)Time spent redirecting
31,7349,703840,039 ms

This is from a 1:1000 sampling. This means that in February 2019, the Main Page had an estimated 31 million views from Grade A web browsers that completed their page load. Of these, over 9.7 million page views (30.5%) experienced a redirect. They cumulatively spent 233,344 hours (or over 27 years) waiting for a redirect (about 0.1 s each, on average).

Proposal 1:

I'd like us to consider changing the canonical URL to a the main page of Wikimedia wikis to be the domain root. This means https://www.wikidata.org/ would serve what we currently see at https://www.wikidata.org/wiki/Wikidata:Main_Page, for example.

MediaWiki provides a hook that allows the canonical url for a given title to be customised. This has been in use at translatewiki.net since 2015 (written about on Nixlas' blog, source code), and also used at WMF for the Fix Copyright campaign in 2018.

Once configured, all canonical access to the main page is automatically reflected accordingly by MediaWiki.

  • The link to the main page in the sidebar and on the logo will point to this.
  • When browsing from the talk page, what links here, history page, contributions, search results, it points to the canonical url.
  • When creating an internal link to it in wikitext like [[Main_Page]] this results in the correct HTML for an anchor link to the canonical url (e.g. <a href="/" title="Main Page">Main Page</a>).
  • When editing the main page, the purges sent to the CDN layer will be for the canonical url, as expected.
  • When manually browsing to /wiki/Main_Page, MediaWiki's router normalises this to the canonical url in the form of a HTTP 301 redirect.
  • Configuration variables in JavaScript like wgIsMainPage and server-side checks like Title::isMainPage() all work as expected.

Event Timeline

Krinkle created this task.Dec 2 2015, 1:08 PM
Krinkle raised the priority of this task from to Needs Triage.
Krinkle updated the task description. (Show Details)
Krinkle added a subscriber: Krinkle.
Restricted Application added subscribers: StudiesWorld, Matanya, Aklapper. · View Herald TranscriptDec 2 2015, 1:08 PM
Krinkle set Security to None.Dec 2 2015, 1:08 PM
Krinkle added subscribers: Nikerabbit, ori, bd808 and 2 others.
Kghbln added a subscriber: Kghbln.Dec 2 2015, 5:53 PM
Restricted Application added a subscriber: JEumerus. · View Herald TranscriptJan 14 2016, 4:15 AM
TTO awarded a token.Jan 31 2016, 1:30 AM
Krinkle renamed this task from Change canonical URL of the main page to domain root. to Change canonical URL of the main page to domain root.Feb 1 2016, 4:37 PM

Given T106793#2786334, the implementation needs to be rethought.

Honestly, I don't see how that is related. This is working well for all of my wikis where I am using it.

demon removed a subscriber: demon.Jan 27 2017, 2:42 PM

Honestly, I don't see how that is related. This is working well for all of my wikis where I am using it.

Yeah, but since part of that was reverted, while https://translatewiki.net/ still works (with canonical self-reference), https://translatewiki.net/wiki/Special:MainPage now works as well, and it's has its own rel-canonical reference, not pointing to / strangely.

I expected the redirect to no longer work and rel-canonical on the special page to be set to /, since that's what your hook configures on translatewiki, right?

Okay, I got it now. I haven't changed my configuration, so indeed the redirect to the canonical name is no longer happening.

Legoktm added a subscriber: Legoktm.Sep 6 2018, 4:16 AM

We implemented this as a one-off thing for fixcopyrightwiki (https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/458345). Doing this for the rest of the wikis sounds like a good idea to me, but I think we want to implement this in MediaWiki core first as a configuration option somehow instead of relying on two hooks to make it work.

Krinkle renamed this task from Change canonical URL of the main page to domain root to Serve Main Page of WMF wikis from a consistent URL.Mar 6 2019, 11:13 PM
Krinkle updated the task description. (Show Details)
Restricted Application added a project: Operations. · View Herald TranscriptMar 6 2019, 11:13 PM
Krinkle updated the task description. (Show Details)Mar 6 2019, 11:16 PM
ema moved this task from Triage to General on the Traffic board.Mar 7 2019, 9:30 AM
jbond triaged this task as Normal priority.Mar 7 2019, 1:21 PM
Tgr added a subscriber: Tgr.Mar 15 2019, 8:36 PM

Change 520139 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/core@master] Add config for serving main Page from the domain root

https://gerrit.wikimedia.org/r/520139

Ladsgroup moved this task from Under discussion to Inbox on the TechCom-RFC board.Jul 1 2019, 11:28 PM
Ladsgroup added a subscriber: Ladsgroup.

Given that the last comment on this ticket was for around a year ago, I don't think it falls in category of "under discussion". It seems straightforward though.

@Ladsgroup "Under discussion" here merely means that it has a problem statement or objective that has been triaged and understood by TechCom, and that the author has signalled they are ready for wider feedback (e.g. to help with finding ways to solve it, and/or to get feedback on their own proposal). The "Backlog" on the other hand is used when the author is still working on the proposal and/or if there is not yet an objective that has been triaged by TechCom.

In that sense, this is "Under discussion". Feedback for a better name welcome at T216308 :).

Next steps are (mainly as note to myself):

  • reach out to relevant product owners and get their input and approval.
  • reach out to stake holder and get their input on the overall objective and my current (so far, only) proposal, take the input and amend the proposal as needed.

(Once approved by TechCom):

  • estimate the amount of engineering work (probably quite small amount, less than a week of 1 or 2 people in total).
  • figure out where resourcing would come from (could be done within perf, perhaps other teams would be interested as well and might be able to prioritise it earlier).
  • ask the teams that implementers would depend on during roll out, and find a common quarter in which we're comfortable seeing this rolled out.
Ladsgroup moved this task from Inbox to Under discussion on the TechCom-RFC board.Jul 2 2019, 4:58 PM

Thanks. The naming confused me and sorry for the mess.

IMO, there's three different questions that we should answer:

  • Should we define it as a config variable?
    • The answer seems to be yes, I don't think there's any objection towards it
  • Should we set the default to serve from root in WMF?
    • This should be done gradually, it's easy to undo, I don't personally think it's a big deal. Communication needed for sure but maybe a several emails to wikitech-ambassadors, wikitech and some messages in places like WP:VPT should be enough (I can do it if you're too busy)
  • Should the default for mediawiki be true?
    • TechCom can answer this but also it looks straightforward as it's easy to undo in case issues arise.
jcrespo updated the task description. (Show Details)Jul 2 2019, 5:07 PM
Krinkle added a comment.EditedJul 2 2019, 6:02 PM

[..] there's three different questions that we should answer:

  • Should we define it as a config variable? [..]

This is not required for the current RFC. MediaWiki supports the required functionality in core already. It can currently enabled with a 1-line hook callback, which is how translatewiki.net and FixcopyrightWiki do it already.

I consider it in-scope for code review (and not for TechCom/RFC, unless +2'ers disagree) to decide whether we want another configuration variable and the added maintenance (but also, testability) of maintaining the callback logic in core instead (e.g. inside Title::getLocalURL).

  • Should we set the default to serve from root in WMF? [..]

That is in essence what this RFC is about.

  • Should the default for mediawiki be true? [..]

This is orthogonal to this RFC, which is about WMF sites. See T216791#5013185.

This is not required for the current RFC. MediaWiki supports the required functionality in core already. It can currently enabled with a 1-line hook callback, which is how translatewiki.net and FixcopyrightWiki do it already.
I consider it in-scope for code review (and not for TechCom/RFC, unless +2'ers disagree) to decide whether we want another configuration variable and the added maintenance (but also, testability) of maintaining the callback logic in core instead (e.g. inside Title::getLocalURL).

ICYMI: https://gerrit.wikimedia.org/r/520139

BBlack added a subscriber: BBlack.EditedJul 18 2019, 12:17 PM

I like the end result here, and I don't think it's problematic from the Traffic perspective in the long view, but I think the initial rollout isn't so trivial:

  1. We do need to review our VCL with this in mind, in case it does interfere in trivial ways with existing rewrites and/or redirects, etc. There are some other areas SRE might need to review in general as well (e.g. internal loadbalancer/cache -driven healthchecks and general monitoring queries that are hitting either the root or Main_Page URIs of various wikis and expecting certain status).
  1. While MediaWiki's config might see this as a single flip of a switch, there are multiple conflicting changes being rolled out here which change the direction of the redirect arrrow between two high-traffic URIs. All such changes are effectively asynchronous (even with a manual purge), and therefore the blurry time domain of flipping a switch for this would result in 301 loops for / -> Main_Page -> / -> Main_Page -> ... for at least some caches and/or end-users for at least a brief window of time, and being such high-traffic URIs the redirect loops might cause an outage on our end as well. We might need to control for this with some temporary custom VCL that breaks the loops (e.g. when Varnish sees any wiki GET response to a root URI request which is a 301, it replaces that with a direct internal rewrite to the destination URI instead). We could deploy such a hack first, then flip the switch on the MediaWiki end, purge all the relevant URIs from caches, test things, and then remove the hack quickly afterwards.

[edit: fixed the proposed hack above, I had it backwards at first]

Oh one more thing that should've been (3) on that list:

I'm pretty sure UAs cache 301s "Permanently" as indicated, so there's another layer to redirect-loop onion where even if we serve non-looping URIs from our edge, the UAs' caching of historical 301s will still cause the looping to happen. That angle needs some more digging as well: which UAs do this, and under what conditions do they stop doing it and/or how can it be prevented in this scenario, etc.

We talked about this with @Tgr in the hackathon and one easy way to bypass the issue of the redirect loop is to serve the main page through both endpoints for at least a couple of months (and we can even keep it forever, like /w/index.php/Main_Page is also being served without redirect) which seems sensible to me but I'm not sure how to do it though.

kchapman added subscribers: CCicalese_WMF, kchapman.

@CCicalese_WMF could you review this from a product perspective and determine if it is something we want to do?