Page MenuHomePhabricator

RFC: Serve Main Page of Wikimedia wikis from a consistent URL
Open, NormalPublic

Description

Objective

  • Serve the main page of WMF wikis from a consistent URL, one that does not vary by wiki configuration, site language, or local interface message overrides.

Secondary objective:

  • Let the domain root serve actual content instead of a redirect.

Stakeholders:

  • Traffic team. (assert potential routing impact)
  • Reading Web team. (about SEO, and reader user experience)
  • Performance team. (believed to improve performance)
  • Core Platform Team. (core behaviour being utilised that previously has only been used by low-traffic wikis and third-parties)
  • Wikimedia communities via Tech News and Community Engagement team. (identify potential impact on technical workflows we may not be aware of, so that we may help accommodate those)

Status quo

The URL to WMF wiki main page varies by wiki configuration (site language, or hooks), and interface message overrides locally to the wiki. For example:

The following are HTTP 301 redirects to https://en.wikipedia.org/wiki/Main_Page:

The following are HTTP 301 redirects to https://fixcopyright.wikimedia.org/:

Examples of affected links:

  • Portals, such as https://www.wikipedia.org and https://www.wikimedia.org.
  • Language links in the sidebar of the main pages themselves.
  • Interwiki links, such as [[mw:]], or [[wikitech:]].
  • Browsing directly by entering the hostname of a wiki project.
  • Browsing by changing homepage address of one project into another (usually leads to a 404 Not Found, as "Wikipedia:Hauptseite" would not exist on nl.wikipedia.org).

Current issues

  • Accessing wiki projects by domain results in a redirect. (Subpar performance)
  • Address bars, urls and search results for our projects prominently expose the inconsistent naming conventions of each wiki. (Subpar user experience)
  • SEO. "Avoid Landing Page Redirects", Google PageSpeed, https://developers.google.com/speed/docs/insights/AvoidRedirects.
  • Difficulties with tooling. Performance tests are difficult to write in a way that targets a normal view of a main page without a redirect, due to the url not being deterministic or consistent. (Current workarounds: Using a ?whatever query string, which will serve the Main Page as the default title without redirect).
  • Monitoring such as "Is the Main Page for all projects up and responding content?" is not trivial, as simplistic tools do not follow redirects or consider a 301 it as success, even if the actual page with a random url is returning an error. In some places, Main_Page as a redirect is sometimes deleted, leading to false alarms.

Performance data

From Navigation Timing, over February 2019:

stat1007/hive
-- sampled views to enwiki/Main_Page
SELECT COUNT(*),SUM(event.redirecting) FROM event.NavigationTiming WHERE year=2019 AND month=2 AND wiki="enwiki" AND event.revId=870437359 AND event.action="view" AND event.isOversample=false;
-- sampled views to enwiki/Main_Page that involved a redirect
SELECT COUNT(*),SUM(event.redirecting) FROM event.NavigationTiming WHERE year=2019 AND month=2 AND wiki="enwiki" AND event.revId=870437359 AND event.action="view" AND event.isOversample=false AND event.redirecting != 0;
Sampled viewsSampled views (redirected)Time spent redirecting
31,7349,703840.039 s

This is from a 1:1000 sampling. This means that in February 2019, the Main Page had an estimated 31 million views from Grade A web browsers that completed their page load. Of these, over 9.7 million page views (30.5%) experienced a redirect. They cumulatively spent 233,344 hours (or over 27 years) waiting for a redirect (about 0.1 s each, on average).

Proposal 1:

I'd like us to consider changing the canonical URL to a the main page of Wikimedia wikis to be the domain root. This means https://www.wikidata.org/ would serve what we currently see at https://www.wikidata.org/wiki/Wikidata:Main_Page, for example.

MediaWiki provides a hook that allows the canonical url for a given title to be customised. This has been in use at translatewiki.net since 2015 (written about on Nixlas' blog, source code), and also used at WMF for the Fix Copyright campaign in 2018.

Once configured, all canonical access to the main page is automatically reflected accordingly by MediaWiki.

  • The link to the main page in the sidebar and on the logo will point to this.
  • When browsing from the talk page, what links here, history page, contributions, search results, it points to the canonical url.
  • When creating an internal link to it in wikitext like [[Main_Page]] this results in the correct HTML for an anchor link to the canonical url (e.g. <a href="/" title="Main Page">Main Page</a>).
  • When editing the main page, the purges sent to the CDN layer will be for the canonical url, as expected.
  • When manually browsing to /wiki/Main_Page, MediaWiki's router normalises this to the canonical url in the form of a HTTP 301 redirect. MediaWiki will serve the page as usual without redirect, and <link re=canonical> set to the canonical url, the same way we do for other non-canonical urls and article redirects (per T120085#5345448).
  • Configuration variables in JavaScript like wgIsMainPage and server-side checks like Title::isMainPage() all work as expected.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 520139 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/core@master] Add config for serving main Page from the domain root

https://gerrit.wikimedia.org/r/520139

Ladsgroup moved this task from Under discussion to Inbox on the TechCom-RFC board.Jul 1 2019, 11:28 PM
Ladsgroup added a subscriber: Ladsgroup.

Given that the last comment on this ticket was for around a year ago, I don't think it falls in category of "under discussion". It seems straightforward though.

@Ladsgroup "Under discussion" here merely means that it has a problem statement or objective that has been triaged and understood by TechCom, and that the author has signalled they are ready for wider feedback (e.g. to help with finding ways to solve it, and/or to get feedback on their own proposal). The "Backlog" on the other hand is used when the author is still working on the proposal and/or if there is not yet an objective that has been triaged by TechCom.

In that sense, this is "Under discussion". Feedback for a better name welcome at T216308 :).

Next steps are (mainly as note to myself):

  • reach out to relevant product owners and get their input and approval.
  • reach out to stake holder and get their input on the overall objective and my current (so far, only) proposal, take the input and amend the proposal as needed.

(Once approved by TechCom):

  • estimate the amount of engineering work (probably quite small amount, less than a week of 1 or 2 people in total).
  • figure out where resourcing would come from (could be done within perf, perhaps other teams would be interested as well and might be able to prioritise it earlier).
  • ask the teams that implementers would depend on during roll out, and find a common quarter in which we're comfortable seeing this rolled out.
Ladsgroup moved this task from Inbox to Under discussion on the TechCom-RFC board.Jul 2 2019, 4:58 PM

Thanks. The naming confused me and sorry for the mess.

IMO, there's three different questions that we should answer:

  • Should we define it as a config variable?
    • The answer seems to be yes, I don't think there's any objection towards it
  • Should we set the default to serve from root in WMF?
    • This should be done gradually, it's easy to undo, I don't personally think it's a big deal. Communication needed for sure but maybe a several emails to wikitech-ambassadors, wikitech and some messages in places like WP:VPT should be enough (I can do it if you're too busy)
  • Should the default for mediawiki be true?
    • TechCom can answer this but also it looks straightforward as it's easy to undo in case issues arise.
jcrespo updated the task description. (Show Details)Jul 2 2019, 5:07 PM
Krinkle added a comment.EditedJul 2 2019, 6:02 PM

[..] there's three different questions that we should answer:

  • Should we define it as a config variable? [..]

This is not required for the current RFC. MediaWiki supports the required functionality in core already. It can currently enabled with a 1-line hook callback, which is how translatewiki.net and FixcopyrightWiki do it already.

I consider it in-scope for code review (and not for TechCom/RFC, unless +2'ers disagree) to decide whether we want another configuration variable and the added maintenance (but also, testability) of maintaining the callback logic in core instead (e.g. inside Title::getLocalURL).

  • Should we set the default to serve from root in WMF? [..]

That is in essence what this RFC is about.

  • Should the default for mediawiki be true? [..]

This is orthogonal to this RFC, which is about WMF sites. See T216791#5013185.

This is not required for the current RFC. MediaWiki supports the required functionality in core already. It can currently enabled with a 1-line hook callback, which is how translatewiki.net and FixcopyrightWiki do it already.
I consider it in-scope for code review (and not for TechCom/RFC, unless +2'ers disagree) to decide whether we want another configuration variable and the added maintenance (but also, testability) of maintaining the callback logic in core instead (e.g. inside Title::getLocalURL).

ICYMI: https://gerrit.wikimedia.org/r/520139

BBlack added a subscriber: BBlack.EditedJul 18 2019, 12:17 PM

I like the end result here, and I don't think it's problematic from the Traffic perspective in the long view, but I think the initial rollout isn't so trivial:

  1. We do need to review our VCL with this in mind, in case it does interfere in trivial ways with existing rewrites and/or redirects, etc. There are some other areas SRE might need to review in general as well (e.g. internal loadbalancer/cache -driven healthchecks and general monitoring queries that are hitting either the root or Main_Page URIs of various wikis and expecting certain status).
  1. While MediaWiki's config might see this as a single flip of a switch, there are multiple conflicting changes being rolled out here which change the direction of the redirect arrrow between two high-traffic URIs. All such changes are effectively asynchronous (even with a manual purge), and therefore the blurry time domain of flipping a switch for this would result in 301 loops for / -> Main_Page -> / -> Main_Page -> ... for at least some caches and/or end-users for at least a brief window of time, and being such high-traffic URIs the redirect loops might cause an outage on our end as well. We might need to control for this with some temporary custom VCL that breaks the loops (e.g. when Varnish sees any wiki GET response to a root URI request which is a 301, it replaces that with a direct internal rewrite to the destination URI instead). We could deploy such a hack first, then flip the switch on the MediaWiki end, purge all the relevant URIs from caches, test things, and then remove the hack quickly afterwards.

[edit: fixed the proposed hack above, I had it backwards at first]

Oh one more thing that should've been (3) on that list:

I'm pretty sure UAs cache 301s "Permanently" as indicated, so there's another layer to redirect-loop onion where even if we serve non-looping URIs from our edge, the UAs' caching of historical 301s will still cause the looping to happen. That angle needs some more digging as well: which UAs do this, and under what conditions do they stop doing it and/or how can it be prevented in this scenario, etc.

We talked about this with @Tgr in the hackathon and one easy way to bypass the issue of the redirect loop is to serve the main page through both endpoints for at least a couple of months (and we can even keep it forever, like /w/index.php/Main_Page is also being served without redirect) which seems sensible to me but I'm not sure how to do it though.

kchapman added subscribers: CCicalese_WMF, kchapman.

@CCicalese_WMF could you review this from a product perspective and determine if it is something we want to do?

Change 520139 merged by jenkins-bot:
[mediawiki/core@master] Add config for serving main Page from the domain root

https://gerrit.wikimedia.org/r/520139

Izno added a subscriber: Izno.

This one will probably require a user notice before WMF rollout and maybe even a "do you guys want us to do this" question to the communities.

It looks like the above patch just adds the config option, so that it can be an option for 1.34 users. Does the old way to do this need deprecation notices?

Krinkle moved this task from To Triage to Not ready to announce on the User-notice board.EditedMon, Sep 23, 7:29 PM

This one will probably require a user notice before WMF rollout [..]

This is still an open RFC. Consultation with the community will be part of this RFC, including asking for input and feedback through Tech News before anything is approved, implemented or rolled out.

It looks like the above patch just adds the config option, so that it can be an option for 1.34 users. Does the old way to do this need deprecation notices?

There is no old way.

The old way was basically to write this manually through custom PHP code. This (experimental) configuration variable provides that same code as part of core now.

Izno added a comment.Mon, Sep 23, 10:51 PM

This is still an open RFC. [snip]

Totally missed this was in the RFCs bucket. (Probably just used to seeing RFC in the name of the task as with recent RFCs.)

Nikerabbit renamed this task from Serve Main Page of WMF wikis from a consistent URL to RFC: Serve Main Page of WMF wikis from a consistent URL.Tue, Sep 24, 6:41 AM
awight renamed this task from RFC: Serve Main Page of WMF wikis from a consistent URL to RFC: Serve Main Page of Wikimedia wikis from a consistent URL.Fri, Sep 27, 6:52 AM
Krinkle updated the task description. (Show Details)Tue, Oct 1, 10:56 PM
Krinkle updated the task description. (Show Details)Tue, Oct 1, 11:00 PM

I like the end result here, and I don't think it's problematic from the Traffic perspective in the long view, but I think the initial rollout isn't so trivial: [redirect loops]

We talked about this with @Tgr in the hackathon and one easy way to bypass the issue of the redirect loop is to serve the main page through both endpoints […]

Thanks, excellent point. I've adjusted the proposal to not redirect the old URL, but to keep it as-is, the same way we do with article redirects and other non-canonical URL representations. E.g. serve normally as 200 OK, but with <link rel=canonical> set to the canonical url, and with JS-rewrite of the address bar to the canonical variant as well.

From task description

Stakeholders:

  • Traffic team. (assert potential routing impact)
  • Reading Web team. (about SEO, and reader user experience)
  • Performance team. (believed to improve performance)
  • Core Platform Team. (core behaviour being utilised that previously has only been used by low-traffic wikis and third-parties)
  • Wikimedia communities via Tech News and Community Engagement team. (identify potential impact on technical workflows we may not be aware of, so that we may help accommodate those)

@BBlack has commented from Traffic. They raised no blocking concerns, and their feedback has resulted in a change to the proposal to not redirect the old URL (T120085#5345448, T120085#5539830).

Myself on behalf of Performance have already provided data to support the change and have no concerns either.

I've reached out to Community Engagement by e-mail to ask for their feedback and outreach.

I've tagged Reading-Web and CPT on the task here for their feedback from product and technical perspective.

Johan added a subscriber: Johan.Tue, Oct 1, 11:18 PM

How would you phrase this for inclusion in Tech News?

I've tagged Reading-Web and CPT on the task here for their feedback from product and technical perspective.

If I understand correctly we'll be choosing en.m.wikipedia.org as the canonical link for the main page. Given this is the most likely URL a user will enter and currently it redirects to /wiki/Main_Page it seems like this would reduce the amount of indirection to visitors to the main page which seems a good thing in terms of experience. In terms of SEO, I'm not sure how we could measure any impact here technically and whether it's worth it. Did you have any specific thoughts/concerns?

Keegan added a subscriber: Keegan.Thu, Oct 3, 7:22 PM

I've tagged Reading-Web and CPT on the task here for their feedback from product and technical perspective.

If I understand correctly we'll be choosing en.m.wikipedia.org as the canonical link for the main page.

@Jdlrobson clarifying question: en.m.wikipedia.org as the canonical link for the main page *on mobile*, correct? I'm pretty sure that's what you meant, but I don't want to assume.

Yup. en.wikipedia.org for desktop (which redirects to mobile en.m.wikipedia.org).

Change 540678 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Set $wgMainPageIsDomainRoot true for fixcopyrightwiki

https://gerrit.wikimedia.org/r/540678

Change 540679 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[operations/mediawiki-config@master] Get rid of main page hack for fixcopyrightwiki

https://gerrit.wikimedia.org/r/540679

Krinkle added subscribers: Esanders, ssastry.

@ssastry, @Esanders Hi - could you review this RFC for potential impact on Parsoid and VisualEditor?

The current proposal would make the canonical url for [[Main Page]] on most wikis result in <a href="/"> instead of <a href="/wiki/Main_Page">. I imagine this might impact Parsoid and/or VisualEditor if there are assumptions made about being able to reverse-engineer urls based on wgArticlePath (instead of the API deciding what urls are). Note that compliance is entirely optional, in that the old URL will continue to work, and it will continue to be valid to create URLs based on wgArticlePath. What changes is that canonical URLs created elsewhere (e.g. by the API) may be different for the Main Page.

So the question is whether it would be a problem if API responses start advertising this url alongside page titles. For example, from prefix search.

cscott added a subscriber: cscott.EditedThu, Oct 3, 9:13 PM

From Parsoid's perspective:

  1. is $wgMainPageIsDomainRoot available in SiteInfo? Parsoid/JS and the non-integrated mode of Parsoid/PHP would need this.
  1. currently [[Main Page]] yields <a href="./Main Page">. It sounds like that's fine for initial deployment, but if we eventually want this to yield <a href="../"> or <a href="/"> it complicates the task of recreating the title of a link from the A tag. We'd have to audit all the places that do that and ensure they all handle this case correctly, and only after doing so we could deploy a change that uses $wgMainPageIsDomainRoot from site info to emit <a href="../"> in the appropriate circumstances.
Johan added a comment.Thu, Oct 3, 11:43 PM

Something like this for Tech News? (Plus links and clearer handling of URLs.)

The URL of the main page of the Wikimedia wikis could be changed. This is because the way it is done now leads to several problems. For example https://www.wikidata.org/wiki/Wikidata:Main_Page would be https://www.wikidata.org instead. You can tell the developers if this would cause problems for your wiki.

Johan added a comment.Fri, Oct 4, 1:33 PM

This has now been added to Tech News.

Dcljr added a subscriber: Dcljr.EditedFri, Oct 4, 8:29 PM

[...] For example https://www.wikidata.org/wiki/Wikidata:Main_Page would be https://www.wikidata.org instead. You can tell the developers if this would cause problems for your wiki.

@Johan: Technically, given the discussion above, wouldn't it be more accurate to say

would be https://www.wikidata.org/ instead.

with a trailing slash?

Krinkle added a comment.EditedFri, Oct 4, 8:34 PM

Yes. (done)

Change 540971 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/core@master] Export $wgMainPageIsDomainRoot in siteinfo API

https://gerrit.wikimedia.org/r/540971

bd808 removed a subscriber: bd808.Fri, Oct 4, 9:21 PM
Bawolff added a subscriber: Bawolff.EditedFri, Oct 4, 11:35 PM

[...] For example https://www.wikidata.org/wiki/Wikidata:Main_Page would be https://www.wikidata.org instead. You can tell the developers if this would cause problems for your wiki.

@Johan: Technically, given the discussion above, wouldn't it be more accurate to say

would be https://www.wikidata.org/ instead.

with a trailing slash?

Not really. https://www.wikidata.org/ and https://www.wikidata.org are two different ways of writing the same URL. I think most web-browsers will normalize to https://www.wikidata.org without the trailing /. Firefox and Chrome do on www.wikipedia.org

Dcljr added a comment.Sat, Oct 5, 4:01 AM

Not really. https://www.wikidata.org/ and https://www.wikidata.org are two different ways of writing the same URL.

Yes, but what is the software actually doing?

Bawolff added a comment.EditedSat, Oct 5, 4:52 AM

Not really. https://www.wikidata.org/ and https://www.wikidata.org are two different ways of writing the same URL.

Yes, but what is the software actually doing?

Im not sure what you mean. I dont think its possible to distinguish in mediawiki between these 2 urls. They are both different ways to say the same thing: the path part of the url is empty. Im not sure about http/2 off the top of my head (i expect its the same) but in http 1.1 its impossible to distinguish between the 2 from mediawiki as both result in GET / HTTP/1.1. (i.e. when you visit a website the url is split at the / and the part before the / is transmitted seperately from the part after the / before the #)

So its basically up to thd browser what to display. Firefox and chrome seem to chose to remove the trailing /, as can be seen at https://www.wikipedia.org/ or https://translatewiki.net/

Dcljr added a comment.Sat, Oct 5, 7:37 AM

Yes, but what is the software actually doing?

Im not sure what you mean.

Change 520139, merged on Sep 23, adds this to MediaWiki.php:

if ( $this->config->get( 'MainPageIsDomainRoot' ) && $request->getRequestURL() === '/' ) {
  return false;
}

and this to Title.php:

if ( $wgMainPageIsDomainRoot && $this->isMainPage() && $query === '' ) {
  return '/';
}

I am not a developer, but this looks to me like the software is using the single slash to indicate the root document.

Yes, but when you visit the site it will get removed (in the interface). To put it another way, the / is used behind the scenes, but anything the user sees will not use the /.

Yair_rand added a subscriber: Yair_rand.EditedMon, Oct 7, 6:00 AM

How will this work for projects with a different main page for each language, eg Commons? The main page depends on the user's interface language. Normally, if you're a French-language user and you navigate to https://commons.wikimedia.org/ , you get redirected to https://commons.wikimedia.org/wiki/Accueil . Will https://commons.wikimedia.org/ still show the correct content to each user?

How will this work for projects with a different main page for each language, eg Commons? The main page depends on the user's interface language. Normally, if you're a French-language user and you navigate to https://commons.wikimedia.org/ , you get redirected to https://commons.wikimedia.org/wiki/Accueil . Will https://commons.wikimedia.org/ still show the correct content to each user?

That's a good point. I think most of this will work fine, however we may have cache pollution issues if someone with their language set to 'fr' writes [[Accueil]] on a page (If the page doesn't have an {{int: on it or otherwise is marked as varying by user language.)

This should definitely be tested with $wgForceUIMsgAsContentMsg = ['mainpage']; set

Very valid point, I personally would be okay with not turning on the config on wkis that set $wgForceUIMsgAsContentMsg = ['mainpage']; (like commons, wikidata, etc.)

Change 540678 merged by jenkins-bot:
[operations/mediawiki-config@master] Set $wgMainPageIsDomainRoot true for fixcopyrightwiki

https://gerrit.wikimedia.org/r/540678

Change 540679 merged by jenkins-bot:
[operations/mediawiki-config@master] Get rid of main page hack for fixcopyrightwiki

https://gerrit.wikimedia.org/r/540679

Mentioned in SAL (#wikimedia-operations) [2019-10-07T11:42:45Z] <lucaswerkmeister-wmde@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:540678|Set $wgMainPageIsDomainRoot true for fixcopyrightwiki (T120085)]] (duration: 00m 52s)

Mentioned in SAL (#wikimedia-operations) [2019-10-07T11:44:18Z] <lucaswerkmeister-wmde@deploy1001> Synchronized wmf-config/CommonSettings.php: SWAT: [[gerrit:540679|Get rid of main page hack for fixcopyrightwiki (T120085)]] (duration: 00m 52s)

Just want to emphasis that this config variable at this current state redirects /wiki/Main_Page to / and will cause redirect loops if we just turn it on, we need to make the config not redirect to the canonical place before moving forward.

MediaWiki does not HTTP redirect (at least not in translatewiki.net). Wikimedia has rewrites outside MediaWiki for this, right?

MediaWiki does not HTTP redirect (at least not in translatewiki.net). Wikimedia has rewrites outside MediaWiki for this, right?

Yes, I think they are apache redirects, T120085#5345448 Maybe we can make the redirects internal (turn them into rewrite). @BBlack knows better and explained some details in T120085#5345448 but maybe he can explain more

Very valid point, I personally would be okay with not turning on the config on wkis that set $wgForceUIMsgAsContentMsg = ['mainpage']; (like commons, wikidata, etc.)

To make it even more complicated, Wikidata redirects (or at least wants to redirect) all main pages to https://www.wikidata.org/wiki/Wikidata:Main_Page using wiki redirects (see https://www.wikidata.org/w/index.php?title=Wikidata:Hauptseite&action=edit for example) and uses in-page i18n, so this change would be safe to implement on WD (but not on Commons, MediaWiki.org etc.).

Cwek added a subscriber: Cwek.Tue, Oct 8, 1:05 AM
Pcoombe added a subscriber: DStrine.Tue, Oct 8, 2:44 PM

We're in peak fundraising season now, and I'm worried this might affect links to https://donate.wikimedia.org.

@DStrine Can someone from Fundraising Tech investigate this to see if it would cause any problems on donate or payments?

@Pcoombe I don't think this will go live before January, but if it helps, let's just exclude any and all changes from donatewiki!

I'd still very much like feedback from FR-Tech as the unique set up of donatewiki could expose additional compatibility concerns we need to consider, but I'd be fine with hearing those (and incorporating them) after January, possibly after it has gone live on other wikis already. We can keep iterating. It's also equally likely that in January we'll find there are no concerns unique to donatewiki, in which case we'll flip the switch there later at that time.

So if worried about prioritisation, feel free to push this back within FR-Tech :)

Change 540971 merged by jenkins-bot:
[mediawiki/core@master] Export $wgMainPageIsDomainRoot in siteinfo API

https://gerrit.wikimedia.org/r/540971