Page MenuHomePhabricator

Decide how to configure $wgMobileUrlCallback during mobile domain sunset
Closed, ResolvedPublic

Description

This task represents Prep #3 of RFC: Mobile domain sunsetting, and is a blocker to WE6.4.4 (FY25-26 Q1), as tracked by the parent task T214998: RFC: Serve mobile and desktop variants through the same URL (unified mobile routing).

  1. Disable MobileFrontendHooks::onTitleSquidURLs in wmf-config, in favour of a local hook. This is to allow us to disable $wgMobileUrlCallback during the rollout (to promote the standard URL), whilst still keeping purges in-tact for old URLs during the transition. Otherwise, MobileFrontend would stop sending purges during the transition.

Background info

The $wgMobileUrlCallback config variable is set in production wmf-config (Codesearch: MobileUrlCallback) to a function that inserts the m. subdomain in the appropiate position. This is used in MobileFrontend for two purposes:

  • A: MobileContext::hasMobileDomain() returns true whenever this variable is set (instead of the default null) and returns a modified value. When wgMobileUrlCallback is not set, or when the function returns the input unchanged (as it does e.g. for wikitech.wikimedia.org and login.wikimedia.org today), then hasMobileDomain returns false.
    • This in turn powers dozens of side-effects in other MediaWiki features but inside MobileFrontend itself, it primarily does one thing: It controls whether the "Mobile view" link, in the site footer below articles, moves you to a the m-dot domain, or merely sets a cookie on the current domain. (Codesearch: hasMobileDomain)
    • Similarly, it also controls the addition of the discovery link in the HTML head on desktop responses: <link rel="alternate" media="only screen and (max-width: 720px)" href="//en.m.wikipedia.org/wiki/Pagename"/>
  • B: MobileFrontendHooks::onTitleSquidURLs adds the mobile version of all given URLs to the list of URLs to purge in Varnish.

In the early phases of the rollout, we will make the canonical domain vary between serving desktop HTML and serving mobile HTML (rather than today, where it varies between desktop HTML and mobile redirect), while the mobile domain remains unchanged.

Note that from the MediaWiki perspective, both today, and after the rolling, the mechanism by which MobileFrontend is activated is not the URL, but the MFMobileHeader HTTP header (X-Subdomain). Today, the Varnish layer removes the m-dot from the URL, and then sends a regular pageview to MediaWiki with the MF HTTP header set. After the rolling, Varnish will be doing that, and doing the same thing on the canonical domain as well for mobile user agents.

Scope

  • Decide how to configure MobileFrontend to act like there is no mobile domain (so that hasMobileDomain returns false, and thus any code generating or modifying URLs becomes inert, and thus the new experience on the canonical domain stays within the canonical domain). It is important at this stage, that we do still purge both URLs for compatibility and to avoid stale caches.

Sign off:

  • Consult and validate technical approach from Varnish side (SRE Traffic)

Proposal

  1. Add a local onTitleSquidURLs hook in wmf-config that is effectively a standalone copy of MobileFrontendHooks::onTitleSquidURLs, that when wgMobileUrlCallback is null, will adds URLs by applying today's local wmfMobileUrlCallback function directly.
  2. Apply Varnish changes to pilot domain, which serve MF pageview directly on canonical domain for mobile user agents.
  3. Set wgMobileUrlCallback to null for pilot wikis. This will disable purpose A above, and thanks to the local hook in wmf-config will not disable purpose B so that purges keep working.

The execution of these steps is not in scope for this task. This task is about deciding how we configure the MobileFrontend side, and what order we need to do things in during the rollout.

Event Timeline

In reviewing text-frontend.inc.vcl.erb just now, I noticed that it excludes thankyou.wikipedia.org, but MediaWiki/MobileFrontend are not aware of this. This has little to no impact because it has a custom stylesheet that hides the site footer, and it is receives very few edits. But:

  1. if you access a page without custom styling on that domain (e.g. https://thankyou.wikipedia.org/w/index.php?title=Special:BlankPage&safemode=on&useskin=vector), the footer has a "Mobile view" link pointing to thankyou.m.wikipedia.org, which doesn't exist in DNS.
  2. when Fundraising staff edit pages on this wiki, MediaWiki sends extra purges to Varnish for thankyou.m.wikipedia.org, despite there being no such domain name and thus no such cache entries.

Ref T259002#6364656, T321520, T152882.

Change #1174578 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/mediawiki-config@master] MobileUrlCallback: Disable for thankyou.wikipedia.org

https://gerrit.wikimedia.org/r/1174578

Krinkle updated the task description. (Show Details)

In reviewing text-frontend.inc.vcl.erb just now, I noticed that it excludes thankyou.wikipedia.org, but MediaWiki/MobileFrontend are not aware of this. […]

[…] e.g. https://thankyou.wikipedia.org/w/index.php?title=Special:BlankPage&safemode=on&useskin=vector, the footer has a "Mobile view" link pointing to thankyou.m.wikipedia.org, which doesn't exist in DNS. […]

The same is true for nostalgia.wikipedia.org and donate.wikipedia.org as well.

https://nostalgia.wikipedia.org/wiki/Philosophy responses include:

<link rel="alternate" media="only screen and (max-width: 640px)" href="//nostalgia.m.wikipedia.org/wiki/Philosophy">

But https://nostalgia.m.wikipedia.org/wiki/Philosophy doesn't load because nostalgia.m.wikipedia.org isn't defined in DNS.

Change #1176725 had a related patch set uploaded (by Krinkle; author: Krinkle):

[operations/dns@master] wikipedia.org: Fix grouping of wikis and non-wikis

https://gerrit.wikimedia.org/r/1176725

Change #1176725 merged by Ssingh:

[operations/dns@master] wikipedia.org: Fix grouping of wikis and non-wikis

https://gerrit.wikimedia.org/r/1176725

@ssingh Could you look-over the approach in the task description for anything of concern to your team and the CDN stack that I may have missed? This is the same as in decision brief that Brandon read back in March, but with more detail.

Once signed off, I will start implementing over at T401595: [Pilot Rollout] Implement unified mobile routing and enable on wikitech.wikimedia.org.

Krinkle triaged this task as Medium priority.Aug 11 2025, 4:02 PM

Change #1174578 merged by jenkins-bot:

[operations/mediawiki-config@master] Disable MobileFrontend on thankyou.wikipedia.org and nostalgia.wikipedia.org

https://gerrit.wikimedia.org/r/1174578

Mentioned in SAL (#wikimedia-operations) [2025-08-11T16:19:39Z] <krinkle@deploy1003> Started scap sync-world: Backport for [[gerrit:1174578|Disable MobileFrontend on thankyou.wikipedia.org and nostalgia.wikipedia.org (T400855 T152882)]]

Mentioned in SAL (#wikimedia-operations) [2025-08-11T16:21:28Z] <krinkle@deploy1003> krinkle: Backport for [[gerrit:1174578|Disable MobileFrontend on thankyou.wikipedia.org and nostalgia.wikipedia.org (T400855 T152882)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-08-11T16:29:31Z] <krinkle@deploy1003> Finished scap sync-world: Backport for [[gerrit:1174578|Disable MobileFrontend on thankyou.wikipedia.org and nostalgia.wikipedia.org (T400855 T152882)]] (duration: 09m 52s)

@ssingh Could you look-over the approach in the task description for anything of concern to your team and the CDN stack that I may have missed? This is the same as in decision brief that Brandon read back in March, but with more detail.

Once signed off, I will start implementing over at T401595: [Pilot Rollout] Implement unified mobile routing and enable on wikitech.wikimedia.org.

Thanks, we will be discussing this today in the Traffic meeting and I will follow up after that.

@Krinkle - Diving into the specifics a bit (I think all of this will be clearer if you make a prototype VCL patch for phase 1 maybe), and re-stating/asking things for clarity:

  • The VCL plan is more or less this?
    • The current mobile_redirect VCL code should, instead of actually redirecting to m-dot, just set the X-Subdomain header and carry on with the direct request.
    • Edge caches should vary on the X-Subdomain set above
    • Edge caches should still rewrite the m-dot domain to the desktop domain before talking to MediaWiki
    • When the host header of the incoming domain was already m-dot, should the VCL run the mobile_redirect code anyways and dynamically set X-Subdomain, or should it just force X-Subdomain: m because the request was for m-dot? (I suspect the former)
    • Inside the mobile redirect conditional logic, what do we do with the special conditions like:
      • && req.http.Cookie !~ "(stopMobileRedirect=true|mf_useformat=desktop)"
      • && req.url !~ "[?&]mobileaction=toggle_view_desktop(&|$)"
    • ^ It seems like maybe we need to add an inverse condition for at least ?mf_useformat=mobile, maybe something eles with toggle_view, etc?

For this task, I mainly want to review the purge strategy. Today, the MobileFrontend extension uses a single mechanism to decide whether a wiki uses an m-dot domain or the same-domain. This controls output (where the "Mobile view" link points, and where the <link rel=alternate> tag points) and controls whether MediaWiki emit purges for m-dot URLs.

@BBlack That sounds correct, yes.

Today: Canonical requests sometimes redirect to m-dot, and m-dot requests always render a mobile version.

Future: Canonical requests sometimes redirect to m-dot, canonical requests sometimes render a mobile version, and m-dot requests always render a mobile version.

I believe the following is how this works today:

  • In vcl_recv > cluster_fe_recv_pre_purge, we rewrite m-dot requests to add http.X-Subdomain = "M"; and turn m-dot domains back into canonical domains.
  • In vcl_recv > cluster_fe_recv > mobile_redirect, we return a synthetic redirect to an m-dot domain if
    • 1) request method is GET/HEAD,
    • 2) and, request has a mobile user agent,
    • 3) and, the request does not opt-out (e.g. opt-out cookies, or mobile-ish UA that prefers desktop, or a Googlebot-for-Commons request as of T397267).
    • 4) and, the request is not a toggle request (mobileaction) which are POST-like in that MobileFrontend will respond to such request by saving an opt-out cookie on the current domain, and then redirecting to the "other" domain (mobile to desktop, or desktop to mobile).
    • 5) is a mobile-enabled domain (i.e. the canonical domain of a MW-powered site with MobileFrontend enabled and an m-dot DNS entry)

The first change is effectively:

  • change the m-dot clause that sets X-Subdomain = "M" to do so for these pilot domains as well.
  • change the opt-out condition to include a sub clause with a list of pilot domains that we won't redirect. (This may be redundant in implementation, since we already exclude X-Subdomain from redirects. While that should be an impossible condition today, it's a safe guard we could start relying on explicitly to avoid needing to duplicate or store the list.)

A later second change will be to redirect m-dot to canonical, but that is several months out and not in play right now.

This is why the purge strategy is important, because it means during the transition period we want MobileFrontend to primarily act as if there is no m-dot domain, but still support m-dot traffic. For pageviews that's fine because we'll use the same X-header to enable it. It won't know the difference. But... purges. Unless we do develop something new, we'd serve stale content on m-dot URLs, right?

In reviewing the cluster_fe_recv_pre_purge code today, I was surprised to see the m-dot translation is "pre purge" without any (obvious) guard from happening on purge requests. This suggests a PURGE for https://en.m.wikipedia.org/wiki/Main_Page is translated into a purge for https://en.wikipedia.org/wiki/Main_Page (with an X-header that Varnish presumably ignores), which in turn suggests that we don't really need to be emitting these duplicate purges in the first place. Is that so?

[Just as an FYI, @BCornwall will be working from Traffic on this.]

Krinkle claimed this task.
Krinkle removed a project: Reader Experience Team.
Krinkle updated the task description. (Show Details)

Continuing at T401595.