Page MenuHomePhabricator

Provide canonical data on mobile domains
Closed, ResolvedPublic

Description

Each Wikimedia wiki has a canonical domain (e.g. meta.wikimedia.org). Virtually all have a modified domain for mobile traffic (e.g. meta.m.wikimedia.org). Correctly matching the two is important for things like combining unique devices data, but is difficult to do correctly.

One query-time strategy that works correctly in almost all cases is this:

-- Strip mobile subdomains so mobile and desktop sites are combined. 
REGEXP_REPLACE(
    REGEXP_REPLACE(
        -- The canonical domains for Wikidata and MediaWiki.org start with `www`, which 
        -- gets _replaced_ by the mobile subdomain. Combine the two possibilites for each site.
        REGEXP_REPLACE(
            REGEXP_REPLACE(domain, "^m\\\\.wikidata", "www.wikidata"),
        "^m\\\\.mediawiki", "www.mediawiki"),
    "^m\\\\.", ""),
"\\\\.m\\\\.", ".")

However, this is clumsy (and will get clumsier after we add Wikifunctions alongside Wikidata and MediaWiki.org).

A much better strategy would be to provide canonical translations as part of our canonical data.

What's the canonical version?

As with so many things, there is no nice canonical dataset lying around. The best we can do is find the canonical set of rules and mirror them in our canonical data code.

But what are those rules?

There is the Varnish frontend code that redirects mobile devices to the mobile sites: https://github.com/wikimedia/operations-puppet/blob/production/modules/varnish/templates/text-frontend.inc.vcl.erb#L134

However, this doesn't catch all the cases (my best guess is that these are relatively minor cases where the lack of automatic mobile redirection has gone unnoticed; I've filed this as T344175). Misses include:

  • vrt-wiki.wikimedia.org
  • api.wikimedia.org
  • www.wikifunctions.org
  • wikimania.wikimedia.org
  • hi.wikimedia.org
  • foundation.wikimedia.org

But the real source of truth is the MobileFrontend configuration code (in the wgMobileUrlTemplate) that specifies where the mobile site is actually served: https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/InitialiseSettings.php#L6903

Notes

  • "Virtually all sites have a modified domain": at least one site (wikitech.wikimedia.org) uses the same domain for its mobile site. A handful of tiny, usually long-closed sites seem to have no mobile site (e.g. transitionteam.wikimedia.org) at all (the "mobile view" link is broken).

Event Timeline

I figured out how to apply the MobileFrontend rules in our context and I've put up a pull request: https://github.com/wikimedia-research/canonical-data/pull/8

Now I need some code review!

nshahquinn-wmf triaged this task as Medium priority.
nshahquinn-wmf moved this task from Incoming to Needs sign-off on the Movement-Insights board.

The first PR was merged, but I noticed a few errors that still need to be fixed: https://github.com/wikimedia-research/canonical-data/pull/9

The second PR has been merged and the updated data pushed. I still need to update the documentation and publicize the new feature.

I've added documentation to DataHub and posted an announcement in the working-with-data Slack channel. Now I just need to do T344185.

I just realized that I missed applying the "wikitech" rule to "labtestwiki". An extremely minor point, but still one worth correcting.