Page MenuHomePhabricator

Enable page schemas on the beta cluster
Closed, ResolvedPublic2 Estimated Story Points

Description

This task encompasses whatever work is necessary to enable page schema QA on the beta cluster:

Something like the following config will be needed:

// --- Some form of these settings is hopefully already enabled. ---
require_once "$IP/extensions/Wikibase/vendor/autoload.php";
require_once "$IP/extensions/Wikibase/lib/WikibaseLib.php";
require_once "$IP/extensions/Wikibase/repo/Wikibase.php";
require_once "$IP/extensions/Wikibase/repo/ExampleSettings.php";
require_once "$IP/extensions/Wikibase/client/WikibaseClient.php";
require_once "$IP/extensions/Wikibase/client/ExampleSettings.php";
$wgEnableWikibaseRepo = true;
$wgEnableWikibaseClient = true;
// --- /Some form of these settings is hopefully already enabled. ---

// I hope this terrible workaround won't be needed but it was necessary to get my local Wikibase repo
// instance to actually link data.
$wgWBClientSettings[ 'siteGlobalID' ] = 'enwiki';

// Enable page schemas on the main, a non-main (File), Item, and Property pages. We really just need
// main but the others might be useful.
$wgWBClientSettings[ 'pageSchemaNamespaces' ] = [ 0, 6, 120, 122 ];

// Enable 50% sampling.
$wgWBClientSettings[ 'pageSchemaSplitTestSamplingRatio' ] = 0.5;
// Split bucketed pages into control and treatment groups. Only the latter will receive the schema changes.
// Given 50% sampling, 25% of pages will receive the new treatment and 75% will be unchanged.
$wgWBClientSettings[ 'pageSchemaSplitTestBuckets' ] = [ 'control', 'treatment' ];

^That's about what I use in development. The bare minimum of changes needed are:

$wgWBClientSettings[ 'pageSchemaNamespaces' ] = [ 0 ];
$wgWBClientSettings[ 'pageSchemaSplitTestSamplingRatio' ] = 0.5;
$wgWBClientSettings[ 'pageSchemaSplitTestBuckets' ] = [ 'control', 'treatment' ];

But I'm hoping it'll kind of just work with the beta Wikidata instance.

Note: the order of $wgWBClientSettings[ 'pageSchemaSplitTestBuckets' ] buckets matter. Please copy as described, [ 'control', 'treatment' ]. The reason is that we'll later use this configuration in production and we want to identify pages in the new treatment prior using a database query on page_random. The new treatment bucket occupies the upper half of the range in this order and would occupy the lower half if flipped. It's not a big deal if we botch it but it'll save requerying later if we're mindful and consistent about it.

Developer notes

Event Timeline

Restricted Application changed the subtype of this task from "Deadline" to "Task". · View Herald TranscriptNov 5 2018, 7:17 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@Jdlrobson, @phuedx, any help you can provide would be very appreciated. The bare minimum of changes needed are:

$wgWBClientSettings[ 'pageSchemaNamespaces' ] = [ 0 ];
$wgWBClientSettings[ 'pageSchemaSplitTestSamplingRatio' ] = 0.5;
$wgWBClientSettings[ 'pageSchemaSplitTestBuckets' ] = [ 'control', 'treatment' ];

I'm hoping it'll kind of just work with the beta Wikidata instance but additional settings are described in the task in the event that it doesn't.

Jdlrobson moved this task from Needs Prioritization to Upcoming on the Web-Team-Backlog board.
Jdlrobson added a subscriber: pmiazga.

With respect to the last note, maybe @phuedx can help us out in the absence of @pmiazga

ovasileva raised the priority of this task from Medium to High.Nov 5 2018, 10:32 PM

You'll need to link pages on the beta cluster with the wikidata beta cluster

I'm not sure if this functionality is broken right now or not. It does not work in development (see T205922, T206567). A link seems to be made automatically for pages that have the same name as the Q item like, "kitten", when I override $wgWBClientSettings[ 'siteGlobalID' ] = 'enwiki';. The BC is currently down but hopefully there's a kitten entry there, possibly Q147, and Kitten will automatically link there when it comes back without changes to siteGlobalID but this might require some fiddling and perhaps shouldn't be attempted until T208809 is resolved.

☝️

  • A SWAT will be needed to turn it on. Given this is a beta cluster only change, if you find a willing deployer, you can do it outside the normal SWAT windows.

The config is automatically deployed to the Beta Cluster when a config change is merged by the beta-mediawiki-config-update-eqiad Jenkins job.

☝️

  • A SWAT will be needed to turn it on. Given this is a beta cluster only change, if you find a willing deployer, you can do it outside the normal SWAT windows.

The config is automatically deployed to the Beta Cluster when a config change is merged by the beta-mediawiki-config-update-eqiad Jenkins job.

yes, but someone still has to keep the repository's files in sync in production, or the next deployer in production can throw the change out

Change 471948 had a related patch set uploaded (by Phuedx; owner: Phuedx):
[operations/mediawiki-config@master] BC: Enable Schema.org page split test

https://gerrit.wikimedia.org/r/471948

Please review ☝️, especially for precision of language.

@Niedzielski: You'll note that I've put (what I consider to be) safe defaults in wmf-config/Wikibase.php. These defaults will be set for all wikis. If they aren't safe, then please LMK!

ovasileva set the point value for this task to 2.Nov 6 2018, 5:36 PM

I've scheduled rOMWC61479ef8e135: BC: Enable Schema.org page split test for deployment during the Morning SWAT window as I might not be available for the entirety of the European Mid-day SWAT window.

I've rescheduled the change to be deployed in the European Mid-day SWAT window.

Change 471948 merged by jenkins-bot:
[operations/mediawiki-config@master] BC: Enable Schema.org page split test

https://gerrit.wikimedia.org/r/471948

Mentioned in SAL (#wikimedia-operations) [2018-11-07T12:36:36Z] <zfilipin@deploy1001> Synchronized wmf-config: SWAT: [[gerrit:471948|BC: Enable Schema.org page split test (T208763)]] (duration: 00m 54s)

This took a little while to test! However, after linking Q390678 to the Foooooo article on the Beta Cluster I see the following at the bottom of that article:

{
  "@context": "https://schema.org",
  "@type": "Article",
  "name": "Foooooo",
  "url": "https://en.wikipedia.beta.wmflabs.org/wiki/Foooooo",
  "sameAs": "http://www.wikidata.org/entity/Q390678",
  "mainEntity": "http://www.wikidata.org/entity/Q390678",
  "author": {
    "@type": "Organization",
    "name": ""
  },
  "publisher": {
    "@type": "Organization",
    "name": "",
    "logo": {
      "@type": "ImageObject",
      "url": ""
    }
  },
  "datePublished": "2014-01-02T23:05:11Z",
  "dateModified": "2014-01-02T23:05:11Z",
  "image": "https://upload.beta.wmflabs.org/wikipedia/commons/4/4e/Www_2014-01-02_15-04.png",
  "headline": "gcdXUtUqFgyFrCpkeZYw"
}

Are the odd values in the author and publisher fields are a consequence of not having set some system messages on the Beta Cluster?

I can also confirm that I don't see the above type of output on the bottom of all articles with linked Wikidata items (again, on the Beta Cluster).

verified on beta cluster, works as expected.

pmiazga removed pmiazga as the assignee of this task.
pmiazga removed a project: Patch-For-Review.