Page MenuHomePhabricator

Deploy regular running of wikidata constraint checks using the job queue
Open, NormalPublic3 Story Points

Description

Once T204022 is done and merged and deployed we can think about deploying the feature:

The feature should have a staged roll out, slowly ramping up the number of edits that the jobs are run for.
Throughout the time of the roll out metrics such as number of jobs in the queue etc should be monitored.
Throughout the rollout we also need to check the cache status and cache eviction rate, we probably can't fit results for all entities in the cache..
The job will only really take full effect when T204024 is also done persistently storing the data.

The config option was introduced in https://gerrit.wikimedia.org/r/#/c/463950/
The config options is wgWBQualityConstraintsEnableConstraintsCheckJobsRatio

Beta and test can probably be deployed quickly / in the same day (if everything goes fine).
wikidata.org should probably not proceed any faster than 1 increase per day.

Deployment List:

  • beta wikidata 100% - 16th Jan
  • testwikidata 50% - 16th Jan
  • testwikidata 100% - 16th Jan
  • wikidata.org 1% - 16th Jan
  • wikidata.org 5% - 17th Jan
  • wikidata.org 10% - 22nd jan EU Morning
  • wikidata.org 25% - 22nd jan EU Morning
  • wikidata.org 50%
  • wikidata.org 100%

When deploying and expecting a rate of around 10 jobs per second ping Services

Dashboards:

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Addshore moved this task from Ready to go to Needs Work on the Wikidata-Campsite board.EditedNov 13 2018, 2:36 PM
Addshore changed the task status from Open to Stalled.

This needs constraint checks to be setup on the beta cluster and testwikidata before we can proceed.

This in itself is an interesting thing to need, as without a query service for testwikidatawiki or beta testing this outside of production will end up not testing the "full" functionality.

TODO @Addshore create tasks

This one is still blocked on T209957 and T209922 being complete first.

Krinkle moved this task from Untriaged to Meta on the WMF-JobQueue board.Dec 20 2018, 7:58 PM
Addshore changed the task status from Stalled to Open.Jan 15 2019, 1:34 PM
Addshore updated the task description. (Show Details)Jan 15 2019, 2:05 PM
Addshore updated the task description. (Show Details)Jan 15 2019, 2:06 PM
Restricted Application added a project: User-Addshore. · View Herald TranscriptJan 16 2019, 9:32 AM

Change 484621 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] BETA wikidata: post edit constraint jobs on 100% of edits

https://gerrit.wikimedia.org/r/484621

Change 484622 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] testwikidata: post edit constraint jobs on 50% of edits

https://gerrit.wikimedia.org/r/484622

Change 484623 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] testwikidata: post edit constraint jobs on 100% of edits

https://gerrit.wikimedia.org/r/484623

Change 484624 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 1% of edits

https://gerrit.wikimedia.org/r/484624

Change 484625 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 5% of edits

https://gerrit.wikimedia.org/r/484625

Change 484629 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 10% of edits

https://gerrit.wikimedia.org/r/484629

Change 484630 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 25% of edits

https://gerrit.wikimedia.org/r/484630

Change 484633 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 50% of edits

https://gerrit.wikimedia.org/r/484633

Change 484635 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 100% of edits

https://gerrit.wikimedia.org/r/484635

Patches for the stages described in the ticket have all been created.

Change 484621 merged by jenkins-bot:
[operations/mediawiki-config@master] BETA wikidata: post edit constraint jobs on 100% of edits

https://gerrit.wikimedia.org/r/484621

Change 484622 merged by jenkins-bot:
[operations/mediawiki-config@master] testwikidata: post edit constraint jobs on 50% of edits

https://gerrit.wikimedia.org/r/484622

Mentioned in SAL (#wikimedia-operations) [2019-01-16T10:13:14Z] <addshore@deploy1001> sync-file aborted: testwikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 50 T204031 [[gerrit:484621]] (duration: 00m 00s)

Mentioned in SAL (#wikimedia-operations) [2019-01-16T10:14:13Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: testwikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 50 T204031 [[gerrit:484621]] (duration: 00m 52s)

Change 484623 merged by jenkins-bot:
[operations/mediawiki-config@master] testwikidata: post edit constraint jobs on 100% of edits

https://gerrit.wikimedia.org/r/484623

Mentioned in SAL (#wikimedia-operations) [2019-01-16T10:18:41Z] <addshore@deploy1001> sync-file aborted: testwikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 100 T204031 [[gerrit:484621]] (duration: 00m 02s)

Mentioned in SAL (#wikimedia-operations) [2019-01-16T10:19:41Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: testwikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 100 T204031 [[gerrit:484621]] (duration: 00m 52s)

Change 484624 merged by jenkins-bot:
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 1% of edits

https://gerrit.wikimedia.org/r/484624

Mentioned in SAL (#wikimedia-operations) [2019-01-16T10:38:25Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: wikidatawiki, wgWBQualityConstraintsEnableConstraintsCheckJobsRatio 1% T204031 [[gerrit:484621]] (duration: 00m 52s)

Change 484652 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] wgWBQualityConstraintsEnableConstraintsCheckJobs false

https://gerrit.wikimedia.org/r/484652

Change 484652 merged by jenkins-bot:
[operations/mediawiki-config@master] wgWBQualityConstraintsEnableConstraintsCheckJobs false

https://gerrit.wikimedia.org/r/484652

Need to investigate issues that happened. (meeting time now though)...

Change 484654 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/WikibaseQualityConstraints@master] Fix constraintsRunCheck Job class

https://gerrit.wikimedia.org/r/484654

Change 484655 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/WikibaseQualityConstraints@wmf/1.33.0-wmf.12] Fix constraintsRunCheck Job class & test

https://gerrit.wikimedia.org/r/484655

Change 484656 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/WikibaseQualityConstraints@wmf/1.33.0-wmf.13] Fix constraintsRunCheck Job class & test

https://gerrit.wikimedia.org/r/484656

Change 484654 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@master] Fix constraintsRunCheck Job class & test

https://gerrit.wikimedia.org/r/484654

Change 484655 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@wmf/1.33.0-wmf.12] Fix constraintsRunCheck Job class & test

https://gerrit.wikimedia.org/r/484655

Change 484656 merged by jenkins-bot:
[mediawiki/extensions/WikibaseQualityConstraints@wmf/1.33.0-wmf.13] Fix constraintsRunCheck Job class & test

https://gerrit.wikimedia.org/r/484656

Mentioned in SAL (#wikimedia-operations) [2019-01-16T12:39:58Z] <addshore@deploy1001> Synchronized php-1.33.0-wmf.13/extensions/WikibaseQualityConstraints: [[gerrit:484654]] T204031 T204022 Fix constraintsRunCheck Job class & test (duration: 00m 57s)

Mentioned in SAL (#wikimedia-operations) [2019-01-16T12:40:59Z] <addshore@deploy1001> Synchronized php-1.33.0-wmf.12/extensions/WikibaseQualityConstraints: [[gerrit:484654]] T204031 T204022 Fix constraintsRunCheck Job class & test (duration: 00m 54s)

I've noticed some errors in the JobQueue with the message like 'Cannot instantiate job 'constraintsRunCheck': bad spec!'

Here's the job:

{
    "database": "wikidatawiki",
    "mediawiki_signature": "5de8d10b6415ec7910a231f5234dd00683ca2839cfff23f2c333755640f487af",
    "meta": {
      "domain": "www.wikidata.org",
      "dt": "2019-01-16T10:59:29+00:00",
      "id": "cba98c55-197d-11e9-9906-1418776139a6",
      "request_id": "XD8OkApAAE4AAHV3BbkAAACX",
      "schema_uri": "mediawiki/job/2",
      "topic": "mediawiki.job.constraintsRunCheck",
      "uri": "https://www.wikidata.org/wiki/Special:Badtitle/JobSpecification"
    },
    "page_namespace": -1,
    "page_title": "Special:Badtitle/JobSpecification",
    "params": {
      "entityId": "Q40282624"
    },
    "type": "constraintsRunCheck"
}

I've noticed some errors in the JobQueue with the message like 'Cannot instantiate job 'constraintsRunCheck': bad spec!'

Yep, that was patched with the above backports.
Going to put it back on test in the next hour, and if all looks good on 1% of edits on wikidata.org

Change 484719 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] ConstraintsCheckJobs enabled on testwikidatawiki

https://gerrit.wikimedia.org/r/484719

Change 484719 merged by jenkins-bot:
[operations/mediawiki-config@master] ConstraintsCheckJobs enabled on testwikidatawiki

https://gerrit.wikimedia.org/r/484719

Mentioned in SAL (#wikimedia-operations) [2019-01-16T18:03:11Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: ConstraintsCheckJobs enabled on testwikidatawiki T204031 (duration: 00m 52s)

Change 484723 had a related patch set uploaded (by Addshore; owner: Addshore):
[operations/mediawiki-config@master] ConstraintsCheckJobs enabled on testwikidatawiki (1% of edits)

https://gerrit.wikimedia.org/r/484723

Change 484723 merged by jenkins-bot:
[operations/mediawiki-config@master] ConstraintsCheckJobs enabled on testwikidatawiki (1% of edits)

https://gerrit.wikimedia.org/r/484723

Mentioned in SAL (#wikimedia-operations) [2019-01-16T18:13:56Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: ConstraintsCheckJobs enabled on wikidatawiki (1% of edits) T204031 (duration: 00m 51s)

Addshore updated the task description. (Show Details)Jan 16 2019, 6:18 PM

Change 484773 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/change-propagation/jobqueue-deploy@master] Add the constraintsRunCheck job definition

https://gerrit.wikimedia.org/r/484773

Addshore updated the task description. (Show Details)Jan 17 2019, 4:24 PM

Change 484625 merged by jenkins-bot:
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 5% of edits

https://gerrit.wikimedia.org/r/484625

Mentioned in SAL (#wikimedia-operations) [2019-01-17T17:38:02Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: ConstraintsCheckJobs on wikidatawiki (5% of edits) T204031 (duration: 00m 52s)

Addshore updated the task description. (Show Details)Jan 17 2019, 5:38 PM
Addshore updated the task description. (Show Details)

5% gives us around 0.6 - 0.8 jobs per second


Lets hold it there for now and more forward to 10% tomorrow

Change 484629 merged by jenkins-bot:
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 10% of edits

https://gerrit.wikimedia.org/r/484629

Mentioned in SAL (#wikimedia-operations) [2019-01-22T10:08:42Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T204031 wikidata: post edit constraint jobs on 10% of edits (duration: 00m 47s)

Change 484630 merged by jenkins-bot:
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 25% of edits

https://gerrit.wikimedia.org/r/484630

Mentioned in SAL (#wikimedia-operations) [2019-01-22T10:20:51Z] <addshore@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T204031 wikidata: post edit constraint jobs on 25% of edits (duration: 00m 45s)

Addshore added a comment.EditedJan 22 2019, 10:39 AM

So, the increase to 25% of edits triggering jobs seems to be adding between 2 and 2.5 jobs per second to the queue (estimation prior was 1.6-4.1).
Job concurrency is now at about 6.

Default low volume queue concurrency is 50 per https://github.com/wikimedia/mediawiki-services-change-propagation-jobqueue-deploy/blob/master/scap/vars.yaml#L101
So getting this to 100% without hitting the limit should be no problem.

I appears there is so no deduplication happening yet, probably down to the fact that we are only at 25% so edits to the same entities in rapid succession may not currently be getting jobs triggered, the queue is also staying empty and jobs are being processed in 1-2s.

The QPS on the wdqs-internal cluster has increased from 50ops to 150ops due to the increased number of constraint checks.

Regarding deduplication it may be worth adding a delay to the jobs using jobReleaseTimestamp from the mediawiki job definition to slightly delay the job execution giving more chance of deduplication from consecutive edits.

After checking with @Gehel the QPS on the wdqs is not too important, instead we should just be looking at the load.
Load didn't increase much so that is a good indication that we are fine to continue from the wdqs side.
If the wdqs is struggling it will return 429s and quality constraints will throttle, so that's all we need to keep an eye out for.

Default low volume queue concurrency is 50 per https://github.com/wikimedia/mediawiki-services-change-propagation-jobqueue-deploy/blob/master/scap/vars.yaml#L101

So getting this to 100% without hitting the limit should be no problem.

However, those 50 slots are shared between many jobs types. That's why we created https://gerrit.wikimedia.org/r/#/c/mediawiki/services/change-propagation/jobqueue-deploy/+/484773/ - to be deployed today.

Change 484773 merged by Ppchelko:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Add the constraintsRunCheck job definition

https://gerrit.wikimedia.org/r/484773

Mentioned in SAL (#wikimedia-operations) [2019-01-22T17:29:29Z] <ppchelko@deploy1001> Started deploy [cpjobqueue/deploy@afca813]: Add the constraintsRunCheck job definition T204031

Mentioned in SAL (#wikimedia-operations) [2019-01-22T17:30:25Z] <ppchelko@deploy1001> Finished deploy [cpjobqueue/deploy@afca813]: Add the constraintsRunCheck job definition T204031 (duration: 00m 55s)

Change-Prop now has a separate rule for this job, so it doesn't interfere with low-traffic jobs. The concurrency is set to 30, which should be enough, however after deploying 100% we need to monitor the lag and increase concurrency if we see the job is steadily lagging.

However, those 50 slots are shared between many jobs types. That's why we created https://gerrit.wikimedia.org/r/#/c/mediawiki/services/change-propagation/jobqueue-deploy/+/484773/ - to be deployed today.

Aaah, now that I didn't know!

Change-Prop now has a separate rule for this job, so it doesn't interfere with low-traffic jobs. The concurrency is set to 30, which should be enough, however after deploying 100% we need to monitor the lag and increase concurrency if we see the job is steadily lagging.

Great, I'll look at increasing from 25% to 50% at some point today (or tomorrow)

So checking back at how this is affected WDQS, we started getting throttled queries.

So we will pause and assess the situation and see if we can reduce the number of queries.

It looks like the rate of throttled requests might have got slightly better in the past 2 days

Going to move this to stalled on the campsite board for now as we can't move forward with this directly.

Addshore removed Addshore as the assignee of this task.Mar 14 2019, 11:01 AM

Going to un assign myself from this for now but leave it in the camp board stalled column, we need to put some more tickets through the camp before we can turn this up a bit more.