Page MenuHomePhabricator

Deploy regular running of wikidata constraint checks using the job queue
Open, MediumPublic3 Estimated Story Points

Description

Once T204022 is done and merged and deployed we can think about deploying the feature:

The feature should have a staged roll out, slowly ramping up the number of edits that the jobs are run for.
Throughout the time of the roll out metrics such as number of jobs in the queue etc should be monitored.
Throughout the rollout we also need to check the cache status and cache eviction rate, we probably can't fit results for all entities in the cache..
The job will only really take full effect when T204024 is also done persistently storing the data.

The config option was introduced in https://gerrit.wikimedia.org/r/#/c/463950/
The config options is wgWBQualityConstraintsEnableConstraintsCheckJobsRatio

Beta and test can probably be deployed quickly / in the same day (if everything goes fine).
wikidata.org should probably not proceed any faster than 1 increase per day.

Deployment List:

  • beta wikidata 100% - 16th Jan 2019
  • testwikidata 50% - 16th Jan 2019
  • testwikidata 100% - 16th Jan 2019
  • wikidata.org 1% - 16th Jan 2019
  • wikidata.org 5% - 17th Jan 2019
  • wikidata.org 10% - 22nd jan EU Morning 2019
  • wikidata.org 25% - 22nd jan EU Morning 2019
  • wikidata.org 40% - 1st Feb 2021
  • wikidata.org 50% - 11th Feb 2021
  • wikidata.org 60% - 12th Apr 2021
  • wikidata.org 70% - 3rd May 2021

...

  • wikidata.org 100%

When deploying and expecting a rate of around 10 jobs per second ping Services

Dashboards:

Details

ProjectBranchLines +/-Subject
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -0
mediawiki/services/change-propagation/jobqueue-deploymaster+2 -0
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+1 -0
mediawiki/extensions/WikibaseQualityConstraintswmf/1.33.0-wmf.13+11 -1
mediawiki/extensions/WikibaseQualityConstraintswmf/1.33.0-wmf.12+11 -1
mediawiki/extensions/WikibaseQualityConstraintsmaster+11 -1
operations/mediawiki-configmaster+0 -1
operations/mediawiki-configmaster+1 -0
operations/mediawiki-configmaster+1 -1
operations/mediawiki-configmaster+5 -0
operations/mediawiki-configmaster+4 -0
Show related patches Customize query in gerrit

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

So, the increase to 25% of edits triggering jobs seems to be adding between 2 and 2.5 jobs per second to the queue (estimation prior was 1.6-4.1).
Job concurrency is now at about 6.

Default low volume queue concurrency is 50 per https://github.com/wikimedia/mediawiki-services-change-propagation-jobqueue-deploy/blob/master/scap/vars.yaml#L101
So getting this to 100% without hitting the limit should be no problem.

I appears there is so no deduplication happening yet, probably down to the fact that we are only at 25% so edits to the same entities in rapid succession may not currently be getting jobs triggered, the queue is also staying empty and jobs are being processed in 1-2s.

The QPS on the wdqs-internal cluster has increased from 50ops to 150ops due to the increased number of constraint checks.

Regarding deduplication it may be worth adding a delay to the jobs using jobReleaseTimestamp from the mediawiki job definition to slightly delay the job execution giving more chance of deduplication from consecutive edits.

After checking with @Gehel the QPS on the wdqs is not too important, instead we should just be looking at the load.
Load didn't increase much so that is a good indication that we are fine to continue from the wdqs side.
If the wdqs is struggling it will return 429s and quality constraints will throttle, so that's all we need to keep an eye out for.

Default low volume queue concurrency is 50 per https://github.com/wikimedia/mediawiki-services-change-propagation-jobqueue-deploy/blob/master/scap/vars.yaml#L101

So getting this to 100% without hitting the limit should be no problem.

However, those 50 slots are shared between many jobs types. That's why we created https://gerrit.wikimedia.org/r/#/c/mediawiki/services/change-propagation/jobqueue-deploy/+/484773/ - to be deployed today.

Change 484773 merged by Ppchelko:
[mediawiki/services/change-propagation/jobqueue-deploy@master] Add the constraintsRunCheck job definition

https://gerrit.wikimedia.org/r/484773

Mentioned in SAL (#wikimedia-operations) [2019-01-22T17:29:29Z] <ppchelko@deploy1001> Started deploy [cpjobqueue/deploy@afca813]: Add the constraintsRunCheck job definition T204031

Mentioned in SAL (#wikimedia-operations) [2019-01-22T17:30:25Z] <ppchelko@deploy1001> Finished deploy [cpjobqueue/deploy@afca813]: Add the constraintsRunCheck job definition T204031 (duration: 00m 55s)

Change-Prop now has a separate rule for this job, so it doesn't interfere with low-traffic jobs. The concurrency is set to 30, which should be enough, however after deploying 100% we need to monitor the lag and increase concurrency if we see the job is steadily lagging.

However, those 50 slots are shared between many jobs types. That's why we created https://gerrit.wikimedia.org/r/#/c/mediawiki/services/change-propagation/jobqueue-deploy/+/484773/ - to be deployed today.

Aaah, now that I didn't know!

Change-Prop now has a separate rule for this job, so it doesn't interfere with low-traffic jobs. The concurrency is set to 30, which should be enough, however after deploying 100% we need to monitor the lag and increase concurrency if we see the job is steadily lagging.

Great, I'll look at increasing from 25% to 50% at some point today (or tomorrow)

So checking back at how this is affected WDQS, we started getting throttled queries.

So we will pause and assess the situation and see if we can reduce the number of queries.

It looks like the rate of throttled requests might have got slightly better in the past 2 days

Going to move this to stalled on the campsite board for now as we can't move forward with this directly.

Going to un assign myself from this for now but leave it in the camp board stalled column, we need to put some more tickets through the camp before we can turn this up a bit more.

Addshore changed the task status from Open to Stalled.Jun 22 2019, 10:48 PM

And marking as stalled, as this is obviously currently stalled.

Change 471001 abandoned by Lucas Werkmeister (WMDE):
wgWBQualityConstraintsCacheCheckConstraintsResults true on testwikidata

Reason:
no longer needed since Ie3f04197d6 (cache constraint checks by default – explicit wikidatawiki config was removed in I01b56209a3)

https://gerrit.wikimedia.org/r/471001

Figure out the status and what/when to do next

So, constraint checks are currently running after 25% of edits.
The goal of course being 100%.
At 25% we started being rate limited with some of our queries from the constraint checks to the wdqs cluster thus stopped the increase.

One ticket that could help us here is T176312

Also it seems that T214362 is slightly stalled again, probably waiting on T227776, which is also ultimately needed for us having a stream of these results to be fed back into wdqs.

Looking at https://grafana.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&var-cluster_name=wdqs-internal&from=now-90d&to=now im not seeing any throttled or banned requests for the internal cluster in the last 3 months.
Maybe the work on improving the updater etc means we might be able to bump this up further?

This month T240884: RFC: How to evaluate user-provided regular expressions was also discussed and we will try to implement a regex checking service so that we can complete T176312: Don’t check format constraint via SPARQL (safely evaluating user-provided regular expressions)

Will ping the query service team before we thinking about increasing this at all.

Change 484633 had a related patch set uploaded (by Jforrester; owner: Addshore):
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 50% of edits

https://gerrit.wikimedia.org/r/484633

Change 484635 had a related patch set uploaded (by Jforrester; owner: Addshore):
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 100% of edits

https://gerrit.wikimedia.org/r/484635

Addshore changed the task status from Stalled to Open.Feb 17 2020, 8:28 AM
Addshore lowered the priority of this task from Medium to Low.

Why is this priority low? From the product side I'd put this to at least medium due to it being a prerequisite for regular constraints runs and querying.

Change 484633 abandoned by Addshore:
wikidata: post edit constraint jobs on 50% of edits

https://gerrit.wikimedia.org/r/484633

Change 484635 abandoned by Addshore:
wikidata: post edit constraint jobs on 100% of edits

https://gerrit.wikimedia.org/r/484635

Per Adam the next step could be 40% for Wikidata.org, in case we want to be extra cautious

Change 660774 had a related patch set uploaded (by Rosalie Perside (WMDE); owner: Rosalie Perside (WMDE)):
[operations/mediawiki-config@master] wikidata: post edit constrain jobs on 50% of edits

https://gerrit.wikimedia.org/r/660774

Change 660774 merged by jenkins-bot:
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 40% of edits

https://gerrit.wikimedia.org/r/660774

Mentioned in SAL (#wikimedia-operations) [2021-02-01T12:19:03Z] <lucaswerkmeister-wmde@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:660774|wikidata: post edit constraint jobs on 40% of edits (T204031)]] (duration: 01m 03s)

According to JobQueue EventBus, the insertion rate of this job went up a bit yesterday, but it’s not too extreme (cursor is on the approximate timestamp of the latest deployment):

I haven’t found any errors related to the job (constraintsRunCheck / CheckConstraintsJob) in Logstash, so everything seems to be fine so far.

Change 662967 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 50% of edits

https://gerrit.wikimedia.org/r/662967

Change 662967 merged by jenkins-bot:
[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 50% of edits

https://gerrit.wikimedia.org/r/662967

Mentioned in SAL (#wikimedia-operations) [2021-02-11T12:18:01Z] <lucaswerkmeister-wmde@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:662967|wikidata: post edit constraint jobs on 50% of edits (T204031)]] (up from 40%) (duration: 01m 08s)

Change 677928 had a related patch set uploaded (by Tonina Zhelyazkova; author: Tonina Zhelyazkova):

[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 60% of edits

https://gerrit.wikimedia.org/r/677928

Restricted Application added a subscriber: Zabe. Β· View Herald TranscriptApr 8 2021, 2:59 PM

Change 677928 merged by jenkins-bot:

[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 60% of edits

https://gerrit.wikimedia.org/r/677928

Mentioned in SAL (#wikimedia-operations) [2021-04-12T11:13:42Z] <lucaswerkmeister-wmde@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:677928|wikidata: post edit constraint jobs on 60% of edits (T204031)]] (duration: 01m 13s)

Looking good, I'll put it in waiting until camp feels ready for it to be back in TODO for the next jump

Change 682608 had a related patch set uploaded (by Tonina Zhelyazkova; author: Tonina Zhelyazkova):

[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 70% of edits

https://gerrit.wikimedia.org/r/682608

Change 682608 merged by jenkins-bot:

[operations/mediawiki-config@master] wikidata: post edit constraint jobs on 70% of edits

https://gerrit.wikimedia.org/r/682608

Mentioned in SAL (#wikimedia-operations) [2021-05-03T11:04:41Z] <urbanecm@deploy1002> Synchronized wmf-config/InitialiseSettings.php: f1a5ef0116c77b86b1abfb7bfa7d4ed363c69f61: wikidata: post edit constraint jobs on 70% of edits (T204031) (duration: 00m 57s)