Page MenuHomePhabricator

Limit concurrency of DPL queries
Open, MediumPublic

Description

Follow-up for T262391 and it's non-public counterpart T262240.

In T262391#6449590 et seq, @Bawolff proposed to limit concurrency of DPL queries by two ways: a query cache (which makes DPL to not do the same work XY times) and a pool counter that makes DPL to not work at more than X things at once.

Both should make DPL somehow safer (but not faster), which is always a bonus.

Event Timeline

Urbanecm created this task.Sep 18 2020, 2:49 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 18 2020, 2:49 AM

Change 626796 had a related patch set uploaded (by Urbanecm; owner: Brian Wolff):
[mediawiki/extensions/intersection@master] Add an option to limit DPL concurrency with pool counter

https://gerrit.wikimedia.org/r/626796

Change 626268 had a related patch set uploaded (by Urbanecm; owner: Brian Wolff):
[mediawiki/extensions/intersection@master] Attempt to add a query cache to DPL

https://gerrit.wikimedia.org/r/626268

Peachey88 updated the task description. (Show Details)Sep 19 2020, 12:40 PM
Peachey88 added a subscriber: Bawolff.

More specificly its meant to prevent query pileups. The common case of a query pileup in DPL being a template with a dpl on it being used on tens of thousands of pages, all being refreshed at once, which the short cache will eliminate. The pool counter solution as a fall back should eliminate any other situation of DPL query pileup.

I have closed the non public task that tracked the initial impact of DPL on ruwikinews (T262240).
As far as I know the patches that @Bawolff put together are still pending to be merged (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/intersection/+/626268/ https://gerrit.wikimedia.org/r/c/mediawiki/extensions/intersection/+/626796/)
Can we get someone to review and merge them? I guess that'd be Platform Engineering or Performance-Team?

Marostegui triaged this task as High priority.Sep 29 2020, 7:53 AM

This seems to be happening again: T266775

jcrespo raised the priority of this task from High to Unbreak Now!.Thu, Oct 29, 12:00 PM
jcrespo added a subscriber: jcrespo.

Because this is actively causing outages for 800+ wikis on s3.

Should we (again) disable DPL somewhere?

Should we (again) disable DPL somewhere?

Maybe we can double check if reviewing, merging and deploying https://gerrit.wikimedia.org/r/626796 is realistically possible before the weekend? And maybe also https://gerrit.wikimedia.org/r/c/mediawiki/extensions/intersection/+/626268/ which I guess is more complex?

Platform Engineering can you bump this on your list of reviews priorities if possible?

Note. The former one depends on the latter one, so we would need to review 626268 first anyway :/.

To help with reviewing 626268, I enabled memcached caching as MainCacheType and set wgDLPQueryCacheTime to 30. Then, I created Category:A with handful of pages via for i in {0..45}; do echo -e "Testing page $i\n\n[[Category:A]]" | php /home/urbanecm/unsynced/gerrit/mediawiki/core/maintenance/edit.php --wiki=bwiki Test_page_$i --user Martin_Urbanec; done. Then, I set $wgDebugLogFile = "/var/www/wikis/logs/debug-{$wgDBname}.log"; to see all possible MediaWiki logs.

Then, I opened tail -f /var/www/wikis/logs/debug-bwiki.log | grep -i DBQuery | grep -i DynamicPageListHooks in my terminal, and opened Template:DPL_query for editing in my browser. Then, I pasted the following to the template:

<DynamicPageList>
category             = A
count                = 5
order                = descending
addfirstcategorydate = true
</DynamicPageList>

Then, I previewed it once, and confirmed that [DBQuery] DynamicPageListHooks::processQuery - Template:DPL_query [0s] localhost: SELECT page_namespace,page_title,c1.cl_timestamp FROM page` INNER JOIN categorylinks c1 ON ((page_id = c1.cl_from) AND (c1.cl_to='A')) WHERE page_is_redirect = 0 ORDER BY c1.cl_timestamp DESC LIMIT 5` appears only once in the logs I opened previously in my terminal. Then, I previewed it several times, and it didn't generate any more queries. Then, I waited 30 seconds (wgDLPQueryCacheTime on my machine), and it did query the DB again.

Marked with V+1 on Gerrit, as it seems to do what it is advertised to :).

Change 637499 had a related patch set uploaded (by Urbanecm; owner: Urbanecm):
[operations/mediawiki-config@master] Set wgDLPQueryCacheTime to 120 at all wikis

https://gerrit.wikimedia.org/r/637499

Change 626268 merged by jenkins-bot:
[mediawiki/extensions/intersection@master] Attempt to add a query cache to DPL

https://gerrit.wikimedia.org/r/626268

Change 637059 had a related patch set uploaded (by Urbanecm; owner: Brian Wolff):
[mediawiki/extensions/intersection@wmf/1.36.0-wmf.14] Attempt to add a query cache to DPL

https://gerrit.wikimedia.org/r/637059

Change 637499 merged by jenkins-bot:
[operations/mediawiki-config@master] Set wgDLPQueryCacheTime to 120 at all wikis

https://gerrit.wikimedia.org/r/637499

Mentioned in SAL (#wikimedia-operations) [2020-10-29T15:15:02Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: 19c5aff02c20812c56b8abdcc0ed530393010193: Set wgDLPQueryCacheTime to 120 at all wikis (T263220) (duration: 00m 59s)

Change 637059 merged by jenkins-bot:
[mediawiki/extensions/intersection@wmf/1.36.0-wmf.14] Attempt to add a query cache to DPL

https://gerrit.wikimedia.org/r/637059

Mentioned in SAL (#wikimedia-operations) [2020-10-29T15:22:42Z] <urbanecm@deploy1001> Synchronized php-1.36.0-wmf.14/extensions/intersection/: 483c3bceb926ac6a2cfc40112fb9b4f0671fef72: Attempt to add a query cache to DPL (T263220) (duration: 00m 58s)

jcrespo lowered the priority of this task from Unbreak Now! to Medium.Thu, Oct 29, 3:47 PM

This is no longer UBN, please feel free to consider it resolved/declined based on my comments on the subtask.

I think we should still review and merge https://gerrit.wikimedia.org/r/c/mediawiki/extensions/intersection/+/626796, so this should stay open IMO.

Change 626796 merged by jenkins-bot:
[mediawiki/extensions/intersection@master] Add an option to limit DPL concurrency with pool counter

https://gerrit.wikimedia.org/r/626796

Note: A config change is needed in wmf-config before the patch will take affect.

Note: A config change is needed in wmf-config before the patch will take affect.

Good point. Do you mind uploading a patch please? I can deploy in the nearest window :).