Page MenuHomePhabricator

Larger Wikidata entity dump queries
Closed, ResolvedPublic

Description

Are queries like the following wikidatawiki ok (run from a maintenance script, not in a web request):

SELECT page_id,page_title  FROM `page` LEFT JOIN `redirect` ON ((page_id = rd_from))   WHERE (page_id > 86) AND page_namespace IN ('0','120')  AND (rd_from IS NULL)  ORDER BY page_id ASC LIMIT 2500

(this used to run with LIMIT 100)

SELECT rev_id,rev_content_format,rev_timestamp,page_latest,page_is_redirect,old_id,old_text,old_flags,page_title  FROM `page` INNER JOIN `revision` ON ((page_latest=rev_id)) INNER JOIN `text` ON ((old_id=rev_text_id))   WHERE (('Q2'=page_title) AND (0=page_namespace)) OR (('P2'=page_title) AND (120=page_namespace)) …

(with 500 such page_title/ page_namespace pairs; this used to run with 20-25 such pairs)

I tested them yesterday and they look ok to me, but @ArielGlenn advised to better be safe and ask.

There will be up to five instances running which occasionally fire of these queries (but fewer than now, as the result sets are larger).

Event Timeline

hoo created this task.Sep 26 2017, 3:23 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 26 2017, 3:23 PM
hoo updated the task description. (Show Details)Sep 26 2017, 3:29 PM

This is in the context of the wikidatawiki weekly dumps job (json and other formats), which runs separately from the usual xml/sql dumps. I expect that it mostly runs on the vslow/dumps db server.

hoo added a comment.Sep 26 2017, 3:37 PM

I expect that it mostly runs on the vslow/dumps db server.

Sadly not, due to T147169: Make sure Wikibase dump maintenance scripts solely use the "dump" db group :/

jcrespo moved this task from Triage to Next on the DBA board.Sep 26 2017, 3:55 PM

A quick run of those two queries on the slowest slaves of s5 doesn't look too dangerous to me (as in they are pretty fast).
I will check a bit more tomorrow

Marostegui closed this task as Resolved.Sep 27 2017, 8:52 AM
Marostegui claimed this task.
Marostegui triaged this task as Normal priority.

So, those two queries are pretty fast (0.01 sec) in the slowest slave of wikdata, if you even mention there will be even less queries. That should be fine.
Both page_title and page_namespace are fast.
I would say it is fine to proceed.

hoo added a comment.Sep 27 2017, 8:52 AM

Great, thanks for checking!

Great, thanks for checking!

Let's keep an eye on the first iterations to check that they are indeed fine. Otherwise, let's revisit them

Changeset merged and deployed, I guess we'll see the impact next Monday, let's have it in mind then.

Changeset merged and deployed, I guess we'll see the impact next Monday, let's have it in mind then.

Thanks for the heads up!