Page MenuHomePhabricator

Dumps generation without prefetch cause disruption to the production environment
Open, In Progress, HighPublic

Description

Today, June 20th 2024 we got paged because of MariaDB Replica lag on the s1 cluster, on db1206.

The configured threshold for paging is 300 seconds and the actual value was 300.70 seconds. Ever so slightly above the limit.

Shortly after the issue resolved itself without manual intervention.

first investigation by DBA showed:

18:50 < Amir1> db1206 starts lagging a lot when dumps start, something in the query is either wrong or it bombard the replica. Either way, it needs to be investigated.
18:55 < Amir1> > | 236130644 | wikiadmin2023   | 10.64.0.157:37742    | enwiki | Query     |       1 | Creating sort index                                    | SELECT /* WikiExporter::dumpPages  */  /*!  .. STRAIGHT_JOIN */ re

Notification Type: PROBLEM

Service: MariaDB Replica Lag: s1 #page
Host: db1206 #page
Address: 10.64.16.89
State: CRITICAL

Date/Time: Thu Jun 20 18:15:09 UTC 2024

Notes URLs: https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica

Acknowledged by :

Additional Info:

CRITICAL slave_sql_lag Replication lag: 300.70 seconds


The query:

SELECT /* WikiExporter::dumpPages  */  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_actor,actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model  FROM `revision` JOIN `page` ON ((rev_page=page_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = rev_actor)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = rev_comment_id)) JOIN `slots` ON 
20:58 ((slot_revision_id = rev_id)) JOIN `content` ON ((content_id = slot_content_id))   WHERE (page_id >= 673734 AND page_id < 673816) AND ((rev_page > 0 OR (rev_page = 0 AND rev_id > 0)))  ORDER BY rev_page ASC,rev_id ASC LIMIT 50000

In addition, dumps generation for english wikipedia also caused network saturation in eqiad:

 <Amir1>	yup it's dumps: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=snapshot1012&var-datasource=thanos&var-cluster=dumps&from=1719089686711&to=1719101545535
<Amir1>	it seems to be only snapshot1012 and that host has enwiki dump running

that had severe consequences including a full outage for editing that persisted for more than half an hour.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I think actually @Ladsgroup's proposal seems like the easier solution on the short term. @xcollazo do you see any complication with reducing parallelism of the enwiki dumps?

enwiki finished dumping at 2024-06-22 19:52:29, which doesn't correlate with the network event, which started around 2024-06-22 22:46:00.

However snapshot1012 is currently dumping commonswiki, so I suspect this was the one running when the event happened.

Additionally we are both running the 20240620 (i.e partial dump) as well as the 20240601 (i.e. full dump) of wikidatawiki. The full dump of wikidatawiki did had a network spike around the event time.

All these backed up runs are unfortunate, so sorry for the outage folks.

@xcollazo do you see any complication with reducing parallelism of the enwiki dumps?

enwiki is now done, but I will monitor the run of commonswiki while I look whether we can temporarily lower the parallelism.

That would fit with the observed outage of the S4 DBs (which host commons) at the start of the incident.

Change #1049182 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] mediawiki: Introduce rsyslog udp2log rate limiting

https://gerrit.wikimedia.org/r/1049182

Ok, I want to revise my previous assessment.

Nominal network usage with bursts up to ~90 MB/s are common from snapshot10* servers. This can be seen on plots over last 30 days:

snapshot1010: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=snapshot1010&var-datasource=thanos&var-cluster=dumps&from=1716652461762&to=1719244461762&viewPanel=8
snapshot1011: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=snapshot1011&var-datasource=thanos&var-cluster=dumps&from=1716652461762&to=1719244461762&viewPanel=8
snapshot1012: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=snapshot1012&var-datasource=thanos&var-cluster=dumps&from=1716652461762&to=1719244461762&viewPanel=8
snapshot1013: https://grafana.wikimedia.org/d/000000377/host-overview?orgId=1&var-server=snapshot1013&var-datasource=thanos&var-cluster=dumps&from=1716652461762&to=1719244461762&viewPanel=8

After observing current commonswiki dump, I can see that there are ~50 MB/s bursts attributable to MD5 and SHA1 calculations against the NFS share that hosts the dumps artifacts. Similarly, every 4 hours the rsync process that copies the artifacts to the cluddumps servers kicks in, which will also cause high network usage for a relatively short time span.

So it seems from a network usage perspective the dumps are working normally.

Coming back to:

I think actually @Ladsgroup's proposal seems like the easier solution on the short term. @xcollazo do you see any complication with reducing parallelism of the enwiki dumps?

After perusing code, the only existing mechanism to reduce parallelism in the current dumps infrastructure seems to be slots mechanism. For each snapshot server, we can control how many concurrent jobs are allowed by modifying this variable. The current maxslots is set to 28 slots per host in the profile::dumps::generation::worker::dumper::maxjobs puppet config.

However, this config is global, meaning all jobs from the dumps will be affected. Dumping already takes around ~18 days to finish a full dump, and around ~8 days to finish a partial dump, which leaves us with just a couple more days in the month to do reruns or restarts when needed.

I want to offer a solution though, in next comment.

Ok after observing the dumps for a bit now, here are some conclusions:

  • I still believe the extra pressure against the databases is attributable to not having prefetch, as discussed in T368098#9913625.
  • I do not think network usage bursts coming from the snapshot servers are an indication of a bug, as discussed in T368098#9918320.
  • There is an existing mechanism to control parallelism in dumps, called slots (unrelated to MCR slots), however this is currently a global setting that I would rather not lower. More discussion in T368098#9918849.

The hypothesis I have is that we will not see this issue once we are done with this particular run of the dumps that doesn't have prefetch. And in fact, up until now in my tenure as (reluctant) dumps maintainer, this is the first time I see a run affecting a database so heavily.

Here is a plan of action, considering this is an unbreak now:

  1. I will continue monitoring the remaining two wikis related to the offending 20240620 run: commonswiki and wikidatawiki.
  1. In the event that we have another SRE being paged, I will kill the remaining 20240620 dumps runs.
  1. In a couple days, we will start the next run of the dumps, namely the 20240701 run. I do not expect that run to be problematic, as it will leverage prefetching. However, in the event that we do see similar issues, I will work with @BTullis to lower the dumps slots, and/or pause the dumps as necessary.

Let me know if this sounds reasonable.

Mentioned in SAL (#wikimedia-operations) [2024-06-24T20:21:44Z] <mutante> snapsho1017 - systemctl mask commonsrdf-dump ; systemctl mask commonsjson-dump T368098

This happened today again, starting at 20:05 and resulted in basically the same issues we observed over the weekend and in the previous incident. I have not identified the query and I leave that to people who know better, but the steps we took to resolve this were:

taavi@snapshot1017 ~ $ sudo systemctl stop commons*.service
<mutante>	snapsho1017 - systemctl mask commonsrdf-dump ; systemctl mask commonsjson-dump T368098

(Please note: Puppet is still disabled there so that it doesn't start the service.)

And then subsequently, a patch by @Ladsgroup to reduce the log severity https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1049261.

This took basically everything down. Wikis returned errors, Phabricator down, toolforge down. Large user impact. We had to interrupt the dumps and ensure they don't start again this time.

The working theory is that DBs get overwhelmed by the expensive query and then MW starts _massively_ logging which saturates the interface on mwlog*.

I just spot-checked some of the runPrimaryTransactionIdleCallbacks logs that made it through the 99% throttle in logstash during the "precursor" event starting shortly after 19:30 UTC (logs during the "main" event starting shortly after 20:00 are pretty incomplete, at least at the moment).

At least during that earlier event, this looks quite similar to what we saw T365655#9823954: every stack trace I looked at was handling EmergencyTimeoutException by way of MWExceptionHandler::rollbackPrimaryChangesAndLog.

That first event is also clearly visible in the traffic graphs for cr1-eqiad <> asw2-c-eqiad (i.e., toward mwlog1002), but much smaller than the later event - ~ 10 Gbps / no out-discards vs. nearly 40 Gbps / 520k pps out-discards.

Edit: sflow data that shows both events is now available: https://w.wiki/AUec (aggregate volume toward mwlog1002).

Ok after observing the dumps for a bit now, here are some conclusions:

  • I still believe the extra pressure against the databases is attributable to not having prefetch, as discussed in T368098#9913625.
  • I do not think network usage bursts coming from the snapshot servers are an indication of a bug, as discussed in T368098#9918320.
  • There is an existing mechanism to control parallelism in dumps, called slots (unrelated to MCR slots), however this is currently a global setting that I would rather not lower. More discussion in T368098#9918849.

The hypothesis I have is that we will not see this issue once we are done with this particular run of the dumps that doesn't have prefetch. And in fact, up until now in my tenure as (reluctant) dumps maintainer, this is the first time I see a run affecting a database so heavily.

Here is a plan of action, considering this is an unbreak now:

  1. I will continue monitoring the remaining two wikis related to the offending 20240620 run: commonswiki and wikidatawiki.
  1. In the event that we have another SRE being paged, I will kill the remaining 20240620 dumps runs.
  1. In a couple days, we will start the next run of the dumps, namely the 20240701 run. I do not expect that run to be problematic, as it will leverage prefetching. However, in the event that we do see similar issues, I will work with @BTullis to lower the dumps slots, and/or pause the dumps as necessary.

Let me know if this sounds reasonable.

Overall the plan sounds reasonable, although we did cause another outage last night, and thus commons dumps are disabled at the moment. Let's wait for the SRE mitigations to be in place before restarting it.

The trigger seems to be a duress imposed on s4:
https://grafana.wikimedia.org/d/000000278/mysql-aggregated?orgId=1&from=1719253892503&to=1719262567130&viewPanel=9&var-site=eqiad&var-group=core&var-shard=s4&var-role=All

image.png (836×1 px, 74 KB)

Which caused a general slow down, that led to mediawiki start to spit out 5 billion logs which congested the network and made everything slow which sustained the downward cycle.

I need to check what kind of query to commons caused this.


Some notes:
I think current dumps have logical errors that make them really inefficient: Let me give you an example. During the outage. Here is the hits to WAN cache during the peak of the commons dumper: https://grafana.wikimedia.org/d/2Zx07tGZz/wanobjectcache?orgId=1&from=1719088287268&to=1719102615017

image.png (780×1 px, 200 KB)

That's around 100M hit per minute. To comparison, commons only has 100M images in total. It basically goes through the whole commons in one minute for twenty minutes. Even if we assume it's going through all revisions of commons, it means it should be done dumping all of commons with all of its history by causing the outage. The fact that it's taking days or weeks is quite scary to me.

Another inefficiency brought up by @daniel is that it uses memcached for getting revisions of all pages. That fills memcached with entries that would never be accessed (see miss rate in the graph above) and reduces overall efficiency of our memcached by forcing eviction of useful entries.

The other note is that dumps are really big hammers, e.g. they are making queries with limit of 50,000. This is way too high. The highest we go in production is 10,000. Daniel tracked it down to T207628#4690924 but I think that needs revisiting. That benchmark was done in isolation and once on a small wiki. It is currently doing ten to fifteen of them at the same time for wikis like enwiki and commons. This large hammer wasn't a big issue until we actually had to change something. It's exposing this problem that was masked by tuning in other places. I highly encourage you to lower that limit to 10,000. It won't make it that slower (certainly not five times).

I will dig even deeper but I have to go afk right now.

Change #1049182 merged by jenkins-bot:

[operations/deployment-charts@master] mediawiki: Introduce rsyslog udp2log rate limiting

https://gerrit.wikimedia.org/r/1049182

Change #1049532 had a related patch set uploaded (by Clément Goubert; author: Clément Goubert):

[operations/deployment-charts@master] mw-on-k8s: Rate limit udp2log

https://gerrit.wikimedia.org/r/1049532

Change #1049532 merged by jenkins-bot:

[operations/deployment-charts@master] mw-on-k8s: Rate limit udp2log

https://gerrit.wikimedia.org/r/1049532

Mentioned in SAL (#wikimedia-operations) [2024-06-25T12:46:28Z] <cgoubert@deploy1002> Started scap: Deploy udp2log rate-limiting - T365655 - T368098

Mentioned in SAL (#wikimedia-operations) [2024-06-25T12:51:40Z] <cgoubert@deploy1002> Finished scap: Deploy udp2log rate-limiting - T365655 - T368098 (duration: 05m 49s)

Question, what went wrong with dump replica groups? I believe (or at least it used to be) that dumps only use databases in the dump replica group. Was it overloaded and it jump into other dbs? If yes, can that be prevented/help throttle the dumps?

Question, what went wrong with dump replica groups? I believe (or at least it used to be) that dumps only use databases in the dump replica group. Was it overloaded and it jump into other dbs? If yes, can that be prevented/help throttle the dumps?

We still have dumps group in each section but:

  • We have had many bugs that dumpers for at least a portion of the queries went to general group
  • Replicas in dump group also serve general traffic too (with lower weight usually) and a dump replica becoming slow can slow down a certain portion of normal requests as well. It should be a low percent but I assume combined with bugs in mw, it can trigger a flood.

...
That's around 100M hit per minute. To comparison, commons only has 100M images in total. It basically goes through the whole commons in one minute for twenty minutes. Even if we assume it's going through all revisions of commons, it means it should be done dumping all of commons with all of its history by causing the outage. The fact that it's taking days or weeks is quite scary to me.

Another inefficiency brought up by @daniel is that it uses memcached for getting revisions of all pages. That fills memcached with entries that would never be accessed (see miss rate in the graph above) and reduces overall efficiency of our memcached by forcing eviction of useful entries.

The current dump implementation is certainly not ideal, but at least from my side there are no cycles to modify its general behavior. It is in maintenance mode. We are working on a new implementation that will not be coupled to MediaWiki and thus avoid having a bespoke batch system hitting the same databases that production workloads run against.

...
The other note is that dumps are really big hammers, e.g. they are making queries with limit of 50,000. This is way too high. The highest we go in production is 10,000. Daniel tracked it down to T207628#4690924 but I think that needs revisiting. That benchmark was done in isolation and once on a small wiki. It is currently doing ten to fifteen of them at the same time for wikis like enwiki and commons. This large hammer wasn't a big issue until we actually had to change something. It's exposing this problem that was masked by tuning in other places. I highly encourage you to lower that limit to 10,000. It won't make it that slower (certainly not five times).

Thank you for finding this. I agree this should not make dumps significantly slower. I will go ahead and change it.

I wonder what changed though, because that 50k setting has not changed since Oct 2018, so ~6 years. I want to understand better why is it that the databases can't take the load from this particular run. In T368098#9919385 we disabled the commons RDF dumps, but that is not explained by my theory of prefetch, as commons RDF dumps does not rely on that mechanism.

...

  • Replicas in dump group also serve general traffic too

Can we modify this behavior? Do we need to couple dumps with general production traffic?

Change #1049617 had a related patch set uploaded (by Xcollazo; author: Xcollazo):

[mediawiki/core@master] On T207628, as part of performance tuning, WikiExporoter's BATCH_SIZE was set to 50000. There was no decisive benchmarking done to support that size. We have recently seen a lot of database pressure coming from this configuration.

https://gerrit.wikimedia.org/r/1049617

Change #1049623 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Dumps: suppress blob cache while generating dumps

https://gerrit.wikimedia.org/r/1049623

Change #1049617 merged by jenkins-bot:

[mediawiki/core@master] Modify WikiExporter's BATCH_SIZE from 50000 to 10000

https://gerrit.wikimedia.org/r/1049617

Change #1049982 had a related patch set uploaded (by Ladsgroup; author: Xcollazo):

[mediawiki/core@wmf/1.43.0-wmf.11] Modify WikiExporter's BATCH_SIZE from 50000 to 10000

https://gerrit.wikimedia.org/r/1049982

Change #1049984 had a related patch set uploaded (by Ladsgroup; author: Xcollazo):

[mediawiki/core@wmf/1.43.0-wmf.10] Modify WikiExporter's BATCH_SIZE from 50000 to 10000

https://gerrit.wikimedia.org/r/1049984

WDoranWMF lowered the priority of this task from Unbreak Now! to High.Wed, Jun 26, 4:19 PM

Change #1049984 merged by jenkins-bot:

[mediawiki/core@wmf/1.43.0-wmf.10] Modify WikiExporter's BATCH_SIZE from 50000 to 10000

https://gerrit.wikimedia.org/r/1049984

Change #1049982 merged by jenkins-bot:

[mediawiki/core@wmf/1.43.0-wmf.11] Modify WikiExporter's BATCH_SIZE from 50000 to 10000

https://gerrit.wikimedia.org/r/1049982

Mentioned in SAL (#wikimedia-operations) [2024-06-26T17:05:56Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:rMW1049982acf77|Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)]], [[gerrit:1049989|Skip failing ForeignResourceStructureTest (T362425)]], [[gerrit:1049988|Skip failing ForeignResourceStructureTest (T362425)]], [[gerrit:1049984|Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)]]

Mentioned in SAL (#wikimedia-operations) [2024-06-26T17:08:53Z] <ladsgroup@deploy1002> ladsgroup: Backport for [[gerrit:rMW1049982acf77|Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)]], [[gerrit:1049989|Skip failing ForeignResourceStructureTest (T362425)]], [[gerrit:1049988|Skip failing ForeignResourceStructureTest (T362425)]], [[gerrit:1049984|Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwd

Mentioned in SAL (#wikimedia-operations) [2024-06-26T17:14:49Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:rMW1049982acf77|Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)]], [[gerrit:1049989|Skip failing ForeignResourceStructureTest (T362425)]], [[gerrit:1049988|Skip failing ForeignResourceStructureTest (T362425)]], [[gerrit:1049984|Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)]] (duration: 08m 52s)

https://gerrit.wikimedia.org/r/1049617, which was pointed as the main issue on this ticket, has been merged and back ported to all prod instances.

The offending 20240620 runs for commonswiki and wikidatawiki have finished successfully.

I'd like to revert T368098#9919385 to get the Commons RDF/JSON dumps back in service. If there is no opposition, I will schedule time with @BTullis to do so early next week.

As someone involved in disabling the dumps services and puppet on snapshot1017 this sounds good to me. I like the part that it's early next week and not over the weekend.

I would like to monitor the databases when the dumps start. Please ping me when next XML or RDF dumps for any of enwiki, commonswiki or wikidatawiki start. Thanks!

Great, tentatively I've scheduled time with @BTullis on Wednesday Jul 3 to enable RDF/JSON dumps of commons.

More generally, the 20250701 full dump (i.e pages-meta-history) of all wikis is scheduled to start on Monday July 1 as well. But as mentioned before:

In a couple days, we will start the next run of the dumps, namely the 20240701 run. I do not expect that run to be problematic, as it will leverage prefetching. However, in the event that we do see similar issues, I will work with @BTullis to lower the dumps slots, and/or pause the dumps as necessary.

I would like to monitor the databases when the dumps start. Please ping me when next XML or RDF dumps for any of enwiki, commonswiki or wikidatawiki start. Thanks!

XML Full dumps of enwiki and wikidatawiki will both start on July 1 at 00:00 UTC. enwiki will run out of snapshot1012, wikidatawiki out of snapshot1011.

Today and yesterday we had another event of critical alarming firing for that reason

Today and yesterday we had another event of critical alarming firing for that reason

Was there a specific SQL statement associated with these events?

20240701 run update:

Most all wikis are now done with the "First-pass for page XML data dumps" (i.e. stub-meta-history) job. commonswiki, enwiki and wikidatawiki continue running that first pass.

I still intent to re-enable the Commons RDF/JSON dumps tomorrow around 1:30PM UTC. If you'd like me not to, please let me know.

CC @Ladsgroup

Today and yesterday we had another event of critical alarming firing for that reason

Was there a specific SQL statement associated with these events?

Yes, the usual dumps ones:

| 282749125 | wikiadmin2023   | 10.64.0.157:53238    | enwiki | Query     |       3 | Creating sort index                                    | SELECT /* WikiExporter::dumpPages  */  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_actor,actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model  FROM `revision` JOIN `page` ON ((rev_page=page_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = rev_actor)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = rev_comment_id)) JOIN `slots` ON ((slot_revision_id = rev_id)) JOIN `content` ON ((content_id = slot_content_id))   WHERE (page_id >= 10979258 AND page_id < 10979762) AND ((rev_page > 0 OR (rev_page = 0 AND rev_id > 0)))  ORDER BY rev_page ASC,rev_id ASC LIMIT 10000 |    0.000 |

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              
| 282749151 | wikiadmin2023   | 10.64.0.157:46466    | enwiki | Query     |       1 | Creating sort index                                    | SELECT /* WikiExporter::dumpPages  */  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_actor,actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model  FROM `revision` JOIN `page` ON ((rev_page=page_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = rev_actor)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = rev_comment_id)) JOIN `slots` ON ((slot_revision_id = rev_id)) JOIN `content` ON ((content_id = slot_content_id))   WHERE (page_id >= 22738801 AND page_id < 22739787) AND ((rev_page > 0 OR (rev_page = 0 AND rev_id > 0)))  ORDER BY rev_page ASC,rev_id ASC LIMIT 10000 |    0.000 |

| 282748763 | wikiadmin2023   | 10.64.0.157:36340    | enwiki | Query     |      13 | Creating sort index                                    | SELECT /* WikiExporter::dumpPages  */  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_actor,actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model  FROM `revision` JOIN `page` ON ((rev_page=page_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = rev_actor)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = rev_comment_id)) JOIN `slots` ON ((slot_revision_id = rev_id)) JOIN `content` ON ((content_id = slot_content_id))   WHERE (page_id >= 6505200 AND page_id < 6506201) AND ((rev_page > 0 OR (rev_page = 0 AND rev_id > 0)))  ORDER BY rev_page ASC,rev_id ASC LIMIT 10000 |    0.000 |
| 282748808 | wikiadmin2023   | 10.64.0.157:36358    | enwiki | Sleep     |      12 |                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
| 282748809 | wikiadmin2023   | 10.64.0.157:36370    | enwiki | Query     |      12 | Creating sort index                                    | SELECT /* WikiExporter::dumpPages  */  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_actor,actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model  FROM `revision` JOIN `page` ON ((rev_page=page_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = rev_actor)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = rev_comment_id)) JOIN `slots` ON ((slot_revision_id = rev_id)) JOIN `content` ON ((content_id = slot_content_id))   WHERE (page_id >= 4980972 AND page_id < 4981707) AND ((rev_page > 0 OR (rev_page = 0 AND rev_id > 0)))  ORDER BY rev_page ASC,rev_id ASC LIMIT 10000 |    0.000 |

 282746607 | wikiadmin2023   | 10.64.0.157:35144    | enwiki | Query     |       1 | Creating sort index                                    | SELECT /* WikiExporter::dumpPages  */  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_actor,actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model  FROM `revision` JOIN `page` ON ((rev_page=page_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = rev_actor)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = rev_comment_id)) JOIN `slots` ON ((slot_revision_id = rev_id)) JOIN `content` ON ((content_id = slot_content_id))   WHERE (page_id >= 8536201 AND page_id < 8536202) AND ((rev_page > 8536201 OR (rev_page = 8536201 AND rev_id > 269566474)))  ORDER BY rev_page ASC,rev_id ASC LIMIT 10000 |    0.000 |
| 282748666 | wikiadmin2023   | 10.64.0.157:36212    | enwiki | Sleep     |       0 |                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
| 282748667 | wikiadmin2023   | 10.64.0.157:36218    | enwiki | Query     |       0 | Creating sort index                                    | SELECT /* WikiExporter::dumpPages  */  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_actor,actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model  FROM `revision` JOIN `page` ON ((rev_page=page_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = rev_actor)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = rev_comment_id)) JOIN `slots` ON ((slot_revision_id = rev_id)) JOIN `content` ON ((content_id = slot_content_id))   WHERE (page_id >= 13509439 AND page_id < 13510775) AND ((rev_page > 13510300 OR (rev_page = 13510300 AND rev_id > 161409703)))  ORDER BY rev_page ASC,rev_id ASC LIMIT 10000 |    0.000 |
| 282748700 | wikiadmin2023   | 10.64.0.157:36256    | enwiki | Sleep     |       0 |                                                        | NULL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |    0.000 |
| 282748701 | wikiadmin2023   | 10.64.0.157:36270    | enwiki | Query     |       0 | Creating sort index                                    | SELECT /* WikiExporter::dumpPages  */  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_actor,actor_rev_user.actor_user AS `rev_user`,actor_rev_user.actor_name AS `rev_user_text`,rev_timestamp,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,comment_rev_comment.comment_text AS `rev_comment_text`,comment_rev_comment.comment_data AS `rev_comment_data`,comment_rev_comment.comment_id AS `rev_comment_cid`,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,slot_revision_id,slot_content_id,slot_origin,slot_role_id,content_size,content_sha1,content_address,content_model  FROM `revision` JOIN `page` ON ((rev_page=page_id)) JOIN `actor` `actor_rev_user` ON ((actor_rev_user.actor_id = rev_actor)) JOIN `comment` `comment_rev_comment` ON ((comment_rev_comment.comment_id = rev_comment_id)) JOIN `slots` ON ((slot_revision_id = rev_id)) JOIN `content` ON ((content_id = slot_content_id))   WHERE (page_id >= 25980219 AND page_id < 25981031) AND ((rev_page > 25980636 OR (rev_page = 25980636 AND rev_id > 476895481)))  ORDER BY rev_page ASC,rev_id ASC LIMIT 10000 |    0.000 |

The explain:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: range
possible_keys: PRIMARY,rev_actor_timestamp,rev_page_actor_timestamp,rev_page_timestamp
          key: rev_page_actor_timestamp
      key_len: 4
          ref: NULL
         rows: 19610
        Extra: Using index condition; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: enwiki.revision.rev_page
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: actor_rev_user
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: enwiki.revision.rev_actor
         rows: 1
        Extra: 
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: comment_rev_comment
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: enwiki.revision.rev_comment_id
         rows: 1
        Extra: 
*************************** 5. row ***************************
           id: 1
  select_type: SIMPLE

Might be that it's picking the wrong index and needs index hint to pick rev_page_timestamp instead of rev_page_actor_timestamp but it needs double checking

Also the fact that most of the time is being spent on "Creating sort index" says probably we should rethink the ORDER BY clause. The rev id is not a good order by column, rev_timestamp is usually better (but since rev_timestamp is string, maybe that'd be slower)

The prefetch has been done now so these are causing issue even with prefetch being accessible. Correct?

The prefetch has been done now so these are causing issue even with prefetch being accessible. Correct?

Correct.

Ok I am going to postpone re-enabling the Commons RDF/JSON dumps until we have a chance to discuss this issue better.

I played with the offending SQL statements from T368098#9947893 by simulation the reads that Dumps does in 1000 page increments. Ran each query 3 times on an idle Analytics enwiki replica.

Here are the numbers I got:

Original query:

10000 rows in set (2 min 12.108 sec)
10000 rows in set (2 min 12.710 sec)
10000 rows in set (1 min 32.748 sec)

Change (1) USE INDEX(rev_page_timestamp):

10000 rows in set (1 min 41.896 sec)
10000 rows in set (1 min 31.514 sec)
10000 rows in set (1 min 10.011 sec)

Change (2) ORDER BY rev_page ASC, rev_timestamp ASC:

10000 rows in set (7.030 sec)
10000 rows in set (7.387 sec)
10000 rows in set (7.202 sec)

Changing both (1) and (2):

10000 rows in set (6.587 sec)
10000 rows in set (6.590 sec)
10000 rows in set (6.656 sec)

Original plan:

+------+-------------+---------------------+--------+-------------------------------------------------------------------------+--------------------------+---------+--------------------------------+---------+---------------------------------------+
| id   | select_type | table               | type   | possible_keys                                                           | key                      | key_len | ref                            | rows    | Extra                                 |
+------+-------------+---------------------+--------+-------------------------------------------------------------------------+--------------------------+---------+--------------------------------+---------+---------------------------------------+
|    1 | SIMPLE      | revision            | range  | PRIMARY,rev_actor_timestamp,rev_page_actor_timestamp,rev_page_timestamp | rev_page_actor_timestamp | 4       | NULL                           | 1590902 | Using index condition; Using filesort |
|    1 | SIMPLE      | page                | eq_ref | PRIMARY                                                                 | PRIMARY                  | 4       | enwiki.revision.rev_page       | 1       |                                       |
|    1 | SIMPLE      | actor_rev_user      | eq_ref | PRIMARY                                                                 | PRIMARY                  | 8       | enwiki.revision.rev_actor      | 1       |                                       |
|    1 | SIMPLE      | comment_rev_comment | eq_ref | PRIMARY                                                                 | PRIMARY                  | 8       | enwiki.revision.rev_comment_id | 1       |                                       |
|    1 | SIMPLE      | slots               | ref    | PRIMARY,slot_revision_origin_role                                       | PRIMARY                  | 8       | enwiki.revision.rev_id         | 1       | Using where                           |
|    1 | SIMPLE      | content             | eq_ref | PRIMARY                                                                 | PRIMARY                  | 8       | enwiki.slots.slot_content_id   | 1       |                                       |
+------+-------------+---------------------+--------+-------------------------------------------------------------------------+--------------------------+---------+--------------------------------+---------+---------------------------------------+
6 rows in set (0.004 sec)

How the plan looks with changes (1) and (2):

+------+-------------+---------------------+--------+-----------------------------------+--------------------+---------+--------------------------------+---------+-----------------------+
| id   | select_type | table               | type   | possible_keys                     | key                | key_len | ref                            | rows    | Extra                 |
+------+-------------+---------------------+--------+-----------------------------------+--------------------+---------+--------------------------------+---------+-----------------------+
|    1 | SIMPLE      | revision            | range  | rev_page_timestamp                | rev_page_timestamp | 4       | NULL                           | 1653850 | Using index condition |
|    1 | SIMPLE      | page                | eq_ref | PRIMARY                           | PRIMARY            | 4       | enwiki.revision.rev_page       | 1       |                       |
|    1 | SIMPLE      | actor_rev_user      | eq_ref | PRIMARY                           | PRIMARY            | 8       | enwiki.revision.rev_actor      | 1       |                       |
|    1 | SIMPLE      | comment_rev_comment | eq_ref | PRIMARY                           | PRIMARY            | 8       | enwiki.revision.rev_comment_id | 1       |                       |
|    1 | SIMPLE      | slots               | ref    | PRIMARY,slot_revision_origin_role | PRIMARY            | 8       | enwiki.revision.rev_id         | 1       | Using where           |
|    1 | SIMPLE      | content             | eq_ref | PRIMARY                           | PRIMARY            | 8       | enwiki.slots.slot_content_id   | 1       |                       |
+------+-------------+---------------------+--------+-----------------------------------+--------------------+---------+--------------------------------+---------+-----------------------+
6 rows in set (0.004 sec)

So the biggest win here is avoiding that filesort when using rev_timestamp ASC instead of rev_id ASC as suggested by @Ladsgroup.

This is great, but I am now going to inspect the code, check whether the semantics are not changed given the ordering change.

In T29112: Select of revisions for stub history files does not explicitly order revisions they modified the code to ORDER BY page_id ASC, rev_id ASC explicitly instead of just ORDER BY page_id ASC.

The rationale from that ticket is that 1) we'd like to have dumps where the order of the entries doesn't change between dumps and 2) they thought this ordering was necessary for the prefetching mechanism to work correctly (T29112#313040).

Looking at prefetch code, I concur that the way prefetch works today requires ordering by rev_id. Here is an example:

ORDER BY page_id ASC, rev_id ASC:

+----------+----------+----------------+
| rev_id   | rev_page | rev_timestamp  |
+----------+----------+----------------+
| 15917023 |    20002 | 20020225154311 |
| 67754089 |    20002 | 20060805022803 |
|    95189 |    20003 | 20020225155115 |
|    95191 |    20003 | 20020615034657 |
|  2316646 |    20003 | 20020615034917 |
|  2321294 |    20003 | 20040206113455 |
|  3823213 |    20003 | 20040206222306 |
|  3823219 |    20003 | 20040601134621 |
|  4057845 |    20003 | 20040601134713 |
|  6598000 |    20003 | 20040612204203 |
+----------+----------+----------------+
10 rows in set (4.989 sec)

ORDER BY page_id ASC, rev_timestamp ASC:

+-----------+----------+----------------+
| rev_id    | rev_page | rev_timestamp  |
+-----------+----------+----------------+
|  15917023 |    20002 | 20020225154311 |
|  67754089 |    20002 | 20060805022803 |
| 621927320 |    20003 | 20010925011251 | <<<<<<<<<<<<<<<
|     95189 |    20003 | 20020225155115 |
|     95191 |    20003 | 20020615034657 |
|   2316646 |    20003 | 20020615034917 |
|   2321294 |    20003 | 20040206113455 |
|   3823213 |    20003 | 20040206222306 |
|   3823219 |    20003 | 20040601134621 |
|   4057845 |    20003 | 20040601134713 |
+-----------+----------+----------------+

That single row change would make the current prefetch() code miss most all the revisions from rev_page = 20003, because the code has to walk the prefetched XML all the way to rev_id= 621927320 and it can't walk it back. Changing this behavior would require prefetch() surgery to make it work via rev_timestamp instead, and also we'd have to suffer thru another month or two of no prefetch available. Not fun.

It seems, however, that in T29112, they were able to make this query (well the version of this query from 13 years ago) execute without the filesort. If anyone can see what I am missing from the T29112 context to make this so, please LMK.

That single row change would make the current prefetch() code miss most all the revisions from rev_page = 20003, because the code has to walk the prefetched XML all the way to rev_id= 621927320 and it can't walk it back. Changing this behavior would require prefetch() surgery to make it work via rev_timestamp instead, and also we'd have to suffer thru another month or two of no prefetch available. Not fun.

Here is the thing: rev_timestamp is the better way to order revisions. Because when a page is deleted, rev_id is removed but when it's undeleted it gets reinserted with the correct timestamp but newer rev_id. So a proper dump should order them based on timestamp not rev_id. I wonder if there is a way to keep missing prefetches to a minimum.

It seems, however, that in T29112, they were able to make this query (well the version of this query from 13 years ago) execute without the filesort. If anyone can see what I am missing from the T29112 context to make this so, please LMK.

Table stats can be out of sync, newer version of maraidb having bugs, we probably changed the schema of revision table since then, etc.

It seems, however, that in T29112, they were able to make this query (well the version of this query from 13 years ago) execute without the filesort. If anyone can see what I am missing from the T29112 context to make this so, please LMK.

Thirteen years ago is an eternity in terms of how the MariaDB optimizer behaves, not to mention the changes in InnoDB and table statistics over time. Even between minor versions, there can be differences in the query plans the optimizer decides to choose based on its own statistics. We've filed many bugs with MariaDB regarding this, but it is very hard to control - probably even for them - due to the vast number of possibilities, mostly because it depends on the table statistics.

Sometimes these issues can be addressed by refreshing the statistics, but this is not a guarantee that they won't arise again (they most likely will)

Folks today I found snapshot1017 with puppet disable for more than a week, and a generic outage message. I run puppet again since the host needed to get updates, and then I stopped all the timers and services with "dumps" in its title.

Let's try to figure out what state the host should be in, it is fine to keep it stopped for a while but we should puppetize the state. Lemme know :)

I don't think we should be leaving a host with puppet disabled for such a long time, but instead disable the jobs we consider problematic or still being under investigation.
Thanks Luca!

Change #1052752 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Puppetize the disabling of the misc dumps on snapshot1017

https://gerrit.wikimedia.org/r/1052752

Folks today I found snapshot1017 with puppet disable for more than a week, and a generic outage message. I run puppet again since the host needed to get updates, and then I stopped all the timers and services with "dumps" in its title.

Let's try to figure out what state the host should be in, it is fine to keep it stopped for a while but we should puppetize the state. Lemme know :)

I don't think we should be leaving a host with puppet disabled for such a long time, but instead disable the jobs we consider problematic or still being under investigation.
Thanks Luca!

This happened as an emergency measure back in T368098#9919385.

I intend to revert it this Wednesday at ~1:30PM UTC jointly with @BTullis.

Change #1052752 merged by Btullis:

[operations/puppet@production] Puppetize the disabling of the misc dumps on snapshot1017

https://gerrit.wikimedia.org/r/1052752

That single row change would make the current prefetch() code miss most all the revisions from rev_page = 20003, because the code has to walk the prefetched XML all the way to rev_id= 621927320 and it can't walk it back. Changing this behavior would require prefetch() surgery to make it work via rev_timestamp instead, and also we'd have to suffer thru another month or two of no prefetch available. Not fun.

Here is the thing: rev_timestamp is the better way to order revisions. Because when a page is deleted, rev_id is removed but when it's undeleted it gets reinserted with the correct timestamp but newer rev_id. So a proper dump should order them based on timestamp not rev_id. I wonder if there is a way to keep missing prefetches to a minimum.

Agreed rev_timestamp is best, both for correctness like you mention and for performance as per analysis above. The issue is the surgery required in prefetch() and its callers. I would rather invest time in continuing Dumps 2.0 than to redesign this prefetch mechanism.

Change #1053315 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Revert "Puppetize the disabling of the misc dumps on snapshot1017"

https://gerrit.wikimedia.org/r/1053315

Change #1053315 merged by Btullis:

[operations/puppet@production] Revert "Puppetize the disabling of the misc dumps on snapshot1017"

https://gerrit.wikimedia.org/r/1053315

These timers were inactive and are going to be activated by this change:

n/a                         n/a                 Mon 2024-07-08 05:00:00 UTC 2 days ago         categoriesrdf-dump-daily.timer                    categoriesrdf-dump-daily.service
n/a                         n/a                 Sat 2024-07-06 20:00:00 UTC 3 days ago         categoriesrdf-dump.timer                          categoriesrdf-dump.service
n/a                         n/a                 Mon 2024-07-01 16:15:00 UTC 1 weeks 1 days ago cirrussearch-dump-s1.timer                        cirrussearch-dump-s1.service
n/a                         n/a                 Mon 2024-07-01 16:15:00 UTC 1 weeks 1 days ago cirrussearch-dump-s11.timer                       cirrussearch-dump-s11.service
n/a                         n/a                 Mon 2024-07-01 16:15:00 UTC 1 weeks 1 days ago cirrussearch-dump-s2.timer                        cirrussearch-dump-s2.service
n/a                         n/a                 Mon 2024-07-01 16:15:00 UTC 1 weeks 1 days ago cirrussearch-dump-s3.timer                        cirrussearch-dump-s3.service
n/a                         n/a                 Mon 2024-07-01 16:15:00 UTC 1 weeks 1 days ago cirrussearch-dump-s4.timer                        cirrussearch-dump-s4.service
n/a                         n/a                 Mon 2024-07-01 16:15:00 UTC 1 weeks 1 days ago cirrussearch-dump-s5.timer                        cirrussearch-dump-s5.service
n/a                         n/a                 Mon 2024-07-01 16:15:00 UTC 1 weeks 1 days ago cirrussearch-dump-s6.timer                        cirrussearch-dump-s6.service
n/a                         n/a                 Mon 2024-07-01 16:15:00 UTC 1 weeks 1 days ago cirrussearch-dump-s7.timer                        cirrussearch-dump-s7.service
n/a                         n/a                 Mon 2024-07-01 16:15:00 UTC 1 weeks 1 days ago cirrussearch-dump-s8.timer                        cirrussearch-dump-s8.service

This is the natural schedule of when the dumps that were disabled will run next.

btullis@snapshot1017:~$ systemctl list-timers | grep n/a
Wed 2024-07-10 23:00:00 UTC 9h left             n/a                         n/a                wikidatardf-truthy-dumps.timer                    wikidatardf-truthy-dumps.service
Thu 2024-07-11 05:00:00 UTC 15h left            n/a                         n/a                categoriesrdf-dump-daily.timer                    categoriesrdf-dump-daily.service
Fri 2024-07-12 09:10:00 UTC 1 day 19h left      n/a                         n/a                xlation-dumps.timer                               xlation-dumps.service
Fri 2024-07-12 23:00:00 UTC 2 days left         n/a                         n/a                wikidatardf-lexemes-dumps.timer                   wikidatardf-lexemes-dumps.service
Sat 2024-07-13 08:15:00 UTC 2 days left         n/a                         n/a                global_blocks_dump.timer                          global_blocks_dump.service
Sat 2024-07-13 08:15:00 UTC 2 days left         n/a                         n/a                growth_mentorship_dump.timer                      growth_mentorship_dump.service
Sat 2024-07-13 20:00:00 UTC 3 days left         n/a                         n/a                categoriesrdf-dump.timer                          categoriesrdf-dump.service
Sun 2024-07-14 19:00:00 UTC 4 days left         n/a                         n/a                commonsrdf-dump.timer                             commonsrdf-dump.service
Mon 2024-07-15 03:15:00 UTC 4 days left         n/a                         n/a                commonsjson-dump.timer                            commonsjson-dump.service
Mon 2024-07-15 03:15:00 UTC 4 days left         n/a                         n/a                wikidatajson-dump.timer                           wikidatajson-dump.service
Mon 2024-07-15 16:15:00 UTC 5 days left         n/a                         n/a                cirrussearch-dump-s1.timer                        cirrussearch-dump-s1.service
Mon 2024-07-15 16:15:00 UTC 5 days left         n/a                         n/a                cirrussearch-dump-s11.timer                       cirrussearch-dump-s11.service
Mon 2024-07-15 16:15:00 UTC 5 days left         n/a                         n/a                cirrussearch-dump-s2.timer                        cirrussearch-dump-s2.service
Mon 2024-07-15 16:15:00 UTC 5 days left         n/a                         n/a                cirrussearch-dump-s3.timer                        cirrussearch-dump-s3.service
Mon 2024-07-15 16:15:00 UTC 5 days left         n/a                         n/a                cirrussearch-dump-s4.timer                        cirrussearch-dump-s4.service
Mon 2024-07-15 16:15:00 UTC 5 days left         n/a                         n/a                cirrussearch-dump-s5.timer                        cirrussearch-dump-s5.service
Mon 2024-07-15 16:15:00 UTC 5 days left         n/a                         n/a                cirrussearch-dump-s6.timer                        cirrussearch-dump-s6.service
Mon 2024-07-15 16:15:00 UTC 5 days left         n/a                         n/a                cirrussearch-dump-s7.timer                        cirrussearch-dump-s7.service
Mon 2024-07-15 16:15:00 UTC 5 days left         n/a                         n/a                cirrussearch-dump-s8.timer                        cirrussearch-dump-s8.service
Mon 2024-07-15 23:00:00 UTC 5 days left         n/a                         n/a                wikidatardf-all-dumps.timer                       wikidatardf-all-dumps.service
Wed 2024-07-17 03:15:00 UTC 6 days left         n/a                         n/a                wikidatajson-lexemes-dump.timer                   wikidatajson-lexemes-dump.service

We could see if any of them should be triggered manually, or we could just wait for them to start by themselves.