Page MenuHomePhabricator

Some Wikidata + MediaInfo dumps missing for week of 2024-07-08
Closed, ResolvedPublic

Description

Normally the dumps from last week should be available by now, but on https://dumps.wikimedia.org/other/wikibase/wikidatawiki/, the latest Wikidata “all” entities dumps are from 4 and 5 July (i.e. the week of 2024-07-01). “Lexeme” and “truthy” dumps are mostly available from last week (12 and 13 July), except JSON Lexeme dumps which are also from the week before (3 July).

https://dumps.wikimedia.org/other/wikibase/commonswiki/ is even more outdated, with the “latest” dumps all dating from mid-June 2024. Some later directories (e.g. 20240715) exist but are empty.

I haven’t found any relevant-looking errors in Logstash yet.

Previously (no idea if related): T366043: Some dumps are not available since mid may 2024

Event Timeline

Apparently the dumps were recently disabled (T368098)? And then more recently re-enabled. So I guess we just wait for the next dump run to kick off (in a couple of hours or days, depending on the dump) and hope they work again?

I can give a status update here, which I hope will be useful.

You are correct, the other dumps that run on snapshot1017 were stopped and subsequently disabled last moth because they were implicated (possibly tangentially) in a database related site outage.

Some of them were re-enabled on July 8th T368098#9959951 during this manual puppet run. The remainder were re-enabled on July 10th when this puppet patch was merged. T368098#9969342

The current state of them is like this:

btullis@snapshot1017:~$ systemctl list-timers --all $(for t in $(cat other-timers.txt); do echo $t.timer;done)
NEXT                        LEFT           LAST                        PASSED        UNIT                            ACTIVATES
Mon 2024-07-15 20:50:00 UTC 3h 52min left  Sun 2024-07-14 20:50:00 UTC 20h ago       adds-changes.timer              adds-changes.service
Mon 2024-07-15 23:00:00 UTC 6h left        n/a                         n/a           wikidatardf-all-dumps.timer     wikidatardf-all-dumps.service
Tue 2024-07-16 05:00:00 UTC 12h left       Mon 2024-07-15 05:00:00 UTC 11h ago       categoriesrdf-dump-daily.timer  categoriesrdf-dump-daily.service
Tue 2024-07-16 08:10:00 UTC 15h left       Mon 2024-07-15 08:10:00 UTC 8h ago        pagetitles-ns0.timer            pagetitles-ns0.service
Tue 2024-07-16 08:50:00 UTC 15h left       Mon 2024-07-15 08:50:00 UTC 8h ago        pagetitles-ns6.timer            pagetitles-ns6.service
Wed 2024-07-17 03:15:00 UTC 1 day 10h left n/a                         n/a           wikidatajson-lexemes-dump.timer wikidatajson-lexemes-dump.service
Wed 2024-07-17 23:00:00 UTC 2 days left    Wed 2024-07-10 23:00:00 UTC 4 days ago    wikidatardf-truthy-dumps.timer  wikidatardf-truthy-dumps.service
Fri 2024-07-19 09:10:00 UTC 3 days left    Fri 2024-07-12 09:10:00 UTC 3 days ago    xlation-dumps.timer             xlation-dumps.service
Fri 2024-07-19 23:00:00 UTC 4 days left    Fri 2024-07-12 23:00:00 UTC 2 days ago    wikidatardf-lexemes-dumps.timer wikidatardf-lexemes-dumps.service
Sat 2024-07-20 08:15:00 UTC 4 days left    Sat 2024-07-13 08:15:00 UTC 2 days ago    global_blocks_dump.timer        global_blocks_dump.service
Sat 2024-07-20 08:15:00 UTC 4 days left    Sat 2024-07-13 08:15:00 UTC 2 days ago    growth_mentorship_dump.timer    growth_mentorship_dump.service
Sat 2024-07-20 20:00:00 UTC 5 days left    Sat 2024-07-13 20:00:01 UTC 1 day 20h ago categoriesrdf-dump.timer        categoriesrdf-dump.service
Sun 2024-07-21 07:10:00 UTC 5 days left    Sun 2024-07-14 07:10:01 UTC 1 day 9h ago  list-media-per-project.timer    list-media-per-project.service
Mon 2024-07-22 08:05:00 UTC 6 days left    Mon 2024-07-15 08:05:00 UTC 8h ago        shorturls.timer                 shorturls.service
Mon 2024-07-22 16:15:00 UTC 6 days left    Mon 2024-07-15 16:15:00 UTC 42min ago     cirrussearch-dump-s11.timer     cirrussearch-dump-s11.service
n/a                         n/a            Mon 2024-07-15 16:15:00 UTC 42min ago     cirrussearch-dump-s1.timer      cirrussearch-dump-s1.service
n/a                         n/a            Mon 2024-07-15 16:15:00 UTC 42min ago     cirrussearch-dump-s2.timer      cirrussearch-dump-s2.service
n/a                         n/a            Mon 2024-07-15 16:15:00 UTC 42min ago     cirrussearch-dump-s3.timer      cirrussearch-dump-s3.service
n/a                         n/a            Mon 2024-07-15 16:15:00 UTC 42min ago     cirrussearch-dump-s4.timer      cirrussearch-dump-s4.service
n/a                         n/a            Mon 2024-07-15 16:15:00 UTC 42min ago     cirrussearch-dump-s5.timer      cirrussearch-dump-s5.service
n/a                         n/a            Mon 2024-07-15 16:15:00 UTC 42min ago     cirrussearch-dump-s6.timer      cirrussearch-dump-s6.service
n/a                         n/a            Mon 2024-07-15 16:15:00 UTC 42min ago     cirrussearch-dump-s7.timer      cirrussearch-dump-s7.service
n/a                         n/a            Mon 2024-07-15 16:15:00 UTC 42min ago     cirrussearch-dump-s8.timer      cirrussearch-dump-s8.service
n/a                         n/a            Mon 2024-07-15 03:15:00 UTC 13h ago       commonsjson-dump.timer          commonsjson-dump.service
n/a                         n/a            Sun 2024-07-14 19:00:01 UTC 21h ago       commonsrdf-dump.timer           commonsrdf-dump.service
n/a                         n/a            Mon 2024-07-15 03:15:00 UTC 13h ago       wikidatajson-dump.timer         wikidatajson-dump.service

26 timers listed.

So some of the larger dumps like commonsrdf-dump are running for the first time now. e.g.

btullis@snapshot1017:~$ systemctl status commonsrdf-dump.service
● commonsrdf-dump.service - Regular jobs to build rdf snapshot of commons structured data
     Loaded: loaded (/lib/systemd/system/commonsrdf-dump.service; static)
     Active: activating (start) since Sun 2024-07-14 19:00:01 UTC; 22h ago
TriggeredBy: ● commonsrdf-dump.timer
       Docs: https://wikitech.wikimedia.org/wiki/Monitoring/systemd_unit_state
   Main PID: 3647836 (systemd-timer-m)
      Tasks: 26 (limit: 76561)
     Memory: 8.7G
     CGroup: /system.slice/commonsrdf-dump.service
             ├─ 283248 /usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --shard 0 --sharding-factor 8 --batch-size 2000 --format ttl ->
             ├─ 283249 gzip -9
             ├─ 283263 /usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --shard 3 --sharding-factor 8 --batch-size 2000 --format ttl ->
             ├─ 283264 gzip -9
             ├─ 283382 /usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --shard 2 --sharding-factor 8 --batch-size 2000 --format ttl ->
             ├─ 283383 gzip -9
             ├─ 283439 /usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --shard 1 --sharding-factor 8 --batch-size 2000 --format ttl ->
             ├─ 283440 gzip -9
             ├─ 283865 /usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --shard 4 --sharding-factor 8 --batch-size 2000 --format ttl ->
             ├─ 283866 gzip -9
             ├─ 283947 /usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --shard 7 --sharding-factor 8 --batch-size 2000 --format ttl ->
             ├─ 283948 gzip -9
             ├─ 283964 /usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --shard 6 --sharding-factor 8 --batch-size 2000 --format ttl ->
             ├─ 283965 gzip -9
             ├─ 283977 /usr/bin/php7.4 /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --shard 5 --sharding-factor 8 --batch-size 2000 --format ttl ->
             ├─ 283978 gzip -9
             ├─3647836 /usr/bin/python3 /usr/local/bin/systemd-timer-mail-wrapper --subject commonsrdf-dump --mail-to root@snapshot1017.eqiad.wmnet --only-on-error /usr/local/bin/dumpwikibaserdf.sh -p commons ->
             ├─3647845 /bin/bash /usr/local/bin/dumpwikibaserdf.sh -p commons -d mediainfo -f ttl -e nt
             ├─3647878 /bin/bash /usr/local/bin/dumpwikibaserdf.sh -p commons -d mediainfo -f ttl -e nt
             ├─3647879 /bin/bash /usr/local/bin/dumpwikibaserdf.sh -p commons -d mediainfo -f ttl -e nt
             ├─3647880 /bin/bash /usr/local/bin/dumpwikibaserdf.sh -p commons -d mediainfo -f ttl -e nt
             ├─3647882 /bin/bash /usr/local/bin/dumpwikibaserdf.sh -p commons -d mediainfo -f ttl -e nt
             ├─3647884 /bin/bash /usr/local/bin/dumpwikibaserdf.sh -p commons -d mediainfo -f ttl -e nt
             ├─3647885 /bin/bash /usr/local/bin/dumpwikibaserdf.sh -p commons -d mediainfo -f ttl -e nt
             ├─3647888 /bin/bash /usr/local/bin/dumpwikibaserdf.sh -p commons -d mediainfo -f ttl -e nt
             └─3647889 /bin/bash /usr/local/bin/dumpwikibaserdf.sh -p commons -d mediainfo -f ttl -e nt

Jul 14 19:00:01 snapshot1017 systemd[1]: Starting Regular jobs to build rdf snapshot of commons structured data...

I hope that's of some help. We are working hard to get back to an even keel with these dumps, so apologies for any inconvenience that their being delayed may have caused you.