Page MenuHomePhabricator

No wikidata dumps this week (20250630)
Closed, ResolvedPublicBUG REPORT

Description

Hello,
In https://dumps.wikimedia.org/wikidatawiki/entities/ , this week directory (20250630) is empty.
Could you check why so that next week have a chance to be generated.

Thanks

Event Timeline

Aklapper changed the subtype of this task from "Task" to "Bug Report".Jul 5 2025, 4:05 PM
Aklapper subscribed.

@Melderick: Thanks for reporting this. For future reference, please use the bug report form (linked from the top of the task creation page) to create a bug report. Thanks!

This is on the Data Platform SRE team radar.

Hello,
Apologies for this. The dumps have been created, but they have been inadvertently published to the wrong location.
This is all related to our recent work on T352650: WE 5.4 KR - Hypothesis 5.4.4 - Q3 FY24/25 - Migrate current-generation dumps to run on kubernetes and the wikibase dumps were handled as part of T394389: Migrate the additional dump types from snapshot1016 to Airflow

I believe that the wrong path was configured here: https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/blob/main/test_k8s/dags/dumps/mediawiki_wikibase_dumps.py?ref_type=heads#L233
...and the result is that instead of your files appearing where you expected: https://dumps.wikimedia.org/wikidatawiki/entities/
...they are instead here: https://dumps.wikimedia.org/other/wikidatawiki/

My next step will be to validate this assumption and correct the destination path, then move the dump files to their correct location.

(Turns out, image suggestions, and thus Growth and Apps, don't actually depend on Wikidata dumps anymore since T394757? Removing our tag again.)

The reason that the wikidatawiki/entities link works is because of this symlink.

btullis@clouddumps1002:/srv/dumps/xmldatadumps/public$ ls -l wikidatawiki/
total 7700
drwxr-xr-x 2 dumpsgen dumpsgen 1236992 May 20 09:52 20250401
drwxr-xr-x 2 dumpsgen dumpsgen  208896 Jun  1 09:39 20250420
drwxr-xr-x 2 dumpsgen dumpsgen 1728512 Jun 21 09:34 20250501
drwxr-xr-x 2 dumpsgen dumpsgen  204800 May 26 02:35 20250520
drwxr-xr-x 2 dumpsgen dumpsgen 1777664 Jun 20 10:27 20250601
drwxr-xr-x 2 dumpsgen dumpsgen  217088 Jun 30 15:15 20250620
drwxr-xr-x 2 dumpsgen dumpsgen  180224 Jul  7 09:32 20250701
lrwxrwxrwx 1 root     root          30 Sep 22  2015 entities -> ../other/wikibase/wikidatawiki
drwxrwxr-x 2 dumpsgen dumpsgen 2306048 Jul  7 00:17 latest

We can see that the link points to ../other/wikibase/wikidatawiki and this supports my previous assumption that the path for generated files is incorrect. We need to add that missing wikibase/ path element.

(Turns out, image suggestions, and thus Growth and Apps, don't actually depend on Wikidata dumps anymore since T394757? Removing our tag again.)

Nice! That's good to know. Thanks @Michael.

OK, the code is fixed.
I have manually triggered a sync of the commonswiki wikibase dumps, which were affected by the same issue.

image.png (1,915×675 px, 137 KB)

If this works as expected and fills up https://dumps.wikimedia.org/commonswiki/entities/20250630/ then I will do the same for the wikidatawiki entities.

That seems fine now.

image.png (722×299 px, 77 KB)

I will manually trigger the wikibase/wikidatawiki sync job.

The manual sync run has now completed.

image.png (769×316 px, 69 KB)

I think that we can now mark this as resolved.

@BTullis Thank you for the quick fix.
I will keep an eye on this week dumps. They usually start showing up by wednesday evening.

Confirming that this week dumps are present in https://dumps.wikimedia.org/wikidatawiki/entities/
The bzip2 one looks way too short so I created a new bug : T399119

It's the same thing again this week: https://dumps.wikimedia.org/wikidatawiki/entities

The latest JSON dumps are there (latest-all.json.bz2 and latest-all.json.gz), but the NT and TTL dumps are still those from a week ago.