Page MenuHomePhabricator

Wikidata N-Triples RDF dumps empty, broken since at least 25 July 2025
Closed, ResolvedPublicBUG REPORT

Description

The last working truthy dump I could download was wikidata-20250723-truthy-BETA.nt.bz2 - so this issue exists for a while.
By now, all working truthy dumps have been evicted.

See this listing of RDF dumps at https://dumps.wikimedia.org/wikidatawiki/entities/ (retrieved 2025-09-06):

20250727/                                          27-Jul-2025 16:14                   -
20250728/                                          31-Jul-2025 01:07                   -
20250730/                                          30-Jul-2025 23:58                   -
20250801/                                          02-Aug-2025 00:10                   -
20250804/                                          07-Aug-2025 03:45                   -
20250806/                                          06-Aug-2025 23:35                   -
20250808/                                          08-Aug-2025 23:35                   -
20250811/                                          14-Aug-2025 06:33                   -
20250813/                                          13-Aug-2025 23:39                   -
20250815/                                          15-Aug-2025 23:52                   -
20250818/                                          18-Aug-2025 03:16                   -
20250819/                                          22-Aug-2025 02:25                   -
20250820/                                          22-Aug-2025 02:07                   -
20250822/                                          22-Aug-2025 23:37                   -
20250825/                                          28-Aug-2025 03:03                   -
20250827/                                          27-Aug-2025 23:36                   -
20250829/                                          29-Aug-2025 23:35                   -
20250901/                                          04-Sep-2025 02:32                   -
20250903/                                          03-Sep-2025 23:40                   -
20250905/                                          05-Sep-2025 23:34                   -
dcatap.rdf                                         27-Jun-2025 23:38               89497
latest-all.json.bz2                                03-Sep-2025 13:46         97958601385
latest-all.json.gz                                 03-Sep-2025 07:18        148808805062
latest-all.ttl.bz2                                 04-Sep-2025 02:22        119462651605
latest-all.ttl.gz                                  03-Sep-2025 23:29        146691916197
latest-lexemes.json.bz2                            03-Sep-2025 03:51           391420984
latest-lexemes.json.gz                             03-Sep-2025 03:50           531484359
latest-lexemes.ttl.bz2                             05-Sep-2025 23:34           572604384
latest-lexemes.ttl.gz                              05-Sep-2025 23:33           717836061

# Broken files according to size:
latest-truthy.nt.bz2                               25-Jul-2025 16:42                  45
latest-truthy.nt.gz                                25-Jul-2025 16:42                  44
latest-lexemes.nt.bz2                              05-Sep-2025 23:34                  14
latest-lexemes.nt.gz                               05-Sep-2025 23:33               39840
latest-all.nt.bz2                                  04-Sep-2025 02:32                  14
latest-all.nt.gz                                   03-Sep-2025 23:30               39840

There is issue https://phabricator.wikimedia.org/T162346 which might indicate that these dumps are not monitored yet.

Event Timeline

Aklakan updated the task description. (Show Details)

We found that all the N-triples dumps have been missing since the dump generation infrastructure moved to Airflow because the tool that converts the dumps from .ttl to .nt has been missing. This is being looked into now and should hopefully be resolved soon.

Mahir256 renamed this task from Some published WikiData RDF Dumps are empty to Wikidata N-Triples RDF dumps empty, broken since at least 25 July 2025.Sep 15 2025, 7:56 PM
Mahir256 added subscribers: Tpt, ProgVal.

Mentioned in SAL (#wikimedia-operations) [2025-09-16T14:49:35Z] <dancy@deploy1003> Started scap sync-world: Testing for T403882

Mentioned in SAL (#wikimedia-operations) [2025-09-16T15:01:36Z] <dancy@deploy1003> Finished scap sync-world: Testing for T403882 (duration: 12m 01s)

When will the fixed dumps be available on the server?

When will the fixed dumps be available on the server?

The latest truthy dump is already available here: https://dumps.wikimedia.org/other/wikibase/wikidatawiki/20250916/

The Lexeme N-Triples dump should be available tomorrow, and the full N-Triples dump will be available mid next week once the dump generation job starting on Monday is completed.

Thanks! I see that something happened on the 24th, but it looks like the full N-Triples dump didn't finish.

They're there now: https://dumps.wikimedia.org/wikidatawiki/entities/20250922/

They got successfully generated on the 22nd, but something went wrong in the process of copying them over to the dumps server. Everything should be running smoothly again from now on.