Page MenuHomePhabricator

request: data transfer: wikidata main-20260209 snapshot to wdqs2009
Closed, ResolvedPublic

Description

Hi,

We need to index qlever on wdqs2009, but currently there is no wikidata snpashot on that host.
Could you do a data transfer of main-20260209 to wdqs2009.codfw.wmnet? We need that specific date for parity with other test nodes.

A copy of the snapshot we need is available locally on an-master1004 at /srv/tmp/main-20260209/. I lack the privileges to run the
transfer.py script myself:

sudo transfer.py an-master1004.eqiad.wmnet:/srv/tmp/main-20260209 wdqs2009.codfw.wmnet:/srv/tmp

See T415492: Request: make main and scholarly graphs available on WDQS test nodes.

Event Timeline

atsuko changed the task status from Open to In Progress.Apr 28 2026, 1:00 PM
atsuko claimed this task.

@gmodena I started a transfer, but you might not have the access for it (at least to remove). Do you want me to chown it to any specific user (qlever maybe)

cc @BTullis

atsuko changed the task status from In Progress to Stalled.Apr 28 2026, 1:33 PM

Transferred

thanks @atsuko!

Me and other WDP engineers have sudo rights on that node, and can read the files. I confirm that the transfer looks alright, and I've been able to kick-off indexing:

gmodena@wdqs2009:/srv/wdqs/qlever/index$ time sudo zcat /srv/tmp/main-20260209/wikidata_main.* | qlever-index -m 40G -F nt -f - -i wikidata -s ../conf/wikidata.settings.json
2026-04-29 09:18:13.984 - INFO: QLever index builder, compiled on Mon Feb 23 10:41:57 UTC 2026 using git hash 36ed96
2026-04-29 09:18:13.985 - INFO: You specified "locale = en_US" and "ignore-punctuation = 1"
2026-04-29 09:18:13.986 - INFO: You specified "ascii-prefixes-only = true", which enables faster parsing for well-behaved TTL files
2026-04-29 09:18:13.986 - INFO: You specified "parallel-parsing = true", which enables faster parsing for TTL files with a well-behaved use of newlines
2026-04-29 09:18:13.986 - INFO: You specified "num-triples-per-batch = 10,000,000", choose a lower value if the index builder runs out of memory
2026-04-29 09:18:13.986 - INFO: By default, integers that cannot be represented by QLever will throw an exception
2026-04-29 09:18:13.986 - WARN: Parallel parsing set in the `.settings.json` file; this is deprecated, please use the command-line option --parse-parallel or -p
2026-04-29 09:18:13.986 - INFO: Processing triples from single input stream /dev/stdin (parallel = true) ...
2026-04-29 09:18:13.989 - INFO: Parsing input triples and creating partial vocabularies, one per batch ...
zcat /srv/tmp/main-20260209/wikidata_main.* | qlever-index -m 40G -F ttl -f - -i wikidata -s ../conf/wikidata.settings.jsonwest 1.1 M/s]
2026-04-29 09:20:33.888 - INFO: Triples parsed: 150,000,000 [average speed 1.1 M/s, last batch 1.1 M/s, fastest 1.2 M/s, slowest 1.0 M/s] s]

Thanks for reporing back, marked as resolved then.