Page MenuHomePhabricator

Load Wikidata split graphs into test servers
Closed, ResolvedPublic

Description

To enable testing, we need the split graph to be loaded into test servers. See parent task for details.

AC:

  • split graphs (scholarly articles AND Wikidata core graph) are loaded into test servers

Event Timeline

Gehel triaged this task as High priority.Nov 3 2023, 10:26 AM
Gehel moved this task from Incoming to Blocked/Waiting on the Discovery-Search (Current work) board.
Gehel moved this task from Incoming to Quarterly Goals on the Data-Platform-SRE board.

@Gehel Is this a duplicate of T347504?

No, it's not. T347504 is about loading the full data set, T350465 is about loading the split data.

Progress:

  • wdqs1024 (wikidata main): 6.6B triples loaded, processing chunk 885/1023
  • wdqs1023 (scholarly articles): 6.3B triples loaded, processing chunk 851/1023

Load seems to have completed:

  • wdqs1023: 7.6B triples, load time: 5d,21h
  • wdqs1024: 7.6B triples, load time: 6d,21h

At a glance the number of triples look sane and the blazegraph logs don't show anything suspect, for extra safety we might have to run a couple queries before calling this a success.

Numbers look correct:

hostgraph# entities# triples
wdqs1022full111,514,88015,320,277,615
wdqs1023scholarly articles41,333,8757,643,858,078
wdqs1024wikidata main70,181,0057,676,622,674