Page MenuHomePhabricator

Improve data-reload cookbook based on graph split needs
Closed, ResolvedPublic


As we prepare the graph split hosts in T347504, I've run into a few issues that need addressing:

  • when selecting a "-all" file the preceding "lexemes" one must be taken, e.g. if wikidata-20230925-all-BETA.ttl.bz2 is taken wikidata-20230922-lexemes-BETA.ttl.bz2 must be taken.
  • The cookbook tries to stop the wdqs-updater, but that fails because that service does not exist on the graph split hosts.
  • If you forget to add the "--no-depool" option, the cookbook doesn't fail until the end of the munging step (which takes several days); instead we should change it so that if the host is not behind LVS it will never try to pool/depool.

Event Timeline

Change 966303 had a related patch set uploaded (by Bking; author: Bking):

[operations/cookbooks@master] Add logic for graph_split hosts

@dcausse We're working on the cookbook change, but this made me realize that the current reload cookbook always takes the latest lexemes dump, which isn't in sync with the wikidata dump (usually it's newer).

Do we always need the lexemes dump file to be older than the wikidata dump? Just trying to figure out whether this is a general problem, or specific to the graph split experiment.

Gehel triaged this task as Medium priority.Oct 18 2023, 8:33 AM

This is not a general problem, we want this particular order only when we want to be aligned with what we have in hdfs which is required in the following scenario:

  • initial bootstrap of the whole system (flink bootstrap state+initial data-reload)
  • the graph split evaluation where we want to be sure that the work we base on top of the hdfs data is the same as the data loaded into one blazegraph

It might just be enough to add two new args to the cookbook to allow specifying the files to download instead of picking latest.

Change 966303 abandoned by Ryan Kemper:

[operations/cookbooks@master] Add logic for graph_split hosts


abandon this approach; we'll go with a simpler one

Change 968346 had a related patch set uploaded (by Bking; author: Bking):

[operations/cookbooks@master] add logic for graph_split hosts

Change 968346 merged by Bking:

[operations/cookbooks@master] add logic for graph_split hosts

bking moved this task from In Progress to Done on the Data-Platform-SRE board.

With the above merge, we believe this issue has been addressed. Closing...

Mentioned in SAL (#wikimedia-operations) [2023-11-15T17:52:39Z] <inflatador> bking@wdqs1024 reboot host to hopefully reduce data reload failures T349011