The split graph updater is running and populating its kafka topics since 2024-07-18T09:00.
The earliest dumps usable for a data-reload should be tagged with the snapshot date 20240722 which should instruct the data-reload cookbook to position kafka offsets to 2024-07-19T23:00:00Z.
Given previous runs of the airflow dag to import dumps into HDFS these snapshots should be available around 2024-07-26T10:00:00.
Update from above: the latest snapshots available are now tagged with 20240729.
Given the above the data-reload arguments should be:
- main graph:
cookbook sre.wdqs.data-reload \ --task-id T370754 \ --reason "WDQS main subgraph" \ --reload-data wikidata_main \ --from-hdfs hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ \ --stat-host stat1009.eqiad.wmnet \ wdqs_host_main
- scholarly graph:
cookbook sre.wdqs.data-reload \ --task-id T370754 \ --reason "WDQS scholarly subgraph" \ --reload-data scholarly_articles \ --from-hdfs hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20240729/ \ --stat-host stat1009.eqiad.wmnet \ wdqs_host_scholarly
Pre-requisites:
- The target WDQS node must have its topic properly configured in puppet with (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1060049):
- profile::query_service::streaming_updater::kafka_topic: eqiad.rdf-streaming-updater.mutation-main for an eqiad node imported with the profile wikidata_main
- profile::query_service::streaming_updater::kafka_topic: eqiad.rdf-streaming-updater.mutation-scholarly for an eqiad node imported with the profile scholarly_articles
- profile::query_service::streaming_updater::kafka_topic: codfw.rdf-streaming-updater.mutation-main for a codfw node imported with the profile wikidata_main
- profile::query_service::streaming_updater::kafka_topic: codfw.rdf-streaming-updater.mutation-scholarly for a codfw node imported with the profile scholarly_articles
- The wdqs version 0.3.145 must be deployed (https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/1056125)
- partition hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240722/ is available
- partition hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20240722/ is available