Page MenuHomePhabricator

Initialize WCQS production servers
Closed, ResolvedPublic

Description

This ticket is for tracking final deployment steps for WCQS production. This does not include exposing the service to a general public, only steps to ensure up to date WCQS, with fully working Streaming Updater.

Start date: Jan 11th 2022
prereqs:

Dump: https://dumps.wikimedia.your.org/other/wikibase/commonswiki/20220109/commons-20220109-mediainfo.ttl.gz

savepoint(eqiad): swift://rdf-streaming-updater-eqiad.thanos-swift/commons/savepoints/bootstrap_20220109
savepoint(codfw): swift://rdf-streaming-updater-codfw.thanos-swift/commons/savepoints/bootstrap_20220109
Start time (for eventgate flink consumers in Kafka): 2022-01-09T19:00:04Z
Start time of the flink pipeline (for streaming updater consumers on Blazegraphs hosts): TBD
consumer group for flink pipeline: wcqs_streaming_updater
Steps:

  • clear the journals on all instances
  • Reload WCQS instances with the newest dump (can be done in parallel), streaming updater consumers should be turned off
    • wcqs1001
    • wcqs1002
    • wcqs1003
    • wcqs2001
    • wcqs2002
    • wcqs2003
  • Source rev_map.csv from hdfs (automatically generated weekly by airflow)
  • set offsets for recent changes events topics based on timestamps for the dump (for both steps see https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service/Streaming_Updater#First_run_(bootstrap) )
  • deploy streaming updater consumers to eqiad and codfw (merge puppet patch for streaming updater role)

After the process it should take few hours at most for all the instances to catch up with the lag (dashboard)

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2022-01-07T15:08:19Z] <ottomata> creeating mediainfo-streaming-updater.mutation topics on kafka main-eqiad and main-codfw and setting retention to 30 days - T296470

Ran on main-eqiad and main-codfw kafka:

kafka topics --create --topic eqiad.mediainfo-streaming-updater.mutation --replication-factor 3 --partitions 1
kafka configs --alter --entity-type topics --entity-name eqiad.mediainfo-streaming-updater.mutation --add-config retention.ms=2592000000

kafka topics --create --topic codfw.mediainfo-streaming-updater.mutation --replication-factor 3 --partitions 1
kafka configs --alter --entity-type topics --entity-name codfw.mediainfo-streaming-updater.mutation --add-config retention.ms=2592000000

Mentioned in SAL (#wikimedia-operations) [2022-01-11T18:57:30Z] <ebernhardson> clear wcqs.jnl and aliases.map for all wcqs instances T296470

Started data load via tmux session on cumin1001 at ~ Tue Jan 11 16:53:46 2022 . Expected to take at least 24 hours. Tagging @RKemper for awareness.

Gehel claimed this task.