Page MenuHomePhabricator

Recover lexemes on wdqs1009
Closed, ResolvedPublic


Lexemes were not imported on wdqs1009 and when the streaming-updater-consumer started consuming lexemes from the flink ouput streams the triple store did not match what was expected causing diffs to generate a lot of inconsistencies.

The full flink output is still retained in kafka (1 month retention) and we could try to recover lexemes without having to do a full import again.

  1. add an option to the streaming-updater-consumer to ingest only lexemes
  2. import the lexemes from /srv/wdqs/lex-munged/
  3. start a manual streaming-updater-consumer re-reading the whole mutation stream filtered on lexeme from offset 12080891
  4. stop it when it catches up with the normal consumer

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 670090 had a related patch set uploaded (by DCausse; owner: DCausse):
[wikidata/query/rdf@master] Add a way to filter input messages based on their entity ID

Mentioned in SAL (#wikimedia-operations) [2021-03-09T10:53:36Z] <dcausse> started to import lexemes on wdqs1009 (T276784)

Reprocessed all updates related to lexemes on wdqs1009 using a custom build with

Change 670090 merged by jenkins-bot:
[wikidata/query/rdf@master] Add a way to filter input messages based on their entity ID