Page MenuHomePhabricator

The streaming updater consumer should log information when divergences are detected
Closed, ResolvedPublic1 Estimated Story Points

Description

As a maintainer of the rdf-streaming-updater I want information to be logged when divergences are detected on patch application so that I can more easily debug the cause of these divergences.

When applying a RDF patch to the triple store (blazegraph) some divergences may occur for the following reasons:

  • the state of the store is not what is expected by the flink pipeline (actual divergences)
  • false positives: some triples/literals are modified on the fly by blazegraph (unicode normalization/large values cutoff/precisions). Should be a couple to a dozen triples per hour.

Finding out what's the cause of a non-negligible bump in the number of divergences is not straightforward today, adding some more logs to the streaming consumer will help such investigations.

AC:

  • meaningful logs allowing to trace what are the changes involved in a bump of divergences

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 701331 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/rdf@master] Log batches that cause too many inconsistencies

https://gerrit.wikimedia.org/r/701331

Change 701331 merged by jenkins-bot:

[wikidata/query/rdf@master] Log batches that cause too many inconsistencies

https://gerrit.wikimedia.org/r/701331