Page MenuHomePhabricator

WDQS lag is usually longer than one minute
Closed, ResolvedPublicBUG REPORT

Description

The old updater: https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=8&orgId=1&from=1629504000000&to=1629507600000 (usually the lag is less than one minute)
The new updater: https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=8&orgId=1&from=1634774400000&to=1634778000000 (usually the lag is more than one minute)

This indicates the new updater is not an improvement.

Event Timeline

Thanks for bringing this up, we haven't explain that part very well yet.

Goal of the new updater wasn't to have a best case scenario latency lower than on the old updater. Goal was to eliminate few issues of the old ones, like:

New Updater was designed around those (and more) issues. Few things about it:

  • Higher best case lag is the effect of data reconciliation within the pipeline and it helps with the data completeness. We rather have a complete data set than have a faster incomplete one.
  • Data is reconciled within the pipeline, which has a dramatically lower effect on Blazegraph - this should help with the updates, which was the goal, but also positively affect query engine stability.
  • Max throughput of the current deployment is about 10x better than the old one - which means a much faster catch up and more room to grow for Wikidata. The new updater can be scaled even more - if we really need it to.

We will provide more communication on this soon, along with some more background on the Streaming updater itself and the process of the development.