Page MenuHomePhabricator

wdqs updater processing events but not finding anything useful
Closed, ResolvedPublic

Description

Since 10:30am UTC wdqs updater is not applying any updates. The logs (see below) show that events are processed, but no useful event are found.

11:37:39.427 [main] INFO  org.wikidata.query.rdf.tool.Updater - Polled up to 2019-08-13T10:28:52Z at (4.9, 4.5, 4.2) updates per second and (0.0, 0.0, 0.0) milliseconds per second 
11:37:39.518 [main] INFO  o.w.q.rdf.tool.change.KafkaPoller - Did not find anything useful in this batch, returning existing data

Event Timeline

Gehel created this task.Aug 13 2019, 11:40 AM
Restricted Application added a project: Wikidata. · View Herald TranscriptAug 13 2019, 11:40 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Gehel added a comment.Aug 13 2019, 2:50 PM

eventgate eqiad was depooled from 10:30UTC to 12:20 UTC, which matches the time where no updates were applied.

Note that I see in grafana the number of tuples still increasing, so we might have a reporting issue.

Topics are prefixed with the DC name, so during that time, events were in the codfw prefixed topic and not the usual eqiad topic.

jijiki added a subscriber: jijiki.Aug 13 2019, 2:51 PM

This is pretty weird, the updater should be able to consume from both eqiad and codfw, maybe something between the brokers did not work, or we're not connecting to the right endpoint? The messages and the situation definitely looks like it stopped getting events - in general Did not find anything useful in this batch, returning existing data is normal if it happens occasionally (it means no new events) but if it happens all the time that means there's trouble since we're not getting events. So we need to check maybe we're missing something in our kafka setup.

We'll also probably need to update items edited in that timeframe manually, just in case. I'll do that a bit later (and also will add docs for doing this).

Smalyshev moved this task from Backlog to Next on the User-Smalyshev board.Aug 13 2019, 8:42 PM
Smalyshev added a comment.EditedAug 14 2019, 12:06 AM

Another point: while WDQS does fetch data from both clusters (at least is supposed to), it only tracks its timestamp by one topic: reportingTopic, which is currently eqiad.mediawiki.revision-create. Otherwise we get weird jumps in the timestamps, since different topics can get different events in different sequence. So if eqiad does not get any events, the updater seems to lag even though it is processing events from codfw. I am not sure whether "nothing useful" message is related anyhow.

Smalyshev triaged this task as Normal priority.Aug 14 2019, 12:06 AM
Gehel added a comment.Aug 14 2019, 1:15 PM

Another point: while WDQS does fetch data from both clusters (at least is supposed to), it only tracks its timestamp by one topic: reportingTopic, which is currently eqiad.mediawiki.revision-create. Otherwise we get weird jumps in the timestamps, since different topics can get different events in different sequence. So if eqiad does not get any events, the updater seems to lag even though it is processing events from codfw. I am not sure whether "nothing useful" message is related anyhow.

That would explain why the number of triples was still increasing even when updater was reporting no updates. The Did not find anything useful message might just be a red herring.

Smalyshev closed this task as Resolved.Wed, Aug 21, 7:26 AM
Smalyshev claimed this task.