Page MenuHomePhabricator

wdqs-updater fails when tail-poller queue is full
Closed, ResolvedPublic

Description

During a recent high edit rate on wikidata, wikidata query service stopped to process updates. The updater logs show:

May 26 10:14:15 wdqs1001 bash[28413]: 10:14:15.070 [TailPoller] INFO  o.w.q.r.t.c.TailingChangesPoller - Caught 477 missing updates, adding to the queue
May 26 10:14:15 wdqs1001 bash[28413]: Exception in thread "TailPoller" java.lang.IllegalStateException: Queue full
May 26 10:14:15 wdqs1001 bash[28413]: at java.util.AbstractQueue.add(AbstractQueue.java:98)
May 26 10:14:15 wdqs1001 bash[28413]: at java.util.concurrent.ArrayBlockingQueue.add(ArrayBlockingQueue.java:312)
May 26 10:14:15 wdqs1001 bash[28413]: at org.wikidata.query.rdf.tool.change.TailingChangesPoller.run(TailingChangesPoller.java:81)

Looking at TailingChangesPoller, it looks like the case where the in ternal poller queue is full is not handled correctly. It might make sense to let the queue block in that case and let it apply back pressure on the producer (see BlockingQueue.put()).

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2017-05-26T10:56:29Z] <gehel> restart wdqs-updater on all wdqs nodes - T166378

Change 355761 had a related patch set uploaded (by Gehel; owner: Gehel):
[wikidata/query/rdf@master] Make updater wait when change queue is full.

https://gerrit.wikimedia.org/r/355761

Mentioned in SAL (#wikimedia-operations) [2017-05-26T13:19:56Z] <gehel> restart wdqs-updater on all wdqs nodes - T166378

Change 355761 merged by jenkins-bot:
[wikidata/query/rdf@master] Make updater wait when change queue is full.

https://gerrit.wikimedia.org/r/355761