Page MenuHomePhabricator

WDQS should retry when getting 404s
Closed, ResolvedPublic

Description

As a user, when there is a failed update, I want quick automatic retries instead of manual retries.

As a maintainer of the wdqs streaming updater I want requests to Special:EntityData receiving a 404 response to be retried so that there are fewer items to reconcile (T279541).

There is a race between the events flowing to kafka and mysql replication. This race might cause the events to be processed before the data they point to is available on the mysql replica being reached.

One simple approach to circumvent the problem would be to retry on 404. The retry could be guarded by a check on the difference between the processing time and the event time, if the difference is less than e.g. 10 seconds then a retry is performed.

Looking at the side output data of the streaming updater for the first seven day of april we see (range is the delta between the ingestion time vs the event time):

+-------+------+
|range  |events|
+-------+------+
|0: 0-1s|65    |
|1: 1-3s|137   |
|2: 3-5s|38    |
|3: 5-7s|9     |
+-------+------+

which translates to: over the 8 days of wikidata edits 249 events failed with a 404 but for which the data is actually available (most probably due to replication lag) and whose events were ingested between 0 and 7 seconds after their event time.

There are 141 events for which we received a 404 that is still a 404 now:

+--------+------+
|range   |events|
+--------+------+
|1: 1-3s |4     |
|3: 5-7s |1     |
|4: 7-10s|2     |
|5: >10s |134   |
+--------+-----+

So retrying 404 for events with an processing_time - event_time < 10 seconds seems the right threshold that will cause an extra latency only for a few hundreds of events per week.

AC:

  • retry 404 until the event time is 10sec older than the processing time

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
MPhamWMF moved this task from Incoming to Scaling on the Wikidata-Query-Service board.

Change 737804 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[wikidata/query/rdf@master] producer: Wait between retry attempts when fetching entity

https://gerrit.wikimedia.org/r/737804

Change 737807 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[wikidata/query/rdf@master] producer: Make more entity fetch attempts for recently added content

https://gerrit.wikimedia.org/r/737807

Change 737804 merged by jenkins-bot:

[wikidata/query/rdf@master] producer: Wait between retry attempts when fetching entity

https://gerrit.wikimedia.org/r/737804

Change 737807 merged by jenkins-bot:

[wikidata/query/rdf@master] producer: Retry 404s for new content near creation time

https://gerrit.wikimedia.org/r/737807