Page MenuHomePhabricator

EventLogging dies when fetching a schema over HTTP that does not exist. {oryx}
Closed, ResolvedPublic8 Estimated Story Points

Description

eventlogging_consumer-mysql-m4-master-00.log:jsonschema.exceptions.SchemaError: Schema fetch failure: HTTP Error 404: Not Found

This exception was not caught and caused the process to die.

We should catch and log this exception in case of a 404. Other HTTP errors should probably make EventLogging die.

Event Timeline

Ottomata raised the priority of this task from to Medium.
Ottomata updated the task description. (Show Details)
Ottomata added subscribers: Ottomata, elukey, mforns.
Milimetric renamed this task from EventLogging dies when fetching a schema over HTTP that does not exist. to EventLogging dies when fetching a schema over HTTP that does not exist. {oryx}.Jan 28 2016, 6:16 PM
Milimetric moved this task from Analytics Query Service to Event Platform on the Analytics board.
Milimetric set Security to None.
Milimetric moved this task from Event Platform to Analytics Query Service on the Analytics board.

I tried reproducing the described error and could not:


When sending an event with a non-existing revision,
the MediaWiki API returns a code 400 (not 404) with the following contents:

{
    servedby: "mw1224",
    error: {
        code: "nosuchrevid",
        info: "There is no revision with ID 469803370",
        docref: "See https://meta.wikimedia.org/w/api.php for API usage"
    }
}

This happens at the processor level, the error is logged like:

2016-04-14 20:56:08,320 (MainThread) Unable to process: ?%7B%22wiki%22%3A%20%22vu%22%2C%20%22uuid%22%3A%20%223c%22%2C%20%22timestamp%22%3A%201460667368%2C%20%22recvFrom%22%3A%20%22kj%22%2C%20%22schema%22%3A%20%22Edit%22%2C%20%22event%22%3A%20%7B%22action.saveFailure.timing%22%3A%2010%2C%20%22mediawiki.version%22%3A%20%220u%22%2C%20%22integration%22%3A%20%22page%22%2C%20%22editingSessionId%22%3A%20%22FM%22%2C%20%22user.editCount%22%3A%2019%2C%20%22platform%22%3A%20%22other%22%2C%20%22action.init.mechanism%22%3A%20%22url%22%2C%20%22version%22%3A%2084%2C%20%22editor%22%3A%20%22visualeditor%22%2C%20%22action%22%3A%20%22init%22%2C%20%22user.id%22%3A%202%7D%2C%20%22revision%22%3A%20469803370%7D;      deployment-cache-text04.deployment-prep.eqiad.wmflabs   271     2016-04-14T20:56:08     -       "-" (Schema fetch from https://meta.wikimedia.org/w/api.php?action=jsonschema&title=Edit&revid=469803370&formatversion=2 failed: HTTP Error 400: Bad Request)

The event does not pass validation (so does not make it to the mysql consumer), but EL does not die and continues to work properly.


When sending an event with a non-existing schema name "EditBlah",
the MediaWiki API also returns a code 400 with:

{
    servedby: "mw1234",
    error: {
        code: "revwrongpage",
        info: "r13457736 is not a revision of "EditBlah"",
        docref: "See https://meta.wikimedia.org/w/api.php for API usage"
    }
}

This happens also at the processor level, with a very similar error log. The event does not pass validation but EL continues doing fine.


When forcing the call to MW API to return a code 404 or an ERR_NAME_NOT_RESOLVED,
EL remains alive and working fine. The processor generates a log like:

2016-04-14 21:10:49,845 (MainThread) Unable to process: ?%7B%22wiki%22%3A%20%224c%22%2C%20%22topic%22%3A%20%2261%22%2C%20%22uuid%22%3A%20%22kZ%22%2C%20%22webHost%22%3A%20%22hu%22%2C%20%22timestamp%22%3A%201460668249%2C%20%22recvFrom%22%3A%20%22vJ%22%2C%20%22schema%22%3A%20%22Edit%22%2C%20%22event%22%3A%20%7B%22action.saveAttempt.timing%22%3A%2028%2C%20%22platform%22%3A%20%22other%22%2C%20%22mediawiki.version%22%3A%20%22IE%22%2C%20%22integration%22%3A%20%22flow%22%2C%20%22editingSessionId%22%3A%20%22YI%22%2C%20%22user.editCount%22%3A%2039%2C%20%22page.id%22%3A%2071%2C%20%22version%22%3A%2036%2C%20%22editor%22%3A%20%22other%22%2C%20%22user.class%22%3A%20%22bot%22%2C%20%22action%22%3A%20%22abort%22%2C%20%22user.id%22%3A%2045%2C%20%22action.saveFailure.message%22%3A%20%22BO%22%2C%20%22page.revid%22%3A%2017%7D%2C%20%22revision%22%3A%20469803370%7D;	deployment-cache-text04.deployment-prep.eqiad.wmflabs	361	2016-04-14T21:10:49	-	"-" (Schema fetch from https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/en.wikipedia/all-access/user/Cat/daily/2017032500/2017041300 failed: HTTP Error 404: Not Found)

and the event is discarded by the validation.
(Note that the only URL returning 404s I could think of at the time was the pageview api :P)


So I think something changed since this task was created in Jan 26th. I can not imagine how the error could make it to the mysql consumer, because if the event arrived there, it's because it was validated by the processor. If the schema or revision didn't exist, that error would have been caught by the processor. I guess, there was an issue in the MW API that was eventually returning 404s, and that broke the consumer. If that's the case, I guess making EL die is legit.

It seems to me that what we are asking here is already supported by the current code. I'd suggest to mark this task as resolved or invalid.

OHHHH it's the MySQL consumer that died. Right, I think this might make more sense.

I think this will happen if EventLogging tries to fetch a schema in order to insert a record into a mysql for which the schema does not exist. See the get_schema() call in jrm.py in declare_table().

If you pipe data to a MySQL consumer for which the schema doesn't exist, I betcha it will die.

Nuria set the point value for this task to 8.Apr 19 2016, 5:40 PM

Change 284351 had a related patch set uploaded (by Mforns):
Avoid that fetch-schema errors break the mysql writer

https://gerrit.wikimedia.org/r/284351

Change 284351 merged by Ottomata:
Avoid that fetch-schema errors break the mysql writer

https://gerrit.wikimedia.org/r/284351