Page MenuHomePhabricator

[Investigate] ORES spike of errored requests every hour
Closed, ResolvedPublic

Description

https://grafana.wikimedia.org/dashboard/db/ores

In every hour we have spike of around 100 requests that got error.
Why?

Event Timeline

Almost all of errors registered in logs are happening because of unsupported wikis. Most of all, Commonswiki and nowiki. It would be nice to support those (we are actually very close to support nowiki T131855: Language assets for Norwegian ) but mainly I think RTRC behaves this way: It tries to connect to ores.wmflabs.org/scores/wiki and if it gets 404, it doesn't continue. Which makes sense but puts an extra error in our log.

The more worrying is that when you put ScoredRevision in your global.js, it checks everything the hard way whenever you checkout a page in a wiki. I have no idea how to fix that.

On a second thought, these errors should not get to out grafana dashboard. So I want to turn up error logging a little bit: https://github.com/wiki-ai/ores/pull/141

Thanks to lots of changes and some luck I find out what is causing the problem:

It seems someone with user agent "Ruby" is requesting scores of deleted revisions in English Wikipedia for model wp10:
here is one of requests:

May 25 06:10:05 ores-web-03 uwsgi[21979]: [pid: 1406] 10.68.18.74 (-) {36 vars in 1762 bytes} [Wed May 25 06:10:04 2016] GET /v1/scores/enwiki/wp10/?revids=694870591%7C694799445%7C694893140%7C694925610%7C695004939%7C712438034%7C712207231%7C700871681%7C712438607%7C706368989%7C706373642%7C712439107%7C703463017%7C702524525%7C712438819%7C712239844%7C712437535%7C706245143%7C708009469%7C708015177%7C712874824%7C711554487%7C711573172%7C711574792%7C711575320%7C711576195%7C711576415%7C711576742%7C711577502%7C711579843%7C711580716%7C710420501%7C711530936%7C711537476%7C711579847%7C711595665%7C711597825%7C711598613%7C711598939%7C712207232%7C712239845%7C712437881%7C712438719%7C712711843%7C712715697%7C712715761%7C713139017%7C714738347%7C715787136%7C715899393 => generated 9114 bytes in 356 msecs (HTTP/1.0 200) 2 headers in 73 bytes (1 switches on core 0) user agent "Ruby"

Let's see who is using Ruby and wp10 models.

That looks like something that @Ragesoss's scripts could be doing. Sage, do you know if these could be your requests? Are you getting a lot of errors back?

@Nettrom, I thought this could be SuggestBot too. How long ago did suggestbot start using ORES for article quality modeling?

Yeah, pretty sure that's from dashboard.wikiedu.org.

I'll adjust the import routine to stop requesting revisions that are deleted.

Fix is in, and it should be deployed this week.

Sounds like this got resolved, awesome!

For future reference, if you're looking for SuggestBot it should be making HTTP requests with the appropriate User-Agent and From headers set to identify it, User-Agent should be "SuggestBot/1.0" (or maybe at some future point with a higher version number). The code's here: https://github.com/nettrom/suggestbot/blob/master/suggestbot/utilities/page.py