Page MenuHomePhabricator

ORES requests for wikidatawiki models=damaging end up with HTTP request timed out
Closed, ResolvedPublic

Description

Since promoting group2 to 1.32.0-wmf.22

message.
[{exception_id}] {exception_url} RuntimeException from line 95 of /srv/mediawiki/php-1.32.0-wmf.22/extensions/ORES/includes/ORESService.php: Failed to make ORES request to [http://ores.discovery.wmnet:8081/v3/scores/wikidatawiki/?models=damaging%7Cgoodf

With HTTP request timed out.

stacktrace
#0 /srv/mediawiki/php-1.32.0-wmf.22/extensions/ORES/includes/ScoreFetcher.php(55): ORES\ORESService->request(array, NULL)
#1 /srv/mediawiki/php-1.32.0-wmf.22/extensions/ORES/includes/Hooks/ApiHooksHandler.php(273): ORES\ScoreFetcher->getScores(array)
#2 /srv/mediawiki/php-1.32.0-wmf.22/extensions/ORES/includes/Hooks/ApiHooksHandler.php(227): ORES\Hooks\ApiHooksHandler::loadScoresForRevisions(array)
#3 /srv/mediawiki/php-1.32.0-wmf.22/includes/Hooks.php(174): ORES\Hooks\ApiHooksHandler::onApiQueryBaseAfterQuery(ApiQueryRecentChanges, Wikimedia\Rdbms\ResultWrapper, array)
#4 /srv/mediawiki/php-1.32.0-wmf.22/includes/Hooks.php(202): Hooks::callHook(string, array, array, NULL)
#5 /srv/mediawiki/php-1.32.0-wmf.22/includes/api/ApiQueryBase.php(381): Hooks::run(string, array)
#6 /srv/mediawiki/php-1.32.0-wmf.22/includes/api/ApiQueryRecentChanges.php(426): ApiQueryBase->select(string, array, array)
#7 /srv/mediawiki/php-1.32.0-wmf.22/includes/api/ApiQueryRecentChanges.php(133): ApiQueryRecentChanges->run()
#8 /srv/mediawiki/php-1.32.0-wmf.22/includes/api/ApiQuery.php(249): ApiQueryRecentChanges->execute()
#9 /srv/mediawiki/php-1.32.0-wmf.22/includes/api/ApiMain.php(1587): ApiQuery->execute()
#10 /srv/mediawiki/php-1.32.0-wmf.22/includes/api/ApiMain.php(531): ApiMain->executeAction()
#11 /srv/mediawiki/php-1.32.0-wmf.22/includes/api/ApiMain.php(502): ApiMain->executeActionWithErrorHandling()
#12 /srv/mediawiki/php-1.32.0-wmf.22/api.php(87): ApiMain->execute()
#13 /srv/mediawiki/w/api.php(3): include(string)
#14 {main}

Details

Event Timeline

Restricted Application added a project: Scoring-platform-team. · View Herald TranscriptSep 20 2018, 2:53 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Though if I try a reported URL manually, it seems to work. Tried on deploy1001, deploy2001 and mw2219

Based on Graphana, seems some ORES server got overloaded and some requests did time out. That was a short break though and apparently everything works fine now. To be investigated though.

Yes, basically for each deployment we get an overload error spike: https://grafana.wikimedia.org/dashboard/db/ores?refresh=1m&panelId=9&fullscreen&orgId=1&from=now-2d&to=now-1m
The reason is that all restarts basically happen at the same time, is there a way around that in scap?

I don't think it should be a blocker to the train as it's due to changes happening to the service (T160692: Use poolcounter to limit number of connections to ores uwsgi) but it should be pretty high priority (and I will get it fixed)

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptSep 20 2018, 8:00 PM

That is not a blocker to the train per see. I have filled it as a subtask because the train triggers the issue. Seems that is transient and self resolve after some short amount of time.

So surely, we should get it fixed, but that is not a blocker per see :]

greg triaged this task as High priority.Sep 24 2018, 4:18 PM
greg added a subscriber: greg.

I don't think it should be a blocker to the train as it's due to changes happening to the service (T160692: Use poolcounter to limit number of connections to ores uwsgi) but it should be pretty high priority (and I will get it fixed)

ETA?

There are three ways to handle this, I'm planning to do all to make sure it won't happen again. Each of them might take one day or two to implement and deploy.

Change 462779 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/ORES@master] Catch and gracefully handle when service is not responding properly to ApiHooksHandler

https://gerrit.wikimedia.org/r/462779

Change 462779 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Catch and gracefully handle when service is not responding properly to ApiHooksHandler

https://gerrit.wikimedia.org/r/462779

Ladsgroup moved this task from Incoming to In progress on the User-Ladsgroup board.Oct 2 2018, 8:19 PM
Ladsgroup closed this task as Resolved.Nov 1 2018, 8:00 PM

Thank you @Ladsgroup to have followed on that incident task. Any spam log we can drop makes deployment of MediaWiki easier!