Page MenuHomePhabricator

Watchlist and RecentChanges failure due to ORES on frwiki and ruwiki
Closed, ResolvedPublic

Description

[WhNY-gpAAEAAAI@AlH0AAACN] 2017-11-20 22:36:46: Неустранимое исключение типа «RuntimeException»

Appears for all readers and users in Russian Wikipedia.

Logstash

RuntimeException: Unable to parse threshold: [..]
 at /srv/mediawiki/php-1.31.0-wmf.7/extensions/ORES/includes/Stats.php on line 277

https://grafana.wikimedia.org/dashboard/file/varnish-http-errors.json

Server Admin Log

21:37 <awight@tin> Started deploy [ores/deploy@5084251]: Updating ORES to revscoring 2.0.10, T179711
22:04 <demon@tin> rebuilt wikiversions.php and synchronized wikiversions files: group2 to wmf.8
22:10 Sharp rise in HTTP 500 errors
22:27 <awight@tin> Finished deploy [ores/deploy@5084251]: Updating ORES to revscoring 2.0.10 (duration: 49m 54s)
22:54 <awight@tin> Started deploy [ores/deploy@5084251]: Rollback ORES; T179711
22:55 <awight> rolling back ORES to fix T181006
22:55 <demon@tin> rebuilt wikiversions.php and synchronized wikiversions files: no wmf.8 for group2. i hate my life
22:55 <awight@tin> Finished deploy [ores/deploy@5084251]: Rollback ORES (duration: 01m 05s)
23:11 <awight> purged memcache key 'ruwiki:ORES:threshold_statistics:goodfaith:1’,
23:25 <awight@tin> Started deploy [ores/deploy@82a13ae]: Rollback ORES (take 3); T181006

Event Timeline

Restricted Application added subscribers: Base, Aklapper. · View Herald TranscriptNov 20 2017, 10:39 PM
MaxBioHazard triaged this task as Unbreak Now! priority.Nov 20 2017, 10:41 PM
Restricted Application added subscribers: Liuxinyu970226, Jay8g, TerraCodes. · View Herald TranscriptNov 20 2017, 10:41 PM
[WhNY-gpAAEAAAI@AlH0AAACN] /wiki/%D0%A1%D0%BB%D1%83%D0%B6%D0%B5%D0%B1%D0%BD%D0%B0%D1%8F:%D0%A1%D0%BF%D0%B8%D1%81%D0%BE%D0%BA_%D0%BD%D0%B0%D0%B1%D0%BB%D1%8E%D0%B4%D0%B5%D0%BD%D0%B8%D1%8F?days=0.004001782407407407   RuntimeException from line 277 of /srv/mediawiki/php-1.31.0-wmf.8/extensions/ORES/includes/Stats.php: Unable to parse threshold: {"levelName":"verylikelybad","levelConfig":"maximum recall @ precision >= 0.75","bound":"max","statsData":{"false":{"maximum recall @ precision >= 0.15":{"!f1":0.923,"!precision":0.995,"!recall":0.861,"accuracy":0.86,"f1":0.256,"filter_rate":0.841,"fpr":0.139,"match_rate":0.159,"precision":0.151,"recall":0.842,"threshold":0.252},"maximum recall @ precision >= 0.45":{"!f1":0.985,"!precision":0.977,"!recall":0.993,"accuracy":0.97,"f1":0.269,"filter_rate":0.988,"fpr":0.007,"match_rate":0.012,"precision":0.452,"recall":0.192,"threshold":0.797},"maximum recall @ precision >= 0.75":null},"true":{"maximum recall @ precision >= 0.995":{"!f1":0.254,"!precision":0.149,"!recall":0.854,"accuracy":0.856,"f1":0.921,"filter_rate":0.164,"fpr":0.146,"match_rate":0.836,"precision":0.995,"recall":0.856,"threshold":0.766}}}}
	#0 /srv/mediawiki/php-1.31.0-wmf.8/extensions/ORES/includes/Stats.php(241): ORES\Stats->extractBoundValue(string, string, string, array)
#1 /srv/mediawiki/php-1.31.0-wmf.8/extensions/ORES/includes/Stats.php(44): ORES\Stats->parseThresholds(array, string)
#2 /srv/mediawiki/php-1.31.0-wmf.8/extensions/ORES/includes/Hooks.php(316): ORES\Stats->getThresholds(string)
#3 /srv/mediawiki/php-1.31.0-wmf.8/includes/Hooks.php(177): ORES\Hooks::onChangesListSpecialPageStructuredFilters(SpecialWatchlist)
#4 /srv/mediawiki/php-1.31.0-wmf.8/includes/Hooks.php(205): Hooks::callHook(string, array, array, NULL)
#5 /srv/mediawiki/php-1.31.0-wmf.8/includes/specialpage/ChangesListSpecialPage.php(882): Hooks::run(string, array)
#6 /srv/mediawiki/php-1.31.0-wmf.8/includes/specials/SpecialWatchlist.php(152): ChangesListSpecialPage->registerFilters()
#7 /srv/mediawiki/php-1.31.0-wmf.8/includes/specialpage/ChangesListSpecialPage.php(1023): SpecialWatchlist->registerFilters()
#8 /srv/mediawiki/php-1.31.0-wmf.8/includes/specialpage/ChangesListSpecialPage.php(843): ChangesListSpecialPage->setup(NULL)
#9 /srv/mediawiki/php-1.31.0-wmf.8/includes/specials/SpecialWatchlist.php(85): ChangesListSpecialPage->getOptions()
#10 /srv/mediawiki/php-1.31.0-wmf.8/includes/specialpage/SpecialPage.php(522): SpecialWatchlist->execute(NULL)
#11 /srv/mediawiki/php-1.31.0-wmf.8/includes/specialpage/SpecialPageFactory.php(578): SpecialPage->run(NULL)
#12 /srv/mediawiki/php-1.31.0-wmf.8/includes/MediaWiki.php(287): SpecialPageFactory::executePath(Title, RequestContext)
#13 /srv/mediawiki/php-1.31.0-wmf.8/includes/MediaWiki.php(851): MediaWiki->performRequest()
#14 /srv/mediawiki/php-1.31.0-wmf.8/includes/MediaWiki.php(523): MediaWiki->main()
#15 /srv/mediawiki/php-1.31.0-wmf.8/index.php(43): MediaWiki->run()
#16 /srv/mediawiki/w/index.php(3): include(string)
#17 {main}

Looks like a typo in ORES configuration?

Restricted Application added a project: Scoring-platform-team. · View Herald TranscriptNov 20 2017, 10:43 PM

Looks like 22:27 awight@tin: Finished deploy [ores/deploy@5084251]: Updating ORES to revscoring 2.0.10, T179711 (duration: 49m 54s) probably started this?

stjn updated the task description. (Show Details)Nov 20 2017, 10:46 PM
stjn added a subscriber: stjn.
awight added a subscriber: awight.Nov 20 2017, 10:47 PM

@Catrope Definitely caused by my deployment. The strange thing is, what we deployed was a fix to T179711, which only should have added that "null" value for requests that were already failing due to impossible config.

The fix (ideally) is to tweak the ruwiki thresholds config until it's within the possible range.

Krinkle updated the task description. (Show Details)Nov 20 2017, 10:50 PM

Mentioned in SAL (#wikimedia-operations) [2017-11-20T22:55:11Z] <awight> rolling back ORES to fix T181006

Zppix added a subscriber: Zppix.Nov 20 2017, 10:58 PM
Krinkle updated the task description. (Show Details)Nov 20 2017, 10:59 PM

Still don't working. When it will be fixed?

Mentioned in SAL (#wikimedia-operations) [2017-11-20T23:11:47Z] <awight> purged memcache key 'ruwiki:ORES:threshold_statistics:goodfaith:1’, T181006

Change 392535 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] Ores: Emergency disable on frwiki and ruwiki

https://gerrit.wikimedia.org/r/392535

Krinkle renamed this task from Watchlist and RecentChanges don't work on ruwiki to Watchlist and RecentChanges failure due to ORES on frwiki and ruwiki.Nov 20 2017, 11:28 PM
Krinkle updated the task description. (Show Details)
Krinkle updated the task description. (Show Details)Nov 20 2017, 11:30 PM

Change 392535 merged by jenkins-bot:
[operations/mediawiki-config@master] Ores: Emergency disable on frwiki and ruwiki

https://gerrit.wikimedia.org/r/392535

Mentioned in SAL (#wikimedia-operations) [2017-11-20T23:35:47Z] <legoktm@tin> Synchronized wmf-config/InitialiseSettings.php: emergency disable ORES on frwp/ruwp T181006 (duration: 00m 49s)

Halfak added a subscriber: Halfak.Nov 20 2017, 11:37 PM

I've confirmed trhat both Wikis have recovered.

So, for clarity, it seems that in this case, ORES began to work as documented and that caused a failure in Watchlist/RecentChanges. It seems that the next step WRT completing this reverted deployment is to fix the way that Watchlist/RecentChanges degrade.

greg added a subscriber: greg.
awight lowered the priority of this task from Unbreak Now! to High.Nov 21 2017, 12:03 AM

Reducing the priority, we need to reenable ORES on these wikis very carefully. ORES server code is rolled back, so this should *theoretically* be a smooth re-enablement.

Shawn added a subscriber: Shawn.Nov 21 2017, 1:16 PM

When ORES will be reenabled?

Change 392845 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/ORES@master] Disable the filter if ORES says the threshold doesn't exist

https://gerrit.wikimedia.org/r/392845

Zppix added a comment.Nov 22 2017, 4:14 PM

When ORES will be reenabled?

Unfortunately, as we're still dealing with the fallout of the events, we do not plan on reenabling until next week, even then we aren't entirely sure. Sorry for the inconvience.

@MaxBioHazard it seems like Release-Engineering-Team would like us to wait until next week -- after the US holiday. I wish we could have it re-enabled sooner. Thanks for your patience and sorry for the inconvenience.

OK, but can you explain - why you can't just undo the change, that caused this crash? And is the neural network for Russian language damaged?

The machine predictor for Russian is intact. It's an incompatibility with MediaWiki that caused the problem. We can't just switch the configuration back because it may cause an outage again.

I've just asked in #wikimedia-releng what the chances are of getting a config change through today and will report back.

Arbnos added a subscriber: Arbnos.Nov 22 2017, 5:51 PM
greg added a comment.Nov 22 2017, 8:07 PM

I've just asked in #wikimedia-releng what the chances are of getting a config change through today and will report back.

It is the wednesday before a long weekend where all of Release Engineering and many in Ops will not be working. There is a rule for "No deploys on Friday." Today is this week's Friday. No :)

This new feature can wait until next week.

Thanks for chiming in @greg. The good news for @MaxBioHazard is that we've narrowed in on the issue so we know what caused it and can move forward with confidence next week. See T181168.

Change 392845 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Disable the filter if ORES says the threshold doesn't exist

https://gerrit.wikimedia.org/r/392845

matmarex removed a subscriber: matmarex.Nov 23 2017, 5:51 PM

"Next week" is here, so we are waiting for ORES reenabling.

My patch is merged and I will backport it today. Then we will reenable one wiki to be sure it's not making a problem.

Change 393659 had a related patch set uploaded (by Awight; owner: Amir Sarabadani):
[mediawiki/extensions/ORES@wmf/1.31.0-wmf.8] Disable the filter if ORES says the threshold doesn't exist

https://gerrit.wikimedia.org/r/393659

Change 393659 merged by jenkins-bot:
[mediawiki/extensions/ORES@wmf/1.31.0-wmf.8] Disable the filter if ORES says the threshold doesn't exist

https://gerrit.wikimedia.org/r/393659

Change 393667 had a related patch set uploaded (by Awight; owner: Awight):
[operations/mediawiki-config@master] Reenable ORES on frwiki, ruwiki, and wikidatawiki

https://gerrit.wikimedia.org/r/393667

Change 393667 merged by jenkins-bot:
[operations/mediawiki-config@master] Reenable ORES on frwiki, ruwiki, and wikidatawiki

https://gerrit.wikimedia.org/r/393667

Mentioned in SAL (#wikimedia-operations) [2017-11-27T21:59:37Z] <awight@tin> Synchronized wmf-config/InitialiseSettings.php: Reenable ORES on frwiki, ruwiki, and wikidata; T181006 (duration: 00m 45s)

awight closed this task as Resolved.Nov 27 2017, 10:10 PM
awight claimed this task.

ORES is reenabled on these wikis.

I've left a message to Russian Wikipedia and French Wikipedia to inform them.

awight moved this task from Active to Done on the Scoring-platform-team (Current) board.