Error
MediaWiki version: 1.35.0-wmf.31
PHP Warning: A non-numeric value encountered
Impact
Unclear.
Notes
40+ of these since 1.35.0-wmf.31.
MediaWiki version: 1.35.0-wmf.31
PHP Warning: A non-numeric value encountered
Unclear.
40+ of these since 1.35.0-wmf.31.
#0 /srv/mediawiki/php-1.35.0-wmf.31/extensions/Wikidata.org/src/QueryServiceLag/WikimediaPrometheusQueryServiceLagProvider.php(110): MWExceptionHandler::handleError(integer, string, string, integer, array) #1 /srv/mediawiki/php-1.35.0-wmf.31/extensions/Wikidata.org/src/QueryServiceLag/WikimediaPrometheusQueryServiceLagProvider.php(59): WikidataOrg\QueryServiceLag\WikimediaPrometheusQueryServiceLagProvider->getLags() #2 /srv/mediawiki/php-1.35.0-wmf.31/extensions/Wikidata.org/maintenance/updateQueryServiceLag.php(84): WikidataOrg\QueryServiceLag\WikimediaPrometheusQueryServiceLagProvider->getLag() #3 /srv/mediawiki/php-1.35.0-wmf.31/maintenance/doMaintenance.php(105): WikidataOrg\UpdateQueryServiceLag->execute() #4 /srv/mediawiki/php-1.35.0-wmf.31/extensions/Wikidata.org/maintenance/updateQueryServiceLag.php(107): require_once(string) #5 /srv/mediawiki/multiversion/MWScript.php(101): require_once(string) #6 {main}
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
WikimediaPrometheusQueryServiceLagProvider Nan values | mediawiki/extensions/Wikidata.org | master | +28 -6 |
Just realizing this is due to /srv/mediawiki-staging/multiversion/MWScript.php extensions/Wikidata.org/maintenance/updateQueryServiceLag.php --wiki wikidatawiki --cluster wdqs --prometheus prometheus.svc.eqiad.wmnet --prometheus prometheus.svc.codfw.wmnet.
Presumably this is cron'd? Unclear whether this warrants rollback.
After discussion in #wikimedia-operations, unblocking train on the assumption that this is likely to be bad data from Prometheus.
16:25 <brennen> meanwhile: does T252077 warrant a rollback? 16:26 <Reedy> https://github.com/wikimedia/puppet/blob/6b0dc71f153b6f052eb117c72ed365aaedc12a4d/modules/profile/manifests/mediawiki/maintenance/wikidata.pp#L73 16:26 <Reedy> (it is a cron, yeah) 16:27 * Reedy looks 16:27 <brennen> thx. 16:30 <Reedy> brennen: I'm presuming time() isn't broken in PHP... 16:30 <brennen> well one can hope. 16:30 <Reedy> :) 16:30 <Reedy> So I'm guessing it's bad data from the prometheus service 16:30 <Reedy> No recent changes to the code 16:31 <brennen> yeah, makes sense. in that case i'll unblock.
Looking at P11165 prometheus seems to be returning a NaN value, which is no accounted for in the code.
{ "metric": { "__name__": "blazegraph_lastupdated", "cluster": "wdqs", "instance": "wdqs2001:9193", "job": "blazegraph", "site": "codfw" }, "value": [ 1588807140.848, "NaN" ] },
Looks like that host is doing something else now.
I think the below is related.
21:07 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer 21:05 ryankemper@cumin1001: END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) 21:04 ryankemper@cumin1001: START - Cookbook sre.wdqs.data-transfer
I guess the code needs to account for NaN values and count them as not in the pool of servers to look at.
Change 597474 had a related patch set uploaded (by Addshore; owner: Addshore):
[mediawiki/extensions/Wikidata.org@master] WikimediaPrometheusQueryServiceLagProvider Nan values
Change 597474 merged by jenkins-bot:
[mediawiki/extensions/Wikidata.org@master] WikimediaPrometheusQueryServiceLagProvider Nan values