Mobileapps endpoint checks are frequently timing out since 2019-11-26 0:00 UTC. Grafana doesn't show anything out of the ordinary, and unlike in past cases of this behavior (e.g., T229286), there is no sign of worker deaths in logstash. This needs further investigation and remediation.
Description
Details
Related Objects
Event Timeline
Looks like the timeouts are occurring on requests to https://api-rw.discovery.wmnet/w/api.php — note the https://.
Change 553359 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/mobileapps/deploy@master] Use http, not https, in mwapi_uri
Change 553359 merged by jenkins-bot:
[mediawiki/services/mobileapps/deploy@master] Use http, not https, in mwapi_uri
Upon further investigation, i believe this is being caused by T238832: PCS internal request rates tripled on 2019-11-19.
Change 554557 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/mobileapps@master] Add logging for MW API request timeouts
Change 554557 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Add logging for MW API request timeouts
We have been seeing instances of this issue on codfw (soft mobileapp endpoint timeout alerts) specifically in the last several weeks. I can file a new task if this is not the right place to followup on this.
@jcrespo If it's still the case that you're seeing soft mobileapps endpoint timeout alerts on codfw, IMO a new task would be best.
Per T238832#5736879, this round of instability seems to have been resolved when Parsoid/JS linting was turned off.