Page MenuHomePhabricator

PCS errors on media endpoint
Closed, ResolvedPublic

Description

Example:

curl https://appservice.wmflabs.org/th.wikipedia.org/v1/page/media/%E0%B8%9E%E0%B8%A3%E0%B8%B0%E0%B8%9A%E0%B8%B2%E0%B8%97%E0%B8%AA%E0%B8%A1%E0%B9%80%E0%B8%94%E0%B9%87%E0%B8%88%E0%B8%9E%E0%B8%A3%E0%B8%B0%E0%B8%A7%E0%B8%8A%E0%B8%B4%E0%B8%A3%E0%B9%80%E0%B8%81%E0%B8%A5%E0%B9%89%E0%B8%B2%E0%B9%80%E0%B8%88%E0%B9%89%E0%B8%B2%E0%B8%AD%E0%B8%A2%E0%B8%B9%E0%B9%88%E0%B8%AB%E0%B8%B1%E0%B8%A7

Some logs relared to MCS crashing in media endpoint is reported in RESTBase https://logstash.wikimedia.org/goto/6cd37aafeb65693b2ed662b1aae0fb24

A much bigger problem is that these logs are not visible on the mobileapps kibana dashboard.

Event Timeline

Change 511698 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/mobileapps@master] Media: Filter out items missing imageinfo when building results

https://gerrit.wikimedia.org/r/511698

Change 511715 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/mobileapps@master] Media: Improve codec parsing from type attributes

https://gerrit.wikimedia.org/r/511715

Should crashes on appservice.wmflabs.org even be reported in Logstash?

FWIW the error response is Cannot read property '0' of undefined but @Mholloway already figured out a solution.

Change 511736 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/mobileapps@master] Filter out false positives when querying page elements for media items

https://gerrit.wikimedia.org/r/511736

I've tagged a few patches with this task, fixing the issue identified in the description and a couple of others that are showing up on the restbase dashboard.

Should crashes on appservice.wmflabs.org even be reported in Logstash?

Probably not, but I think the issue is that most errors in production aren't showing up on the mobileapps dashboard, either.

https://logstash.wikimedia.org/app/kibana#/dashboard/1b5adc90-016e-11e8-bc95-517a9b9d585c?_g=(refreshInterval%3A(display%3AOff%2Cpause%3A!f%2Cvalue%3A0)%2Ctime%3A(from%3Anow-24h%2Cmode%3Aquick%2Cto%3Anow))

With the exception of a rare, occasional "no query pages in response" error, the only thing being shown in this dashboard is "cannot find heading for section" warnings. But as we can see from the restbase dashboard, the /page/media endpoint has multiple bugs causing production errors.

Mholloway renamed this task from MCS crasher on media endpoint to PCS errors on media endpoint.May 21 2019, 5:33 PM

A much bigger problem is that these logs are not visible on the mobileapps kibana dashboard.

Created a new ticket for this: T224052: Improve mobileapps kibana dashboard

Change 511962 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/mobileapps@master] Media: Bump patch version to 1.4.3

https://gerrit.wikimedia.org/r/511962

Change 511736 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Filter out false positives when querying page elements for media items

https://gerrit.wikimedia.org/r/511736

Change 511715 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Media: Improve codec parsing from type attributes

https://gerrit.wikimedia.org/r/511715

Change 511698 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Media: Filter out items missing imageinfo when building results

https://gerrit.wikimedia.org/r/511698

Change 511962 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Media: Bump patch version to 1.4.3

https://gerrit.wikimedia.org/r/511962

Just noting here that this is still to be deployed, since the deployment attempts were rolled back on Wednesday.

Looks like the /page/media bugs are all squashed.