Page MenuHomePhabricator

Return 415 Media Type not Supported errors for pdfs and other types of unsupported formats
Open, Needs TriagePublic

Description

  • Use zotero to reject content-type
  • Detect content-type in citoid first, not just zotero
  • Return specific content-type in error, i.e. 'application/pdf'

Event Timeline

I deployed a quick change today that reports a 415 if Zotero reports an unsupported media type (T365583) It also supposedly prevents us from re-scraping the page, avoiding downloading the pdf twice once Zotero fails.

On the plus side, looking at those saturation panels, you can see it immediately reduced our network transmission, as well as CPU and memory usage (CPU was most dramatic) which is nice: https://grafana.wikimedia.org/d/NJkCVermz/citoid?orgId=1&refresh=5m&from=1716376689141&to=1716380289141

Weirdly, the total request volume jumped seemingly in direct response: https://grafana.wikimedia.org/d/NJkCVermz/citoid?orgId=1&refresh=5m&from=1716376689141&to=1716380289141&viewPanel=13

We have much fewer 500s (good!) but I was more expecting 404s to drop and 415 to rise in equal measure. But instead, 404s rose. Because previously a 415 should have been a 404. And I expected 200s to remain roughly stable, but those have also jumped. I can't really explain this.

My two theories are: metrics are broken and now we're double counting some incoming requests, or:

Third party suddenly getting 415s doesn't know how to handle them and makes the request again? But that wouldn't explain why we're getting more 200s?

Change #1034917 had a related patch set uploaded (by Mvolz; author: Mvolz):

[mediawiki/services/citoid@master] Add documentation of 415 error to spec

https://gerrit.wikimedia.org/r/1034917

Change #1034917 merged by jenkins-bot:

[mediawiki/services/citoid@master] Add documentation of 415 error to spec

https://gerrit.wikimedia.org/r/1034917

I'm not sure how long it will take to fix the hyper switch issue. We might need to switch from restbase to the api gateway first.

At this point we could definitely return a generic unsupported format error to the user in response to 415s though.

In T365583#9946857, @Mvolz wrote...

@Mvolz, three questions in response to this update...

1) What work would be involved to, as you described, "...detect content-type in citoid first."?

2) How – if at all – does allowing, "...restbase/hyperswitch to correctly pass through the error itself." impact our ability to detect the content-type that caused the error to be activated and subsequently, offer people feedback specific to the content-type they're trying to cite?

3) To be doubly sure I'm understanding the state of this work, can you please share what – if anything – about the Current state section below is missing/inaccurate?

Current state

A. We are now logging when Citoid returns a 415 error in response to people attempting to generate a citation for a media format/type Citoid does not currently support.

B. For each 415 error, we are NOT yet able to detect the content-type that caused the error to be activated

  • See question "1)" above for what work needs to be done to offer this support.

C. Because of "B.", we are NOT yet able to offer people feedback that is specific to the content-type they are trying to use Citoid to cite (T364594).

  • Note: as Marielle described in T365583#9946857, we could offer people, "...a generic unsupported format error..." in the time between now and when "B." is addressed. I've filed T369547 for this interim work.
In T365583#9946857, @Mvolz wrote...

@Mvolz, three questions in response to this update...

1) What work would be involved to, as you described, "...detect content-type in citoid first."?

A patch in the backend, but I would do this after 2) is resolved.

2) How – if at all – does allowing, "...restbase/hyperswitch to correctly pass through the error itself." impact our ability to detect the content-type that caused the error to be activated and subsequently, offer people feedback specific to the content-type they're trying to cite?

Blocked by T361576. Basically, restbase handling of 415s is broken, but we need to switch to Gateway from restbase anyway so shouldn't invest time in fixing this but rather on switching to gateway (which needs to be done imminently).

3) To be doubly sure I'm understanding the state of this work, can you please share what – if anything – about the Current state section below is missing/inaccurate?

Current state

A. We are now logging when Citoid returns a 415 error in response to people attempting to generate a citation for a media format/type Citoid does not currently support.

B. For each 415 error, we are NOT yet able to detect the content-type that caused the error to be activated

  • See question "1)" above for what work needs to be done to offer this support.

C. Because of "B.", we are NOT yet able to offer people feedback that is specific to the content-type they are trying to use Citoid to cite (T364594).

  • Note: as Marielle described in T365583#9946857, we could offer people, "...a generic unsupported format error..." in the time between now and when "B." is addressed. I've filed T369547 for this interim work.

Change #1056571 had a related patch set uploaded (by Mvolz; author: Mvolz):

[mediawiki/services/citoid@master] Check content-type

https://gerrit.wikimedia.org/r/1056571

Change #1056571 merged by jenkins-bot:

[mediawiki/services/citoid@master] Check content-type

https://gerrit.wikimedia.org/r/1056571

Next step(s)

  • @dchan to follow-up with Marielle to learn what the current state of this work is.

Next step(s)

  • @dchan to follow-up with Marielle to learn what the current state of this work is.

This was blocked by T361576 but per T361576#10632561 is now unblocked.

Change #1134186 had a related patch set uploaded (by Mvolz; author: Mvolz):

[mediawiki/services/citoid@master] [WIP] Return detailed information about content type

https://gerrit.wikimedia.org/r/1134186

Change #1134186 merged by Mvolz:

[mediawiki/services/citoid@master] Return detailed information about content type

https://gerrit.wikimedia.org/r/1134186

Mvolz renamed this task from Return 415 Media Type not Supported errors for pdfs and other types of unsupported formats in the citoid back end. to Return 415 Media Type not Supported errors for pdfs and other types of unsupported formats.Jun 5 2025, 9:37 AM
Mvolz updated the task description. (Show Details)

Change #1156311 had a related patch set uploaded (by Mvolz; author: Mvolz):

[mediawiki/extensions/Citoid@master] [WIP] Add more specific error messages.

https://gerrit.wikimedia.org/r/1156311

How specific do we want to be in the front end?

We could do image, video, or pdf, and then return a generic message for "other" - or we could actually just report back the MIME type directly. We could also warn in the console for the exact MIME type and then have a more user friendly message on the inspector.

Moved to design review because I think this might come under design for feedback? How detailed should these messages be?

Change #1156311 merged by jenkins-bot:

[mediawiki/extensions/Citoid@master] Add more specific error message for pdf and generic message for other types of unsupported media types.

https://gerrit.wikimedia.org/r/1156311

How specific do we want to be in the front end?

We could do image, video, or pdf, and then return a generic message for "other" - or we could actually just report back the MIME type directly. We could also warn in the console for the exact MIME type and then have a more user friendly message on the inspector.

@Mvolz: could you please share a screenshot of the experience that you think (and I agree!) could benefit from design review?