Page MenuHomePhabricator

[Bug] Requesting a nonexistent page generates a PDF instead of returning an HTTP response
Closed, ResolvedPublic3 Estimated Story Points

Description

Requesting a render for a nonexisting article, such as http://localhost:3030/en.wikipedia.org/v1/pdf/Book13242/letter, generates a PDF with the JSON error codes.

Actual

The content is clipped but contains:

{"type":"https://mediawiki.org/wiki/HyperSwitch/errors/not_found","title":"Not found.","method":"get","detail":"Page or revision not found

Expected

HTTP status 404 is returned with an optional text/html payload.

Developer Notes

The missing part is response validation before calling page.pdf(). Previous promise (page.goto(url, ...)) returns a Response object.
Before calling page.pdf(pdfOptions) we have to verify that response is successful. We can achieve that by checking response.ok() or response.status() === 200.

With that approach, we will be able to handle all errors - including the 5xx returned by MediaWiki.

There is one pitfall: you have to handle 404 errors differently than 400|5xx, this has to be implemented in queue.js

QA Steps

Please try to generate a PDF for non-existing article, for example: http://chromium-pdf.wmflabs.org/en.wikipedia.org/v1/pdf/ArticleThatDoesntExist/letter

Service should return application/json with HTTP_404 error instead of PDF containing Article not found text

Event Timeline

Jdlrobson renamed this task from Requesting a nonexistent page generates a PDF instead of returning an HTTP response to [Bug] Requesting a nonexistent page generates a PDF instead of returning an HTTP response.Jan 31 2018, 5:29 PM
Jdlrobson set the point value for this task to 3.Jan 31 2018, 5:35 PM
Jdlrobson subscribed.

We do not ask for URLs, Chromium does, so we'll need a way to detect a 404 before rendering and change behaviour relating to it.

Niedzielski removed the point value for this task.
Jdlrobson set the point value for this task to 3.

@pmiazga also mentioned we'll want to handle 500 and other non-success errors. He'll update this task.

Can we request the HTML to local file and then ask Chromium to visit a file:// URL instead of an https:// URL?

Change 408819 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/services/chromium-render@master] Improve client-side error handling

https://gerrit.wikimedia.org/r/408819

pmiazga moved this task from Doing to Needs Code Review on the Readers-Web-Kanbanana-Board-Old board.

@pmiazga do we have http://proton.wmflabs.org/ or similar for QAing this? Can you add some instructions?

@Jdlrobson we can test it locally, download the repo, do npm install, then npm run start and then verify that localhost:3030/en.wikipedia.org/v1/pdf/ArticleThatDoesntExist/letter returns an HTTP_404 error with application_json.
You can also verify that on Proton staging server: http://chromium-pdf.wmflabs.org/en.wikipedia.org/v1/pdf/ArticleThatDoesntExist/letter

Change 408819 merged by Pmiazga:
[mediawiki/services/chromium-render@master] Improve client-side error handling

https://gerrit.wikimedia.org/r/408819

This appears correct: {"name":"HTTPError","message":"404","status":404,"details":"Article 'Book13242' not found"}. As discussed in code review, it is known that the message should probably replace the redundant details and bad parameters incorrectly produce a 404 response as well. e.g., http://localhost:3030/en.wikipedia.org/v1/pdf/Book/letterq1341e. Moving to sign off and resolving directly.