Page MenuHomePhabricator

Proton should reject erroneous requests straightaway
Closed, ResolvedPublic

Description

Currently, all requests are queued as soon as they are received and then evaluated once they enter the execution phase. However, requests that are erroneous (like bad/missing titles) should be outright rejected and not even enter the queue (in this way freeing up the queuing slots for requests that can be successfully completed). This can be achieved either by asking the MW API information about the page via the mwApiGet() utility function or by asking RESTBase via the restApiGet() function from the same module (faster).

Going a bit further, the info returned can be used to correct a possible inconsistency in the rendering mechanism. Namely, if there is a new revision of an article between the time the user requests a PDF and the actual call to puppeteer to render it, the user will receive the new version, not the one they asked the PDF for. By requesting info about the page as soon as the request is received, Proton can retrieve the HTML for the specific revision the user is asking the PDF for.

Related Objects

StatusSubtypeAssignedTask
Resolvedovasileva
ResolvedNone
ResolvedBawolff
Resolvedphuedx
Resolved mobrovac
Resolved mobrovac
Resolvedphuedx
ResolvedJdrewniak
Resolvedphuedx
Resolvedphuedx
Resolvedphuedx
Resolvedphuedx
DeclinedNone
Resolved bmansurov
Resolved mobrovac
Resolvedovasileva
InvalidNone
ResolvedJdlrobson
Resolvedphuedx
Resolvedphuedx
Resolved holger.knust
ResolvedTgr
Resolvedjijiki
ResolvedMSantos
Resolved mobrovac
Resolvedovasileva
Resolvedphuedx
Declinedpmiazga
ResolvedDzahn
Resolvedpmiazga
Duplicate holger.knust
ResolvedMSantos
ResolvedTgr
ResolvedJohan
OpenNone
Resolvedovasileva
InvalidNone
Resolved mobrovac
Resolvedpmiazga

Event Timeline

mobrovac created this task.

Currently, all requests are queued as soon as they are received and then evaluated once they enter the execution phase. However, requests that are erroneous (like bad/missing titles) should be outright rejected and not even enter the queue (in this way freeing up the queuing slots for requests that can be successfully completed). This can be achieved either by asking the MW API information about the page via the mwApiGet() utility function or by asking RESTBase via the restApiGet() function from the same module (faster).

Handling 404 pages was done in https://phabricator.wikimedia.org/T186127. I agree, that probably the better approach is to verify the article existence before it's queue. I'll prepare a patch for that.

Going a bit further, the info returned can be used to correct a possible inconsistency in the rendering mechanism. Namely, if there is a new revision of an article between the time the user requests a PDF and the actual call to puppeteer to render it, the user will receive the new version, not the one they asked the PDF for. By requesting info about the page as soon as the request is received, Proton can retrieve the HTML for the specific revision the user is asking the PDF for.

We are aware of this situation, but AFAIK we planned that as a something to work on in the future. We can introduce the second endpoint /pdf/revision/{ID}/{SIZE}/{TYPE} and render pages using revision ID.

/cc @ovasileva

Handling 404 pages was done in https://phabricator.wikimedia.org/T186127. I agree, that probably the better approach is to verify the article existence before it's queue. I'll prepare a patch for that.

Great, thnx! It would good to get that sorted out before proceeding to the next step of rolling the service out to production.

We are aware of this situation, but AFAIK we planned that as a something to work on in the future. We can introduce the second endpoint /pdf/revision/{ID}/{SIZE}/{TYPE} and render pages using revision ID.

No need for a new end point. If the 404 check is implemented in one of the ways suggested in the task description, then either method would give you the latest revision ID, which you can then just add to the data you pass to the job in the queue. One more idea that has just occurred to me is that somebody might also want the PDF of an older revision of an article, and in that case the revision ID can just be appended as the last, but optional, path parameter. But, this is something we can tackle later on.

Change 442152 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/services/chromium-render@master] Check the article existence before adding it to the queue

https://gerrit.wikimedia.org/r/442152

Change 442152 merged by Mobrovac:
[mediawiki/services/chromium-render@master] Check the article existence before adding it to the queue

https://gerrit.wikimedia.org/r/442152

Change 442352 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/chromium-render/deploy@master] Config: Add the RESTBase request template

https://gerrit.wikimedia.org/r/442352

Change 442352 merged by Mobrovac:
[mediawiki/services/chromium-render/deploy@master] Config: Add the RESTBase request template

https://gerrit.wikimedia.org/r/442352

Stashbot subscribed.

Mentioned in SAL (#wikimedia-operations) [2018-06-27T17:52:33Z] <mobrovac@deploy1001> Started deploy [proton/deploy@cd6ed94]: Update proton to 491e966 - T186748 T197856

Mentioned in SAL (#wikimedia-operations) [2018-06-27T17:53:08Z] <mobrovac@deploy1001> Finished deploy [proton/deploy@cd6ed94]: Update proton to 491e966 - T186748 T197856 (duration: 00m 35s)

Vvjjkkii renamed this task from Proton should reject erroneous requests straightaway to bjaaaaaaaa.Jul 1 2018, 1:02 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed pmiazga as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
mobrovac renamed this task from bjaaaaaaaa to Proton should reject erroneous requests straightaway.Jul 1 2018, 10:43 AM
mobrovac closed this task as Resolved.
mobrovac assigned this task to pmiazga.
mobrovac updated the task description. (Show Details)
mobrovac removed a subscriber: pmiazga.