Page MenuHomePhabricator

Proton cannot assume the requests are for {lang}.wikipedia.org
Closed, ResolvedPublic

Description

The Proton service assumes that the requests it receives are for wikipedia pages only. However, its aim is to replace the Electron rendering service, which is enabled for all projects. Therefore, Proton cannot assume that the requests for PDFs will be limited to Wikipedia.

Related Objects

StatusSubtypeAssignedTask
Resolvedovasileva
ResolvedNone
ResolvedBawolff
Resolvedphuedx
Resolved mobrovac
Resolved mobrovac
Resolvedphuedx
ResolvedJdrewniak
Resolvedphuedx
Resolvedphuedx
Resolvedphuedx
Resolvedphuedx
DeclinedNone
Resolved bmansurov
Resolved mobrovac
Resolvedovasileva
InvalidNone
ResolvedJdlrobson
Resolvedphuedx
Resolvedphuedx
Resolved holger.knust
ResolvedTgr
Resolvedjijiki
ResolvedMSantos
Resolved mobrovac
Resolvedovasileva
Resolvedphuedx
Declinedpmiazga
ResolvedDzahn
Resolvedpmiazga
Duplicate holger.knust
ResolvedMSantos
ResolvedTgr
ResolvedJohan
OpenNone
Resolvedovasileva
InvalidNone
Resolved mobrovac
Resolved mobrovac

Event Timeline

mobrovac created this task.

Maybe instead of doing a restbase check (calling /page/title/{TITLE}, first we can do a HEAD call to the requested URL and check the http response. if it's 200 proceed with the queue, otherwise reject the job immediately? With that approach, we will be able to handle all possible projects.

The RESTBase call is not wikipedia-specific, it supports all the wikis where RESTBase is enabled. The problem is in this fragment, which assumes a specific format for the domain.

That part is done to handle the mobile domains (adding .m part). There is no nice way to retrieve the mobile domain for a given wiki. We already did some research on this case, please check:

There is also some small conversation in gerrit patches

That part is done to handle the mobile domains (adding .m part). There is no nice way to retrieve the mobile domain for a given wiki.

have it defined in config?

How to build the mobile domain in the Proton service -> yes, the template is stored in the config: https://github.com/wikimedia/mediawiki-services-chromium-render/blob/master/config.dev.yaml#L66

mobrovac edited projects, added Services (doing); removed Services (blocked).

We actually also need to do this because of security implications: we want to restrict the service to only be allowed to talk to our production MW servers directly.

That part is done to handle the mobile domains (adding .m part). There is no nice way to retrieve the mobile domain for a given wiki. We already did some research on this case, please check:

Actually, there is. As far as our production servers are considered, .m. domains do not exist, only their desktop counterparts. In other words, both en.wp.org/wiki/Title and en.m.wp.org/wiki/Title get interpreted in the same way (as being for the en.wp.org domain). I will try to add the logic to Proton and change the MW request template to reflect that.

Change 443444 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[operations/puppet@production] service::node: Expose the MW appservers' host to modules

https://gerrit.wikimedia.org/r/443444

Change 443465 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/chromium-render@master] Tighten the MW request

https://gerrit.wikimedia.org/r/443465

Change 443468 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/chromium-render/deploy@master] Config: Improve the MW request template

https://gerrit.wikimedia.org/r/443468

The three patches above should get us what we want, as they allow us to restrict the hosts we issue requests to in production (these being only the MW app servers), all the while allowing us to fetch both desktop and mobile views of pages.

Change 443444 merged by Giuseppe Lavagetto:
[operations/puppet@production] service::node: Expose the MW appservers' host to modules

https://gerrit.wikimedia.org/r/443444

Change 443465 merged by Mobrovac:
[mediawiki/services/chromium-render@master] Tighten the MW request

https://gerrit.wikimedia.org/r/443465

Change 443468 merged by Mobrovac:
[mediawiki/services/chromium-render/deploy@master] Config: Improve the MW request template

https://gerrit.wikimedia.org/r/443468

Mentioned in SAL (#wikimedia-operations) [2018-07-26T19:58:28Z] <mobrovac@deploy1001> Started deploy [proton/deploy@883cacd]: Use a more secure MW API template - T198461

Mentioned in SAL (#wikimedia-operations) [2018-07-26T19:59:02Z] <mobrovac@deploy1001> Finished deploy [proton/deploy@883cacd]: Use a more secure MW API template - T198461 (duration: 00m 33s)