Page MenuHomePhabricator

Expose the PDF rendering service via RESTBase
Closed, ResolvedPublic

Description

In order to be able to use the PDF rendering service from T143129, we need to create an endpoint, preferably /{domain}/v1/page/pdf/{title} that places a call to the service and returns the result to the caller (proxy-only mode for the time being).

Related Objects

StatusSubtypeAssignedTask
Resolved Jhernandez
Resolved atgo
DeclinedNone
ResolvedNone
DeclinedNone
Resolved JKatzWMF
ResolvedNone
ResolvedWMDE-Fisch
ResolvedAddshore
InvalidNone
InvalidNone
Resolved GWicke
Resolved Lea_WMDE
ResolvedAddshore
ResolvedAddshore
ResolvedTobi_WMDE_SW
ResolvedTobi_WMDE_SW
Resolvedgabriel-wmde
ResolvedAddshore
ResolvedTobi_WMDE_SW
ResolvedTobi_WMDE_SW
ResolvedTobi_WMDE_SW
DeclinedNone
ResolvedTobi_WMDE_SW
ResolvedAddshore
ResolvedAddshore
ResolvedAddshore
ResolvedAddshore
ResolvedAddshore
Resolved Pchelolo
ResolvedAddshore

Event Timeline

@Addshore, this depends on the electron service being deployed, which in turn depends on ops. We are shooting for all this to be resolved before November 30th.

@Addshore, this depends on the electron service being deployed, which in turn depends on ops. We are shooting for all this to be resolved before November 30th.

Great! :)

Please note that exposing the service via restbase doesn't mean it's a good idea to call it via restbase from a MediaWiki extension; actually, there are very, very good reasons why that is a very bad idea in the general case.

The service is now exposed via restbase so that the public rest api can access this service, but I don't think we should access it via that from mediawiki.

Also, the restbase configuration at the moment does cache the content (a PDF) on varnish, and the text cluster, for 5 minutes.

That should be discussed as well with the appropriate people (Traffic )

Please note that exposing the service via restbase doesn't mean it's a good idea to call it via restbase from a MediaWiki extension; actually, there are very, very good reasons why that is a very bad idea in the general case.

The service is now exposed via restbase so that the public rest api can access this service, but I don't think we should access it via that from mediawiki.

This is being discussed in T150185 and it is mostly ok, but it means that all requests to the PDF service will go through the publicly exposed restbase urls, thus via varnish, so the other point I made

Also, the restbase configuration at the moment does cache the content (a PDF) on varnish, and the text cluster, for 5 minutes.

That should be discussed as well with the appropriate people (Traffic )

is even more critical, as the expected via-varnish traffic is of course all the traffic generated from the service.

For comparison, I just confirmed that when using OCG, MediaWiki issues Cache-control: no-cache; that's because OCG is caching content on disk.

@Joe: The traffic we are talking about here is very low. OCG currently sees about 2 req/s.

The PR is now merged, and I also checked with @BBlack about object sizes & Varnish cache times. With expected volume & sizes (< 100mb) he does not see issues, but recommended to look into indicating the size (via content-length or some other header) if we end up serving PDFs of 100mb or larger. In that case, disabling caching for large responses would also be worth considering.

The electron render time limit is set to 60s in production. Based on experiments in labs, it is likely that this limits the returned PDF sizes to significantly less than 100mb in practice.

The REST API end point is tentatively scheduled for deployment tomorrow.

Can someone be so kind to document on mediawiki.org how to configure this ? Many people there are interested in running electron on their own, but It's totally confusing. People think they just have to install the ElectronPdfService extension, but are not realising they also need their own restbase and electron service, and there are no configuration steps specific to this service linked from the extension page.

I added some hints, and linked to the upstream service repository. Functionally, the electron render service is all that is needed to render arbitrary web pages to PDFs. The extension, RESTBase, and Varnish caching are all just nice-to-haves.