Page MenuHomePhabricator

Allow rendering PDFs for mobile
Closed, ResolvedPublic5 Estimated Story Points

Description

NOTE: this will not be deployed with the first version of the new PDF renderer

Currently the chromium-render-service renders PDFs in A4 and Letter sizes without mobile print styles. In order to test the service against mobile styles we need to make the following to changes:

  1. Allow the service to render PDFs in Legal size (per @Nirzar at T181513#3793601);
  2. Add an option in the service to load mobile print styles before rendering.

Design requirement

two-styles.png (268×491 px, 19 KB)

We have two intents for producing PDFs and we have two platforms for doing so. That can be put into a 2x2 for number of styles we need.

out of the 4 required styles,

  • We can safely dismiss the use case to create hard copies from small screens
  • We can use the same styles for soft and hard copy on large screens, as the visual scale of large screen matches printing on paper. i.e. font size equivalent to 10pt, and 10-15 words per line
  • for soft copy on small screens, we need larger visual scale. i.e. font size equivalent of 16pt and around ~5 words per line. this will mimic the regular article view

Developer notes

The below are suggestions. Maybe you'll find a better way of accomplishing the task while working on it.

Acceptance criteria

  • Nirzar to explain the different types of PDF

@phuedx: Done in T181680#3890789.

  • Before coding anything having a meeting to discuss and agree what the change will look like. Meeting creator should be driving the change and have a strawman proposal of what to do. Should discuss how clients will determine what PDF mode to use.
  • Should be possible to generate 2 types of PDFs via 2 different URLs
  • Ensure that the headless Chromium UA is sending a well-known, fixed header (see T181680#3946988)

Adding a new page size:

  1. Allow the PDF end point to accept legal here.

Applying mobile print styles

  1. puppeteer allows adding style tags to the page that's being rendered. Use it to add the mobile print styles to the the page object before rendering the PDF;
  2. The mobile print styles is part of the "skins.minerva.base.styles" module and can be seen here.
  3. Since we don't want to hard-code URLs, we'll have to create a config variable that points to the module URL. Something similar to this.

QA Steps

Please verify that both desktop and mobile rendered PDFs match our print styles.
To test that:

  • go to the article and print it to the PDF using system print (CTRL+P, or Print from browser menu), save the PDF
  • build an URL to retrieve the PDF from chromium-renderer [explained below] and save the PDF generated by our service
  • compare that PDF generated by chromium-renderer looks the same as the PDF generated by the browser using print method. PDF don't have to be identical (different browsers can handle styles differently) but the WMF look&feel has to be present. If unsure please refer to the PDF generated by browser print method.

Please verify that for both desktop and mobile articles. Pay attention to the images and graphics in the PDF generated by the chromium-renderer service, things like images, math/chemistry formulas has to be present.
Please verify that artices from non english wikipedia (preferably non-latin charsets) and RTL languages are generated correctly on mobile PDFs.

How to generate URL to the chromium service

URL schema is http://chromium-pdf.wmflabs.org/{wiki}/v1/pdf/{article}/{size}/{format}
Where:

  • {wiki} -> wiki domain, ex 'en.wikipedia.pl`
  • {article} -> Title of article we want to generate pdf for, ex: Poland
  • {size} -> PDF size, accepted sizes are letter, a4, legal
  • {format} -> optional paramter, PDF format, available formats are desktop and mobile. If {format}is not specified system will generate desktop version of PDF
Examples:
Poland article from english wikipedia, desktop version, a4 size

http://chromium-pdf.wmflabs.org/en.wikipedia.org/v1/pdf/Poland/a4
or
http://chromium-pdf.wmflabs.org/en.wikipedia.org/v1/pdf/Poland/a4/desktop

Both URLs are valid as {format} parameter is optional.

Planet article from english wikipedia, mobile version, legal size

http://chromium-pdf.wmflabs.org/en.wikipedia.org/v1/pdf/Planet/legal/mobile

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 405062 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/services/chromium-render@master] WIP: Allow injecting extra styles

https://gerrit.wikimedia.org/r/405062

There is a problem with existing Minerva printstyles. We use the restbase /page/html/{$TITLE} endpoint which outputs bit different HTML than the MobileFormatter. Because of that we cannot use the print styles we have in Minerva as they do not match.

There are two possibilities:

  • create new print styles that match the restbase output
  • for mobile pages use the MobileFormatter (MediaWIki API??) as a content provider

We should review and merge https://gerrit.wikimedia.org/r/405062 as this provides us a possibility to inject additional styles into the PDF. Then in a follow-up patch we should find a way how to or match styles to the content, or get a content that matches styles we have.

@pmiazga (+ Readers Web): I think it'd be prudent to evaluate whether injecting styles is a reasonable solution – here I'm thinking about what I said in T181680#3850935.

I left some comments on https://gerrit.wikimedia.org/r/#/c/405062/3
Given the trouble, I'm wondering if we should instead try to print the non-Parsoid HTML?
Am also seeing the following problem in the existing endpoint (squashed images):

Screen Shot 2018-01-30 at 10.31.36 AM.png (339×774 px, 166 KB)

As we discussed in standup:

We have to be careful about not accidentally recording a pageview when the headless Chromium UA navigates to the site that we're "printing". Since we're not entirely sure that we want those navigations recorded as pageviews, the most flexible change that we can make is to tell Chromium to send a custom User-Agent header and inform Analytics Engineering so that they can classify it as a bot or not.

A possible value for the header is Baha's Proton Renderer v${VERSION} ${CHROMIUM_UA}.

Change 405062 abandoned by Pmiazga:
Allow injecting extra styles

Reason:
Instead of injecting styles we will render different site

https://gerrit.wikimedia.org/r/405062

Change 408573 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/services/chromium-render@master] Allow rendering mobile PDFs by rendering the wiki page instead of Parsoid output

https://gerrit.wikimedia.org/r/408573

Ready for review, @Jdlrobson @mobrovac could you take a look? Please also check different patchsets (1, 2 and 4) are totally different approaches. I think that the last approach is the best (least code) but there might be some reasons we want to go with different version

I pushed the https://gerrit.wikimedia.org/r/#/c/408573/5 which brings back images on mobile PDF's (disable lazyloading by disabling Javascript in chromium)

@pmiazga encountered an issue here with regards to lazy loading images. Similarly to the issue we had with the download button they are not showing in the downloaded PDF.

We talked about several options for fixing this:

  1. Disable JavaScript in the headless Chromium
  2. Change format of the pdf service so that it can be passed the current page HTML for printing (but we discarded this idea as people would likely abuse this service)
  3. Add a HTTP header to request that disables lazy loading images in the MobileFormatter. (but this will require varnish special casing so that we don't accidentally cache this HTML for other people)
  4. Add a query string parameter that disables lazy loading images in the MobileFormatter (but this will require varnish special casing and could easily be abused)

it turns out #1 is possible and given that was the lowest risk we tried this out. We'll want to check this in QA that nothing unexpected is showing/not showing (for example do Math equations show).

I pushed the https://gerrit.wikimedia.org/r/#/c/408573/5 which brings back images on mobile PDF's (disable lazyloading by disabling Javascript in chromium)

Nice work, @pmiazga and @Jdlrobson!

pmiazga removed pmiazga as the assignee of this task.
Pchelolo edited projects, added Services (watching); removed Services.

Per gerrit comment @Pchelolo at Feb 7 10:36 PM

So how do we intend to expose this to the public in RESTBase? Since RESTBase is not exposed on mobile domains (technically it is exposed under https://en.m.wikipedia.org/api/rest_v1/ but due to vanish handling of mobile domains that is actually the same RESTBase as for en.wikipedia.org domain ), we would need to port that mobile domain handling code to RESTBase now? Or would we expose under some different path like /v1/page/mobile-pdf/{title}?

As I wrote in the comment (https://phabricator.wikimedia.org/T181680#3950130) I wasn't sure which approach is the best, and also I didn't know is it possible to host RESTBase on mobile domains.
I wanted to avoid is magical mobile_domain creation here, just depend on the caller to pass us correct domain (mobile or desktop).

Could you check the approach created in https://gerrit.wikimedia.org/r/#/c/408573/2 and https://gerrit.wikimedia.org/r/#/c/408573/1? Here, instead of depending on domain I expect an additional parameter type where the user can pass desktop or mobile (or anything else in the future),
and depends on type I build correct domain. Using that logic this service will be listening only on desktop domains.

is it possible to host RESTBase on mobile domains.

Since mobile domain handling is done in Varnish, RESTBase is not even aware of the existence of the mobile domains, so RB on mobile and desktop domain is abold textbsolutely the same.

Here, instead of depending on the domain I expect an additional parameter type where the user can pass desktop or mobile (or anything else in the future), and depends on type I build correct domain. Using that logic this service will be listening only on desktop domains.

I think the question here is how do we want to expose the endpoint to the public. we have several options here:

  • We can expose a query/path/whatever parameter out the same way as your approach with type=mobile|desktop and rely on the client-side code to set it appropriately and transform the domain for the request in RESTBase
  • We can expose a query/path/whatever parameter out the same way as your approach with type=mobile|desktop and rely on the client-side code to set it appropriately and pass the parameter to the service and transform the domain in the service
  • We can make some Varnish magic to add a specific header when the request domain was for mobile and handle the header in RESTBase and transform the domain
  • We can make some Varnish magic to add a specific header when the request domain was for mobile and pass the header to the service.

The latter 2 are no-go cause that's too complicated and there's no need for that for any endpoint other than PDF. Out of the first 2 I'd prefer the service to make the domain transformation cause if you ever make a choice to switch to Parsoid HTML or whatever you'd have the sole responsibility to make that change without coordination with RESTBase code.

To sum up I'd prefer your second approach when the service takes in the type parameter.

Ok, I'll handle the type parameter in the service and for type=mobile I'll modify domain to add .m part to it.

Change 408573 merged by Jdlrobson:
[mediawiki/services/chromium-render@master] Allow rendering mobile PDFs by using wiki page instead of Parsoid output

https://gerrit.wikimedia.org/r/408573

Note my follow up suggestion, but let's get this in front of Anthony for testing. @pmiazga can you provide some testing instructions and update http://chromium-pdf.wmflabs.org/ ?

Chrome
DownloadBrowser - LetterBrowser-LegalBrowser-A4Chromium Renderer - LetterChromium Renderer - LegalChromium Renderer - A4
Screen Shot 2018-02-15 at 1.52.56 PM.png (1×2 px, 1 MB)
Screen Shot 2018-02-15 at 1.52.59 PM.png (1×2 px, 1 MB)
Screen Shot 2018-02-15 at 1.53.01 PM.png (1×2 px, 1 MB)
Screen Shot 2018-02-15 at 1.53.03 PM.png (1×2 px, 993 KB)
Screen Shot 2018-02-15 at 1.53.05 PM.png (1×2 px, 1015 KB)
Screen Shot 2018-02-15 at 1.53.09 PM.png (1×2 px, 936 KB)
Screen Shot 2018-02-15 at 1.53.11 PM.png (1×2 px, 930 KB)

Noticed some difference in the infobox colors and spacing near the top.

Working on mobile now.

@ABorbaWMF - did you get a change to check these out a mobile device? @pmiazga - what should we expect in terms of styles? I checked the planet article on my phone and it seems to be giving me desktop styles (font is too small for mobile)

@ovasileva this task adds "print the mobile page" feature to the Chromium-Render service. I did not implement print styles, chromium-render sends requests to the production wikipedias and uses styles available there. If something is incorrect we should fix the print styles.

@ABorbaWMF - we do not print table background colors (infoxes are rendered as tables), it's known limitation - https://www.mediawiki.org/wiki/Reading/Web/Projects/Print_Styles, see the Limitations section.

@ovasileva I am still working thru the mobile stuff. I do see some differences between chromium renderer and download. Here are some shots of the Planet article on Pixel. Ignore the image quality. It's just the test tool.

Download LetterDownload LegalDownload A4
Download - Letter.jpg (559×314 px, 33 KB)
Download - Legal.jpg (519×292 px, 27 KB)
Download - A4.jpg (519×292 px, 29 KB)
Chromium LetterChromium LegalChromium A4
Chromium - Letter.jpg (519×292 px, 17 KB)
Chromium - Legal.jpg (519×292 px, 36 KB)
Chromium - A4.jpg (519×292 px, 17 KB)

@pmiazga - I'm not sure I'm following 100% We have two sets of print styles - mobile and desktop. Where do our current mobile print styles live?

There is a fix I have to apply, the Mobile styles are not applied on chromium (they're present but not applied) due to device width size.

Change 413238 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/services/chromium-render@master] Define window height&width when opening new page

https://gerrit.wikimedia.org/r/413238

Change 413238 merged by jenkins-bot:
[mediawiki/services/chromium-render@master] Define window height&width when opening new page

https://gerrit.wikimedia.org/r/413238

I updated the http://chromium-pdf.wmflabs.org server, ready to test. @ABorbaWMF could re-check the mobile PDFs?

Looks almost identical to me:

Download LetterDownload LegalDownload A4
Screen Shot 2018-02-26 at 12.33.29 PM.png (1×2 px, 1 MB)
Screen Shot 2018-02-26 at 12.34.03 PM.png (1×2 px, 1 MB)
Screen Shot 2018-02-26 at 12.34.29 PM.png (1×2 px, 1 MB)
Chromium LetterChromium LegalChromium A4
Screen Shot 2018-02-26 at 12.35.20 PM.png (1×2 px, 984 KB)
Screen Shot 2018-02-26 at 12.35.41 PM.png (1×2 px, 986 KB)
Screen Shot 2018-02-26 at 12.36.00 PM.png (1×2 px, 1020 KB)