Page MenuHomePhabricator

Deploy and test new book rendering (Remex + Electron)
Closed, InvalidPublic

Description

Deploy and test the new rendering code added in T171838: Build out article concatenation according to requirements for books.

  • configure $wgVirtualRestConfig['electron'] (https://gerrit.wikimedia.org/r/#/c/377928/)
    • set the secret key in the private repo
  • test functionality on beta
  • find some way to prevent clever users who discover the non-public test URL from using it (https://gerrit.wikimedia.org/r/#/c/377929/)
  • test with some large books (e.g. from T142226#2537844)
    • check MediaWiki memory use
    • check Electron memory use (run top on scb1001-1004)
    • check service latency
  • make sure the test did not cause problems for Electron / restart if needed

Rendering workflow is as follows:

  • read book definition from session
  • fetch HTML for pages from RESTBase via MultiHttpClient; fetch metadata (sections, contributors etc) from MediaWiki API via internal pseduo-request
  • compose all those into a single HTML page with Remex
  • store book data in WAN cache with a short TTL
  • fetch PDF from Electron
    • Electron will send a request to MediaWiki, MW will load the book from cache and render it.
  • output the PDF response with the appropriate headers (the current code does not try to stream the response; that should probably be added)

Restarting Electron:

for i in 1 2 3 4; do echo $i; ssh scb100$i.eqiad.wmnet "sudo service pdfrender restart"; done

There is an Icinga alert when it goes down. (scb1001, scb1002, scb1003, scb1004)

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Tgr updated the task description. (Show Details)
ovasileva triaged this task as High priority.EditedSep 14 2017, 1:07 PM

Pulling this into the web board for visibility. Not sure what the best column would be, so throwing it into blocked.

Mentioned in SAL (#wikimedia-operations) [2017-09-14T18:57:04Z] <tgr@tin> Synchronized private/PrivateSettings.php: set Electron secret for T175868 (duration: 00m 49s)

Mentioned in SAL (#wikimedia-operations) [2017-09-14T18:58:17Z] <tgr@tin> Synchronized wmf-config/PrivateSettings.php: set Electron secret for T175868 (duration: 00m 48s)

Mentioned in SAL (#wikimedia-releng) [2017-09-14T19:37:08Z] <tgr> updated PrivateSettings.php for T175868

Mentioned in SAL (#wikimedia-cloud) [2017-09-14T19:37:08Z] <tgr> updated PrivateSettings.php for T175868

Change 377928 merged by jenkins-bot:
[operations/mediawiki-config@master] Add VirtualRestService config for Electron

https://gerrit.wikimedia.org/r/377928

Mentioned in SAL (#wikimedia-operations) [2017-09-14T23:37:45Z] <catrope@tin> Synchronized wmf-config/: VRS config for Electron (T175868) (duration: 00m 47s)

Now live on beta, and it's not working. The PDF body is sent correctly but the connection errors out at the end (curl says transfer closed with outstanding read data remaining) so the browser refuses to save it. Probably a wrong Content-Length header?

It gets chewed up by Varnish:

$ curl -vso/dev/null https://en.wikipedia.beta.wmflabs.org/wiki/Special:RenderBook/electron/global:7d07971a22569eb4dc8d3c4a67abd8fb
*   Trying 208.80.155.135...
* Connected to en.wikipedia.beta.wmflabs.org (208.80.155.135) port 443 (#0)
* found 173 certificates in /etc/ssl/certs/ca-certificates.crt
* found 697 certificates in /etc/ssl/certs
* ALPN, offering http/1.1
* SSL connection using TLS1.2 / ECDHE_RSA_AES_256_GCM_SHA384
* 	 server certificate verification OK
* 	 server certificate status verification SKIPPED
* 	 common name: aa.m.wikipedia.beta.wmflabs.org (matched)
* 	 server certificate expiration date OK
* 	 server certificate activation date OK
* 	 certificate public key: RSA
* 	 certificate version: #3
* 	 subject: CN=aa.m.wikipedia.beta.wmflabs.org
* 	 start date: Sun, 10 Sep 2017 08:22:00 GMT
* 	 expire date: Sat, 09 Dec 2017 08:22:00 GMT
* 	 issuer: C=US,O=Let's Encrypt,CN=Let's Encrypt Authority X3
* 	 compression: NULL
* ALPN, server accepted to use http/1.1
> GET /wiki/Special:RenderBook/electron/global:7d07971a22569eb4dc8d3c4a67abd8fb HTTP/1.1
> Host: en.wikipedia.beta.wmflabs.org
> User-Agent: curl/7.47.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Date: Fri, 15 Sep 2017 01:36:55 GMT
< Content-Type: application/pdf
< Transfer-Encoding: chunked
< Connection: keep-alive
< Server: deployment-mediawiki04.deployment-prep.eqiad.wmflabs
< X-Powered-By: HHVM/3.18.2
< X-Content-Type-Options: nosniff
< P3P: CP="This is not a P3P policy! See https://en.wikipedia.beta.wmflabs.org/wiki/Special:CentralAutoLogin/P3P for more info."
< Content-Disposition: attachment; filename='book.pdf'
< Vary: Accept-Encoding
< Backend-Timing: D=826112 t=1505439414661737
< X-Varnish: 169808214, 58318700
< Via: 1.1 varnish-v4, 1.1 varnish-v4
< Accept-Ranges: bytes
< Age: 0
< X-Cache: deployment-cache-text04 pass, deployment-cache-text04 pass
< X-Cache-Status: pass
< Set-Cookie: WMF-Last-Access=15-Sep-2017;Path=/;HttpOnly;secure;Expires=Tue, 17 Oct 2017 00:00:00 GMT
< Set-Cookie: WMF-Last-Access-Global=15-Sep-2017;Path=/;Domain=.wikipedia.beta.wmflabs.org;HttpOnly;secure;Expires=Tue, 17 Oct 2017 00:00:00 GMT
< X-Analytics: https=1;nocookies=1
< X-Client-IP: 184.250.32.42
< Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
< Set-Cookie: GeoIP=US:::38.00:-97.00:v4; Path=/; secure; Domain=.beta.wmflabs.org
< 
{ [145 bytes data]
* transfer closed with outstanding read data remaining
* Closing connection 0


$ curl -H 'Host: en.wikipedia.beta.wmflabs.orgo/dev/null http://deployment-mediawiki05.deployment-prep.eqiad.wmflabs/wiki/Special:RenderBook/electron/global:7d07971a22569eb4dc8d3c4a67abd8fb
* Hostname was NOT found in DNS cache
*   Trying 10.68.22.21...
* Connected to deployment-mediawiki05.deployment-prep.eqiad.wmflabs (10.68.22.21) port 80 (#0)
> GET /wiki/Special:RenderBook/electron/global:7d07971a22569eb4dc8d3c4a67abd8fb HTTP/1.1
> User-Agent: curl/7.38.0
> Accept: */*
> Host: en.wikipedia.beta.wmflabs.org
> 
< HTTP/1.1 200 OK
< Date: Fri, 15 Sep 2017 01:38:12 GMT
* Server deployment-mediawiki05.deployment-prep.eqiad.wmflabs is not blacklisted
< Server: deployment-mediawiki05.deployment-prep.eqiad.wmflabs
< X-Powered-By: HHVM/3.18.2
< X-Content-Type-Options: nosniff
< Cache-control: no-cache
< P3P: CP="This is not a P3P policy! See https://en.wikipedia.beta.wmflabs.org/wiki/Special:CentralAutoLogin/P3P for more info."
< Vary: Accept-Encoding
< Content-Disposition: attachment; filename='book.pdf'
< Content-Length: 51490
< Backend-Timing: D=787390 t=1505439492881585
< Content-Type: application/pdf
< 
{ [data not shown]
* Connection #0 to host deployment-mediawiki05.deployment-prep.eqiad.wmflabs left intact

Cannot reproduce locally, even with the varnish role enabled.

Change 378375 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/Collection@master] Remove Content-Length from PDF response

https://gerrit.wikimedia.org/r/378375

Change 378375 merged by jenkins-bot:
[mediawiki/extensions/Collection@master] Remove Content-Length from PDF response

https://gerrit.wikimedia.org/r/378375

Change 378825 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/Collection@wmf/1.30.0-wmf.18] Remove Content-Length from PDF response

https://gerrit.wikimedia.org/r/378825

Works fine on beta after the fix.

Change 378825 abandoned by Gergő Tisza:
Remove Content-Length from PDF response

Reason:
already going out with the train

https://gerrit.wikimedia.org/r/378825

This task is ready, by ready I mean:

  • the HTML concatenation is ready and in T175856 we're using the code provided by Gergo
  • ElectronPDF is deployed and we can test it on beta cluster

Only missing part is to block non-wmf users from accessing the test page because the ElectronPDF testing page might break the electron rendering service. The link is not publicly available, a user would have to build the link by himself (check the code how to build it).

ElectronPDF is a technology we want to abandon as it's causing us lots of pain (see T176627). We shouldn't reject this task as it brought an HTML concatenation support in PHP. I'm calling to resolve this task and remove it from kanban board as we do not work on it anymore.

@ovasileva @phuedx - thoughts?

@pmiazga: SYN ACK

AFAICT the backing code has ridden the train and @Tgr's config changes have also been deployed, which means that this is both available to test on the beta cluster and in production (as you note, only rOMWC235449e34144: Temporarily prevent users from accessing Special:RenderBook/test hasn't been deployed).

I think it's important to understand as much as possible about the Electron renderer service before we go about replacing it. With that in mind, I'd suggest that we still do the testing in this task /cc @ovasileva.

I agree on proceeding with the testing. Would we consider it a part of this task, or should we set a testing task up separately?

@phuedx: ACK

(sorry, I had to. You cannot toss a TCP request and get nothing)

@ovasileva lets test that in the current task. I would wait for T175856 to get merged (it's almost there) as it adds images and license info to the book (sends more requests do MW API and outputs a bit more HTML).
Then I'll test it myself.

Moving to Needs QA and assigning it to myself.

Reassigning it to @Jdlrobson. Before testing ElectronPDF rendering on beta cluster we need to merge all related patches, which didn't happen this week. There are problems with contributors and images as there is a hard limit on API response, first we need to tackle that, then once everything gets merged we can test it on Beta cluster.

When visiting https://en.wikipedia.beta.wmflabs.org/wiki/Special:RenderBook/test for a book that only contains 3 articles, it takes around 10 seconds to generate the page (clearing cache gives an idea of the time)

Screen Shot 2017-10-09 at 4.03.30 PM.png (233×598 px, 37 KB)

Questions

Apologies if these have been answered elsewhere)

  • Have we benchmarked performance with respect to total size of articles/number of articles?
  • Do we need an upper limit on the size of a book?
  • When will(?) PHP concatenatation time out? If so, under what circumstances?

I don't think tests on beta (a Labs cluster, basically) say much. You can test on production as long as you don't test the PDF rendering part (which might bork Electron requiring manual intervention).

The timeout for PHP web requests is 30 seconds IIRC.

And some way to prevent clever users who discover the non-public test URL from using it

Shouldn't we have done this before putting on production? A user permission seems sensible but I'm not sure how wgPermissions would/could be overriden by the REST service. We should work out what to do for that soon.

I don't think tests on beta (a Labs cluster, basically) say much

Sure, but have did we get an idea before putting this on production?
https://en.wikipedia.org/wiki/Book:Solar_Eclipses for instance takes a minute to load in https://en.wikipedia.org/wiki/Special:RenderBook/test and then when I try to print to PDF I get book not in cache which suggests its timing out (see T177993)

I've captured the issues I've found via testing as a subtask of T178095

check MediaWiki memory use
check Electron memory use (run top on scb1001-1004)
check service latency

@Tgr this hasn't happened right? Were you planning to do this or were you expecting us to do this. If the latter I'd like to capture this in a new subtask.

find some way to prevent clever users who discover the non-public test URL from using it (https://gerrit.wikimedia.org/r/#/c/377929/)

Seems important, especially given all the errors in logstash. We should probably disable this in production until we have time to do that, right? I'd prefer to have a feature flag in the collections extension that registers the special page as https://gerrit.wikimedia.org/r/#/c/377929/ feels a little hacky. What do you think?

@Tgr this hasn't happened right? Were you planning to do this or were you expecting us to do this. If the latter I'd like to capture this in a new subtask.

I was planning to but then the Electron deprecation happened so I abandoned this. Not sure if there is a point in doing the tests now (the MediaWiki part still makes sense, but you could just wait until the new PDF service is up and save some time testing them together).

find some way to prevent clever users who discover the non-public test URL from using it (https://gerrit.wikimedia.org/r/#/c/377929/)

Seems important, especially given all the errors in logstash.

Errors on the MediaWiki side are harmless. The problematic part is Electron which can get stuck in a failure state and need manual recovery, and testing large PDFs might or might not make that more likely to happen. (Although right now you can't trigger Electron for large pages due to the memcached issue, so this is only a problem after that gets fixed.)
I think some sort of access limit should be there but no one else seemed to care so the patch got stuck in code review and I moved on to other things.

We should probably disable this in production until we have time to do that, right? I'd prefer to have a feature flag in the collections extension that registers the special page as https://gerrit.wikimedia.org/r/#/c/377929/ feels a little hacky. What do you think?

Or just get rid of the special page. It was just meant for testing.
That said, I still think that patch is the easiest way to disable it for normal users while not blocking testing. MediaWiki doesn't have much in the way of feature flag management so you'd have to deploy a config or code change every time you want to start/end testing.

Or just get rid of the special page. It was just meant for testing.

What about making use of wgPermissions and limiting it to logged in users? Or do you think that's overkill?
https://gerrit.wikimedia.org/r/#/c/377929/4/wmf-config/CommonSettings.php doesn't help thinks if a username is registered with the WMF prefix :)

Looks like the groups 'unittesters', 'testwriters' exist. We could maybe reuse that?

What about making use of wgPermissions and limiting it to logged in users? Or do you think that's overkill?

It's underkill, if someone accidentally trips this of, it will be a power user who has seen the special page in the code / in some phab/irc/wiki discussion and is curious about it.

https://gerrit.wikimedia.org/r/#/c/377929/4/wmf-config/CommonSettings.php doesn't help thinks if a username is registered with the WMF prefix :)

You need to be an admin at least to register usernames with WMF in them.

Looks like the groups 'unittesters', 'testwriters' exist. We could maybe reuse that?

Where do you see those? I can't find anything similar.

I see those groups in core. Maybe they are only enabled for developers?

The username thing sounds like it will work (and right now is the only option we've got) but also limits testing to staff. Is it possible to create a new group with limited access? If not I guess we'll need to go with that approach.

We could get rid of the special page but we've not finished testing it so I'd like to leave that open a little bit longer.

I see those groups in core. Maybe they are only enabled for developers?

As far as I can see they are only used as mock permissions in user group related unit tests.

The username thing sounds like it will work (and right now is the only option we've got) but also limits testing to staff. Is it possible to create a new group with limited access?

It's certainly possible, just more work than worth IMO. You can add something like [ 'pdftester' => [ 'testpdf' => true ] ] to $wgGroupPermissions, check for $user->isAllowed('testpdf') in the code, add testpdf to $wgAvailableRights, create the group-pdftester, group-pdftester-member and grouppage-pdftester messages (probably in the WikimediaMessages extension), create the group description page, and have someone with steward or staff rights add the right people to that group (also on beta if you want to test there). Might be slightly different if you want to use a global (cross-wiki) group, I haven't done that yet.

Jdlrobson changed the task status from Open to Stalled.Oct 30 2017, 11:26 PM

Until we have time to talk through T178095

@Jdlrobson Hey Jon, is this for Readers Web to work on, or Reading Infrastructure? We're unclear if this is still valid. Thanks!

@LGoto it's unclear right now while this is stalled. It depends when we want to do it. See https://phabricator.wikimedia.org/T175868#3721912

Closing as per T184772#4116906. Pediapress will be taking on rendering of PDF books.

Change 377929 abandoned by Gergő Tisza:
Temporarily prevent users from accessing Special:RenderBook/test

https://gerrit.wikimedia.org/r/377929