Page MenuHomePhabricator

[spike] Temporarily allow pushing large objects
Closed, ResolvedPublic

Description

I've created the deploy commit for https://gerrit.wikimedia.org/r/#/admin/projects/mediawiki/services/chromium-render/deploy, but it's too big to be pushed to gerrit. This is the error I'm getting:

error: Object too large (186,009,832 bytes), rejecting the pack. Max object size limit is 104,857,600 bytes.
error: remote unpack failed: error Object too large (186,009,832 bytes), rejecting the pack. Max object size limit is 104,857,600 bytes.

I've also split up the commit to pre and post puppeteer changes, but still no luck because the service template node itself is too big. Changing the project settings didn't seem to help as they cannot override the global limit if the value is bigger than the global limit. Can we increase the global limit for pushing the initial commit?

These are the files that go over the 100m limit:

100755 blob 89138522074bf80e7b3685565cdb312daf18883d 186009832  node_modules/puppeteer/.local-chromium/linux-499413/chrome-linux/chrome
100644 blob 3048b30f0d38be95c6877236ae1472ca1158260f 13766582   node_modules/puppeteer/.local-chromium/linux-499413/chrome-linux/resources.pak
100644 blob 58300d81c556dd9088653bb656e379609015de8d 13670308   node_modules/clarinet/test/twitter.js
100644 blob 55da7b29a3bb42a9cbd7e4cf1e09636c7e5b377b 13670294   node_modules/clarinet/samples/twitter.json
100644 blob 6685d3b2e84fb5ab57e4e35a1daf69d51a033346 11542238   node_modules/clarinet/samples/npm.json
100644 blob a9c427fbf11d1bcc1d5184305fbdb22150d38501 10196592   node_modules/puppeteer/.local-chromium/linux-499413/chrome-linux/icudtl.dat

Details

Related Gerrit Patches:
mediawiki/services/chromium-render/deploy : refs/meta/configIncrease object size to 200m
operations/puppet : productionGerrit: Increase receive.maxObjectSizeLimit to 200m temporarily

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 13 2017, 6:05 PM

Change 384083 had a related patch set uploaded (by Paladox; owner: Paladox):
[mediawiki/services/chromium-render/deploy@refs/meta/config] Increase object size to 190m

https://gerrit.wikimedia.org/r/384083

Change 384085 had a related patch set (by Paladox) published:
[operations/puppet@production] Gerrit: Increase receive.maxObjectSizeLimit to 200m temporarily

https://gerrit.wikimedia.org/r/384085

bmansurov updated the task description. (Show Details)Oct 13 2017, 6:48 PM

Pulled the task into the sprint for visibility.

Jdlrobson added a subscriber: Jdlrobson.

Joining late to the party, but if we solve this temporarily won't this still be a problem for future deploys?

Joining late to the party, but if we solve this temporarily won't this still be a problem for future deploys?

Hmm, maybe. I will leave this up to @demon to decide weather to keep the limit. We should probaly set it to something high in gerrit.config but lower it per project.

Maybe we need to lift the limit for the first time and in subsequent pushes there won't be need because all the big files have already been accepted in the remote repo. Wasn't clear from the documentation.

bmansurov updated the task description. (Show Details)Oct 13 2017, 10:11 PM
bmansurov updated the task description. (Show Details)

Just to clarify, are you trying to commit a binary of chromium into git? I think we normally use git-fat for large files? (there's other tasks about this somewhere...)

temporary solutions have a terrible habit of becoming permanent, around here.

The problem is that we cannot override this on a local repo without raising the global limits (all you can do is customize it lower). I'm very very very scared of raising this globally. 100MB objects are f'ing huge and perform very poorly. They also don't diff well, so it will burden this repo forever.

Let's do something with git-fat. As long as it's accessible over rsync, we should be able to fetch it from basically anywhere (cf: T171758#3475334). If git-fat isn't viable, there's got to be a better solution here than stuffing 100mb binaries into git. As the linked task states, I want to get git-lfs support going as well through gerrit, but I get the impression that this is slightly more urgent than waiting on that...

Just to clarify, are you trying to commit a binary of chromium into git? I think we normally use git-fat for large files? (there's other tasks about this somewhere...)

Yes, that's right. I want to commit all the dependencies of the chromium-render service to its corresponding deploy repo.

bmansurov added a comment.EditedOct 14 2017, 2:41 AM

@demon, ok, let's try git-fat first.

Change 384083 abandoned by Paladox:
Increase object size to 200m

https://gerrit.wikimedia.org/r/384083

Change 384085 abandoned by Paladox:
Gerrit: Increase receive.maxObjectSizeLimit to 200m temporarily

https://gerrit.wikimedia.org/r/384085

ovasileva triaged this task as High priority.Oct 16 2017, 10:20 AM
ovasileva moved this task from Incoming to 2017-18 Q2 on the Readers-Web-Backlog board.

Just to clarify, are you trying to commit a binary of chromium into git? I think we normally use git-fat for large files? (there's other tasks about this somewhere...)

Yes, that's right. I want to commit all the dependencies of the chromium-render service to its corresponding deploy repo.

Is there a reason we're not using the Debian package of Chromium? It seems like it would be a huge burden to constantly track and follow chromium's security issues...

Is there a reason we're not using the Debian package of Chromium? It seems like it would be a huge burden to constantly track and follow chromium's security issues...

One reason may be that the version of puppeteer we'll be using may need a specific version of Chromium. I'm not sure if a mismatch will cause any issues. If security is a concern we can skip downloading chromium.

phuedx added a subscriber: phuedx.EditedOct 17 2017, 4:36 PM
100644 blob 58300d81c556dd9088653bb656e379609015de8d 13670308   node_modules/clarinet/test/twitter.js
100644 blob 55da7b29a3bb42a9cbd7e4cf1e09636c7e5b377b 13670294   node_modules/clarinet/samples/twitter.json
100644 blob 6685d3b2e84fb5ab57e4e35a1daf69d51a033346 11542238   node_modules/clarinet/samples/npm.json

These look like test fixtures/example data for the clarinet NPM module. AFAICT they could safely be added to .gitignore.

Is there a reason we're not using the Debian package of Chromium? It seems like it would be a huge burden to constantly track and follow chromium's security issues...

One reason may be that the version of puppeteer we'll be using may need a specific version of Chromium. I'm not sure if a mismatch will cause any issues. If security is a concern we can skip downloading chromium.

Is there a plan for constantly keeping it up to date? I think this is something that should be discussed with Ops.

bmansurov added a subscriber: Joe.Oct 17 2017, 8:12 PM

Is there a plan for constantly keeping it up to date? I think this is something that should be discussed with Ops.

Given how puppeteer is constantly changing and hasn't reached version 1 yet, I hope we constantly update it.

@Joe what do you think about this issue? Should we go with using a Debian package for chromium or let puppeteer use its own version?

Joe added a comment.Oct 18 2017, 6:20 AM

Hi! I'm not sure I understand the details or the requirements, in fact last time I looked at your project, you were planning on working with python, while I see puppeteer is javascript, so I can assume any information I have about the project is now outdated.

AIUI, you want to swap electron out of pdfrenderer in favour of headless chromium. Given the high-risk nature of what we're doing here (parsing almost-arbitrary input in a browser environment) I think security should be a focal point, and in fact was one of our big grievances with electron which wasn't updated often.

In the interest of security, it would be preferrable to do as follows:

  • use the chromium package from debian. The one from jessie is already reasonably modern and should receive security updates for the whole lifecycle of the distribution.
  • The developers should keep puppeteer up to date - it's a library and it is part of your deployment.
  • You should probably use docker to run your program in a container interacting with the right version of chromium

also, you stated earlier that

One reason may be that the version of puppeteer we'll be using may need a specific version of Chromium

I'd doubt that's the case (they probably test only a specific version though), and if you have a decent test coverage, that should guarantee you compatibility.

Thanks for the reply, @Joe. To briefly update you, in T175853#3610941 we found out that interacting with headless Chromium in Python wasn't the best idea. So we decided to use the official Javascript package for that.

bmansurov closed this task as Declined.Oct 18 2017, 1:40 PM

I think we have the information we need in order to proceed. I've updated the description of T178166: Create and initialise the main and deploy repositories. Since we won't be increasing the gerrit limit, I'll decline this task. Thanks everyone for your comments.

bmansurov added a comment.EditedOct 18 2017, 8:08 PM

Puppeteer documentations warns against using versions of Chromium that doesn't come with pupeeteer:

NOTE Puppeteer works best with the version of Chromium it is bundled with. There is no guarantee it will work with any other version. Use executablePath option with extreme caution. If Google Chrome (rather than Chromium) is preferred, a Chrome Canary or Dev Channel build is suggested.

https://github.com/GoogleChrome/puppeteer/blob/v0.11.0/docs/api.md#puppeteerlaunchoptions

I wonder whether this a good reason to not use the Debian version of Chromium.

Also, the latest Debian Jessie has the Chromium version 57.0.2987.98-1~deb8u1, and the headless Chromium first appeared in versoin 59. Does that mean we should compile our own version of Chromium? Wouldn't it defeat the purpose of getting free security fixes from the Debian package maintainers?

Also, I created a proof of concept patch that uses the distribution's Chromium, except the patch doesn't work and puppeteer warns against using non-bundled Chromium: https://gerrit.wikimedia.org/r/385044.

Given the above, would it make sense to stick to the version of Chromium provided by puppeteer?

bmansurov reopened this task as Open.Oct 18 2017, 8:27 PM

Reopening given T178189#3695008. What do you guys think?

I would still advise distributing such a large binary (and the corresponding libraries) as a deb package, or as an archive somehow. We do have an artifacts repository in archiva, but I fear that only works for jars.

I can look into this a bit and get back to you. @MoritzMuehlenhoff might want to chime in too.

Joe added a comment.Oct 18 2017, 8:56 PM

Puppeteer documentations warns against using versions of Chromium that doesn't come with pupeeteer:

NOTE Puppeteer works best with the version of Chromium it is bundled with. There is no guarantee it will work with any other version. Use executablePath option with extreme caution. If Google Chrome (rather than Chromium) is preferred, a Chrome Canary or Dev Channel build is suggested.

https://github.com/GoogleChrome/puppeteer/blob/v0.11.0/docs/api.md#puppeteerlaunchoptions
I wonder whether this a good reason to not use the Debian version of Chromium.
Also, the latest Debian Jessie has the Chromium version 57.0.2987.98-1~deb8u1, and the headless Chromium first appeared in versoin 59. Does that mean we should compile our own version of Chromium? Wouldn't it defeat the purpose of getting free security fixes from the Debian package maintainers?
Also, I created a proof of concept patch that uses the distribution's Chromium, except the patch doesn't work and puppeteer warns against using non-bundled Chromium: https://gerrit.wikimedia.org/r/385044.
Given the above, would it make sense to stick to the version of Chromium provided by puppeteer?

Ouch, this is pretty annoying for deployments!

It surely makes sense to stick to the version of chromium they support, given what you found out; we should look into how to reliably and securely distribute it on our cluster though.

Joe added a comment.Oct 19 2017, 6:36 AM

I took a look at how puppeteer downloads chromium and it's underwhelmning to be honest: they plainly download a zip file and unpack it. As far as I can see no way of verifying the downloaded package is made; what is worse, I cannot see any page on the chromium downloads website reporting the checksums for the archives we'd need to download.

So it seems we really can't allow puppeteer to do whatever it wants to and add a chromium build somewhere, and find a convenient mean of distribution. Probably being able to just download the zip file and unpack it during scap deployments should be enough. I think we need some advice from Release Engineering for how to make this work.

Should the releng tag be added to get there advice?

Joe added a comment.Oct 19 2017, 6:43 AM

@Paladox actually I'd open a new ticket describing the problem instead of requesting a change of configuration to gerrit.

Ok. Thanks. I will let that for @bmansurov to do :)

demon added a comment.Oct 19 2017, 5:02 PM

I'm not sure why we need another task or to tag RelEng since I've been following along since the beginning.

Is there a reason gerrit + git-lfs is insufficient? We're already working on implementation as it is.

bmansurov added a comment.EditedOct 19 2017, 5:37 PM

@demon, the other task is about finding out whether we need to submit the Chromium binary into gerrit. If we do, then I guess we'll use the method you suggested.

demon added a comment.Oct 19 2017, 5:53 PM

Well I'm implementing it anyway, this isn't the only use case.

@demon, that's great. Any idea when it'll be ready?

demon added a comment.Oct 19 2017, 6:06 PM

Hopefully next week, we've already got some patches in flight for this to start configuring it.

bmansurov renamed this task from Temporarily allow pushing large objects to [subtask] Temporarily allow pushing large objects.Oct 19 2017, 7:00 PM

Change 385239 had a related patch set uploaded (by Paladox; owner: Paladox):
[All-Projects@refs/meta/config] Enable lfs for mediawiki/services/chromium-render/deploy

https://gerrit.wikimedia.org/r/385239

Joe added a comment.Oct 20 2017, 5:52 AM

I'm not sure why we need another task or to tag RelEng since I've been following along since the beginning.

I think the description of this task does not reflect the actual problem we're trying to solve, and we also moved away from the original request. This task is clearly an XY problem.

Also, the latest Debian Jessie has the Chromium version 57.0.2987.98-1~deb8u1, and the headless Chromium first appeared in versoin 59. Does that mean we should compile our own version of Chromium? Wouldn't it defeat the purpose of getting free security fixes from the Debian package maintainers?

There are currently no Chromium packages for jessie, the maintainer has asked for a volunteer to build/test the stretch packages for jessie and while there was a a volunteer, this apparently didn't happen so far:
https://lists.debian.org/debian-security/2017/08/msg00010.html

It's fairly complex endeavor to follow Chromium over more than the usual two years of lifetime of a Debian stable release (before the next one is released), since they're updating build dependencies pretty quickly (e.g. on wheezy Chromium needed to be end-of-lifed in advance as well since they started to use C++ features which were not yet supported in the GCC C++ compiler in Debian wheezy)

If necessary we could chime in for the jessie builds, though. (But the proper fix would be to run this service on stretch).

Also, I created a proof of concept patch that uses the distribution's Chromium, except the patch doesn't work and puppeteer warns against using non-bundled Chromium: https://gerrit.wikimedia.org/r/385044.
Given the above, would it make sense to stick to the version of Chromium provided by puppeteer?

If we stick to the Chromium version bundled by puppeteer, we'll end up in the same lockstep problem we had with Electron; we need to rely on puppeteer to follow Chrome security releases every few weeks. If they don't, that's not the end of the world (since many of the typical browser threats are reduced in our use case and we'll hopefully also deploy this thing with firejail again), but it's still fairly ugly.

phuedx added a comment.Nov 7 2017, 6:20 AM

But the proper fix would be to run this service on stretch <snip />

@MoritzMuehlenhoff: Apologies for the late follow up. Is this an option from an Ops perspective? @bmansurov has already identified that we can't use the version of Chromium packaged for Debian Jessie as it can't run in headless mode. However, the version for Debian Stretch (62.0.3202.75-1) is up to date AFAICT.

Our scb* cluster currently runs jessie. I don't know the time frame for the new setup, but running the electron replacement on stretch should be doable. Depending on the ETA and available resources in the Services team we can either migrate scb* in general to stretch or alternatively we could create a stretch-based scc* cluster based on Ganeti instances.

While migrating SCB nodes to stretch will need to happen, I don't think we will have the bandwidth to do so soon-ish. Since the headless Chrome/puppeteer approach is experimental at this point, how about setting it up temporarily in Ganeti for evaluation purposes and then migrate it to SCB once we move to stretch?

(Also, this is highly off-topic for this ticket, we should probably create a new one for this issue).

phuedx added a comment.Nov 7 2017, 9:16 AM

(Also, this is highly off-topic for this ticket, we should probably create a new one for this issue).

Agreed. I think that this ticket can actually be closed in favor of tracking the effort to deploy git-lfs in T171758: Support git-lfs files in gerrit.

I'll summarize our options for deploying and using Chromium in a comment on T178570: How should we get Chromium for use in puppeteer? and we can continue the conversation there.

Our scb* cluster currently runs jessie. I don't know the time frame for the new setup

There isn't one, at least not yet. We haven't had the need to upgrade to stretch yet. Given that it is also going to be a slow and arduous process since all services will need to be tested, we 've been postponing it, hoping our kubernetes infrastructure will be ready before the need arises.

but running the electron replacement on stretch should be doable.

Yes

Depending on the ETA and available resources in the Services team we can either migrate scb* in general to stretch or alternatively we could create a stretch-based scc* cluster based on Ganeti instances.

Do we have a timeline for when we can (and want to) have this working in production ? That would help us make a more informed decision. AFAICT this is still in early stages, right ?

Do we have a timeline for when we can (and want to) have this working in production ? That would help us make a more informed decision. AFAICT this is still in early stages, right?

Right. Currently, Readers Web are working on adding rate limiting and timeout mechanisms to the new service prior to performance testing it (see T178501 and T178278, respectively). There's also security review going on in parallel. My finger in the air estimate says that we'll be done with the performance testing (including seeking review from Services, Ops, and Readers Infra) by the beginning of December (circa 3 weeks).

After that's done we'd like to deploy the new service alongside the old service so that we can see how it works in production. After we've evaluated the service, and we're all happy, then we'll switch out the services and decommission the old one.

Since deployment will be slowed around Thanksgiving and by the December deployment freeze, my guess is that we're actually looking at deploying in early January, which will also be affected by Christmas holidays and All Hands travel/prep.

One thing that we (Readers Web) need to do to help inform the decision would be to make sure that @bmansurov's POC change (to make the service run with the Debian packaged Chromium) works on Stretch

Do we have a timeline for when we can (and want to) have this working in production ? That would help us make a more informed decision. AFAICT this is still in early stages, right?

Right. Currently, Readers Web are working on adding rate limiting and timeout mechanisms to the new service prior to performance testing it (see T178501 and T178278, respectively). There's also security review going on in parallel. My finger in the air estimate says that we'll be done with the performance testing (including seeking review from Services, Ops, and Readers Infra) by the beginning of December (circa 3 weeks).

OK

After that's done we'd like to deploy the new service alongside the old service so that we can see how it works in production. After we've evaluated the service, and we're all happy, then we'll switch out the services and decommission the old one.

Sounds reasonable. There is the open question of how exactly we are going to do the transition (all in one go ? in % increments ? something else entirely ?) but that's for later I guess

Since deployment will be slowed around Thanksgiving and by the December deployment freeze, my guess is that we're actually looking at deploying in early January, which will also be affected by Christmas holidays and All Hands travel/prep.

Aha. I 'd add there's a significant risk this could be pushed back to February. Unfortunately for the same reasons (Thanksgiving/December freezes) that doesn't not really buy us any significant time as far as scb* goes. During Thanksgiving and December freezes it's improbable we would be able to make the transition to stretch. Which leaves us with just 2 weeks in January. At the same time I doubt the Services team has high in priority the testing of services in stretch (the biggest problem being services that ship binary node/python modules). That being said if we end up targetting February we just might be able to use kubernetes (but don't count on it)

One thing that we (Readers Web) need to do to help inform the decision would be to make sure that @bmansurov's POC change (to make the service run with the Debian packaged Chromium) works on Stretch

Ok, so that's a blocker. Assuming it works and the service is demoed working fine in labs, we could pave a way to production via a couple of stretch VMs to power the service at first. Later on we can reevaluate

In T178189#3740029, @akosiaris wrote:

In T178189#3740805, @phuedx wrote:
One thing that we (Readers Web) need to do to help inform the decision would be to make sure that @bmansurov's POC change (to make the service run with the Debian packaged Chromium) works on Stretch

Ok, so that's a blocker. Assuming it works and the service is demoed working fine in labs, we could pave a way to production via a couple of stretch VMs to power the service at first. Later on we can reevaluate.

In T180037: [Spike] Can the new render service run on Debian Stretch?, @bmansurov and @pmiazga (both Readers Web) confirmed that v0.11.0 of the puppeteer page can successfully drive a packaged version of the Chromium binary (62.0.3202.89-1~deb9u1) on Debian Stretch on labs.

If we're all agreed with this as a way forward, then this task and T178570: How should we get Chromium for use in puppeteer? can be resolved.

bmansurov removed bmansurov as the assignee of this task.Nov 15 2017, 6:20 PM
phuedx renamed this task from [subtask] Temporarily allow pushing large objects to [spike] Temporarily allow pushing large objects.Nov 15 2017, 6:22 PM
phuedx claimed this task.
phuedx added a project: Spike.

If we're all agreed with this as a way forward, then this task and T178570: How should we get Chromium for use in puppeteer? can be resolved.

Agreed on my part.

phuedx closed this task as Resolved.Nov 21 2017, 8:49 AM

Being bold.

I'll be creating a higher-level "Deploy the service" task that summarises the outcome of this task and T178570: How should we get Chromium for use in puppeteer? later.