Page MenuHomePhabricator

Decide what to do with size of parsoid and its dependencies (langconv) vendor/tarballs
Closed, ResolvedPublic

Description

Parsoid tests in vendor are of a size where we should be considering if/how we distribute them, especially for non dev environments including MediaWiki-Vendor and tarball releases

We know disk space is an issue for some people, and keeping them will increase the size of both the tarballs compressed and the space on disk when expanded

Event Timeline

LGoto triaged this task as Medium priority.May 21 2020, 6:10 PM
LGoto moved this task from Needs Triage to Needs Investigation on the Parsoid board.

Can we ignore them with .gitattributes like we do in other composer packages?

The issue is that we eventually need to run parser tests on extensions using Parsoid, so things like the ParserTest runner (currently in tests) need to be distributed. Similarly, I'd like to reduce the duplication of parserTests between core and Parsoid, so eventually I'd like to see core using the copy of parserTests from parsoid directly.

Which is a long way of saying that *some* of the code in tests needs to be distributed. But there is certainly other bits of code in there which don't need to be distributed, needs a bit of finesse to separate these out. (And maybe some reorg of directories.)

user@dev ~/g/m/c/vendor> du -hs *
4.0K	autoload.php
4.0K	bin
372K	composer
72K	cssjanus
820K	guzzlehttp
4.0K	jakub-onderka
216K	liuggio
1.3M	oojs
676K	pear
36K	pleonasm
208K	psr
28K	ralouphie
68K	symfony
340M	wikimedia
300K	zordius

The size is really coming from wikimedia/langconv which is 322M. For reference, the most recent MediaWiki tarball is 39M.

Do we really need all the files in vendor/wikimedia/langconv/fst at runtime?

I don't think Parsoid tests are that significant, they're only 2M. We can probably stop shipping baseconfig/ (8M) so then Parsoid is just 4M. But langconv is the real problem...

Change 612431 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[mediawiki/services/parsoid@master] Don't include baseconfig/ in packagist releases

https://gerrit.wikimedia.org/r/612431

Legoktm renamed this task from Decide what to do with parsoid tests in vendor/tarballs to Decide what to do with size of parsoid and its dependencies (langconv) vendor/tarballs.Jul 13 2020, 10:31 PM

I'd rather not strip baseconfig at the present time, we're still integrating CI with core and that means being able to run parsertests from parsoid-as-installed-as-a-package.

There's no way wikimedia/langconv should be 322M.

Change 612436 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/libs/LangConv@master] Don't distribute the (large) .att files in the composer library

https://gerrit.wikimedia.org/r/612436

Change 612436 merged by jenkins-bot:
[mediawiki/libs/LangConv@master] Don't distribute the (large) .att files in the composer library

https://gerrit.wikimedia.org/r/612436

Change 612438 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Make wikimedia/langconv optional

https://gerrit.wikimedia.org/r/612438

Change 612440 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/vendor@master] Ensure production uses wikimedia/langconv

https://gerrit.wikimedia.org/r/612440

Change 612438 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Make wikimedia/langconv optional

https://gerrit.wikimedia.org/r/612438

Change 612440 merged by jenkins-bot:
[mediawiki/vendor@master] Ensure production uses wikimedia/langconv

https://gerrit.wikimedia.org/r/612440

Notes from IRC:

  • wikimedia/langconv, with the .att files removed, is 80M. It's only required for Parsoid read views, which isn't ready for 1.35, so it will not be in the tarball (though it may take a few days for us to remove it, so it might slip into rc0 depending on when that happens). It is expected to end up the tarball eventually though.
  • wikimedia/parsoid is 12M, and needs the baseconfig/ files to enable some CI testing. Removing those to get the library down to 4M may be possible in the future but not right now. There will be other savings too when parserTests are unified and eventually when the old parser is removed. Eventually.

I think we can close this task once the necessary bumps have happened to remove wikimedia/langconv from the tarball for 1.35.

Also I now realize that I've been comparing uncompressed filesizes to the compressed tarball. The new langconv is ~40MB compressed, which would only about double the tarball size.

Change 612634 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/vendor@wmf/1.35.0-wmf.41] Bump Parsoid to v0.12.0-a22

https://gerrit.wikimedia.org/r/612634

Change 612635 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/vendor@master] Bump Parsoid to v0.12.0-a22

https://gerrit.wikimedia.org/r/612635

Change 612635 merged by jenkins-bot:
[mediawiki/vendor@master] Bump Parsoid to v0.12.0-a22

https://gerrit.wikimedia.org/r/612635

Change 612498 had a related patch set uploaded (by Jforrester; owner: C. Scott Ananian):
[mediawiki/vendor@REL1_35] Bump Parsoid to v0.12.0-a22

https://gerrit.wikimedia.org/r/612498

Change 612634 merged by jenkins-bot:
[mediawiki/vendor@wmf/1.35.0-wmf.41] Bump Parsoid to v0.12.0-a22

https://gerrit.wikimedia.org/r/612634

Change 612498 merged by Jforrester:
[mediawiki/vendor@REL1_35] Bump Parsoid to v0.12.0-a22

https://gerrit.wikimedia.org/r/612498

Mentioned in SAL (#wikimedia-operations) [2020-07-14T19:52:46Z] <jforrester@deploy1001> Synchronized php-1.35.0-wmf.41/vendor/wikimedia/parsoid/: T252448 T255190 Bump Parsoid to v0.12.0-a23 (duration: 01m 06s)

Legoktm assigned this task to cscott.

We could consider removing the siteconfig and parsertest information from the parsoid 0.12.0 "release" tag (since right now our 1.35 RC is using an 0.12.0-aX tag). That would slim things down a little for 1.35 w/o impacting the current work being done to unify parser tests (on master, which is parsoid 0.13.x).

Change 900393 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] baseconfig: Remove formatversion=1 configurations

https://gerrit.wikimedia.org/r/900393

Change 900393 merged by jenkins-bot:

[mediawiki/services/parsoid@master] baseconfig: Remove formatversion=1 configurations

https://gerrit.wikimedia.org/r/900393

Change 901245 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/vendor@master] Bump parsoid to 0.18.0-a2

https://gerrit.wikimedia.org/r/901245

Change 901245 merged by jenkins-bot:

[mediawiki/vendor@master] Bump parsoid to 0.18.0-a2

https://gerrit.wikimedia.org/r/901245

Change 612431 abandoned by C. Scott Ananian:

[mediawiki/services/parsoid@master] Don't include baseconfig/ in packagist releases

Reason:

Blocked on T287419

https://gerrit.wikimedia.org/r/612431