Page MenuHomePhabricator

test/mediawiki/core2 has some big objects
Closed, ResolvedPublic

Description

The test/mediawiki/core2 has some big objects which were uploaded to svn by mistake. There is at least an OGG video and a few MB parserTests file.

We might want to drop those objects to shrunken the repo size.

One can found the object sha1 using verify-pack:

git verify-pack -v .git/objects/pack/pack-4d812fd3351b2f9b9814dd4c4370554c5bf3bc8a.idx | sort -k3n | tail -n 10

deacc7523dc11f0aa72fabf09c9eb142ea501b6d blob 574106 145627 85947251
a34d599e7baee042336dd9d7ba5823a338e6d568 blob 596528 563665 127474685
c105546577035b0c140885e7784c6aa8c1bd6e3c blob 696872 696627 85017232
d41cf97eda958e808e47a1b26b4e1faf57b1872d blob 831473 731654 128085310
7547e5d52614b85bc569f319e6c90ffae0d22e74 blob 859358 561624 68971517
d9f76fe3685a08d233d57f79ed09458cb89b9ee8 blob 867490 234202 70979864
6474bca35d96def65b5bd5c4057570feacd97b0e blob 1039405 283187 60074558
f73456d5912da3406fcf7a1d557253c8cd7d0130 blob 1375400 384009 23574243
442cecf1c118e06e76baffd8c16e75a5fa65b953 blob 2152337 2145045 94611352
f0742f706d752466bb9ba782abce5db2e52642c4 blob 8001431 3756081 73016860

The 3rd field is uncompressed size, 4th one compressed size. So the last two are 2,1MB and 3.7MB files.

$ git show f0742f7 | head -n1

This is another parserTest file.

A commit by someone to parserTests.php that inserted a 8MB test case :(

$ git show 442cecf | head -c 4
OggS

That is a theora encoded video of some fishes (git show 442cecf > /tmp/fishes.ogg then open it in VLC).


Version: unspecified
Severity: normal

Details

Reference
bz34472
TitleReferenceAuthorSource BranchDest Branch
Revert "Add metadata to the gettasktransactions conduit method response"repos/phabricator/phabricator!61aklapperT364728rmConduitMetadatawmf/stable
Remove downstream changes in AphrontFileResponse.phprepos/phabricator/phabricator!51aklapperrmAphrontFileT364720wmf/stable
Fixes for banana-check autofixerrepos/ci-tools/libup!54taavitaavi/bananamaster
Downgrade errors with translated data (in default config) to warningsrepos/ci-tools/banana-checker!13esanderswarnmain
Customize query in GitLab

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 12:09 AM
bzimport added a project: Gerrit.
bzimport set Reference to bz34472.
bzimport added a subscriber: Unknown Object (MLST).

If you can find a way to drop these from the repo properly, I'm all for it.

It looks like we can drop the .ogg video by using the following filtering command:

git filter-branch --prune-empty \

--index-filter \
'git rm -rf --cached --ignore-unmatch js2/mwEmbed/example_usage/media/*.ogg' \
--tag-name-filter cat \
-- --all

The huge ExtraParserTests.txt file is mentioned in bug 23715. Probably added by r67014. It was removed by r67091.

The easiest would probably be to use 'git rebase', cancel that huge additions and amend the commit message with a nice note such as : "Year 2012: there used to be a huge file there that was replaced by a nicer str_repeat() call :]"

The ExtraParserTests f0742f706d752466bb9ba782abce5db2e52642c4 is still in :-/ Although I tested rebasing friday and tried both today to fix it, I have encountered a blocker with the final cut. An unrelated path conflict later on during the rebase :-/

Being pragmatic, we skipped filtering that one.

The ogg files have been filtered out though! So that is almost fixed.