Page MenuHomePhabricator

Compress phabricator dump
Open, LowPublicFeature

Description

Currently, the dump is 1.4GB, it's not much but still compressing it reduce its size drastically to 180MB (Also it should be renamed from .dump to .json IMO) that would reduce a lot of network bandwidth (almost every request we make in browser gets compressed in gzip too)

Details

ReferenceSource BranchDest BranchAuthorTitle
repos/phabricator/deployment!18work/brennen/update-tools-compressionwmf/stablebrennenupdate tools submodule for dump compression
repos/phabricator/tools!1compressionwmf/stableladsgroupMake the dump compressed
Customize query in GitLab

Event Timeline

Aklapper changed the subtype of this task from "Task" to "Feature Request".
Aklapper added a subscriber: Aklapper.

If anyone who knows Python wants to look into this: https://gitlab.wikimedia.org/repos/phabricator/tools/-/blob/wmf/stable/public_task_dump.py 's

with open('/srv/dumps/phabricator_public.dump', 'w') as f:
f.write(json.dumps(data))

probably needs changes, plus an import tarfile line at the top of that file.

Edit: Uhm, maybe this code is not executed at all, and instead things are done in modules/phabricator/manifests/tools.pp in the operations/puppet Gerrit repository instead? Sorry for the wrong pointer!

i'll do it if you tell me which compression method: the options are: gzip, bzip, or if we want to be super fancy: zstd,

voting against super fancy and for simple .gz :)

alternative to doing it in python is that we run the actual gzip command on it right after upload

In puppet it's " $dump_script = "${directory}/public_task_dump.py". We could just make this " $dump_script = "${directory}/public_task_dump.py | gzip > ..." and not even need a Phabricator deployment ?

In puppet it's " $dump_script = "${directory}/public_task_dump.py". We could just make this " $dump_script = "${directory}/public_task_dump.py | gzip > ..." and not even need a Phabricator deployment ?

That doesn't write to stdout. The script directly writes to the file (see https://gitlab.wikimedia.org/repos/phabricator/tools/-/blob/wmf/stable/public_task_dump.py) so we can't pipe it.

https://gitlab.wikimedia.org/repos/phabricator/tools/-/merge_requests/1

I had to fork the whole repo into ladsgroup/tools. Can we make it possible to create a local branch for all repos for at least trusted contributors?

https://gitlab.wikimedia.org/repos/phabricator/tools/-/merge_requests/1

I had to fork the whole repo into ladsgroup/tools. Can we make it possible to create a local branch for all repos for at least trusted contributors?

I added you at Maintainer level to https://gitlab.wikimedia.org/repos/phabricator/. We'll have to work out how to handle this for all trusted contributors later.

Hmm patch got merged but https://dumps.wikimedia.org/other/misc/ still only has an uncompressed dump

I think we need to deploy phabricator too? I'm not sure tbh.

I think a regular scap deployment should do the trick here.

@brennen can you confirm? I can take care of it if that's the case.

@brennen can you confirm? I can take care of it if that's the case.

That'd do the trick, assuming that the tools submodule is updated on wmf/stable in the deployment repo.

The current plan is to deploy this change during a Phab deployment window: https://phabricator.wikimedia.org/T346266

This is ready to deploy, but there was some ongoing post-datacenter-switch maintenance during today's SRE Collab office hours. We'll defer 'til the next one.