Page MenuHomePhabricator

Scraper: compress samples and intermediate files with gzip
Closed, ResolvedPublic

Description

Using a library like stream_gzip, compress all outputs. Decompress when reading.

In the past, I seem to remember that this library uses slightly odd typing for its input or output, so we might need to include a small wrapper to adapt to Stream and Flow pipelines.

Smaller, final output files with a single object such as "-summary.json" shouldn't be compressed.

Code to review:
https://gitlab.com/wmde/technical-wishes/scrape-wiki-html-dump/-/merge_requests/49

Event Timeline

awight renamed this task from Scraper: compress samples, intermediate, and output files with gzip to Scraper: compress samples and intermediate files with gzip.Apr 20 2023, 10:03 AM
awight updated the task description. (Show Details)
awight moved this task from Doing to Tech Review on the WMDE-TechWish-Sprint-2023-04-19 board.
awight moved this task from Tech Review to Done on the WMDE-TechWish-Sprint-2023-04-19 board.