Page MenuHomePhabricator

Stop trying to avoid rsyncing l10n CDB files
Closed, DeclinedPublic5 Estimated Story Points

Description

scap has a lot of machinery involved with converting CDB files to JSON and avoiding the CDB files when rsyncing between hosts, and reconstituting the CDB files from the JSON files on target hosts. I wonder if this is still useful/necessary.

Related context:

rsync stats comparison between rsync_cdbs:True and rsync_cdbs:False when altering a set of l10n files (https://gerrit.wikimedia.org/r/c/mediawiki/core/+/749100). The commit results in 13 of 449 CDB files being updated. The following are stats from rsync when pulling to a target host during scap sync-world

rsync_cdbs:True
Number of files: 263,633 (reg: 243,594, dir: 19,871, link: 168)
Number of created files: 1 (reg: 1)
Number of deleted files: 0
Number of regular files transferred: 27
Total file size: 6,855,911,866 bytes
Total transferred file size: 68,003,603 bytes
Literal data: 7,253,928 bytes   <---
Matched data: 60,749,675 bytes  <---
File list size: 6,750,842
File list generation time: 1.037 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 199,009
Total bytes received: 9,751,864
rsync_cdbs:False
Number of files: 264,530 (reg: 244,489, dir: 19,873, link: 168)
Number of created files: 1 (reg: 1)
Number of deleted files: 0
Number of regular files transferred: 40
Total file size: 7,141,639,029 bytes
Total transferred file size: 69,132,414 bytes
Literal data: 2,187,974 bytes     <---
Matched data: 66,944,440 bytes    <---
File list size: 6,779,014
File list generation time: 1.033 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 200,277
Total bytes received: 7,225,108

Note: Each of these transfers took about the same amount of time (~3 seconds) on my machine (train-dev envirionment).

The stats confirm that modified JSON L10N files transfer more efficiently than their CDB counterparts. An open question is if the better rsync efficiency is more important than the code and operational complexity. I don't think it is but am interested in input. I will also note that using rsync_cdbs:True results in a faster scap sync-world if all l10n files have been freshly generated or when no l10n files have been changed.

Event Timeline

Change 745572 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] Allow scap/files/scap-master-sync to include CDB files

https://gerrit.wikimedia.org/r/745572

Change 745572 merged by Dzahn:

[operations/puppet@production] Allow scap/files/scap-master-sync to include CDB files

https://gerrit.wikimedia.org/r/745572

dancy set the point value for this task to 5.Dec 9 2021, 10:41 PM

Change 746941 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[mediawiki/tools/scap@master] rsync_cdbs stuff

https://gerrit.wikimedia.org/r/746941

Change 746941 merged by jenkins-bot:

[mediawiki/tools/scap@master] Add rsync_cdbs configuration parameter

https://gerrit.wikimedia.org/r/746941

Change 747643 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] scap.cfg: Enable rsync_cdbs in beta

https://gerrit.wikimedia.org/r/747643

Change 747643 merged by Dzahn:

[operations/puppet@production] scap.cfg: Enable rsync_cdbs in beta

https://gerrit.wikimedia.org/r/747643

Change 751763 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[mediawiki/tools/scap@master] rsync_cdbs fix

https://gerrit.wikimedia.org/r/751763

Change 751763 merged by jenkins-bot:

[mediawiki/tools/scap@master] rsync_cdbs fix

https://gerrit.wikimedia.org/r/751763

Collected data is in the description. T99740 is really the best bet going forward.

Change 766170 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[mediawiki/tools/scap@master] Rip out rsync_cdbs-related code

https://gerrit.wikimedia.org/r/766170

Change 766170 merged by jenkins-bot:

[mediawiki/tools/scap@master] Rip out rsync_cdbs-related code

https://gerrit.wikimedia.org/r/766170

Change 801746 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] Revert "scap.cfg: Enable rsync_cdbs in beta"

https://gerrit.wikimedia.org/r/801746

Change 801746 merged by Dzahn:

[operations/puppet@production] Revert "scap.cfg: Enable rsync_cdbs in beta"

https://gerrit.wikimedia.org/r/801746