scap has a lot of machinery involved with converting CDB files to JSON and avoiding the CDB files when rsyncing between hosts, and reconstituting the CDB files from the JSON files on target hosts. I wonder if this is still useful/necessary.
Related context:
- T221428: Scap should only sync built CDB files to production appserver hosts, not the build files as well
- T99740: Use static php array files for l10n cache at WMF (instead of CDB)
rsync stats comparison between rsync_cdbs:True and rsync_cdbs:False when altering a set of l10n files (https://gerrit.wikimedia.org/r/c/mediawiki/core/+/749100). The commit results in 13 of 449 CDB files being updated. The following are stats from rsync when pulling to a target host during scap sync-world
Number of files: 263,633 (reg: 243,594, dir: 19,871, link: 168) Number of created files: 1 (reg: 1) Number of deleted files: 0 Number of regular files transferred: 27 Total file size: 6,855,911,866 bytes Total transferred file size: 68,003,603 bytes Literal data: 7,253,928 bytes <--- Matched data: 60,749,675 bytes <--- File list size: 6,750,842 File list generation time: 1.037 seconds File list transfer time: 0.000 seconds Total bytes sent: 199,009 Total bytes received: 9,751,864
Number of files: 264,530 (reg: 244,489, dir: 19,873, link: 168) Number of created files: 1 (reg: 1) Number of deleted files: 0 Number of regular files transferred: 40 Total file size: 7,141,639,029 bytes Total transferred file size: 69,132,414 bytes Literal data: 2,187,974 bytes <--- Matched data: 66,944,440 bytes <--- File list size: 6,779,014 File list generation time: 1.033 seconds File list transfer time: 0.000 seconds Total bytes sent: 200,277 Total bytes received: 7,225,108
Note: Each of these transfers took about the same amount of time (~3 seconds) on my machine (train-dev envirionment).
The stats confirm that modified JSON L10N files transfer more efficiently than their CDB counterparts. An open question is if the better rsync efficiency is more important than the code and operational complexity. I don't think it is but am interested in input. I will also note that using rsync_cdbs:True results in a faster scap sync-world if all l10n files have been freshly generated or when no l10n files have been changed.