Page MenuHomePhabricator

refreshCdbJsonFiles should be rewritten in python
Closed, ResolvedPublic


since hhvm is missing the dba_* functions, bin/refreshCdbJsonFiles shabang had to be hacked to explicitly use #!/usr/bin/env php5. Rather than depend on php5, we should rewrite this in python as it is part of scap.

Revisions and Commits

Event Timeline

thcipriani raised the priority of this task from to Needs Triage.
thcipriani updated the task description. (Show Details)
thcipriani added projects: Scap, Deployments.

I don't know how relevant it is, but there is a longer term plan to use plain php files intead of cdb files (T99740). This use case has been specifically optimized by the hhvm team (

It means we wouldn't have to rewrite it anymore, but we still should regardless.

Using PHP files for l10n cache is blocked on T103886: Translation cache exhaustion caused by changes to PHP code in file scope which is in turn blocked on T119637: Update HHVM package to recent release.

The refreshCdbJsonFiles script is fairly simple at its heart. It generates JSON dumps and MD5 checksums for all the l10n cache CDB files by forking N child processes in parallel. The choice of PHP as the implementation language is mostly a historical accident of it being written by @aaron before scap was converted to Python. Since the script worked and was fairly sane code I never bothered to reimplement in Python. Scap has the inverse operation of updating all the l10n CDB files from the JSON dumps implemented in the scap.tasks.merge_cdb_updates() function which shows how to accomplish the parallel processing using multiprocessing.Pool.

thcipriani moved this task from Needs triage to Services improvements on the Scap board.