17:15:42 Started sync-masters sync-masters: 100% (ok: 1; fail: 0; left: 0) 17:17:31 Finished sync-masters (duration: 01m 49s)
Description
Revisions and Commits
rMSCA Scap | |||
Restricted Differential Revision | rMSCAa593285f44aa Compare cdb mtimes less granularly |
Related Objects
- Mentioned Here
- rOPUP3d3cd0b17b3d: Update scap to v.3.0.3-1
Event Timeline
Whereas on tin:
17:27:49 Started sync-masters sync-masters: 100% (ok: 1; fail: 0; left: 0) 17:28:09 Finished sync-masters (duration: 00m 20s)
The first thing I would do to debug (if I had the root powers to do it) would be to run this from tin:
$ sudo /usr/bin/rsync \ --verbose \ --debug=CONNECT2 \ --archive --delete-delay --delay-updates --compress --delete \ --exclude="**/cache/l10n/*.cdb" \ --exclude="*.swp" \ "${MASTER}::common" /srv/mediawiki-staging
That's the guts of /usr/local/bin/scap-master-sync with some logging enabled that might help track down some source of slowness. Random guesses about the problem from me include asymmetric routing (seems unlikley) and some sort of version incompatibility between the rsync client on tin (3.0.9) and the server on mira (3.1.0).
Possibly related: https://bugs.launchpad.net/ubuntu/+source/rsync/+bug/1300367 (rsync 3.1.0 and 3.0.9 incompatibility)
@ori realized a few days ago that one source of slowness in the sync-master process is the rebuild of CDB files from their json counterparts. We should examine if this step is actually needed and remove it if it is not.
If the CDBs aren't rebuilt from the json on the co-master then the medawiki-staging directory won't actually be in sync. l10nupdate or a scap run from the co-master would make the json files from the stale CDBs that were laying around.
The bigger question here is why the timestamps aren't working to know that the CDBs and JSON blobs are actually in sync.
We could drop the JSON->CDB step if we remove --exclude="**/cache/l10n/*.cdb" from modules/scap/files/scap-master-sync in operations/puppet.git and let rsync move the CDBs for us.
This would be slower (I think) when the CDB files actually change (e.g. during a scap or l10nupdate run), but it might be faster in the more common sync-file/sync-dir case.
Would be nice to solve why scap-rebuild-cdb mtimes are weird:
>>> import os >>> os.path.getmtime('/srv/mediawiki-staging/php-1.27.0-wmf.14/cache/l10n/l10n_cache-en.cdb') 1455664545.517631 >>> os.path.getmtime('/srv/mediawiki-staging/php-1.27.0-wmf.14/cache/l10n/upstream/l10n_cache-en.cdb.json') 1455664545.5176318
@thcipriani: Do you want to tag a new version of scap?
We need a process for releasing a test build of the .deb to beta cluster and then promoting that to production once it's prooven stable.
Do we have a pre-production debian package repo?
The patch for scap to compare json and cdb mtimes correctly merged yesterday evening: rOPUP3d3cd0b17b3de1cc34b4018e489650739293deaa
Since then, sync-masters times have fallen as expected:
Anecdotally, searching the SAL 'synchronized' and comparing results starting from noon today vs. results historically, it seems that the time for sync-file has been shifted down quite a bit.