Page MenuHomePhabricator

sync-masters slow on mira
Closed, ResolvedPublic

Description

17:15:42 Started sync-masters
sync-masters: 100% (ok: 1; fail: 0; left: 0)                                    
17:17:31 Finished sync-masters (duration: 01m 49s)

Revisions and Commits

rMSCA Scap
AuditedRestricted Differential Revision

Event Timeline

greg raised the priority of this task from to Needs Triage.
greg updated the task description. (Show Details)
greg added projects: Deployments, Scap.
greg subscribed.

Whereas on tin:

17:27:49 Started sync-masters
sync-masters: 100% (ok: 1; fail: 0; left: 0)                                    
17:28:09 Finished sync-masters (duration: 00m 20s)
greg renamed this task from sync-masters slow (on mira?) to sync-masters slow on mira.Jan 28 2016, 5:31 PM
greg set Security to None.

The first thing I would do to debug (if I had the root powers to do it) would be to run this from tin:

$ sudo /usr/bin/rsync \
    --verbose \
    --debug=CONNECT2 \
    --archive --delete-delay --delay-updates --compress --delete \
    --exclude="**/cache/l10n/*.cdb" \
    --exclude="*.swp" \
    "${MASTER}::common" /srv/mediawiki-staging

That's the guts of /usr/local/bin/scap-master-sync with some logging enabled that might help track down some source of slowness. Random guesses about the problem from me include asymmetric routing (seems unlikley) and some sort of version incompatibility between the rsync client on tin (3.0.9) and the server on mira (3.1.0).

@ori realized a few days ago that one source of slowness in the sync-master process is the rebuild of CDB files from their json counterparts. We should examine if this step is actually needed and remove it if it is not.

@ori realized a few days ago that one source of slowness in the sync-master process is the rebuild of CDB files from their json counterparts. We should examine if this step is actually needed and remove it if it is not.

If the CDBs aren't rebuilt from the json on the co-master then the medawiki-staging directory won't actually be in sync. l10nupdate or a scap run from the co-master would make the json files from the stale CDBs that were laying around.

The bigger question here is why the timestamps aren't working to know that the CDBs and JSON blobs are actually in sync.

We could drop the JSON->CDB step if we remove --exclude="**/cache/l10n/*.cdb" from modules/scap/files/scap-master-sync in operations/puppet.git and let rsync move the CDBs for us.

This would be slower (I think) when the CDB files actually change (e.g. during a scap or l10nupdate run), but it might be faster in the more common sync-file/sync-dir case.

Would be nice to solve why scap-rebuild-cdb mtimes are weird:

>>> import os
>>> os.path.getmtime('/srv/mediawiki-staging/php-1.27.0-wmf.14/cache/l10n/l10n_cache-en.cdb')
1455664545.517631
>>> os.path.getmtime('/srv/mediawiki-staging/php-1.27.0-wmf.14/cache/l10n/upstream/l10n_cache-en.cdb.json')
1455664545.5176318
thcipriani added a revision: Restricted Differential Revision.Feb 19 2016, 12:45 AM

Patch has merged, but reopening until we get this live on the deployment machines.

@thcipriani: Do you want to tag a new version of scap?

We need a process for releasing a test build of the .deb to beta cluster and then promoting that to production once it's prooven stable.

Do we have a pre-production debian package repo?

The patch for scap to compare json and cdb mtimes correctly merged yesterday evening: rOPUP3d3cd0b17b3de1cc34b4018e489650739293deaa

Since then, sync-masters times have fallen as expected:

sync-masters.png (308×586 px, 29 KB)

Anecdotally, searching the SAL 'synchronized' and comparing results starting from noon today vs. results historically, it seems that the time for sync-file has been shifted down quite a bit.

mmodell awarded a token.

credit where credit is due?