Page MenuHomePhabricator

scap-purge-l10n-cache hanging
Closed, InvalidPublic

Description

I think this is due to T122005, but scap-purge-l10n-cache seems to be hanging. I don't know if it's due to some timeout or something...

I got bored of waiting

scap-purge-l10n-cache

reedy@tin:/srv/mediawiki-staging$ time scap-purge-l10n-cache --version=php-1.27.0-wmf.6 --verbose
^C16:01:41 scap-purge-l10n-cache aborted: 1)                                    

real  6m26.006s
user  2m45.562s
sys   3m31.321s

vs

reedy@tin:/srv/mediawiki-staging$ sync-file README noop
           ___ ____
         ⎛   ⎛ ,----
          \  //==--'
     _//|,.·//==--'    ____________________________
    _OO≣=-  ︶ ᴹw ⎞_§ ______  ___\ ___\ ,\__ \/ __ \
   (∞)_, )  (     |  ______/__  \/ /__ / /_/ / /_/ /
     ¨--¨|| |- (  / ______\____/ \___/ \__^_/  .__/
         ««_/  «_/ jgs/bd808                /_/

15:53:17 Started sync-masters
sync-masters: 100% (ok: 1; fail: 0; left: 0)                                    
15:53:31 Finished sync-masters (duration: 00m 13s)
15:53:31 Started sync-proxies
sync-proxies: 100% (ok: 12; fail: 0; left: 0)                                   
15:53:33 Finished sync-proxies (duration: 00m 02s)
15:53:33 Started sync-apaches
15:53:42 ['/srv/deployment/scap/scap/bin/sync-common', '--no-update-l10n', '--include', 'README', 'mw1010.eqiad.wmnet', 'mw1033.eqiad.wmnet', 'mw1070.eqiad.wmnet', 'mw1097.eqiad.wmnet', 'mw1216.eqiad.wmnet', 'mw1161.eqiad.wmnet', 'mw1201.eqiad.wmnet', 'mw2001.codfw.wmnet', 'mw2041.codfw.wmnet', 'mw2080.codfw.wmnet', 'mw2119.codfw.wmnet', 'mw2187.codfw.wmnet'] on mw1228.eqiad.wmnet returned [70]: 15:53:42 Copying to mw1228.eqiad.wmnet from mw1201.eqiad.wmnet
15:53:42 Started rsync common
15:53:42 Finished rsync common (duration: 00m 00s)
/usr/bin/touch: cannot touch ‘/srv/mediawiki/wmf-config/InitialiseSettings.php’: Read-only file system
15:53:42 Unhandled error:
Traceback (most recent call last):
  File "/srv/deployment/scap/scap/scap/cli.py", line 276, in run
    exit_status = app.main(extra_args)
  File "/srv/deployment/scap/scap/scap/main.py", line 401, in main
    verbose=self.verbose
  File "/srv/deployment/scap/scap/scap/utils.py", line 348, in context_wrapper
    return func(*args, **kwargs)
  File "/srv/deployment/scap/scap/scap/tasks.py", line 342, in sync_common
    '/usr/bin/touch', settings_path))
  File "/usr/lib/python2.7/subprocess.py", line 540, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '('sudo', '-u', 'mwdeploy', '-n', '--', '/usr/bin/touch', '/srv/mediawiki/wmf-config/InitialiseSettings.php')' returned non-zero exit status 1
15:53:42 sync-common failed: <CalledProcessError> Command '('sudo', '-u', 'mwdeploy', '-n', '--', '/usr/bin/touch', '/srv/mediawiki/wmf-config/InitialiseSettings.php')' returned non-zero exit status 1

sync-common: 100% (ok: 467; fail: 1; left: 0)                                   
15:53:50 1 apaches had sync errors
15:53:50 Finished sync-apaches (duration: 00m 16s)
15:53:50 Synchronized README: noop (duration: 00m 32s)
reedy@tin:/srv/mediawiki-staging$

Event Timeline

Reedy raised the priority of this task from to Low.
Reedy updated the task description. (Show Details)
Reedy subscribed.
Reedy renamed this task from scap-purge-l10n-cache to scap-purge-l10n-cache hanging.Dec 20 2015, 4:04 PM
Reedy set Security to None.

--verbose doesn't show very much

Yup, so after it was depooled, scap-purge-l10n-cache runs fine, so I presume there's some excessive timeout being set to the dsh call by this module?

hashar claimed this task.
hashar subscribed.

/usr/bin/touch: cannot touch ‘/srv/mediawiki/wmf-config/InitialiseSettings.php’: Read-only file system

/srv is a symlink to /mnt which is /dev/vda2. Looks like some labs issue occurred.

Why've you resolved it? :P

It's only "fixed" because the mw app server with the r/o filesystem was pooled yesterday

Don't think it's a labs issue either...

Sorry I though it was on beta for some reason.

Maybe the task should be rephrased as mw1228.eqiad.wmnet has read only /srv/mediawiki-staging ?

T122005 is the ops ticket

This is for scap-purge-l10n-cache handling it badly and apparently hanging in infinitum!

Peter601980 renamed this task from scap-purge-l10n-cache hanging to h.Dec 28 2015, 8:09 AM
Peter601980 closed this task as Invalid.
Peter601980 removed hashar as the assignee of this task.
Peter601980 updated the task description. (Show Details)
Peter601980 removed subscribers: hashar, Aklapper, Reedy.
Aklapper renamed this task from h to scap-purge-l10n-cache hanging.Dec 28 2015, 9:33 AM
Aklapper reopened this task as Open.
Aklapper assigned this task to hashar.
Aklapper updated the task description. (Show Details)
Aklapper added subscribers: hashar, Aklapper, Reedy.
demon subscribed.

I killed l10n-purge in rMSCA537f2ab