Page MenuHomePhabricator

Parsoid deploy failed
Closed, ResolvedPublic

Description

{"name": "target.wtp1022.eqiad.wmnet.checks", "created": 1494275638.324786, "args": [], "msecs": 324.7859477996826, "filename": "checks.py", "levelno": 30, "msg": "Check 'depool' failed: Traceback (most recent call last):
  File \"/usr/bin/confctl\", line 11, in <module>
    load_entry_point('conftool==0.4.1', 'console_scripts', 'confctl')()
  File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 323, in main
    if not cli.run_action(unit):
  File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 202, in run_action
    return self._run_action()
  File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 112, in _run_action
    for obj in self.host_list():
  File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 192, in host_list
    objects = [obj for obj in self.entity.query(self.selectors)]
  File \"/usr/lib/python2.7/dist-packages/conftool/kvobject.py\", line 39, in query
    for labels in cls.backend.driver.all_keys(cls.base_path()):
  File \"/usr/lib/python2.7/dist-packages/conftool/drivers/etcd.py\", line 104, in all_keys
    for el in self._ls(path, recursive=True) if not el.dir]
  File \"/usr/lib/python2.7/dist-packages/conftool/drivers/__init__.py\", line 81, in _wrapper
    return fn(*args, **kwdargs)
  File \"/usr/lib/python2.7/dist-packages/conftool/drivers/etcd.py\", line 118, in _ls
    raise ValueError(\"{} is not a directory\".format(key))
ValueError: /conftool/v1/pools is not a directory
Traceback (most recent call last):
  File \"/usr/bin/confctl\", line 11, in <module>
    load_entry_point('conftool==0.4.1', 'console_scripts', 'confctl')()
  File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 323, in main
    if not cli.run_action(unit):
  File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 202, in run_action
    return self._run_action()
  File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 112, in _run_action
    for obj in self.host_list():
  File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 192, in host_list
    objects = [obj for obj in self.entity.query(self.selectors)]
  File \"/usr/lib/python2.7/dist-packages/conftool/kvobject.py\", line 39, in query
    for labels in cls.backend.driver.all_keys(cls.base_path()):
  File \"/usr/lib/python2.7/dist-packages/conftool/drivers/etcd.py\", line 104, in all_keys
    for el in self._ls(path, recursive=True) if not el.dir]
  File \"/usr/lib/python2.7/dist-packages/conftool/drivers/__init__.py\", line 81, in _wrapper
    return fn(*args, **kwdargs)
  File \"/usr/lib/python2.7/dist-packages/conftool/drivers/etcd.py\", line 118, in _ls
    raise ValueError(\"{} is not a directory\".format(key))
ValueError: /conftool/v1/pools is not a directory
Traceback (most recent call last):
  File \"/usr/bin/confctl\", line 11, in <module>
    load_entry_point('conftool==0.4.1', 'console_scripts', 'confctl')()
  File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 323, in main
    if not cli.run_action(unit):
  File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 202, in run_action
    return self._run_action()
  File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 112, in _run_action
    for obj in self.host_list():
  File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 192, in host_list
    objects = [obj for obj in self.entity.query(self.selectors)]
  File \"/usr/lib/python2.7/dist-packages/conftool/kvobject.py\", line 39, in query
    for labels in cls.backend.driver.all_keys(cls.base_path()):
  File \"/usr/lib/python2.7/dist-packages/conftool/drivers/etcd.py\", line 104, in all_keys
    for el in self._ls(path, recursive=True) if not el.dir]
  File \"/usr/lib/python2.7/dist-packages/conftool/drivers/__init__.py\", line 81, in _wrapper
    return fn(*args, **kwdargs)
  File \"/usr/lib/python2.7/dist-packages/conftool/drivers/etcd.py\", line 118, in _ls
    raise ValueError(\"{} is not a directory\".format(key))
ValueError: /conftool/v1/pools is not a directory
", "host": "wtp1022.eqiad.wmnet", "lineno": 86, "exc_text": null, "funcName": "handle_failure", "relativeCreated": 4839.227914810181}
{"name": "ssh.job", "relativeCreated": 234924.07989501953, "args": [["/usr/bin/scap", "deploy-local", "-v", "--repo", "parsoid/deploy", "-g", "default", "promote", "--refresh-config"], "wtp1022.eqiad.wmnet", 1, ""], "msecs": 349.0018844604492, "funcName": "run_with_status", "levelno": 30, "exc_text": null, "host": "tin", "created": 1494275638.349002, "lineno": 205, "asctime": "20:33:58", "msg": "%s on %s returned [%d]: %s", "message": "['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'parsoid/deploy', '-g', 'default', 'promote', '--refresh-config'] on wtp1022.eqiad.wmnet returned [1]: ", "filename": "ssh.py"

Event Timeline

mobrovac added a subscriber: Joe.

This seems to be an EtcD hiccup and/or failure. Some nodes failed to be depooled, while others failed to be repooled, always with the same track trace (ValueError: /conftool/v1/pools is not a directory). Any hints/ideas, @Joe ?

Interestingly enough, I have just deployed RB and encountered no problems, even though @Arlolra tried force-deploying Parsoid a second time without luck.

This is annoying, this is preventing me from deploying Graphoid now...

Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.
Qse24h closed this task as a duplicate of T164723: New git repository: <repo name>.

so, after some digging, I found out that conf2002.codfw.wmnet had, for some reason, auth enabled on etcd (while we now just proxy through nginx) and moreover only had the root user available. The most probable cause is me doing something wrong when disabling auth in eqiad during the conversion of that cluster.

So when calls from deploy-service were done, they were performed as user conftool, which was not existing on the machine, resulting in a 401 error that, on a recursive read, gave the error you found.

The difference in outcomes depended on which server was served first to confctl in the SRV record, so the difference between clusters has nothing to do with the different clusters and all to do with timing.

I just confirmed no server has auth enabled anymore with sudo cumin 'R:class = role::configcluster' 'curl -L http://127.0.0.1:2378/v2/auth/enable 2>/dev/null'

mobrovac triaged this task as Medium priority.

Parsoid has been successfully deployed now, so I'm declaring this resolved. Thnx @Joe for looking into it and fixing!