{"name": "target.wtp1022.eqiad.wmnet.checks", "created": 1494275638.324786, "args": [], "msecs": 324.7859477996826, "filename": "checks.py", "levelno": 30, "msg": "Check 'depool' failed: Traceback (most recent call last): File \"/usr/bin/confctl\", line 11, in <module> load_entry_point('conftool==0.4.1', 'console_scripts', 'confctl')() File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 323, in main if not cli.run_action(unit): File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 202, in run_action return self._run_action() File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 112, in _run_action for obj in self.host_list(): File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 192, in host_list objects = [obj for obj in self.entity.query(self.selectors)] File \"/usr/lib/python2.7/dist-packages/conftool/kvobject.py\", line 39, in query for labels in cls.backend.driver.all_keys(cls.base_path()): File \"/usr/lib/python2.7/dist-packages/conftool/drivers/etcd.py\", line 104, in all_keys for el in self._ls(path, recursive=True) if not el.dir] File \"/usr/lib/python2.7/dist-packages/conftool/drivers/__init__.py\", line 81, in _wrapper return fn(*args, **kwdargs) File \"/usr/lib/python2.7/dist-packages/conftool/drivers/etcd.py\", line 118, in _ls raise ValueError(\"{} is not a directory\".format(key)) ValueError: /conftool/v1/pools is not a directory Traceback (most recent call last): File \"/usr/bin/confctl\", line 11, in <module> load_entry_point('conftool==0.4.1', 'console_scripts', 'confctl')() File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 323, in main if not cli.run_action(unit): File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 202, in run_action return self._run_action() File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 112, in _run_action for obj in self.host_list(): File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 192, in host_list objects = [obj for obj in self.entity.query(self.selectors)] File \"/usr/lib/python2.7/dist-packages/conftool/kvobject.py\", line 39, in query for labels in cls.backend.driver.all_keys(cls.base_path()): File \"/usr/lib/python2.7/dist-packages/conftool/drivers/etcd.py\", line 104, in all_keys for el in self._ls(path, recursive=True) if not el.dir] File \"/usr/lib/python2.7/dist-packages/conftool/drivers/__init__.py\", line 81, in _wrapper return fn(*args, **kwdargs) File \"/usr/lib/python2.7/dist-packages/conftool/drivers/etcd.py\", line 118, in _ls raise ValueError(\"{} is not a directory\".format(key)) ValueError: /conftool/v1/pools is not a directory Traceback (most recent call last): File \"/usr/bin/confctl\", line 11, in <module> load_entry_point('conftool==0.4.1', 'console_scripts', 'confctl')() File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 323, in main if not cli.run_action(unit): File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 202, in run_action return self._run_action() File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 112, in _run_action for obj in self.host_list(): File \"/usr/lib/python2.7/dist-packages/conftool/cli/tool.py\", line 192, in host_list objects = [obj for obj in self.entity.query(self.selectors)] File \"/usr/lib/python2.7/dist-packages/conftool/kvobject.py\", line 39, in query for labels in cls.backend.driver.all_keys(cls.base_path()): File \"/usr/lib/python2.7/dist-packages/conftool/drivers/etcd.py\", line 104, in all_keys for el in self._ls(path, recursive=True) if not el.dir] File \"/usr/lib/python2.7/dist-packages/conftool/drivers/__init__.py\", line 81, in _wrapper return fn(*args, **kwdargs) File \"/usr/lib/python2.7/dist-packages/conftool/drivers/etcd.py\", line 118, in _ls raise ValueError(\"{} is not a directory\".format(key)) ValueError: /conftool/v1/pools is not a directory ", "host": "wtp1022.eqiad.wmnet", "lineno": 86, "exc_text": null, "funcName": "handle_failure", "relativeCreated": 4839.227914810181} {"name": "ssh.job", "relativeCreated": 234924.07989501953, "args": [["/usr/bin/scap", "deploy-local", "-v", "--repo", "parsoid/deploy", "-g", "default", "promote", "--refresh-config"], "wtp1022.eqiad.wmnet", 1, ""], "msecs": 349.0018844604492, "funcName": "run_with_status", "levelno": 30, "exc_text": null, "host": "tin", "created": 1494275638.349002, "lineno": 205, "asctime": "20:33:58", "msg": "%s on %s returned [%d]: %s", "message": "['/usr/bin/scap', 'deploy-local', '-v', '--repo', 'parsoid/deploy', '-g', 'default', 'promote', '--refresh-config'] on wtp1022.eqiad.wmnet returned [1]: ", "filename": "ssh.py"
Description
Event Timeline
This seems to be an EtcD hiccup and/or failure. Some nodes failed to be depooled, while others failed to be repooled, always with the same track trace (ValueError: /conftool/v1/pools is not a directory). Any hints/ideas, @Joe ?
Interestingly enough, I have just deployed RB and encountered no problems, even though @Arlolra tried force-deploying Parsoid a second time without luck.
so, after some digging, I found out that conf2002.codfw.wmnet had, for some reason, auth enabled on etcd (while we now just proxy through nginx) and moreover only had the root user available. The most probable cause is me doing something wrong when disabling auth in eqiad during the conversion of that cluster.
So when calls from deploy-service were done, they were performed as user conftool, which was not existing on the machine, resulting in a 401 error that, on a recursive read, gave the error you found.
The difference in outcomes depended on which server was served first to confctl in the SRV record, so the difference between clusters has nothing to do with the different clusters and all to do with timing.
I just confirmed no server has auth enabled anymore with sudo cumin 'R:class = role::configcluster' 'curl -L http://127.0.0.1:2378/v2/auth/enable 2>/dev/null'
Parsoid has been successfully deployed now, so I'm declaring this resolved. Thnx @Joe for looking into it and fixing!