Page MenuHomePhabricator

Homer: commit timeout on MX104 and SRXs
Closed, ResolvedPublic

Description

The commit times out and causes the bellow stacktrace.

In addition to fixing the issue we should make the error message more user friendly. Note that Homer continues to the next host as expected.

Type "yes" to commit, "no" to abort.
> yes
INFO:homer.transports.junos:Committing the configuration on cr1-eqsin.wikimedia.org
ERROR:homer:Failed to commit on cr1-eqsin.wikimedia.org
Traceback (most recent call last):
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/junos_eznc-2.2.1-py3.7.egg/jnpr/junos/device.py", line 777, in execute
    ignore_warning=ignore_warning)
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/junos_eznc-2.2.1-py3.7.egg/jnpr/junos/decorators.py", line 116, in wrapper
    rsp = function(self, *args, **kwargs)
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/junos_eznc-2.2.1-py3.7.egg/jnpr/junos/device.py", line 1339, in _rpc_reply
    return self._conn.rpc(rpc_cmd_e)._NCElement__doc
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/ncclient-0.6.6-py3.7.egg/ncclient/manager.py", line 236, in execute
    huge_tree=self._huge_tree).request(*args, **kwds)
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/ncclient-0.6.6-py3.7.egg/ncclient/operations/third_party/juniper/rpc.py", line 49, in request
    return self._request(rpc)
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/ncclient-0.6.6-py3.7.egg/ncclient/operations/rpc.py", line 355, in _request
    raise TimeoutExpiredError('ncclient timed out while waiting for an rpc reply.')
ncclient.operations.errors.TimeoutExpiredError: ncclient timed out while waiting for an rpc reply.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/homer-0.1.2.dev2+g9bd9c7f-py3.7.egg/homer/transports/junos.py", line 83, in commit
    self._device.cu.commit(confirm=2, comment=message)
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/junos_eznc-2.2.1-py3.7.egg/jnpr/junos/utils/config.py", line 149, in commit
    rsp = self.rpc.commit_configuration(*rpc_varg, **rpc_args)
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/junos_eznc-2.2.1-py3.7.egg/jnpr/junos/rpcmeta.py", line 345, in _exec_rpc
    return self._junos.execute(rpc, **dec_args)
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/junos_eznc-2.2.1-py3.7.egg/jnpr/junos/decorators.py", line 76, in wrapper
    return function(*args, **kwargs)
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/junos_eznc-2.2.1-py3.7.egg/jnpr/junos/decorators.py", line 31, in wrapper
    return function(*args, **kwargs)
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/junos_eznc-2.2.1-py3.7.egg/jnpr/junos/device.py", line 781, in execute
    raise EzErrors.RpcTimeoutError(self, rpc_cmd_e.tag, self.timeout)
jnpr.junos.exception.RpcTimeoutError: RpcTimeoutError(host: cr1-eqsin.wikimedia.org, cmd: commit-configuration, timeout: 30)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/homer-0.1.2.dev2+g9bd9c7f-py3.7.egg/homer/__init__.py", line 198, in _device_commit
    connection.commit(device_config, message, callback, self._ignore_warning)
  File "/home/xionox/Documents/Projects/Wikimedia-repos/homer/env/lib/python3.7/site-packages/homer-0.1.2.dev2+g9bd9c7f-py3.7.egg/homer/transports/junos.py", line 89, in commit
    raise HomerError('Failed to commit configuration on {fqdn}'.format(fqdn=self._fqdn)) from e
homer.exceptions.HomerError: Failed to commit configuration on cr1-eqsin.wikimedia.org

Event Timeline

ayounsi triaged this task as Medium priority.Feb 5 2020, 2:39 PM
ayounsi created this task.
Volans moved this task from Backlog to In Progress on the SRE-tools board.

Change 570510 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/homer@master] junos: handle timeouts separately

https://gerrit.wikimedia.org/r/570510

Change 570510 merged by jenkins-bot:
[operations/software/homer@master] junos: handle timeouts separately

https://gerrit.wikimedia.org/r/570510

Change 584689 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/homer@master] junos: retry when a timeout occurs during commits

https://gerrit.wikimedia.org/r/584689

Change 584689 merged by jenkins-bot:
[operations/software/homer@master] junos: retry when a timeout occurs during commits

https://gerrit.wikimedia.org/r/584689

Volans closed this task as Resolved.EditedMar 31 2020, 10:31 AM

With the automatic retry added to homer this problem has been work-arounded. It's now possible to commit to those devices via homer.
The first attempt will commit but fail to commit_check, but it will be automatically retry the commit_check operation that will succeed.
Resolving as the deploy of config to those devices via homer is now possible.

Change 585510 had a related patch set uploaded (by Volans; owner: Volans):
[operations/software/homer@master] commit: do not commit_check on initial empty diff

https://gerrit.wikimedia.org/r/585510

Change 585510 merged by jenkins-bot:
[operations/software/homer@master] commit: do not commit_check on initial empty diff

https://gerrit.wikimedia.org/r/585510