It would be useful to be able to run commands in parallel on network devices.
At least for diffs, show (in T250413).
And maybe later down the road for commits.
It would be useful to be able to run commands in parallel on network devices.
At least for diffs, show (in T250413).
And maybe later down the road for commits.
I'm wondering if we could look at prioritizing this work. With new network devices arriving in codfw, we're reaching the limit of configuring network devices one after the others serially.
FYI, this limitation is becoming more and more problematic for deploying a change to the whole infra.
I had a quick look to understand our options in terms of parallelization. Keeping in mind the usual 3 possible approaches: multi-process, multi-thread, async.
The ncclient library seems to support "async" RPC calls in a very simple and non-pythonic way, basically just returning immediately and letting the client implement a polling logic to check when the answer is there.
But the py-junos-eznc library doesn't seem to support it. In their 1.0 release they clearly stated:
* Command execution is synchronous and blocking. The underlying NETCONF transport library is the ncclient module. If your application requires asynchronous or nonblocking execution logic, you should investigate other libraries to wrap around the PyEZ framework such as Twisted or Python Threads.
They are now at v2.7.2 but there is no mention of async since then, so it's unlikely that it supports it and even if there was a way we can't just inject the async_mode parameter to the underlying ncclient because it would need to be handled differently by the Junos library.
It's probably worth another look, just to be sure.
Of course we could make some async stuff in homer to wrap the juniper calls and "make them look like they are async".
We should check how much the various underlying libraries are advertised as thread-safe or not and see how feasible it would be. If chosen we could use the concurrent.futures higher framework [1] to ease the work.
At first I would discard this option for the overhead of the communication between the processes and the multiple python
If chosen we could use the concurrent.futures higher framework [1] to ease the work.
[1] https://docs.python.org/3/library/concurrent.futures.html
One observation is that the config generation could be parallelized separate to the router transport.
i.e. once the globbing on hostnames is done spawn separate threads to build all the conf files, then push out to the devices as we do currently.
Obviously not the end-game state but would probably improve things significantly.
Sorry for not mentioning it, the parallelization of the configuration generation was implicit to me, and also easier, but ideally we should parallelize both and hence we should find the common ground parallelization approach for both if possible to prevent having N different ways of parallelizing things in the same tools ;)
I had a chat with Riccardo about a possible first change that could help one of the use cases mentioned (a sort of version-0 of the final solution) could be simply to implement (hopefully) and give some relief for tedious tasks. IIUC at the moment if an admin needs to add a config to multiple devices (if not all, like adding a new user/ssh-key) then they will need to input "Y" for 90-ish times, one for each device, even if the diff is the same. One idea could be to ask Y the first time, and "cache" the (diff/Yes) combination to avoid re-asking if the diff is the same on the next device. Ideally we could add all the diffs at the beginning, asking the admin to input Y only few times in a row and then forget about it until homer finishes, something more complicated that we could do for version 1 later on.
Possible issue:
To avoid this consistency problem we could do the diff every time for all devices, and apply only the "cached" config if the diff doesn't report anything strange.
Lemme know if you like the version-0 idea, if so I can start working on a patch :)
It's necessary to do the diff on all target devices anyway, so that behavior is fine.
For example, if we run homer "*ulsfo*" commit "foo" to change a SSH key
it will iterate over all the devices, first device will display a diff, SRE type "yes" to commit, then that diff will be saved. If for any other device the diff is similar it will automatically commit it. If it's different, it will ask if yes/no it should be committed and cached for any other possible similar case.
In the implementation, I could see 3 options :
We can also decide that batch means to silently skip any device that have a different diff, to not risk blocking the run in the middle of it if a device have local changes
I like the last proposal but I was thinking that there is an additional case:
While 1 and 2 are clearly yes and no, I'm not sure about the naming of the remaining two, batch-skip and batch-ask don't seem good choices ;)
Yeah I think it's what I tried to mean with
We can also decide that batch means to silently skip any device that have a different diff, to not risk blocking the run in the middle of it if a device have local changes
Basically decide if the batch behavior is (3) or (4) and then stick to it. 4 options seems a bit too much.
I tend to prefer (3), and would be ok to not support (4), especially as in a good state there should be no local changes.
For my part I like “3” as set out by Volans above.
@ayounsi is your proposal that “batch” would be a valid answer (in addition to yes/no) when presented with a diff, indicating “yes, and yes to any others the same”? Interesting idea, could work well.