Page MenuHomePhabricator

cumin could use randomization/splay options
Closed, DeclinedPublic

Description

I was just thinking, it would be handy to have a flag to ask cumin to re-order the execution strategy.

Use case: I want to execute command "foo", which is only slightly disruptive, across all 100x cache nodes. If cumin steps through them in linear node order, this is not ideal for minimizing the disruption within each site and/or cluster.

The first and probably-easiest one would be a "random" option: just shuffle the execution order randomly.

Another interesting tactic would be to have a mode that tries to be smart about interleaving sites based on our standard NNNN pattern. This is like "split the list into sublists based on the first digit of the NNNN part, shuffle each sublist randomly, then zip them back together" so the list ends up like cp1045,cp2007,cp3003,cp4014,cp1071,...

Event Timeline

@BBlack Thanks for opening this feature request, because right now it's totally implementation dependent and actually I realized this is neither clear nor explained in the docs / readme.

The TL;DR is that right now it depends if batches (-b) are used or not.

  • With batches: the order is somehow randomized due to access to a python dictionary (see the Python2 implementation note), see the table at the bottom.
  • Without batches: the selection is passed as is to ClusterShell and the execution is pretty much ordered. The pretty much is due to the fact that ClusterShell in turn uses the fanout limit (for the max child to fork at any given time) that right now is left at it's default value of 64, and when going over that it might alter a bit the order. Over ~100 hosts I've seen the first 2 in the order being actually picked up at the end, while all the others were executed in order.

I'm leaning to force the randomness on all cases and add a --ordered (or similar) option to force the execution in order (although I need to check how to do that in the case without batches).

Regarding the NNNN specific implementation, given the generic nature of Cumin, I'd rather not add it into the tool itself but maybe consider the possibility to allow to specify custom filters where we could have a custom implementation for the sorted and shuffle algorithms.

Thoughts?

To give a practical example, here the order of execution I got on all cp* hosts with -b 1:

cp3047
cp4019
cp4007
cp4002
cp2015
cp2016
cp2023
cp4017
cp1045
cp1064
cp1051
cp2019
cp3048
cp2010
cp1066
cp2024
cp4014
cp2006
cp2005
cp1063
cp4012
cp3008
cp2004
cp3049
cp1052
cp2022
cp4009
cp4020
cp2003
cp4021
cp4001
cp3035
cp4003
cp2021
cp2013
cp4004
cp4005
cp3040
cp3031
cp1068
cp3004
cp1062
cp4010
cp4011
cp3043
cp2026
cp1008
cp3039
cp1048
cp2018
cp1067
cp3006
cp2014
cp1047
cp1054
cp4006
cp2020
cp1060
cp1061
cp3005
cp1074
cp1065
cp1053
cp2025
cp3003
cp2012
cp2017
cp3033
cp1046
cp2007
cp4008
cp1050
cp3034
cp3045
cp4016
cp1073
cp1072
cp1049
cp4013
cp3010
cp2001
cp3041
cp3036
cp2011
cp3037
cp1099
cp2009
cp3038
cp3032
cp3046
cp3007
cp4015
cp2002
cp4018
cp3042
cp1055
cp1059
cp1058
cp2008
cp3044
cp1071
cp3030

Honorable mention in the ugly-but-useful category, here's a command to get the list of cp-text nodes in decreasing DC order (cp5, cp4, cp3, cp2, cp1):

paste -d '\n' <(nodeset -e -S '\n' $(sudo cumin 'A:cp-text_eqsin' 2>&1 | grep ^cp)) <(nodeset -e -S '\n' $(sudo cumin 'A:cp-text_ulsfo' 2>&1 | grep ^cp)) <(nodeset -e -S '\n' $(sudo cumin 'A:cp-text_esams' 2>&1 | grep ^cp)) <(nodeset -e -S '\n' $(sudo cumin 'A:cp-text_codfw' 2>&1 | grep ^cp)) <(nodeset -e -S '\n' $(sudo cumin 'A:cp-text_eqiad' 2>&1 | grep ^cp)) | grep ^cp > /tmp/cache-text.nodes

Repeat with s/text/upload/g to get the list of upload nodes.

Run two while loops (with enough sleep between the iterations of course!) in parallel to perform disruptive work such as reboots on text/upload concurrently.

@BBlack Thanks for opening this feature request, because right now it's totally implementation dependent and actually I realized this is neither clear nor explained in the docs / readme.

The TL;DR is that right now it depends if batches (-b) are used or not.

  • With batches: the order is somehow randomized due to access to a python dictionary (see the Python2 implementation note), see the table at the bottom.
  • Without batches: the selection is passed as is to ClusterShell and the execution is pretty much ordered. The pretty much is due to the fact that ClusterShell in turn uses the fanout limit (for the max child to fork at any given time) that right now is left at it's default value of 64, and when going over that it might alter a bit the order. Over ~100 hosts I've seen the first 2 in the order being actually picked up at the end, while all the others were executed in order.

I'm leaning to force the randomness on all cases and add a --ordered (or similar) option to force the execution in order (although I need to check how to do that in the case without batches).

Regarding the NNNN specific implementation, given the generic nature of Cumin, I'd rather not add it into the tool itself but maybe consider the possibility to allow to specify custom filters where we could have a custom implementation for the sorted and shuffle algorithms.

Thoughts?

It seems like it'd be relatively straight forward to implement amking the order random by default, --ordered does a sort on the list, and the ability to pass a lanbda to --sort in order to sort by various aspects of the hostname, so for example one could --ordered --sort 'lambda x: x[2:]' to sort by the hostnames from the post-datacenter part of the hostname, so the cross-datacenter order would be maximized.

After looking into this a bit, the details of how this would be done are a bit involved; since internally cumin uses a NodeSet from clustershell, which acts like a set(), the order is 'unspecified' (semi-random). If we want it to be more random, we'd have to I think convert it into a list and randomize it before batching. If we want to apply sorting, the same is true. I am told this is a relatively unimportant change, but it doesn't seem super complicated to implement if there is demand or this would reduce toil.

See also T224097 for a similar use case.

Given the lack of interest in the last few years I'm resolving this as declined. With the current flexibility available in the cookbooks is very easy to write any custom logic that is specific to each cluster to do the right thing. While implementing this into cumin in a WMF-agnostic way would require a much bigger abstraction effort making it harder to be used by the clients.