The Toolforge grid migration from Stretch to Buster implies we are migrating away from the tools.eqiad.wmflabs domain into tools.eqiad1.wikimedia.cloud. In toolsbeta, the whole grid uses toolsbeta.eqiad1.wikimedia.cloud already.
CURRENT STATUS AS OF TASK FILLING:
- grid master == stretch
- grid shadow == buster
Some issues detected:
- the hiera key that stores the master will need refresh when the grid master is migrated to buster, for example:
-sonofgridengine::gridmaster: tools-sgegrid-master.tools.eqiad.wmflabs +sonofgridengine::gridmaster: tools-sgegrid-master.tools.eqiad1.wikimedia.cloud
- the grid master is somehow encoding the domain in the configuration:
aborrero@tools-sgegrid-master:~$ sudo qconf -ss | grep sgegrid tools-sgegrid-master.tools.eqiad.wmflabs tools-sgegrid-shadow.tools.eqiad.wmflabs aborrero@tools-sgegrid-master:~$ sudo qconf -dh tools-sgegrid-shadow.tools.eqiad.wmflabs can't resolve hostname "tools-sgegrid-shadow.tools.eqiad.wmflabs" aborrero@tools-sgegrid-master:~$ sudo qconf -ah tools-sgegrid-shadow.tools.eqiad1.wikimedia.cloud tools-sgegrid-shadow.tools.eqiad1.wikimedia.cloud added to administrative host list aborrero@tools-sgegrid-master:~$ sudo qconf -ss | grep sgegrid tools-sgegrid-master.tools.eqiad.wmflabs tools-sgegrid-shadow.tools.eqiad.wmflabs aborrero@tools-sgegrid-shadow:~$ qstat -f error: commlib error: access denied (server host resolves rdata host "tools-sgegrid-shadow.tools.eqiad1.wikimedia.cloud" as "tools-sgegrid-shadow.tools.eqiad.wmflabs") error: unable to contact qmaster using port 6444 on host "tools-sgegrid-master.tools.eqiad.wmflabs"
- I've detected a few places where this might be hardcoded:
aborrero@tools-sgegrid-master:/var/lib/gridengine$ cat default/common/shadow_masters tools-sgegrid-master.tools.eqiad.wmflabs tools-sgegrid-shadow.tools.eqiad1.wikimedia.cloud tools-sgegrid-shadow.tools.eqiad.wmflabs
- the master daemon insists on the old domain:
Mar 23 12:46:07 tools-sgegrid-master sge_qmaster[3764]: can't resolve host name "tools-sgegrid-shadow.tools.eqiad.wmflabs": undefined commlib error code Mar 23 12:46:07 tools-sgegrid-master sge_qmaster[3764]: can't resolve host name "tools-sgegrid-shadow.tools.eqiad.wmflabs": undefined commlib error code