Page MenuHomePhabricator

Migrate zuul-server behind systemd service
Closed, ResolvedPublic

Description

I failed to restart Zuul ( T167833 ). Seems there is some bad interaction between systemd and the legacy init script.

zuul-merger has been migrated already, so I guess it is time to move zuul-server behind systemd :-}

Event Timeline

Change 359016 had a related patch set uploaded (by Paladox; owner: Paladox):
[operations/puppet@production] Zuul: Add systemd script for zuul

https://gerrit.wikimedia.org/r/359016

greg subscribed.

(meta: moving to Release-Engineering-Team (Watching / External) as this is not being worked on by a RelEng team member, which is what the kanban board is for (tracking RelEng team work). thanks for your work on this, @Paladox )

Change 359016 had a related patch set uploaded (by Paladox; owner: Paladox):
[operations/puppet@production] Zuul: Add systemd script for zuul

https://gerrit.wikimedia.org/r/359016

Change 359016 had a related patch set uploaded (by Paladox; owner: Paladox):
[operations/puppet@production] Zuul: Add systemd script for zuul

https://gerrit.wikimedia.org/r/359016

Change 359016 merged by Dzahn:
[operations/puppet@production] Zuul: Add systemd script for zuul

https://gerrit.wikimedia.org/r/359016

Dzahn closed this task as Resolved.EditedSep 15 2017, 11:27 PM

17:17 <+wikibugs> (PS24) Paladox: Zuul: Add systemd script for zuul [puppet] - https://gerrit.wikimedia.org/r/359016 (https://phabricator.wikimedia.org/T167833)
17:25 <+wikibugs> (CR) Dzahn: "http://puppet-compiler.wmflabs.org/7900/" [puppet] - https://gerrit.wikimedia.org/r/359016 (https://phabricator.wikimedia.org/T167833) (owner: Paladox)
17:46 <+wikibugs> (CR) Dzahn: [C: 2] "already had a +1 from Hashar earlier, compiler output looks good now, going to test on contint2001 first, where zuul-service is currently " [puppet] - https://gerrit.wikimedia.org/r/359016 (https://phabricator.wikimedia.org/T167833) (owner: Paladox)
17:47 < mutante> !log contint1001 - tmp disable puppet - contint2001 - test zuul unit file (gerrit 359016)
17:47 < hashar> mutante: yeah paladox kind of aced that change :]
17:48 < mutante> hashar: heh there you are, so i wanted to let you know zuul.service was already dead on contint2001
17:48 < mutante> but not zuul-merger
17:48 < hashar> mutante: so yeah contint2001 has the zuul-merger active so we got two of them
17:48 < hashar> but zuul-server is masked/inactive or whatever . That is the spare
17:49 < hashar> in case contint1001 explodes
17:49 < mutante> hashar: ok, so then it is as it should be :) but i am also not breaking anything if zuul starts on 2001, right
17:49 < hashar> I am not sure what will happen :]
17:49 < hashar> maybe we will end up with two zuul
17:50 < paladox> there's always a chance it will start but then fail as a test is started.
17:50 < paladox> though if it stays running, that's a good sign it will work.
17:50 < mutante> !log contint2001 - systemctl start zuul
17:50 < mutante> just to confirm the unit file works, right
17:52 < mutante> it's active (running) on both, but i can stop it again
17:52 < mutante> but yea, it does work
17:52 < paladox> :)
17:53 < hashar> when you migrate the one to cont1001 , make sure it still emits to statsd
17:53 < mutante> enables puppet again on contint1001 and lets the change apply there
17:53 < hashar> after a few minutes there should still be graph at the bottom https://integration.wikimedia.org/zuul/ ( after refreshing the page)
17:53 < mutante> hashar: ok!
17:54 < mutante> hashar: would you prefer i stop it on 2001 again for now?
17:54 < hashar> yup
17:54 < hashar> though puppet might stop it for you
17:54 < mutante> done. stopped
17:55 < mutante> runs puppet to confirm it stays stopped. yes it does
18:02 < mutante> the graph still looks fine
18:02 < mutante> the status is fine... all good afaict
18:05 < hashar> paladox: mutante: congratulations :]

Change 378664 had a related patch set uploaded (by Hashar; owner: Hashar):
[operations/puppet@production] Allow contint-admins to interact with zuul service

https://gerrit.wikimedia.org/r/378664

contint-admins can not interact with the zuul service anymore since that now requires sudo/root.

Change 378665 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] fab: reload zuul via systemd

https://gerrit.wikimedia.org/r/378665

Change 378664 merged by Dzahn:
[operations/puppet@production] Allow contint-admins to interact with zuul service

https://gerrit.wikimedia.org/r/378664

Change 379186 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] fab: change sudo prefix

https://gerrit.wikimedia.org/r/379186

Change 379186 merged by jenkins-bot:
[integration/config@master] fab: change sudo prefix

https://gerrit.wikimedia.org/r/379186

Change 378665 merged by jenkins-bot:
[integration/config@master] fab: reload zuul via systemd

https://gerrit.wikimedia.org/r/378665

There were some follow up patches required but that is now completed. Thank you @Paladox