Page MenuHomePhabricator

Support restarting grrrit-wm automatically when we restart production gerrit
Closed, ResolvedPublic

Description

I have managed to figure out how we can restart the grrrit-wm.

if( text.indexOf('!grrrit-wm-restart') !== -1 && whitelist.indexOf(from) >= 0 ||
  text.indexOf('grrrit-wm: restart') !== -1 && whitelist.indexOf(from) >= 0
) {
    console.log(from + ' => ' + to  + ' ' + text);
    logging.info('Connecting to gerrit..');
    ircClient.say(to, "re-connecting to gerrit");

    conns.end();
  
    logging.info('reconnected to gerrit');

    ircClient.say(to, "reconnected to gerrit");
}

if(text.indexOf('!grrrit-wm-force-restart') !== -1 && whitelist.indexOf(from) >= 0 ||
    text.indexOf('grrrit-wm: force-restart') !== -1 && whitelist.indexOf(from) >= 0
) {
    console.log(from + ' => ' + to  + ' ' + text);
    logging.info('Connecting to gerrit..');
    ircClient.say(to, "re-connecting to gerrit");

    conns.end();
    startRelay();
  
    logging.info('reconnected to gerrit');

    ircClient.say(to, "reconnected to gerrit");

^^ that's a test message and will be updated once we finished testing.

For this to happen, production gerrit will need to send logmsgbot a message that grrrit-wm will get to restart.

Related Objects

Event Timeline

Easy fix have lolrrrit send a message in -labs or saying !log grrrit-wm restarting for maintenance (or whatever) then having it wait 5-10 seconds to actually disconnect

Yes, i asked for this ticket to implement exactly that. The fix isn't as trivial as you make it sound though. First of all there need to be ferm changes to allow logmsg bot messages coming from the gerrit servers.

This comment was removed by Zppix.

Dzahn couldn't we just do what i suggested it's doing the same thing no?

Change 318976 had a related patch set (by Paladox) published:
Adds a grrrit-wm restarting command for you to type in irc

https://gerrit.wikimedia.org/r/318976

Easy fix have lolrrrit send a message in -labs or saying !log grrrit-wm restarting for maintenance (or whatever) then having it wait 5-10 seconds to actually disconnect
Dzahn couldn't we just do what i suggested it's doing the same thing no?

Yes, we have the same idea here. But i'm talking about the part how a gerrit service restart triggers the log bot message to appear on channel. One requirement for that is that the firewall rules allow log messages from the gerrit server to tcpircbot (also see https://gerrit.wikimedia.org/r/#/c/316497/) , which opens another dependency on https://gerrit.wikimedia.org/r/#/c/317192/ etc

grrrit-wm can post a !log message on his own, like wikibugs does. That would not require firewall changes for logmsgbot.

And how would that be triggered from gerrit, when the whole point of needing the restart of the bot is. that it doesnt receive updates from gerrit anymore?

And how would that be triggered from gerrit, when the whole point of needing the restart of the bot is. that it doesnt receive updates from gerrit anymore?

If statements exist for a reason, if the bot doesnt get any response or any feed data from gerrit in X amount of time it will automatically alert operations and/or auto restart gerrit and send a log command for the bot to send to sal.

To be clear, we need to restart the stream events gerrit listener, NOT the IRC portion of the bot.

@Legoktm I've figured out how to do that. I found an npm library that can restart the whole script.

@Legoktm ok ive fixed everything https://gerrit.wikimedia.org/r/318976 now. It's ready to be merged but needs to be reviewed by someone who is not me :).

All tested on a separate bot too.

This comment was removed by Paladox.

@Legoktm ok restarting only the ssh connection is supported now.

You will need to be in the whitelist to run the following commands from irc.

!grrrit-wm-restart

grrrit-wm: restart

@Dzahn helped me complete this ;)

Im also currently working on a whitelist code to let certain users add people to whitelist for admin commands and such (T149689)

Why are you seeking to restart the bot entirely and or adding a user facing command to manually restart it ? The bot ssh to Gerrit and runs the gerrit stream-events command which emits json events.

The ssh2 lib must be raising an exception of some sort whenever the connection is terminate when Gerrit is restarted, hence one can catch it and restart the relay/subscribe again. Quickly looking at the code, it just abort whenever some error occurs which is really not resilient.

Auto reconnection directly in the bot would be way easier to manage compared to adding some user command, handle whitelisting and have some human to actually issue the command on irc. Lets make the bot stronger instead !?

@hashar As you may already know we are actually doing not that we are attempting (still WIP) to get ssh to gerrit stream to reconnect via command.

@hashar As you may already know we are actually doing not that we are attempting (still WIP) to get ssh to gerrit stream to reconnect via command.

I read this sentence three times and don't understand it. :( Can you rephrase?

@hashar ok it now can automatically restart the ssh connection if it detects that it was dropped. I.E. if gerrit restarted it would drop the ssh connection.

I have kept the irc command for the reason to have as a back up if the ssh gets stuck, it won't automatically get restarted. so issuing the irc command will do this for us.

(Please consider proofreading comments before adding them. I have no idea what "Soniasuing" is.)

Change 318976 merged by jenkins-bot:
Adds a grrrit-wm restarting command for you to type in irc

https://gerrit.wikimedia.org/r/318976

The bot should now automatically try and reconnect to ssh. We will see if this works with prod gerrit when it is next restarted. In my testing this works.

But there is a irc command just in case it dosent reconnect to ssh.

grrrit-ww: restart

have no clue why this isnt marked resolved