Page MenuHomePhabricator

itWiki's orfanizzabot interrupted after Mon Mar 15 12:20:11 2021 for no reason
Closed, ResolvedPublic

Description

For the record today @Parma1983 reported that the Toolforge's itwiki bot called OrfanizzaBot was not running.

Here the latest log lines:

$ become itwiki
$ tail -f /data/project/itwiki/orphanizerbot/log.err
[Mon Mar 15 12:20:11 2021] there is a job named 'itwiki-orphanizerbot-gridnamefix2' already active
[Mon Mar 15 12:22:01 2021] there is a job named 'itwiki-orphanizerbot-gridnamefix2' already active
[Thu Mar 18 01:00:12 2021] there is a job named 'itwiki-orphanizerbot-gridnamefix2' already active
[Thu Mar 18 01:02:02 2021] there is a job named 'itwiki-orphanizerbot-gridnamefix2' already active
[Thu Mar 18 01:04:02 2021] there is a job named 'itwiki-orphanizerbot-gridnamefix2' already active
[Thu Mar 18 01:06:03 2021] there is a job named 'itwiki-orphanizerbot-gridnamefix2' already active
[Thu Mar 18 01:08:01 2021] there is a job named 'itwiki-orphanizerbot-gridnamefix2' already active
[Thu Mar 18 01:10:10 2021] there is a job named 'itwiki-orphanizerbot-gridnamefix2' already active

The script was stuck for ~20 days without any reason. Probably related to T151603: Grid jobs often stuck after Tool Labs maintenance.

I've fixed with a quick and dirty workaround, just calling jstop before jstart -once, in order by be really really sure that the script is never pending because of service interruptions in the underlying NFS filesystem (where the logs are saved).

Event Timeline

valerio.bozzolan triaged this task as Unbreak Now! priority.
valerio.bozzolan moved this task from Backlog to OrfanizzaBot on the Tool-itwiki board.