Page MenuHomePhabricator

@ReleaseTaggerBot down?
Closed, ResolvedPublic

Description

@ReleaseTaggerBot is not tagging Tasks since 17:00 UTC this morning, and has skipped tasks like T128553 which its patch was merged at 12:21 (so it should have been tagged at 13:00 run or 14:00 run). Only 7 tasks tagged today and its functioning today looks kind of erratic (just one task tagged per run). Restart? Thanks.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 24 2017, 7:47 PM
Traceback (most recent call last):
  File "/data/project/forrestbot/forrestbot/forrestbot.py", line 213, in <module>
    add_PHIDs.add(get_slug_PHID(slug))
  File "/mnt/nfs/labstore-secondary-tools-project/forrestbot/venv/lib/python3.4/functools.py", line 472, in wrapper
    result = user_function(*args, **kwds)
  File "/data/project/forrestbot/forrestbot/forrestbot.py", line 82, in get_slug_PHID
    raise Exception("No PHID found for slug #%s!" % slug)
Exception: No PHID found for slug #mw1.31.0-wmf6!
Traceback (most recent call last):
  File "/data/project/forrestbot/forrestbot/forrestbot.py", line 77, in get_slug_PHID
    .values()
AttributeError: 'list' object has no attribute 'values'

But... #mw1.31.0-wmf6 -> MW-1.31-release-notes (WMF-deploy-2017-10-31 (1.31.0-wmf.6))

@Legoktm @Jdforrester-WMF The bot is still behaving erratically. Is this T151725 or a new different bug? Thanks.

@Legoktm @Jdforrester-WMF The bot is still behaving erratically.

Is it? It seemed to run 46 minutes ago as expected.

I think there's a broken email or repo somewhere, and it's able to process anything up to that point. https://tools.wmflabs.org/forrestbot/log.txt shows the constant exceptions.

Hmm, I can't decypher which is the broken thing. Maybe code-in to skip the failing task and report the failures with complete data elsewhere for proper debug? The bot complains that there's no PHID for MW-1.31-release-notes (WMF-deploy-2017-10-31 (1.31.0-wmf.6)) but the project indeed has tasks on it so the bot could add it to some https://phabricator.wikimedia.org/maniphest/query/fo2QQu3BOQyu/#R.

Could it be a private task to which the bot do not have access the reason why it is failing? If we don't find the task, I suggest that ReleaseTaggerBot should: a) not fatal on this kind of errors and continue tagging other tasks (reporting which task(s) couldn't be tagged on the log) and b) skip after x errors to avoid flooding the logs. Should I open new tasks for these proposals?

valhallasw added a comment.EditedNov 28 2017, 6:30 PM

AttributeError: 'list' object has no attribute 'values' suggests the API returns a list rather than a dict -- but that might just be PHP for 'an empty array'. It's possible that this is due to a task not being visible -- we'd need to improve logging to see that.

I strongly disagree the suggestion to continue and/or skip -- the reason to crash-and-halt is that otherwise errors are effectively silent, and we would never be forced to resolve them.

The skip option I suggested was to be bundled with an exception log, but I can understand your concerns.

Suggest looking for Security tasks resolved in wmf.6 and see which is missing the tags.

I also suggest to perform a security-review of @gerritbot and @ReleaseTaggerBot and allow them to subscribe to restricted tasks so they can comment/tag stuff and not fail on them.

Looking into this, I think the bot is functioning? Part of the confusion is the split between error and regular log files - now exposed as https://tools.wmflabs.org/forrestbot/forrestbot.err.txt vs https://tools.wmflabs.org/forrestbot/forrestbot.log.txt.

I added a bunch of logging to https://gerrit.wikimedia.org/r/#/c/394133/ which should also help.

As RTB seems to be processing emails, I suggest to close this for now.

https://gerrit.wikimedia.org/r/#/c/393746/ didn't receive any tag @valhallasw so I guess the bot is still buggy on the release in which that change went?

Update: As you can see at @ReleaseTaggerBot activity feed, the bot is down since January, 11@02:00 being T184631#3891909 its last edit.

2018-01-13 08:00:29,748 - forrestbot - ERROR - Releasetaggerbot crashed while processing messages
Traceback (most recent call last):
  File "/data/project/forrestbot/forrestbot/forrestbot.py", line 245, in <module>
    main()
  File "/data/project/forrestbot/forrestbot/forrestbot.py", line 191, in main
    action = process_mail(mail)
  File "/data/project/forrestbot/forrestbot/forrestbot.py", line 131, in process_mail
    if proj not in get_repos_to_watch():
  File "/mnt/nfs/labstore-secondary-tools-project/forrestbot/venv/lib/python3.4/functools.py", line 472, in wrapper
    result = user_function(*args, **kwds)
  File "/data/project/forrestbot/forrestbot/forrestbot.py", line 66, in get_repos_to_watch
    for skin in conf['skins']:
KeyError: 'skins'

Thank you.

Change 404092 had a related patch set uploaded (by Merlijn van Deen; owner: Merlijn van Deen):
[labs/tools/forrestbot@master] Update for new make-wmf-branch logic

https://gerrit.wikimedia.org/r/404092

Change 404092 merged by jenkins-bot:
[labs/tools/forrestbot@master] Update for new make-wmf-branch logic

https://gerrit.wikimedia.org/r/404092

Huji added a subscriber: Huji.EditedMar 6 2018, 2:32 AM

This issue seems to have occurred again. Tasks like T71492, T58784, T180194 and T187169 which have their patches merged are not tagged by the bot. Patches like https://gerrit.wikimedia.org/r/#/c/373119/ which were merged in Feb 27 are not in WMF prod as a result of their corresponding task (in this case, T170014) not being tagged.

This issue seems to have occurred again. Tasks like T71492, T58784, T180194 and T187169 which have their patches merged are not tagged by the bot.

I've created them now, sorry.

Patches like https://gerrit.wikimedia.org/r/#/c/373119/ which were merged in Feb 27 are not in WMF prod as a result of their corresponding task (in this case, T170014) not being tagged.

The cut will happen in some hours' time at which point that commit will get deployed. ReleaseTaggerBot has no impact on production, just on Phabricator tracking.

Jdforrester-WMF closed this task as Resolved.Mar 6 2018, 10:04 AM

To the extent this was open, it's fixed now. For future issues please open a new task.