Page MenuHomePhabricator

all morebots stopped listening to !log lines
Closed, ResolvedPublic

Description

#wikimedia-operations

17:09 < twentyafterfour> !log Preparing to deploy phabricator update. Tagged release/2016-06-08/1
17:11 < twentyafterfour> wtf logmsgbot

17:20 -!- morebots [tools.more@208.80.155.220] has joined #wikimedia-operations
17:20 < mutante> !log restarted the log bot

#wikimedia-releng

17:26 < mutante> !log logging is also broken here, test ..pling
...

Cloud-Services

17:23 < mutante> !log wikistats test log
17:23 < mutante> morebots stopped logging things
17:23 < mutante> i already restarted the production copy but to no avail
17:23 < mutante> !log no logging
17:23 <+labs-morebots> no is not a valid project.
17:24 < mutante> what
17:24 < mutante> !log wikistats but this is
17:24 < mutante> ...
17:24 < mutante> !log morebots !log morebots
17:24 < stashbot> Did you mean tools.morebots instead of morebots?
17:24 <+labs-morebots> morebots is not a valid project.
17:24 < mutante> !log git how about this
17:25 < mutante> !log tools is a valid project

I already restarted the instance in the production channel.. It did not fix it.

jobs are running on toollabs

job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID 
-----------------------------------------------------------------------------------------------------------------
5491062 0.40223 analytics- tools.morebo Rr    05/31/2016 13:49:22 continuous@tools-exec-1214.eqi     1        
6455446 0.34368 qamorebot  tools.morebo r     05/18/2016 11:36:08 continuous@tools-exec-1206.eqi     1        
6701870 0.33105 labs-logbo tools.morebo Rr    05/31/2016 13:49:23 continuous@tools-exec-1208.eqi     1        
7081110 0.31037 cvnmorebot tools.morebo r     06/03/2016 21:34:19 continuous@tools-exec-1217.eqi     1        
7263586 0.30000 production tools.morebo r     06/09/2016 00:15:07 continuous@tools-exec-1211.eqi     1

https://wikitech.wikimedia.org/wiki/Morebots

Event Timeline

The last time I see it sucessfully logging was:

[15:44:06] <logmsgbot> !log tgr@tin Synchronized wmf-config/InitialiseSettings.php: enable AuthManager on group1 for reals T135504 (duration: 00m 25s)
[15:44:07] <stashbot> T135504: Enable AuthManager in WMF production - https://phabricator.wikimedia.org/T135504
[15:44:12] <morebots> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master

Which brings up the question...was morebots updated for AuthManager changes?

production-logbot.err says:

[…]
2016-06-09 00:15:27,273 ERROR: Failed to log message: LoginError(<Site object '('https', 'wikitech.wikimedia.org')/w/'>, {u'reason': u'You have made too many recent login attempts.\nPlease wait 5 minutes before trying again.', u'result': u'Failed'})
Traceback (most recent call last):
  File "/usr/lib/adminbot/adminlogbot.py", line 266, in on_pubmsg
    pageurl = adminlog.log(self.config, message, project, author)
  File "/usr/lib/adminbot/adminlog.py", line 16, in log
    site.login(config.wiki_user, config.wiki_pass, domain=config.wiki_domain)
  File "/usr/lib/python2.7/dist-packages/mwclient/client.py", line 401, in login
    raise errors.LoginError(self, login['login'])
LoginError: (<Site object '('https', 'wikitech.wikimedia.org')/w/'>, {u'reason': u'You have made too many recent login attempts.\nPlease wait 5 minutes before trying again.', u'result': u'Failed'})
2016-06-09 00:16:25,602 DEBUG: 'production-logbot' got '!log updated payments from de11bfe2273d0bcaa0e713389b2d91e8b3567a1d to c6ea2ca699aa78a90f64c77d860f69421e9a7782'; Attempting to log.
2016-06-09 00:16:25,610 INFO: Starting new HTTPS connection (1): wikitech.wikimedia.org
2016-06-09 00:16:25,637 DEBUG: Setting read timeout to None
2016-06-09 00:16:25,998 DEBUG: "POST /w/api.php HTTP/1.1" 200 5505
2016-06-09 00:16:26,003 DEBUG: Setting read timeout to None
2016-06-09 00:16:26,259 DEBUG: "POST /w/api.php HTTP/1.1" 200 336
2016-06-09 00:16:27,983 DEBUG: Setting read timeout to None
2016-06-09 00:16:28,373 DEBUG: "POST /w/api.php HTTP/1.1" 200 193
2016-06-09 00:16:28,374 ERROR: Failed to log message: LoginError(<Site object '('https', 'wikitech.wikimedia.org')/w/'>, {u'reason': u'You have made too many recent login attempts.\nPlease wait 5 minutes before trying again.', u'result': u'Failed'})
Traceback (most recent call last):
  File "/usr/lib/adminbot/adminlogbot.py", line 266, in on_pubmsg
    pageurl = adminlog.log(self.config, message, project, author)
  File "/usr/lib/adminbot/adminlog.py", line 16, in log
    site.login(config.wiki_user, config.wiki_pass, domain=config.wiki_domain)
  File "/usr/lib/python2.7/dist-packages/mwclient/client.py", line 401, in login
    raise errors.LoginError(self, login['login'])
LoginError: (<Site object '('https', 'wikitech.wikimedia.org')/w/'>, {u'reason': u'You have made too many recent login attempts.\nPlease wait 5 minutes before trying again.', u'result': u'Failed'})

So its really possible that the AuthManager change broke login for the bot?

Change 293459 had a related patch set uploaded (by Anomie):
Set domain even if there's only one

https://gerrit.wikimedia.org/r/293459

2016-06-09 02:08:27 [f4333b421af5d3f5880e6372] silver labswiki 1.28.0-wmf.5 exception ERROR: [f4333b421af5d3f5880e6372] /w/api.php   DBQueryError from line 934 of /srv/mediawiki/php-1.28.0-wmf.5/includes/db/Database.php: A database error has occurred. Did you forget to run maintenance/update.php after upgrading?  See: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script
Query: INSERT  INTO `ldap_domains` (domain,user_id) VALUES (NULL,'55')
Function: LdapAuthenticationPlugin::saveDomain
Error: 1048 Column 'domain' cannot be null (208.80.154.136)
 {"exception_id":"f4333b421af5d3f5880e6372"}
[Exception DBQueryError] (/srv/mediawiki/php-1.28.0-wmf.5/includes/db/Database.php:934) A database error has occurred. Did you forget to run maintenance/update.php after upgrading?  See: https://www.mediawiki.org/wiki/Manual:Upgrading#Run_the_update_script
Query: INSERT  INTO `ldap_domains` (domain,user_id) VALUES (NULL,'55')
Function: LdapAuthenticationPlugin::saveDomain
Error: 1048 Column 'domain' cannot be null (208.80.154.136)

  #0 /srv/mediawiki/php-1.28.0-wmf.5/includes/db/Database.php(901): DatabaseBase->reportQueryError(string, integer, string, string, boolean)
  #1 /srv/mediawiki/php-1.28.0-wmf.5/includes/db/Database.php(1464): DatabaseBase->query(string, string)
  #2 /srv/mediawiki/php-1.28.0-wmf.5/extensions/LdapAuthentication/LdapAuthenticationPlugin.php(2000): DatabaseBase->insert(string, array, string)
  #3 /srv/mediawiki/php-1.28.0-wmf.5/extensions/LdapAuthentication/LdapAuthenticationPlugin.php(1165): LdapAuthenticationPlugin::saveDomain(User, NULL)
  #4 /srv/mediawiki/php-1.28.0-wmf.5/extensions/LdapAuthentication/LdapPrimaryAuthenticationProvider.php(139): LdapAuthenticationPlugin->updateUser(User)
  #5 [internal function]: LdapPrimaryAuthenticationProvider->onUserLoggedIn(User)
  #6 /srv/mediawiki/php-1.28.0-wmf.5/includes/Hooks.php(195): call_user_func_array(array, array)
  #7 /srv/mediawiki/php-1.28.0-wmf.5/includes/auth/AuthManager.php(2329): Hooks::run(string, array)
  #8 /srv/mediawiki/php-1.28.0-wmf.5/includes/auth/AuthManager.php(655): MediaWiki\Auth\AuthManager->setSessionDataForUser(User, boolean)
  #9 /srv/mediawiki/php-1.28.0-wmf.5/includes/auth/AuthManager.php(349): MediaWiki\Auth\AuthManager->continueAuthentication(array)
  #10 /srv/mediawiki/php-1.28.0-wmf.5/includes/api/ApiLogin.php(164): MediaWiki\Auth\AuthManager->beginAuthentication(array, string)
  #11 /srv/mediawiki/php-1.28.0-wmf.5/includes/api/ApiMain.php(1373): ApiLogin->execute()
  #12 /srv/mediawiki/php-1.28.0-wmf.5/includes/api/ApiMain.php(469): ApiMain->executeAction()
  #13 /srv/mediawiki/php-1.28.0-wmf.5/includes/api/ApiMain.php(440): ApiMain->executeActionWithErrorHandling()
  #14 /srv/mediawiki/php-1.28.0-wmf.5/api.php(83): ApiMain->execute()
  #15 /srv/mediawiki/w/api.php(3): require(string)
  #16 {main}

Change 293459 merged by jenkins-bot:
Set domain even if there's only one

https://gerrit.wikimedia.org/r/293459

Change 293488 had a related patch set uploaded (by Paladox):
Set domain even if there's only one

https://gerrit.wikimedia.org/r/293488

Mentioned in SAL [2016-06-09T12:19:21Z] <tgr> deploying [[gerrit:293459]] to fix morebots (T137377)

Change 293488 merged by jenkins-bot:
Set domain even if there's only one

https://gerrit.wikimedia.org/r/293488

Mentioned in SAL [2016-06-09T12:33:15Z] <tgr@tin> Synchronized php-1.28.0-wmf.5/extensions/LdapAuthentication/LdapPrimaryAuthenticationProvider.php: deploy [[gerrit:293459]] to fix wikitech API login / morebots (T137377) (duration: 00m 47s)

Tgr assigned this task to Anomie.
Tgr subscribed.
08:33 < logmsgbot> !log tgr@tin Synchronized php-1.28.0-wmf.5/extensions/LdapAuthentication/LdapPrimaryAuthenticationProvider.php: deploy [[gerrit:293459]] to fix 
                   wikitech API login / morebots (T137377) (duration: 00m 47s)
08:33 < stashbot> T137377: all morebots stopped listening to !log lines - https://phabricator.wikimedia.org/T137377
08:33 < morebots> Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master

Change 296926 had a related patch set uploaded (by Gergő Tisza):
Set domain even if there's only one

https://gerrit.wikimedia.org/r/296926

Change 296926 merged by jenkins-bot:
Set domain even if there's only one

https://gerrit.wikimedia.org/r/296926