RedisBagOStuff is broken on beta
Closed, ResolvedPublic

Description

MediaWiki object storage (+session storage) is broken on deployment-prep - apparently trying to connect to redis.

This manifests with the following error:

ErrorException from line 0 of : PHP Notice: fwrite(): send of 41 bytes failed with errno=32 Broken pipe

Apparently deployment-redis0[45] were deleted recently. I tried recreating them but puppet fails on the first run preventing me from logging in to the instances for further diagnosis.

The general consensus on IRC was that redis isn't needed in beta, however, it's still configured. I'm not sure how to reconfigure mediawiki's session manager so that it will stop trying to connect to redis. The configuration for this stuff is not at all obvious or intuitive.

Stack Trace:

#0 [internal function]: MWExceptionHandler::handleError(integer, string, string, integer, array, array)
#1 [internal function]: Redis->processArrayCommand(string, array)
#2 [internal function]: Redis->processCommand(string, string)
#3 [internal function]: Redis->auth(string)
#4 [internal function]: Redis->checkConnection()
#5 [internal function]: Redis->processArrayCommand(string, array)
#6 [internal function]: Redis->processCommand(string, string)
#7 [internal function]: Redis->auth(string)
#8 [internal function]: Redis->checkConnection()
#9 [internal function]: Redis->processArrayCommand(string, array)
#10 [internal function]: Redis->processCommand(string, string)
#11 [internal function]: Redis->auth(string)
#12 [internal function]: Redis->checkConnection()
#13 [internal function]: Redis->processArrayCommand(string, array)
#14 [internal function]: Redis->processCommand(string, string)
#15 [internal function]: Redis->auth(string)
#16 [internal function]: Redis->checkConnection()
#17 [internal function]: Redis->processArrayCommand(string, array)
#18 [internal function]: Redis->processCommand(string, string)
#19 [internal function]: Redis->auth(string)
#20 [internal function]: Redis->checkConnection()
#21 [internal function]: Redis->processArrayCommand(string, array)
#22 [internal function]: Redis->processCommand(string, string)
#23 [internal function]: Redis->auth(string)
#24 [internal function]: Redis->checkConnection()
#25 [internal function]: Redis->processArrayCommand(string, array)
#26 [internal function]: Redis->processCommand(string, string)
#27 [internal function]: Redis->auth(string)
#28 [internal function]: Redis->checkConnection()
#29 [internal function]: Redis->processArrayCommand(string, array)
#30 [internal function]: Redis->processCommand(string, string)
#31 [internal function]: Redis->auth(string)
#32 [internal function]: Redis->checkConnection()
#33 [internal function]: Redis->processArrayCommand(string, array)
#34 [internal function]: Redis->processCommand(string, string)
#35 [internal function]: Redis->auth(string)
#36 [internal function]: Redis->checkConnection()
#37 [internal function]: Redis->processArrayCommand(string, array)
#38 [internal function]: Redis->processCommand(string, string)
#39 [internal function]: Redis->auth(string)
#40 [internal function]: Redis->checkConnection()
#41 [internal function]: Redis->processArrayCommand(string, array)
#42 [internal function]: Redis->processCommand(string, string)
#43 [internal function]: Redis->auth(string)
#44 [internal function]: Redis->checkConnection()
#45 [internal function]: Redis->processArrayCommand(string, array)
#46 [internal function]: Redis->processCommand(string, string)
#47 [internal function]: Redis->auth(string)
#48 [internal function]: Redis->checkConnection()
#49 [internal function]: Redis->processArrayCommand(string, array)
#50 [internal function]: Redis->processCommand(string, string)
#51 [internal function]: Redis->auth(string)
#52 [internal function]: Redis->checkConnection()
#53 [internal function]: Redis->processArrayCommand(string, array)
#54 [internal function]: Redis->processCommand(string, string)
#55 [internal function]: Redis->auth(string)
#56 [internal function]: Redis->checkConnection()
#57 [internal function]: Redis->processArrayCommand(string, array)
#58 [internal function]: Redis->processCommand(string, string)
#59 [internal function]: Redis->auth(string)
#60 [internal function]: Redis->checkConnection()
#61 [internal function]: Redis->processArrayCommand(string, array)
#62 [internal function]: Redis->processCommand(string, string)
#63 [internal function]: Redis->auth(string)
#64 [internal function]: Redis->checkConnection()
#65 [internal function]: Redis->processArrayCommand(string, array)
#66 [internal function]: Redis->processCommand(string, string)
#67 [internal function]: Redis->auth(string)
#68 [internal function]: Redis->checkConnection()
#69 [internal function]: Redis->processArrayCommand(string, array)
#70 [internal function]: Redis->processCommand(string, string)
#71 [internal function]: Redis->auth(string)
#72 [internal function]: Redis->checkConnection()
#73 [internal function]: Redis->processArrayCommand(string, array)
#74 [internal function]: Redis->processCommand(string, string)
#75 [internal function]: Redis->auth(string)
#76 [internal function]: Redis->checkConnection()
#77 [internal function]: Redis->processArrayCommand(string, array)
#78 [internal function]: Redis->processCommand(string, string)
#79 [internal function]: Redis->auth(string)
#80 [internal function]: Redis->checkConnection()
#81 [internal function]: Redis->processArrayCommand(string, array)
#82 [internal function]: Redis->processCommand(string, string)
#83 [internal function]: Redis->auth(string)
#84 [internal function]: Redis->checkConnection()
#85 [internal function]: Redis->processArrayCommand(string, array)
#86 [internal function]: Redis->processCommand(string, string)
#87 [internal function]: Redis->auth(string)
#88 [internal function]: Redis->checkConnection()
#89 [internal function]: Redis->processArrayCommand(string, array)
#90 [internal function]: Redis->processCommand(string, string)
#91 [internal function]: Redis->auth(string)
#92 [internal function]: Redis->checkConnection()
#93 [internal function]: Redis->processArrayCommand(string, array)
#94 [internal function]: Redis->processCommand(string, string)
#95 [internal function]: Redis->auth(string)
#96 [internal function]: Redis->checkConnection()
#97 [internal function]: Redis->processArrayCommand(string, array)
#98 [internal function]: Redis->processCommand(string, string)
#99 [internal function]: Redis->auth(string)
#100 [internal function]: Redis->checkConnection()
#101 [internal function]: Redis->processArrayCommand(string, array)
#102 [internal function]: Redis->processCommand(string, string)
#103 [internal function]: Redis->auth(string)
#104 [internal function]: Redis->checkConnection()
#105 [internal function]: Redis->processArrayCommand(string, array)
#106 [internal function]: Redis->processCommand(string, string)
#107 [internal function]: Redis->auth(string)
#108 [internal function]: Redis->checkConnection()
#109 [internal function]: Redis->processArrayCommand(string, array)
#110 [internal function]: Redis->processCommand(string, string)
#111 [internal function]: Redis->auth(string)
#112 [internal function]: Redis->checkConnection()
#113 [internal function]: Redis->sockReadLine()
#114 [internal function]: Redis->sockReadData(NULL)
#115 [internal function]: Redis->processBooleanResponse()
#116 [internal function]: Redis->auth(string)
#117 [internal function]: Redis->checkConnection()
#118 [internal function]: Redis->processArrayCommand(string, array)
#119 [internal function]: Redis->processCommand(string, string)
#120 [internal function]: Redis->auth(string)
#121 [internal function]: Redis->checkConnection()
#122 [internal function]: Redis->processArrayCommand(string, array)
#123 [internal function]: Redis->processCommand(string, string)
#124 [internal function]: Redis->auth(string)
#125 [internal function]: Redis->checkConnection()
#126 [internal function]: Redis->processArrayCommand(string, array)
#127 [internal function]: Redis->processCommand(string, string)
#128 [internal function]: Redis->auth(string)
#129 [internal function]: Redis->checkConnection()
#130 [internal function]: Redis->processArrayCommand(string, array)
#131 [internal function]: Redis->processCommand(string, string)
#132 [internal function]: Redis->auth(string)
#133 [internal function]: Redis->checkConnection()
#134 [internal function]: Redis->processArrayCommand(string, array)
#135 [internal function]: Redis->processCommand(string, string)
#136 [internal function]: Redis->auth(string)
#137 [internal function]: Redis->checkConnection()
#138 [internal function]: Redis->processArrayCommand(string, array)
#139 [internal function]: Redis->processCommand(string, string)
#140 [internal function]: Redis->auth(string)
#141 [internal function]: Redis->checkConnection()
#142 [internal function]: Redis->sockReadLine()
#143 [internal function]: Redis->sockReadData(NULL)
#144 [internal function]: Redis->processBooleanResponse()
#145 /srv/mediawiki/php-master/includes/libs/redis/RedisConnectionPool.php(249): Redis->auth(string)
#146 /srv/mediawiki/php-master/includes/libs/objectcache/RedisBagOStuff.php(364): RedisConnectionPool->getConnection(string, Monolog\Logger)
#147 /srv/mediawiki/php-master/includes/libs/objectcache/RedisBagOStuff.php(96): RedisBagOStuff->getConnection(string)
#148 /srv/mediawiki/php-master/includes/libs/objectcache/RedisBagOStuff.php(92): RedisBagOStuff->getWithToken(string, NULL, integer)
#149 /srv/mediawiki/php-master/includes/libs/objectcache/CachedBagOStuff.php(56): RedisBagOStuff->doGet(string, integer)
#150 /srv/mediawiki/php-master/includes/libs/objectcache/BagOStuff.php(193): CachedBagOStuff->doGet(string, integer)
#151 /srv/mediawiki/php-master/includes/session/SessionManager.php(939): BagOStuff->get(string)
#152 /srv/mediawiki/php-master/includes/session/SessionInfo.php(150): MediaWiki\Session\SessionManager->generateSessionId()
#153 /srv/mediawiki/php-master/includes/session/SessionProvider.php(176): MediaWiki\Session\SessionInfo->__construct(integer, array)
#154 /srv/mediawiki/php-master/includes/session/SessionManager.php(270): MediaWiki\Session\SessionProvider->newSessionInfo(NULL)
#155 /srv/mediawiki/php-master/includes/session/SessionManager.php(244): MediaWiki\Session\SessionManager->getEmptySessionInternal(WebRequest)
#156 /srv/mediawiki/php-master/includes/session/SessionManager.php(194): MediaWiki\Session\SessionManager->getEmptySession(WebRequest)
#157 /srv/mediawiki/php-master/includes/WebRequest.php(750): MediaWiki\Session\SessionManager->getSessionForRequest(WebRequest)
#158 /srv/mediawiki/php-master/includes/session/SessionManager.php(130): WebRequest->getSession()
#159 /srv/mediawiki/php-master/includes/Setup.php(851): MediaWiki\Session\SessionManager::getGlobalSession()
#160 /srv/mediawiki/php-master/includes/WebStart.php(77): include(string)
#161 /srv/mediawiki/w/favicon.php(3): include(string)
#162 {main}
mmodell created this task.Tue, Nov 20, 11:23 PM
mmodell triaged this task as Normal priority.
ayounsi removed a subscriber: ayounsi.Tue, Nov 20, 11:25 PM

Change 475025 had a related patch set uploaded (by Alex Monk; owner: Alex Monk):
[operations/mediawiki-config@master] deployment-prep: Try changing redis_lock entries to memc hosts

https://gerrit.wikimedia.org/r/475025

To move sessions out of redis, this will need to change in wmf-config/InitialiseSettings.php:

'wgSessionCacheType' => [
⠀⠀⠀'default' => 'redis_local',  // declared in redis.php
⠀⠀⠀'wikitech' => 'memcached-pecl',
],

but I really think that life will be easier in the long run if beta cluster continues to use redis for session storage as long as prod does.

MGChecker added a subscriber: MGChecker.
[00:39] bd808	tries to load the prefix puppet settings for these hosts
[00:48]  <    bd808>	ok, in prod the mc* hosts are apparently both redis and memcached instances
[00:49] bd808	looks at deployment-memc prefix puppet next
[00:51]  <andrewbogott>	hm, I can't log in to deployment-redis05.deployment-prep.eqiad.wmflabs either
[00:51]  <    bd808>	Krenair: I think you were on the right track with profile::redis::multidc. It looks to me like the deployment-prep memc boxes were never switched to be memcached+redis
[00:52]  <    bd808>	and instead deployment-prep was using the redis* boxes as session+job queue+whatever redis
[00:53]  <    bd808>	so ... we need to change deployment-memc* to use role::mediawiki::memcached instead of just role::memcached
[00:54]  <    bd808>	that should add redis to them... and then we need to figure out how to point mcrouter in the right way

Mentioned in SAL (#wikimedia-releng) [2018-11-21T01:22:47Z] <bd808> Applied role::mediawiki::memcached on deployment-memc05.deployment-prep.eqiad.wmflabs to provision redis T210030

bd808 added a comment.EditedWed, Nov 21, 2:27 AM

After applying role::mediawiki::memcached on deployment-memc05 I added some missing hiera config to the deployment-memc prefix-puppet:

redis::shards:
  sessions:
    eqiad:
      shard01:
        host: 172.16.5.17
        port: 6379

This may not turn out to be the correct place at all for this, but I needed to put it somewhere. With that hiera data the puppet manifest compiled, but did not apply cleanly. The redis-instance-tcp_6379 service unit is not starting. The failure is:

[2868] 21 Nov 01:52:26.560 # Fatal error, can't open config file '/etc/redis/replica/6379-state.conf'

The /etc/redis/replica directory exists, but it is empty. This is supposed to be populated by confd. We are using profile::redis::multidc::discovery: appservers-rw so the data is expected to be under the /discovery/appservers-rw prefix in etcd.

After poking around a bit I convinced myself that the fix here would actually be setting profile::redis::multidc::discovery to false in hiera instead of appservers-rw. The following Puppet run made a placeholder file at /etc/redis/replica/6379-state.conf and the service started.

Mentioned in SAL (#wikimedia-releng) [2018-11-21T03:02:11Z] <bd808> Repeated application of role::mediawiki::memcached on deployment-memc04, deployment-memc06, and deployment-memc07 for T210030

bd808 added a comment.Wed, Nov 21, 3:24 AM

After applying role::mediawiki::memcached on deployment-memc05 I added some missing hiera config to the deployment-memc prefix-puppet:
[snip]
This may not turn out to be the correct place at all for this, but I needed to put it somewhere.

It was not the right place because the deployment-mediawiki-* hosts also need this data to configure their /etc/nutcracker/nutcracker.yml settings. I moved the config to the project puppet hiera settings (and fixed the host order):

redis::shards:
  sessions:
    eqiad:
      shard01:
        host: 172.16.5.76
        port: 6379
      shard02:
        host: 172.16.5.17
        port: 6379
      shard03:
        host: 172.16.5.12
        port: 6379
      shard04:
        host: 172.16.5.2
        port: 6379

This unfortunately uses ip addresses rather than hostnames. The redis_get_instances custom puppet function expects to be able to do lookups into this hash index by the ipaddress fact.

Mentioned in SAL (#wikimedia-releng) [2018-11-21T03:25:22Z] <bd808> Forced puppet run on deployment-mediawiki-0[79] to pick up new redis::shards settings T210030

Change 475038 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] deployment-prep: remove stale redis config

https://gerrit.wikimedia.org/r/475038

bd808 added a comment.Wed, Nov 21, 3:39 AM

The wikis are alive. :)

[03:35]  <shinken-wm>	RECOVERY - English Wikipedia Mobile Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 36936 bytes in 1.825 second response time
[03:37]  <shinken-wm>	RECOVERY - English Wikipedia Main page on beta-cluster is OK: HTTP OK: HTTP/1.1 200 OK - 47950 bytes in 1.321 second response time

Change 475038 merged by Andrew Bogott:
[operations/puppet@production] deployment-prep: remove stale redis config

https://gerrit.wikimedia.org/r/475038

mmodell assigned this task to bd808.Wed, Nov 21, 8:13 PM
mmodell closed this task as Resolved.

thanks for the help everyone!

PerfektesChaos reopened this task as Open.Thu, Nov 22, 6:07 PM
PerfektesChaos added a subscriber: PerfektesChaos.

Reopening: Unfortunately not yet fully recovered.

enwiki@BETA is fine for me.

dewiki@BETA is in READONLY MODE since several days.

  • On attempting page edit: (readonlytext: The database has been automatically locked while the replica database servers catch up to the master.) or (readonlywarning: The database has been automatically locked while the replica database servers catch up to the master.)
  • login / logout is working.
Krenair closed this task as Resolved.Thu, Nov 22, 8:17 PM

That does not seem to be this task @PerfektesChaos

Change 475025 merged by jenkins-bot:
[operations/mediawiki-config@master] deployment-prep: Try changing redis_lock entries to memc hosts

https://gerrit.wikimedia.org/r/475025