Seems memcached / nutcracker is dead on labtest2001. That eventually causes log spam.
Description
Details
Project | Branch | Lines +/- | Subject | |
---|---|---|---|---|
operations/mediawiki-config | master | +1 -1 | Fix condition for using nutcracker instead of mcrouter on wikitech |
Related Objects
Event Timeline
This may be a duplicate of T201082- while maybe it is a different issue, it is part of th brokenness of the labtestweb setup.
@jcrespo The error message looks a bit confusing, but it's actually reporting a problem with a memcached server, not a database server. It is reporting that MediaWiki (on labtest2001) is unable to access the Memcached key WANCache:m:global:Wikimedia\Rdbms\LoadBalancer:server-read-only:db2037 from 127.0.0.1:11213 (mcrouter).
This seems like a genuine issue. Which means one of two things:
- an memcached server in codfw is done, one that mcrouter is routing to.
- mcrouter itself is down on labtest2001.
Magically I have access to the machine!
memcached is running and listening on port 11000
There is a process listening on 127.0.0.1:11212 which supposedly is nutcracker. That is used by the openstack_dasbhoard?
$ grep -R 11212 /etc /etc/openstack-dashboard/local_settings.py: 'LOCATION' : '127.0.0.1:11212', /etc/nagios/nrpe.d/check_nutcracker_port.cfg:command[check_nutcracker_port]=/usr/lib/nagios/plugins/check_tcp -H 127.0.0.1 -p 11212 --timeout=2 /etc/nutcracker/nutcracker.yml: listen: 127.0.0.1:11212
In nutcracker, the memcached bucket listens on port 11212 and points to memcached on 11000.
MediaWiki has:
$ mwscript shell.php --wiki=labtestwiki >>> $wgObjectCaches['memcached-pecl']['servers'] => [ "/var/run/nutcracker/nutcracker.sock:0", ]
Under HHVM we use a socket instead of port 11212.
Then:
>>> $wgObjectCaches['mcrouter']['servers'] => [ "127.0.0.1:11213", ] >>>
In puppet:
hieradata/common/mcrouter.yaml:mcrouter::port: 11213 modules/profile/manifests/mediawiki/mcrouter_wancache.pp: Integer $port = hiera('mcrouter::port'),
labtestweb2001.wikimedia.org has puppet roles:
role(wmcs::openstack::labtest::labweb) include ::role::mariadb::labtestwikitech
From the wmcs::openstack::labtest::labweb role:
include ::profile::openstack::labtest::nutcracker # Wikitech: include ::profile::openstack::labtest::wikitech::web
So I guess the role should include one of profile::mediawiki::mcrouter_wancache or role::mediawiki::common. I have not looked at how it is handled for the production wikitech site.
Change 458457 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/mediawiki-config@master] Fix condition for using nutcracker instead of mcrouter on wikitech
The problem is that labswebtest machines are configured to use labstestwiki, and that we didn't configure those to use their local nutcracker, but the global mcrouter, which doesn't make any sense.
The patch I uploaded fixes the issue. We should *not* install mcrouter here.
Change 458457 merged by jenkins-bot:
[operations/mediawiki-config@master] Fix condition for using nutcracker instead of mcrouter on wikitech
Mentioned in SAL (#wikimedia-operations) [2018-09-06T09:11:15Z] <oblivian@deploy1001> Synchronized wmf-config/mc.php: Fixing memcached configuration for labstestwiki T203479 (duration: 00m 56s)