Seems memcached / nutcracker is dead on labtest2001. That eventually causes log spam.
|operations/mediawiki-config||master||+1 -1||Fix condition for using nutcracker instead of mcrouter on wikitech|
@jcrespo The error message looks a bit confusing, but it's actually reporting a problem with a memcached server, not a database server. It is reporting that MediaWiki (on labtest2001) is unable to access the Memcached key WANCache:m:global:Wikimedia\Rdbms\LoadBalancer:server-read-only:db2037 from 127.0.0.1:11213 (mcrouter).
This seems like a genuine issue. Which means one of two things:
- an memcached server in codfw is done, one that mcrouter is routing to.
- mcrouter itself is down on labtest2001.
Magically I have access to the machine!
memcached is running and listening on port 11000
There is a process listening on 127.0.0.1:11212 which supposedly is nutcracker. That is used by the openstack_dasbhoard?
$ grep -R 11212 /etc /etc/openstack-dashboard/local_settings.py: 'LOCATION' : '127.0.0.1:11212', /etc/nagios/nrpe.d/check_nutcracker_port.cfg:command[check_nutcracker_port]=/usr/lib/nagios/plugins/check_tcp -H 127.0.0.1 -p 11212 --timeout=2 /etc/nutcracker/nutcracker.yml: listen: 127.0.0.1:11212
In nutcracker, the memcached bucket listens on port 11212 and points to memcached on 11000.
$ mwscript shell.php --wiki=labtestwiki >>> $wgObjectCaches['memcached-pecl']['servers'] => [ "/var/run/nutcracker/nutcracker.sock:0", ]
Under HHVM we use a socket instead of port 11212.
>>> $wgObjectCaches['mcrouter']['servers'] => [ "127.0.0.1:11213", ] >>>
hieradata/common/mcrouter.yaml:mcrouter::port: 11213 modules/profile/manifests/mediawiki/mcrouter_wancache.pp: Integer $port = hiera('mcrouter::port'),
labtestweb2001.wikimedia.org has puppet roles:
role(wmcs::openstack::labtest::labweb) include ::role::mariadb::labtestwikitech
From the wmcs::openstack::labtest::labweb role:
include ::profile::openstack::labtest::nutcracker # Wikitech: include ::profile::openstack::labtest::wikitech::web
So I guess the role should include one of profile::mediawiki::mcrouter_wancache or role::mediawiki::common. I have not looked at how it is handled for the production wikitech site.
The problem is that labswebtest machines are configured to use labstestwiki, and that we didn't configure those to use their local nutcracker, but the global mcrouter, which doesn't make any sense.
The patch I uploaded fixes the issue. We should *not* install mcrouter here.