Page MenuHomePhabricator

tin doesn't have access to same memcached as terbium and app servers
Closed, ResolvedPublic

Description

I noticed that tin does not have access to the same memcached servers/sharding as terbium and the app servers. So the same request may fail against it, which could make debugging from eval.php tricker (since you have to know that tin won't get the right results and you should go elsewhere).

Event Timeline

Mattflaschen-WMF raised the priority of this task from to Needs Triage.
Mattflaschen-WMF updated the task description. (Show Details)

tin config:

memcached:
   auto_eject_hosts: true
   distribution: ketama
   hash: md5                                                                                                   
   listen: 127.0.0.1:11212                                                                                     
   preconnect: true                                                                                            
   redis: false                                                                                                
   server_connections: 2                                                                                       
   server_failure_limit: 3                                                                                     
   server_retry_timeout: 30000                                                                                 
   timeout: 250                                                                                                
   servers:                                                                                                    
   - 10.64.0.180:11211:1                                                                                       
   - 10.64.0.181:11211:1                                                                                       
   - 10.64.0.182:11211:1                                                                                       
   - 10.64.0.183:11211:1                                                                                       
   - 10.64.0.184:11211:1                                                                                       
   - 10.64.0.185:11211:1                                                                                       
   - 10.64.0.186:11211:1                                                                                       
   - 10.64.0.187:11211:1                                                                                       
   - 10.64.0.188:11211:1                                                                                       
   - 10.64.0.189:11211:1                                                                                       
   - 10.64.0.190:11211:1                                                                                       
   - 10.64.0.191:11211:1
   - 10.64.0.192:11211:1
   - 10.64.0.193:11211:1
   - 10.64.0.194:11211:1
   - 10.64.0.195:11211:1

terbium and mw* config:

mc-unix:
  auto_eject_hosts: true
  distribution: ketama
  hash: md5
  listen: /var/run/nutcracker/nutcracker.sock 0666
  preconnect: true
  server_connections: 2
  server_failure_limit: 3
  servers:
    - 10.64.0.180:11211:1
    - 10.64.0.181:11211:1
    - 10.64.0.182:11211:1
    - 10.64.0.183:11211:1
    - 10.64.0.184:11211:1
    - 10.64.0.185:11211:1
    - 10.64.32.161:11211:1 "shard07"
    - 10.64.32.162:11211:1 "shard08"
    - 10.64.32.163:11211:1 "shard09"
    - 10.64.32.164:11211:1 "shard10"
    - 10.64.32.165:11211:1 "shard11"
    - 10.64.32.166:11211:1 "shard12"
    - 10.64.48.101:11211:1 "shard13"
    - 10.64.48.102:11211:1 "shard14"
    - 10.64.48.103:11211:1 "shard15"
    - 10.64.48.104:11211:1 "shard16"
    - 10.64.48.95:11211:1 "shard17"
    - 10.64.48.96:11211:1 "shard18"
  timeout: 250
memcached:
  auto_eject_hosts: true
  distribution: ketama
  hash: md5
  listen: 127.0.0.1:11212
  preconnect: true
  server_connections: 2
  server_failure_limit: 3
  servers:
    - 10.64.0.180:11211:1
    - 10.64.0.181:11211:1
    - 10.64.0.182:11211:1
    - 10.64.0.183:11211:1
    - 10.64.0.184:11211:1
    - 10.64.0.185:11211:1
    - 10.64.32.161:11211:1 "shard07"
    - 10.64.32.162:11211:1 "shard08"
    - 10.64.32.163:11211:1 "shard09"
    - 10.64.32.164:11211:1 "shard10"
    - 10.64.32.165:11211:1 "shard11"
    - 10.64.32.166:11211:1 "shard12"
    - 10.64.48.101:11211:1 "shard13"
    - 10.64.48.102:11211:1 "shard14"
    - 10.64.48.103:11211:1 "shard15"
    - 10.64.48.104:11211:1 "shard16"
    - 10.64.48.95:11211:1 "shard17"
    - 10.64.48.96:11211:1 "shard18"
  timeout: 250

Most notably the server lists arn't the same

the fact that there's nutcracker on tin doesn't seem intentional, likely tin used to get some mediawiki roles and thus nutcracker but that stopped a year ago:

$ ssh tin.eqiad.wmnet ls -la /etc/nutcracker/nutcracker.yml
-r--r--r-- 1 root root 648 Jul 15  2014 /etc/nutcracker/nutcracker.yml
$ ssh mw1010.eqiad.wmnet ls -la /etc/nutcracker/nutcracker.yml
-r--r--r-- 1 root root 1603 Mar 12 18:12 /etc/nutcracker/nutcracker.yml

I think we should remove nutcracker from tin and let people use terbium instead unless there's a good reason not to

If tin is not meant to be able to do MW-y things, mwscript should be disabled there. That would solve a lot of the confusion.

I agree that's confusing, though I'm not sure if mwscript (part of scap) is of any real use on tin other than convenience? (cc @bd808 @ori @greg)

mscript needs to be installed on tin for scripts like updateinterwikicache to work.

I agree that's confusing, though I'm not sure if mwscript (part of scap) is of any real use on tin other than convenience? (cc @bd808 @ori @greg)

During a full scap run, mwscript is needed to execute the rebuildLocalisationCache.php and mergeMessageFileList.php maintenance scripts on tin. It is also needed for the nightly l10nupdate cron jobs which run on tin.

If we have rearranged puppet so that tin doesn't get setup as a full MW host any more I think we should fix that.

I agree that's confusing, though I'm not sure if mwscript (part of scap) is of any real use on tin other than convenience? (cc @bd808 @ori @greg)

During a full scap run, mwscript is needed to execute the rebuildLocalisationCache.php and mergeMessageFileList.php maintenance scripts on tin. It is also needed for the nightly l10nupdate cron jobs which run on tin.

If we have rearranged puppet so that tin doesn't get setup as a full MW host any more I think we should fix that.

I think the problem comes from the fact that tin doesn't have role::mediawiki::common where the expected nutcracker pools are configured. Instead role::deployment::server does include mediawiki (like tin used to have at top level) One solution could be to include role::mediawiki::common for tin at top level like terbium does. cc @Joe

Change 233751 had a related patch set uploaded (by BryanDavis):
deployment::server: re-puppetize nutcracker config

https://gerrit.wikimedia.org/r/233751

Change 233751 merged by Giuseppe Lavagetto:
deployment::server: re-puppetize nutcracker config

https://gerrit.wikimedia.org/r/233751

Joe claimed this task.