setup replacements for maintenance_server (terbium, wasat) on Stretch
Closed, ResolvedPublic

Description

setup a new maintenance server to replace terbium

use stretch

pick another element name use "mwmaint1001" (see T192185#4152332)

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 430524 merged by Marostegui:
[operations/puppet@production] mariadb: add mwmaint1001 to grants for production-m5

https://gerrit.wikimedia.org/r/430524

hoo added a subscriber: hoo.May 30 2018, 2:37 PM

Change 440070 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] rm mwmaint1001.yaml - activate mariadb::maintenance

https://gerrit.wikimedia.org/r/440070

Change 440070 merged by Dzahn:
[operations/puppet@production] rm mwmaint1001.yaml - activate mariadb::maintenance

https://gerrit.wikimedia.org/r/440070

Change 440099 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mw-maintenance: rsync home dirs from terbium to mwmaint1001

https://gerrit.wikimedia.org/r/440099

Change 440099 merged by Dzahn:
[operations/puppet@production] mw-maintenance: rsync home dirs from terbium to mwmaint1001

https://gerrit.wikimedia.org/r/440099

Change 440139 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mw-maintenance: require GNU time from time package

https://gerrit.wikimedia.org/r/440139

Change 440139 merged by Dzahn:
[operations/puppet@production] mw-maintenance: require GNU time from time package

https://gerrit.wikimedia.org/r/440139

Change 440142 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/puppet@production] mediawiki: Stop Wikidata dispatching

https://gerrit.wikimedia.org/r/440142

Mentioned in SAL (#wikimedia-operations) [2018-06-13T16:14:21Z] <mutante> rsyncing /home dirs from terbium to mwmaint1001, they will appear later in a subdir "home-terbium" like it was done for tin->deploy1001 (T192092)

Change 440267 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] cache::misc: switch backend for dbtree from terbium to mwmaint1001

https://gerrit.wikimedia.org/r/440267

Change 440268 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] tendril: add grants for tendril_web from mwmaint1001

https://gerrit.wikimedia.org/r/440268

Change 440268 merged by Dzahn:
[operations/puppet@production] tendril: add grants for tendril_web from mwmaint1001

https://gerrit.wikimedia.org/r/440268

Change 440267 merged by Dzahn:
[operations/puppet@production] cache::misc: switch backend for dbtree from terbium to mwmaint1001

https://gerrit.wikimedia.org/r/440267

Mentioned in SAL (#wikimedia-operations) [2018-06-14T08:08:54Z] <mutante> switch backend for dbtree.wikimedia.org away from terbium to mwmaint1001 (T192092)

Change 440328 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mw-maintenance: run wikidata maint jobs on old and new server

https://gerrit.wikimedia.org/r/440328

Change 440142 abandoned by Ladsgroup:
mediawiki: Stop Wikidata dispatching

Reason:
In favor of Daniel's patch

https://gerrit.wikimedia.org/r/440142

Change 440328 merged by Dzahn:
[operations/puppet@production] mw-maintenance: switch only wikidata maint jobs to mwmaint1001

https://gerrit.wikimedia.org/r/440328

Mentioned in SAL (#wikimedia-operations) [2018-06-14T14:40:09Z] <mutante> moving wikidata query dispatcher from terbium to mwmaint1001 - scheduled downtime - check turned into a WARN - disabling puppet on mwmaint1001, removing crons on terbium, waiting a couple minutes for them to finish, re-enabling puppet on mwmaint1001 (T192092)

Change 440542 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] noc/dbtree: require libapache-mod-php

https://gerrit.wikimedia.org/r/440542

Change 440542 merged by Dzahn:
[operations/puppet@production] noc/dbtree: require libapache-mod-php

https://gerrit.wikimedia.org/r/440542

Change 430527 merged by Dzahn:
[operations/puppet@production] cache::misc: switch noc.wm backend to mwmaint1001

https://gerrit.wikimedia.org/r/430527

Mentioned in SAL (#wikimedia-operations) [2018-06-15T15:37:35Z] <mutante> switching noc.wikimedia.org site from terbium to mwamiant1001 backend, running puppet on all cache::misc cp servers (T192092)

Dzahn removed Dzahn as the assignee of this task.Jun 21 2018, 8:13 AM

Unassigning this ticket from me temporarily while i'm on vacation. I will take it back once i return but also want to make clear it's free for grabs by anyone while i'm gone and if you want/can continue on it that would be appreciated.

Dzahn added a comment.EditedJun 21 2018, 8:14 AM

status: wikidata related crons are moved, other mw crons are still to be moved (by switching maintenance server with https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/441346)

Change 441346 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] switch mw_maintenance server to mwmaint1001

https://gerrit.wikimedia.org/r/441346

Dzahn added a comment.Jun 21 2018, 8:17 AM

other pending changes, mostly to decom terbium once switch is complete:

https://gerrit.wikimedia.org/r/#/q/topic:terbium+(status:open)

Change 441381 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mw_maintenace: remove temp change for wikidata crons

https://gerrit.wikimedia.org/r/441381

Change 443792 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] mw-maintenance: switch to mwmaint1001

https://gerrit.wikimedia.org/r/443792

Change 443801 had a related patch set uploaded (by Giuseppe Lavagetto; owner: Giuseppe Lavagetto):
[operations/puppet@production] terbium: Add a decommission notice.

https://gerrit.wikimedia.org/r/443801

Change 443792 merged by Giuseppe Lavagetto:
[operations/puppet@production] mw-maintenance: switch to mwmaint1001.eqiad.wmnet

https://gerrit.wikimedia.org/r/443792

Mentioned in SAL (#wikimedia-operations) [2018-07-04T09:44:31Z] <_joe_> stopping all cronjobs via a puppet run on terbium, T192092

Change 443801 merged by Giuseppe Lavagetto:
[operations/puppet@production] terbium: Add a decommission notice.

https://gerrit.wikimedia.org/r/443801

Krinkle renamed this task from setup replacement for terbium (maintenance_server) on stretch to setup replacements for maintenance_server (terbium, wasat) on Stretch.Jul 6 2018, 11:53 PM

Change 431039 abandoned by Muehlenhoff:
switch mw-maintenance server from terbium to mwmaint1001

Reason:
Superceded/replaced by a033370fbcd

https://gerrit.wikimedia.org/r/431039

Change 441346 abandoned by Muehlenhoff:
switch mw_maintenance server to mwmaint1001

Reason:
Superceded/replaced by a033370fdcb

https://gerrit.wikimedia.org/r/441346

Change 441381 abandoned by Muehlenhoff:
mw_maintenace: remove temp change for wikidata crons

Reason:
Replaced/superceded by a033370fdcb

https://gerrit.wikimedia.org/r/441381

Change 445118 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Remove terbium from allowed hosts/ferm rules

https://gerrit.wikimedia.org/r/445118

Change 445118 merged by Muehlenhoff:
[operations/puppet@production] Remove terbium from allowed hosts/ferm rules

https://gerrit.wikimedia.org/r/445118

Change 445149 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Reimage wasat with stretch and rename to mwmaint2001

https://gerrit.wikimedia.org/r/445149

Change 445149 merged by Muehlenhoff:
[operations/puppet@production] Reimage wasat with stretch and rename to mwmaint2001

https://gerrit.wikimedia.org/r/445149

Change 430530 abandoned by Muehlenhoff:
tcpircbot: remove terbium from ferm rules

Reason:
Obsoleted by 1e4e64dc67

https://gerrit.wikimedia.org/r/430530

Change 445421 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Update grants for terbium->mwmaint1001 migration and wasat rename

https://gerrit.wikimedia.org/r/445421

I have seen this error (one in the last 8 hours):

cli_argv	       	/srv/mediawiki/multiversion/MWScript.php maintenance/cleanupUploadStash.php --wiki=labtestwiki
t  db_name	       	labtestwiki
t  db_server	       	10.64.16.79
t  db_user	       	wikiadmin
t  error	       	Access denied for user 'wikiadmin'@'%' to database 'labtestwiki'
t  host	       	mwmaint1001
t  level	       	ERROR
t  message	       	Error connecting to 10.64.16.79: Access denied for user 'wikiadmin'@'%' to database 'labtestwiki'

So I assume this will be fixed once https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/445421/ is applied, right?

Change 445423 had a related patch set uploaded (by Muehlenhoff; owner: Muehlenhoff):
[operations/puppet@production] Decommission terbium

https://gerrit.wikimedia.org/r/445423

Change 445421 merged by Muehlenhoff:
[operations/puppet@production] Update grants for terbium->mwmaint1001 migration and wasat rename

https://gerrit.wikimedia.org/r/445421

There is an undocumented grant from californium.wikimedia.org to striker @bd808 - I will delete it if it is not puppetized it. I will create a separate ticket if this is offtopic here.

Let's wait for confirmation by Bryan, but californium is up for decom (replaced by the labweb* hosts), so 99.9% sure this can go away.

I have created T199518.

No more grants on m5 referencing 10.64.32.13 (terbium):

$ ./software/dbtools/section m5 | while read host port; do mysql.py -BN -h$host:$port -e "select user, host from mysql.user WHERE host='10.64.32.13';"; done

Change 445597 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbtree: move dbtree outside of mwmaint hosts

https://gerrit.wikimedia.org/r/445597

Dzahn added a comment.Jul 25 2018, 6:35 PM

There is an undocumented grant from californium.wikimedia.org to striker @bd808
No more grants on m5 referencing 10.64.32.13 (terbium):

Can we now merge https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/431042/ ?

Change 431042 abandoned by Dzahn:
mariadb: remove grants for terbium (do not merge)

https://gerrit.wikimedia.org/r/431042

Change 448545 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] terbium: disable cross-validate-accounts cron job spam

https://gerrit.wikimedia.org/r/448545

Change 448545 merged by Dzahn:
[operations/puppet@production] terbium: disable cross-validate-accounts cron job spam

https://gerrit.wikimedia.org/r/448545

Change 448594 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mediawiki::maintenance: stop including php5 packages on jessie

https://gerrit.wikimedia.org/r/448594

Change 448594 merged by Dzahn:
[operations/puppet@production] mediawiki::maintenance: stop including php5 packages on jessie

https://gerrit.wikimedia.org/r/448594

Change 448608 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] mw::maint: allow home dir rsync between mwmaint, not terbium

https://gerrit.wikimedia.org/r/448608

Change 448608 merged by Dzahn:
[operations/puppet@production] mw::maint: allow home dir rsync between mwmaint, not terbium

https://gerrit.wikimedia.org/r/448608

Mentioned in SAL (#wikimedia-operations) [2018-07-27T18:15:47Z] <mutante> syncing home dirs from mwmaint1001 to mwmaint2001 (once, manually, not currently set to auto-sync, but to keep another copy of former terbium homes in case we failover) (T192092)

Change 448617 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] scap: remove terbium from dsh groups

https://gerrit.wikimedia.org/r/448617

Change 448617 merged by Dzahn:
[operations/puppet@production] scap: remove terbium from dsh groups

https://gerrit.wikimedia.org/r/448617

Change 448625 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] network::constants: remove terbium.eqiad.wmnet

https://gerrit.wikimedia.org/r/448625

Change 448625 merged by Dzahn:
[operations/puppet@production] network::constants: remove terbium.eqiad.wmnet

https://gerrit.wikimedia.org/r/448625

Mentioned in SAL (#wikimedia-operations) [2018-07-27T19:04:53Z] <mutante> terbium is being removed from ferm rules on elasticsearch/relforge, logstash/collector, mariadb/labtestwikitech and mw-maintenance itself (T192092)

Change 431041 abandoned by Dzahn:
decom terbium: rm from scap,site,dhcp,network constants

Reason:
done in several other changes or WIP

https://gerrit.wikimedia.org/r/431041

Change 431041 restored by Dzahn:
decom terbium: rm from scap,site,dhcp,network constants

https://gerrit.wikimedia.org/r/431041

Change 431041 abandoned by Dzahn:
decom terbium: rm from scap,site,dhcp,network constants

https://gerrit.wikimedia.org/r/431041

Change 448816 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] switch terbium to a spare system

https://gerrit.wikimedia.org/r/448816

Dzahn claimed this task.Jul 28 2018, 12:04 AM
Peachey88 updated the task description. (Show Details)Jul 28 2018, 12:28 AM

Change 445597 merged by Jcrespo:
[operations/puppet@production] dbtree: move dbtree outside of mwmaint hosts

https://gerrit.wikimedia.org/r/445597

Change 449136 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] mariadb: Fix mwdeploy user on tendril/dbtree

https://gerrit.wikimedia.org/r/449136

Change 449136 merged by Jcrespo:
[operations/puppet@production] mariadb: Fix mwdeploy user on tendril/dbtree

https://gerrit.wikimedia.org/r/449136

Change 449142 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/puppet@production] dbtree: Make dbtree work again on debmonitor active host

https://gerrit.wikimedia.org/r/449142

Change 449142 merged by Jcrespo:
[operations/puppet@production] dbtree: Make dbtree work again on dbmonitor1001

https://gerrit.wikimedia.org/r/449142

Change 445423 merged by Muehlenhoff:
[operations/puppet@production] Decommission terbium

https://gerrit.wikimedia.org/r/445423

Change 448816 abandoned by Dzahn:
switch terbium to a spare system

Reason:
already done by Moritz

https://gerrit.wikimedia.org/r/448816

MoritzMuehlenhoff closed this task as Resolved.Jul 31 2018, 8:31 AM

Replacements using stretch are up and running, the decom task for terbium is T200763, closing this task.