Implement parsercache service on pc[12]0(07|08|09|10) and replace leased pc[12]00[456]
Open, Stalled, HighPublic

Description

Parsercaches on codfw hw install: T207259
Parsercaches on eqiad hw install: T207258

Parsercache new 4 hosts per main datacenter

are due to be put into service to replace the current leased hosts. pc2* are already ready to be implemented, pc1* will be soon. Setup puppet, monitoring, replication and provision them in order to get them ready to replace the older pc1004,5,6 and pc2004,5,6.

While rearchitecturing would be nice (T133523), it is out of scope due (unless minor trivial things that can be easily done) to the hard deadline, and a 1:1 replacement is more likely to happen until more time is available for that.

Due to extended disk available, emergency measures taken at T167784 can now be reverted (https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/361656/)

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 473153 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool pc2004

https://gerrit.wikimedia.org/r/473153

Change 473153 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool pc2004

https://gerrit.wikimedia.org/r/473153

Mentioned in SAL (#wikimedia-operations) [2018-11-13T05:42:57Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Depool pc2004 - T208383 (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2018-11-13T05:43:27Z] <marostegui> Stop MySQL on pc2004 to transfer its data to pc2007 - T208383

Change 473176 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] pc2007: Enable notifications

https://gerrit.wikimedia.org/r/473176

Change 473194 had a related patch set uploaded (by Banyek; owner: Banyek):
[operations/puppet@production] mariadb: parsercache different basedirs

https://gerrit.wikimedia.org/r/473194

Change 473176 merged by Marostegui:
[operations/puppet@production] pc2007: Enable notifications

https://gerrit.wikimedia.org/r/473176

Change 473206 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Pool pc2007 in pc1

https://gerrit.wikimedia.org/r/473206

Change 473206 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Pool pc2007 in pc1

https://gerrit.wikimedia.org/r/473206

Marostegui updated the task description. (Show Details)Nov 13 2018, 1:59 PM

pc2007 is now in production replacing pc2004

Marostegui updated the task description. (Show Details)Nov 13 2018, 2:00 PM
Marostegui claimed this task.

Assigning this to myself to reflect the current status

Change 473194 merged by Banyek:
[operations/puppet@production] mariadb: parsercache different basedirs

https://gerrit.wikimedia.org/r/473194

Change 473332 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool pc2005

https://gerrit.wikimedia.org/r/473332

Change 473333 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] pc2010: Enable notifications

https://gerrit.wikimedia.org/r/473333

Change 473333 merged by Marostegui:
[operations/puppet@production] pc2010: Enable notifications

https://gerrit.wikimedia.org/r/473333

Change 473332 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool pc2005

https://gerrit.wikimedia.org/r/473332

Mentioned in SAL (#wikimedia-operations) [2018-11-14T06:27:33Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Depool pc2005 - T208383 (duration: 01m 04s)

Mentioned in SAL (#wikimedia-operations) [2018-11-14T06:31:59Z] <marostegui> Stop MySQL on pc2005 to clone it to pc2008 - T208383

Marostegui moved this task from Next to In progress on the DBA board.Nov 14 2018, 8:50 AM

Change 473444 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Add pc2010 as spare

https://gerrit.wikimedia.org/r/473444

Change 473444 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Add pc2010 as spare

https://gerrit.wikimedia.org/r/473444

Mentioned in SAL (#wikimedia-operations) [2018-11-14T13:55:51Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Add pc2010 as spare - T208383 (duration: 00m 53s)

I have added pc2010 as spare host with the following line on db-codfw.php - we can change it if we want to and make it different, but at least it is there for now:
https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/473444/

$wmgParserCacheDBs = [

<snip>
    '10.64.48.128' => '10.192.48.39',  # pc2006, D5 2.4TB 256GB
    # 'spare' => '10.192.48.14',  # pc2010, D3 4.4TB 256GB # spare host. Use it to replace any of the above if needed
Marostegui updated the task description. (Show Details)Nov 14 2018, 2:59 PM
Marostegui updated the task description. (Show Details)Nov 14 2018, 3:04 PM

Change 473662 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] pc2008.yaml: Enable notifications

https://gerrit.wikimedia.org/r/473662

Change 473665 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Add pc2008 to pc2

https://gerrit.wikimedia.org/r/473665

Change 473662 merged by Marostegui:
[operations/puppet@production] pc2008.yaml: Enable notifications

https://gerrit.wikimedia.org/r/473662

Change 473665 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Add pc2008 to pc2

https://gerrit.wikimedia.org/r/473665

Mentioned in SAL (#wikimedia-operations) [2018-11-15T05:59:07Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Pool pc2008 in pc2 - T208383 (duration: 00m 56s)

pc2008 is now online for pc2.

Marostegui updated the task description. (Show Details)Nov 15 2018, 5:59 AM

Change 473666 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Depool pc2006

https://gerrit.wikimedia.org/r/473666

Change 473666 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Depool pc2006

https://gerrit.wikimedia.org/r/473666

Mentioned in SAL (#wikimedia-operations) [2018-11-15T06:05:03Z] <marostegui@deploy1001> sync-file aborted: Dool pc2006 - T208383 (duration: 00m 00s)

Mentioned in SAL (#wikimedia-operations) [2018-11-15T06:06:04Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Depool pc2006 - T208383 (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2018-11-15T06:06:31Z] <marostegui> Stop MySQL on pc2006 to clone pc2009 - T208383

Change 473732 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] site.pp: Adjust regex for parsercache

https://gerrit.wikimedia.org/r/473732

Change 473732 merged by Marostegui:
[operations/puppet@production] site.pp: Adjust regex for parsercache

https://gerrit.wikimedia.org/r/473732

Change 474073 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-codfw.php: Pool pc2009 on pc3

https://gerrit.wikimedia.org/r/474073

Change 474074 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] pc2009.yaml: Enable notifications

https://gerrit.wikimedia.org/r/474074

Change 474074 merged by Marostegui:
[operations/puppet@production] pc2009.yaml: Enable notifications

https://gerrit.wikimedia.org/r/474074

Change 474073 merged by jenkins-bot:
[operations/mediawiki-config@master] db-codfw.php: Pool pc2009 on pc3

https://gerrit.wikimedia.org/r/474073

Mentioned in SAL (#wikimedia-operations) [2018-11-16T06:36:43Z] <marostegui@deploy1001> Synchronized wmf-config/db-codfw.php: Pool pc2009 in pc3 - T208383 (duration: 00m 56s)

pc2009 has been pooled in.

I am going to leave the weekend go by before starting the decommission process for pc2004, pc2005 and pc2006 to make sure everything is ok hw-wise.

Marostegui updated the task description. (Show Details)Nov 16 2018, 6:37 AM
RobH added a subscriber: RobH.Mon, Nov 26, 7:33 PM

Please note that pc1008, pc1009, and pc1010 are ready for DBA team to take them over. OS is installed and running role:spare via T207258

Change 475951 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] mariadb: Provision pc1007-pc1010

https://gerrit.wikimedia.org/r/475951

Change 475951 merged by Marostegui:
[operations/puppet@production] mariadb: Provision pc1007-pc1010

https://gerrit.wikimedia.org/r/475951

Change 475971 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool pc1004

https://gerrit.wikimedia.org/r/475971

Change 475971 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool pc1004

https://gerrit.wikimedia.org/r/475971

Mentioned in SAL (#wikimedia-operations) [2018-11-27T09:33:40Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Depool pc1004 - T208383 (duration: 00m 46s)

Mentioned in SAL (#wikimedia-operations) [2018-11-27T09:35:19Z] <marostegui> Stop MySQL on pc1004 to clone pc1010 - T208383

Mentioned in SAL (#wikimedia-operations) [2018-11-27T13:34:38Z] <marostegui> Change pc2007 and pc2010 to replicate from pc1010 instead of from pc1004 - T208383

Change 476014 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] pc1010: Enable notifications

https://gerrit.wikimedia.org/r/476014

Change 476015 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Pool pc1010 in pc1

https://gerrit.wikimedia.org/r/476015

Change 476014 merged by Marostegui:
[operations/puppet@production] pc1010: Enable notifications

https://gerrit.wikimedia.org/r/476014

Change 476015 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Pool pc1010 in pc1

https://gerrit.wikimedia.org/r/476015

Mentioned in SAL (#wikimedia-operations) [2018-11-27T13:59:05Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Pool pc1010 in pc1 - T208383 (duration: 00m 46s)

pc1010 has been pooled into pc1 - T208383#4777571

Marostegui updated the task description. (Show Details)Tue, Nov 27, 2:10 PM

Change 476200 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool pc1005

https://gerrit.wikimedia.org/r/476200

Change 476200 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool pc1005

https://gerrit.wikimedia.org/r/476200

Mentioned in SAL (#wikimedia-operations) [2018-11-28T06:13:45Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Depool pc1005 - T208383 (duration: 01m 04s)

Mentioned in SAL (#wikimedia-operations) [2018-11-28T06:14:20Z] <marostegui> Stop MySQL on pc1005 to clone pc1008 - T208383

Change 476222 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Pool pc1008 in pc2

https://gerrit.wikimedia.org/r/476222

Mentioned in SAL (#wikimedia-operations) [2018-11-28T09:51:05Z] <marostegui> Change pc2008 to replicate from pc1008 instead of pc1005 - T208383

Mentioned in SAL (#wikimedia-operations) [2018-11-28T09:55:56Z] <marostegui> Update tendril topology for pc1 - T208383

Change 476225 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] pc1008: Enable notifications

https://gerrit.wikimedia.org/r/476225

Change 476225 merged by Marostegui:
[operations/puppet@production] pc1008: Enable notifications

https://gerrit.wikimedia.org/r/476225

Change 476222 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Pool pc1008 in pc2

https://gerrit.wikimedia.org/r/476222

Mentioned in SAL (#wikimedia-operations) [2018-11-28T10:26:45Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Pool pc1008 - T208383 (duration: 00m 50s)

pc1008 has been pooled in pc2 - T208383#4780657

Marostegui updated the task description. (Show Details)Wed, Nov 28, 10:30 AM

Change 476234 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] install_server: Only allow reimage pc1007

https://gerrit.wikimedia.org/r/476234

Change 476234 merged by Marostegui:
[operations/puppet@production] install_server: Only allow reimage pc1007

https://gerrit.wikimedia.org/r/476234

Change 476454 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool pc1006

https://gerrit.wikimedia.org/r/476454

Change 476454 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool pc1006

https://gerrit.wikimedia.org/r/476454

Mentioned in SAL (#wikimedia-operations) [2018-11-29T06:59:29Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Depool pc1006 - T208383 (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2018-11-29T06:59:42Z] <marostegui> Stop MySQL on pc1006 to clone pc1009 - T208383

Change 476469 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] pc1009: Enable notifications

https://gerrit.wikimedia.org/r/476469

Marostegui updated the task description. (Show Details)Thu, Nov 29, 10:45 AM

Change 476469 merged by Marostegui:
[operations/puppet@production] pc1009: Enable notifications

https://gerrit.wikimedia.org/r/476469

Change 476495 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Pool pc1009

https://gerrit.wikimedia.org/r/476495

Mentioned in SAL (#wikimedia-operations) [2018-11-29T13:05:26Z] <marostegui> Upgrade pc3 tendril topology - T208383

Change 476495 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Pool pc1009

https://gerrit.wikimedia.org/r/476495

Mentioned in SAL (#wikimedia-operations) [2018-11-29T13:12:27Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Pool pc1009 in pc3 - T208383 (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2018-11-29T13:12:27Z] <marostegui@deploy1001> Synchronized wmf-config/db-eqiad.php: Pool pc1009 in pc3 - T208383 (duration: 00m 53s)

pc1009 is now serving on pc3

Marostegui updated the task description. (Show Details)Thu, Nov 29, 3:27 PM
Marostegui changed the task status from Open to Stalled.Fri, Nov 30, 8:50 AM

I am stalling this until pc1007 is fixed as it arrived broken (T207258#4774505).
Once it is fixed we should:

  • Clone pc1007
  • Make it the master for pc1 and set pc1010 to spare (to have consistency with codfw setup)
  • Move replication on pc2007 and pc2010 from pc1010 to pc1007
Banyek moved this task from In progress to FYI on the User-Banyek board.Fri, Dec 7, 3:18 PM