Page MenuHomePhabricator

move 20 new codfw parsoid servers (parse2*) into production
Closed, ResolvedPublic

Description

parse2001 through parse2020 have been racked in the parent task.

This is the ticket to get them into production.

  • - service implementer changes from 'staged' status to 'active' status in netbox'

Event Timeline

Change 579026 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] add new codfw parsoid servers into production

https://gerrit.wikimedia.org/r/579026

Dzahn triaged this task as Medium priority.Jul 8 2020, 12:07 AM

Change 626496 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add parsoid role to parsoid2020.codfw.wmnet

https://gerrit.wikimedia.org/r/626496

Change 626496 merged by Dzahn:
[operations/puppet@production] site: add parsoid role to parsoid2020.codfw.wmnet

https://gerrit.wikimedia.org/r/626496

Change 626521 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add parsoid role to all new parsoid hardware

https://gerrit.wikimedia.org/r/626521

Mentioned in SAL (#wikimedia-operations) [2020-09-11T00:38:46Z] <mutante> generating mcrouter certs for parse2001 - parse2019 - mcrouter_generate_certs on puppetmaster1001 (T247441)

Change 626523 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[labs/private@master] add fake keys for new parsoid hosts

https://gerrit.wikimedia.org/r/626523

Change 626523 merged by Dzahn:
[labs/private@master] add fake keys for new parsoid hosts

https://gerrit.wikimedia.org/r/626523

Change 626521 merged by Dzahn:
[operations/puppet@production] site: add parsoid role to all new parsoid hardware

https://gerrit.wikimedia.org/r/626521

Mentioned in SAL (#wikimedia-operations) [2020-09-11T01:33:20Z] <mutante> initial puppet runs on parse2001 - parse2010, staggered, not in production yet, new hardware, setup WIP (T247441)

Mentioned in SAL (#wikimedia-operations) [2020-09-11T01:53:31Z] <mutante> initial puppet runs on parse2010 - parse2020, staggered, not in production yet, new hardware, setup WIP (T247441)

Change 626719 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/mediawiki-config@master] add new parse* servers to $wgLinterSubmitterWhitelist

https://gerrit.wikimedia.org/r/626719

Change 626721 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] add new parsoid servers to conftool-data

https://gerrit.wikimedia.org/r/626721

Dzahn raised the priority of this task from Medium to High.Sep 11 2020, 5:31 PM

The DSH group alert fired on Saturday, I acked it:)

Thank you, i thought i had done 48 hour downtimes on everything but obviously it failed somehow.

Change 626721 merged by Dzahn:
[operations/puppet@production] add new parsoid servers to conftool-data

https://gerrit.wikimedia.org/r/626721

Mentioned in SAL (#wikimedia-operations) [2020-09-14T16:36:05Z] <mutante> pooled the first of the new parsoid servers - parse2001 (T247441)

Mentioned in SAL (#wikimedia-operations) [2020-09-14T17:51:01Z] <mutante> all new parse* parsoid hardware pooled now and set to active in netbox, deploy in 10 min will add to $wgLinterSubmitterWhitelist (T247441)

Change 626719 merged by jenkins-bot:
[operations/mediawiki-config@master] add new parse* servers to $wgLinterSubmitterWhitelist

https://gerrit.wikimedia.org/r/626719

Mentioned in SAL (#wikimedia-operations) [2020-09-14T18:21:21Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: 27ba5a1da1fb00e721cfa82dd4cd1fbac2541053: add new parse* servers to $wgLinterSubmitterWhitelist (T247441) (duration: 00m 56s)

  • servers had OS installed
  • servers had puppet role applied
  • icinga checks confirmed all green
  • added to conftool data
  • set weight for all to 10
  • pooled from inactive to no to add to pybal config
  • pooled first one, then 10, then all
  • set to active in netbox
  • added to $wgLinterSubmitterWhitelist in MediaWiki config and deployed to prod
  • confirmed in grafana new servers are getting traffic, CPU/memory load going down on old servers

Change 579026 abandoned by Dzahn:
[operations/puppet@production] move 20 new codfw parsoid servers into production

Reason:
duplicate - already done in other patches

https://gerrit.wikimedia.org/r/579026

There was a request to decom the wtp2* servers. There is a new ticket for that at T265558.