Page MenuHomePhabricator

migrate maps servers to stretch with the current style
Closed, ResolvedPublic

Description

Since the new maps style is delayed, we need to start working on the migration to stretch with the current style.

Migration plan:

  • prepare puppet change to isolate maps2004 from current cluster
  • reduce cassandra replication to allow running on a 3 nodes cluster
  • depool maps2004
  • install new disks in maps2004
  • reimage maps2004 as stretch, with new partition scheme
  • initial OSM import on maps2004
  • regenerate tiles on maps2004
  • wait tile generation (probably a long, long time)
  • validate that everything looks good
  • depool maps2003
  • reimage maps2003 as maps slave
  • increase cassandra replication on the new cluster
  • validate
  • depool maps200[12], pool maps200[34]

Notes:

  • since this upgrade does not bring a new style, the tiles generated by both new and old install should be equivalent, having tiles served from both stretch and jessie, from the same DC or from different DC should not be an issue.

Details

Related Gerrit Patches:
operations/puppet : productionmaps: migrate maps2004 to stretch
operations/puppet : productionmaps: check OSM replication lag on all nodes in codfw
operations/puppet : productionmaps: maps2001 is now a slave after migrating to stretch
operations/puppet : productionmigrate maps2001 to stretch
operations/puppet : productionmaps migrate maps2002 to stretch
maps/kartotherian/deploy : masterInsert maps2003 into stretch environment
operations/puppet : productionmaps: migrate maps2003 to stretch
operations/puppet : productionmaps: migrate maps2004 to stretch
operations/puppet : productionmaps: re-enable OSM lag check
operations/puppet : productionmaps: migrate maps1001 to stretch
operations/puppet : productionmaps: migrate maps1002 to stretch
operations/puppet : productionmaps: increase wal_size for postgres 9.6 on stretch
operations/puppet : productionkartotherian: install nodejs-legacy module
operations/puppet : productionmaps: migrate maps1003 to stretch
operations/puppet : productionmaps: change cassandra version
operations/puppet : productionmaps: migrate maps1004 to stretch
operations/puppet : productionmaps: migrate maps1004 to stretch

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Now that the Beta Cluster is back in good working order with both Jessie (deployment-maps03) and Stretch (deployment-maps04) instances, I think we're ready to go forward with this. @Gehel, how soon do you think this could happen?

@MSantos is currently working on finishing the dependency updates to use Mapnik 3.7.2 (T188674) and on merging the 'stretch' branches of the -package and -deploy repos back into master. That means our next production deployment will be after the move to Stretch.

All that remains to be done for T172090 is to decom the maps-test cluster. That can be done anytime. Should it wait until after the migration to stretch?

Mholloway moved this task from In progress to To-do on the Maps-Sprint board.Aug 2 2018, 7:37 PM
Gehel updated the task description. (Show Details)Aug 20 2018, 12:13 PM

Change 457408 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] maps: migrate maps2004 to stretch

https://gerrit.wikimedia.org/r/457408

Change 459535 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] maps: migrate maps1004 to stretch

https://gerrit.wikimedia.org/r/459535

MSantos moved this task from To-do to In progress on the Maps-Sprint board.Sep 12 2018, 5:11 PM
Mholloway assigned this task to Gehel.Sep 24 2018, 5:53 AM

Change 459535 merged by Gehel:
[operations/puppet@production] maps: migrate maps1004 to stretch

https://gerrit.wikimedia.org/r/459535

Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps1004.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201809251159_gehel_3263.log.

Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps1004.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201809251159_gehel_3447.log.

Change 462702 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] maps: migrate maps1004 to stretch

https://gerrit.wikimedia.org/r/462702

Change 462702 merged by Gehel:
[operations/puppet@production] maps: migrate maps1004 to stretch

https://gerrit.wikimedia.org/r/462702

Script wmf-auto-reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps1004.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201809251330_gehel_24458.log.

Completed auto-reimage of hosts:

['maps1004.eqiad.wmnet']

and were ALL successful.

Change 483798 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] maps: migrate maps1003 to stretch

https://gerrit.wikimedia.org/r/483798

Mentioned in SAL (#wikimedia-operations) [2019-01-17T13:38:28Z] <gehel> starting upgrade to stretch for maps1003 - T198622

Change 483798 merged by Gehel:
[operations/puppet@production] maps: migrate maps1003 to stretch

https://gerrit.wikimedia.org/r/483798

Mentioned in SAL (#wikimedia-operations) [2019-01-17T14:01:11Z] <gehel> pooling maps1004 (first time after stretch upgrade) - T198622

Script wmf-auto-reimage was launched by gehel on cumin1001.eqiad.wmnet for hosts:

['maps1003.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901171418_gehel_162898.log.

Completed auto-reimage of hosts:

['maps1003.eqiad.wmnet']

and were ALL successful.

Change 485072 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] maps: change cassandra version

https://gerrit.wikimedia.org/r/485072

Change 485072 merged by Gehel:
[operations/puppet@production] maps: change cassandra version

https://gerrit.wikimedia.org/r/485072

Change 485164 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] kartotherian: install nodejs-legacy module

https://gerrit.wikimedia.org/r/485164

Change 485164 merged by Gehel:
[operations/puppet@production] kartotherian: install nodejs-legacy module

https://gerrit.wikimedia.org/r/485164

Change 485192 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] maps: increase wal_size for postgres 9.6 on stretch

https://gerrit.wikimedia.org/r/485192

Change 485192 merged by Gehel:
[operations/puppet@production] maps: increase wal_size for postgres 9.6 on stretch

https://gerrit.wikimedia.org/r/485192

Mentioned in SAL (#wikimedia-operations) [2019-01-19T12:34:43Z] <onimisionipe> pool maps1003 - stretch migration is complete T198622

Change 485584 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] maps: migrate maps1002 to stretch

https://gerrit.wikimedia.org/r/485584

Mentioned in SAL (#wikimedia-operations) [2019-01-22T09:55:04Z] <gehel> repooling maps1003 after upgrade to stretch - T198622

Mentioned in SAL (#wikimedia-operations) [2019-01-22T12:39:58Z] <gehel> start stretch upgrade for maps1002 - T198622

Change 485584 merged by Gehel:
[operations/puppet@production] maps: migrate maps1002 to stretch

https://gerrit.wikimedia.org/r/485584

Script wmf-auto-reimage was launched by gehel on cumin1001.eqiad.wmnet for hosts:

['maps1002.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901221306_gehel_241795.log.

Completed auto-reimage of hosts:

['maps1002.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-01-22T14:45:48Z] <onimisionipe> starting init of postgres replication on maps1002 - T198622

Change 486062 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] maps: migrate maps1001 to stretch

https://gerrit.wikimedia.org/r/486062

Mentioned in SAL (#wikimedia-operations) [2019-01-24T10:37:23Z] <gehel> starting stretch upgrade on maps1001 - T198622

Change 486062 merged by Gehel:
[operations/puppet@production] maps: migrate maps1001 to stretch

https://gerrit.wikimedia.org/r/486062

Script wmf-auto-reimage was launched by gehel on cumin1001.eqiad.wmnet for hosts:

['maps1001.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201901241047_gehel_58889.log.

Completed auto-reimage of hosts:

['maps1001.eqiad.wmnet']

and were ALL successful.

Mentioned in SAL (#wikimedia-operations) [2019-01-24T18:37:14Z] <onimisionipe> pooling maps1003 - stretch migration is complete. T198622

Change 486436 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] maps: re-enable OSM lag check

https://gerrit.wikimedia.org/r/486436

Change 486436 merged by Gehel:
[operations/puppet@production] maps: re-enable OSM lag check

https://gerrit.wikimedia.org/r/486436

Change 487360 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] maps: migrate maps2004 to stretch

https://gerrit.wikimedia.org/r/487360

Mentioned in SAL (#wikimedia-operations) [2019-02-07T15:55:05Z] <gehel> starting reimage of maps2004 - T198622

Change 487360 merged by Gehel:
[operations/puppet@production] maps: migrate maps2004 to stretch

https://gerrit.wikimedia.org/r/487360

Script wmf-auto-reimage was launched by gehel on cumin2001.codfw.wmnet for hosts:

['maps2004.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201902071606_gehel_23617.log.

Mentioned in SAL (#wikimedia-operations) [2019-02-08T13:39:53Z] <onimisionipe> starting osm-initial-import for maps2004 which is the newly migrated to stretch master - T198622

Change 491191 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] maps: migrate maps2003 to stretch

https://gerrit.wikimedia.org/r/491191

Change 491191 had a related patch set uploaded (by Gehel; owner: Mathew.onipe):
[operations/puppet@production] maps: migrate maps2003 to stretch

https://gerrit.wikimedia.org/r/491191

Change 491191 merged by Gehel:
[operations/puppet@production] maps: migrate maps2003 to stretch

https://gerrit.wikimedia.org/r/491191

Script wmf-auto-reimage was launched by gehel on cumin2001.codfw.wmnet for hosts:

['maps2003.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201904090925_gehel_30748.log.

Change 502514 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[maps/kartotherian/deploy@master] Insert maps2003 into stretch environment

https://gerrit.wikimedia.org/r/502514

Mentioned in SAL (#wikimedia-operations) [2019-04-10T07:12:21Z] <onimisionipe> depooling maps200[34] to increase cassandra replication factor - T198622

Change 502514 merged by Gehel:
[maps/kartotherian/deploy@master] Insert maps2003 into stretch environment

https://gerrit.wikimedia.org/r/502514

Change 502768 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] maps migrate maps2002 to stretch

https://gerrit.wikimedia.org/r/502768

Change 502768 merged by Gehel:
[operations/puppet@production] maps migrate maps2002 to stretch

https://gerrit.wikimedia.org/r/502768

Script wmf-auto-reimage was launched by gehel on cumin2001.codfw.wmnet for hosts:

['maps2002.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201904110941_gehel_28422.log.

Change 502977 had a related patch set uploaded (by Mathew.onipe; owner: Mathew.onipe):
[operations/puppet@production] migrate maps2001 to stretch

https://gerrit.wikimedia.org/r/502977

Completed auto-reimage of hosts:

['maps2002.codfw.wmnet']

Of which those FAILED:

['maps2002.codfw.wmnet']

Change 502977 merged by Gehel:
[operations/puppet@production] migrate maps2001 to stretch

https://gerrit.wikimedia.org/r/502977

Change 502988 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] maps: maps2001 is now a slave after migrating to stretch

https://gerrit.wikimedia.org/r/502988

Change 502988 merged by Gehel:
[operations/puppet@production] maps: maps2001 is now a slave after migrating to stretch

https://gerrit.wikimedia.org/r/502988

Script wmf-auto-reimage was launched by gehel on cumin2001.codfw.wmnet for hosts:

['maps2001.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201904111300_gehel_10390.log.

Script wmf-auto-reimage was launched by gehel on cumin2001.codfw.wmnet for hosts:

['maps2001.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201904111308_gehel_11769.log.

This task is now complete and the lessons learnt have been documented here: https://wikitech.wikimedia.org/wiki/Maps-migration

Mathew.onipe closed this task as Resolved.Apr 12 2019, 10:00 AM

Change 503932 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] maps: check OSM replication lag on all nodes in codfw

https://gerrit.wikimedia.org/r/503932

Change 503932 merged by Gehel:
[operations/puppet@production] maps: check OSM replication lag on all nodes in codfw

https://gerrit.wikimedia.org/r/503932

Change 457408 abandoned by Gehel:
maps: migrate maps2004 to stretch

Reason:
already done!

https://gerrit.wikimedia.org/r/457408