reimage maps-test* servers
Closed, ResolvedPublic

Description

Disks have been filled on maps-test* servers (see T146848). This most probably result in data corruption for cassandra and / or postgresql. Those servers needs to be reimaged anyway to align them with other maps servers and the improved automation that has been put in place on those.

Gehel created this task.Oct 3 2016, 1:05 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 3 2016, 1:05 PM

Mentioned in SAL (#wikimedia-operations) [2016-10-03T13:09:17Z] <gehel> shutting down services on maps-test* servers prior to reimage -T147194

Mentioned in SAL (#wikimedia-operations) [2016-10-03T13:13:41Z] <gehel> reimage of maps-test2001 - T147194

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2001.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610031410_gehel_29303.log.

Completed auto-reimage of hosts:

['maps-test2001.codfw.wmnet']

Those hosts were successful:

['maps-test2001.codfw.wmnet']

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2002.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610051449_gehel_18275.log.

Gehel moved this task from Backlog to In progress on the Maps-Sprint board.Oct 5 2016, 2:57 PM

Completed auto-reimage of hosts:

['maps-test2002.codfw.wmnet']

Those hosts were successful:

['maps-test2002.codfw.wmnet']
Gehel triaged this task as High priority.Oct 5 2016, 8:43 PM

Mentioned in SAL (#wikimedia-operations) [2016-10-07T08:53:48Z] <gehel> reimaging maps-test2003 - T147194

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2003.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610070855_gehel_10452.log.

Completed auto-reimage of hosts:

['maps-test2003.codfw.wmnet']

Those hosts were successful:

[]

Mentioned in SAL (#wikimedia-operations) [2016-10-07T09:20:08Z] <gehel> reimaging maps-test2004 - T147194

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2004.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610070920_gehel_14871.log.

Completed auto-reimage of hosts:

['maps-test2004.codfw.wmnet']

Those hosts were successful:

[]

Mentioned in SAL (#wikimedia-operations) [2016-10-07T16:25:04Z] <gehel> reimage maps-test2001 - T147194

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2001.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610071625_gehel_13919.log.

Completed auto-reimage of hosts:

['maps-test2001.codfw.wmnet']

Those hosts were successful:

['maps-test2001.codfw.wmnet']

Mentioned in SAL (#wikimedia-operations) [2016-10-10T13:12:26Z] <gehel> reimage maps-test2002 - T147194

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2002.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610101311_gehel_19322.log.

Completed auto-reimage of hosts:

['maps-test2002.codfw.wmnet']

Those hosts were successful:

[]

Mentioned in SAL (#wikimedia-operations) [2016-10-10T14:03:31Z] <gehel> reimage maps-test200[34] - T147194

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2003.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610101403_gehel_27871.log.

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2004.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610101403_gehel_27863.log.

Completed auto-reimage of hosts:

['maps-test2004.codfw.wmnet']

Those hosts were successful:

[]

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2004.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610101416_gehel_29045.log.

Completed auto-reimage of hosts:

['maps-test2003.codfw.wmnet']

Those hosts were successful:

['maps-test2003.codfw.wmnet']

Completed auto-reimage of hosts:

['maps-test2004.codfw.wmnet']

Those hosts were successful:

['maps-test2004.codfw.wmnet']

Change 315271 had a related patch set uploaded (by Gehel):
Maps - cleanup postgres user creation

https://gerrit.wikimedia.org/r/315271

Change 315959 had a related patch set uploaded (by Gehel):
maps - adding dummy passwords for postgresql monitoring and replication

https://gerrit.wikimedia.org/r/315959

Change 315959 merged by Gehel:
maps - adding dummy passwords for postgresql monitoring and replication

https://gerrit.wikimedia.org/r/315959

Change 316536 had a related patch set uploaded (by Gehel):
maps - change structure of slaves according to https://gerrit.wikimedia.org/r/#/c/315271/

https://gerrit.wikimedia.org/r/316536

Change 316536 merged by Gehel:
maps - change structure of slaves according to https://gerrit.wikimedia.org/r/#/c/315271/

https://gerrit.wikimedia.org/r/316536

Change 316549 had a related patch set uploaded (by Gehel):
maps - adding dummy monitoring password for postgresql

https://gerrit.wikimedia.org/r/316549

Change 316549 merged by Gehel:
maps - adding dummy monitoring password for postgresql

https://gerrit.wikimedia.org/r/316549

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2001.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610211601_gehel_23846.log.

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2001.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610211601_gehel_24043.log.

Completed auto-reimage of hosts:

['maps-test2001.codfw.wmnet']

Of which those FAILED:

set(['maps-test2001.codfw.wmnet'])

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2002.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610211642_gehel_529.log.

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2003.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610211642_gehel_662.log.

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2004.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610211642_gehel_723.log.

Completed auto-reimage of hosts:

['maps-test2002.codfw.wmnet']

Of which those FAILED:

set(['maps-test2002.codfw.wmnet'])

Completed auto-reimage of hosts:

['maps-test2003.codfw.wmnet']

Of which those FAILED:

set(['maps-test2003.codfw.wmnet'])

Completed auto-reimage of hosts:

['maps-test2004.codfw.wmnet']

Of which those FAILED:

set(['maps-test2004.codfw.wmnet'])

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2002.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610211709_gehel_7440.log.

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2003.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610211713_gehel_8323.log.

Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:

['maps-test2004.codfw.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201610211720_gehel_9640.log.

Completed auto-reimage of hosts:

['maps-test2003.codfw.wmnet']

Of which those FAILED:

set(['maps-test2003.codfw.wmnet'])

Completed auto-reimage of hosts:

['maps-test2002.codfw.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['maps-test2004.codfw.wmnet']

Of which those FAILED:

set(['maps-test2004.codfw.wmnet'])

Change 315271 merged by Gehel:
Maps - cleanup postgres user creation

https://gerrit.wikimedia.org/r/315271

Change 318282 had a related patch set uploaded (by Gehel):
maps / postgresql: new configuration format for slaves

https://gerrit.wikimedia.org/r/318282

Change 318282 merged by Gehel:
maps / postgresql: new configuration format for slaves

https://gerrit.wikimedia.org/r/318282

Change 318283 had a related patch set uploaded (by Gehel):
maps / postgresql: corrected hiera key for replication password

https://gerrit.wikimedia.org/r/318283

Change 318283 merged by Gehel:
maps / postgresql: corrected hiera key for replication password

https://gerrit.wikimedia.org/r/318283

Change 318286 had a related patch set uploaded (by Gehel):
maps / postgresql: corrected hiera key for replication password

https://gerrit.wikimedia.org/r/318286

Change 318286 merged by Gehel:
maps / postgresql: corrected hiera key for replication password

https://gerrit.wikimedia.org/r/318286

Mentioned in SAL (#wikimedia-operations) [2016-10-27T13:33:39Z] <gehel> postgres replication checks in error after deployment of https://gerrit.wikimedia.org/r/#/c/315271/ (T147194) - replication is working, only check is failing - icinga is silenced

Mentioned in SAL (#wikimedia-operations) [2016-10-27T13:34:05Z] <gehel> maps / postgres replication checks in error after deployment of https://gerrit.wikimedia.org/r/#/c/315271/ (T147194) - replication is working, only check is failing - icinga is silenced

Change 318511 had a related patch set uploaded (by Gehel):
maps / postgresql: use replication user for monitoring

https://gerrit.wikimedia.org/r/318511

Change 318511 merged by Gehel:
maps / postgresql: use replication user for monitoring

https://gerrit.wikimedia.org/r/318511

Gehel closed this task as Resolved.Oct 31 2016, 1:41 PM

Re-image is complete, initial tile generation is in progress and working fine, but we are going to switch it to Cassandra. I already mark this task as resovled.