Disks have been filled on maps-test* servers (see T146848). This most probably result in data corruption for cassandra and / or postgresql. Those servers needs to be reimaged anyway to align them with other maps servers and the improved automation that has been put in place on those.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | RKemper | T137939 Increase frequency of OSM replication | |||
Resolved | Gehel | T147194 reimage maps-test* servers | |||
Resolved | Gehel | T148031 Maps - error when doing initial tiles generation: "Error: could not create converter for SQL_ASCII"" | |||
Resolved | Gehel | T148114 Maps-test was created with incorrect initial encoding |
Event Timeline
Mentioned in SAL (#wikimedia-operations) [2016-10-03T13:09:17Z] <gehel> shutting down services on maps-test* servers prior to reimage -T147194
Mentioned in SAL (#wikimedia-operations) [2016-10-03T13:13:41Z] <gehel> reimage of maps-test2001 - T147194
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2001.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610031410_gehel_29303.log.
Completed auto-reimage of hosts:
['maps-test2001.codfw.wmnet']
Those hosts were successful:
['maps-test2001.codfw.wmnet']
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2002.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610051449_gehel_18275.log.
Completed auto-reimage of hosts:
['maps-test2002.codfw.wmnet']
Those hosts were successful:
['maps-test2002.codfw.wmnet']
Mentioned in SAL (#wikimedia-operations) [2016-10-07T08:53:48Z] <gehel> reimaging maps-test2003 - T147194
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2003.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610070855_gehel_10452.log.
Completed auto-reimage of hosts:
['maps-test2003.codfw.wmnet']
Those hosts were successful:
[]
Mentioned in SAL (#wikimedia-operations) [2016-10-07T09:20:08Z] <gehel> reimaging maps-test2004 - T147194
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2004.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610070920_gehel_14871.log.
Completed auto-reimage of hosts:
['maps-test2004.codfw.wmnet']
Those hosts were successful:
[]
Mentioned in SAL (#wikimedia-operations) [2016-10-07T16:25:04Z] <gehel> reimage maps-test2001 - T147194
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2001.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610071625_gehel_13919.log.
Completed auto-reimage of hosts:
['maps-test2001.codfw.wmnet']
Those hosts were successful:
['maps-test2001.codfw.wmnet']
Mentioned in SAL (#wikimedia-operations) [2016-10-10T13:12:26Z] <gehel> reimage maps-test2002 - T147194
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2002.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610101311_gehel_19322.log.
Completed auto-reimage of hosts:
['maps-test2002.codfw.wmnet']
Those hosts were successful:
[]
Mentioned in SAL (#wikimedia-operations) [2016-10-10T14:03:31Z] <gehel> reimage maps-test200[34] - T147194
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2003.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610101403_gehel_27871.log.
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2004.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610101403_gehel_27863.log.
Completed auto-reimage of hosts:
['maps-test2004.codfw.wmnet']
Those hosts were successful:
[]
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2004.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610101416_gehel_29045.log.
Completed auto-reimage of hosts:
['maps-test2003.codfw.wmnet']
Those hosts were successful:
['maps-test2003.codfw.wmnet']
Completed auto-reimage of hosts:
['maps-test2004.codfw.wmnet']
Those hosts were successful:
['maps-test2004.codfw.wmnet']
Change 315271 had a related patch set uploaded (by Gehel):
Maps - cleanup postgres user creation
Change 315959 had a related patch set uploaded (by Gehel):
maps - adding dummy passwords for postgresql monitoring and replication
Change 315959 merged by Gehel:
maps - adding dummy passwords for postgresql monitoring and replication
Change 316536 had a related patch set uploaded (by Gehel):
maps - change structure of slaves according to https://gerrit.wikimedia.org/r/#/c/315271/
Change 316536 merged by Gehel:
maps - change structure of slaves according to https://gerrit.wikimedia.org/r/#/c/315271/
Change 316549 had a related patch set uploaded (by Gehel):
maps - adding dummy monitoring password for postgresql
Change 316549 merged by Gehel:
maps - adding dummy monitoring password for postgresql
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2001.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610211601_gehel_23846.log.
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2001.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610211601_gehel_24043.log.
Completed auto-reimage of hosts:
['maps-test2001.codfw.wmnet']
Of which those FAILED:
set(['maps-test2001.codfw.wmnet'])
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2002.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610211642_gehel_529.log.
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2003.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610211642_gehel_662.log.
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2004.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610211642_gehel_723.log.
Completed auto-reimage of hosts:
['maps-test2002.codfw.wmnet']
Of which those FAILED:
set(['maps-test2002.codfw.wmnet'])
Completed auto-reimage of hosts:
['maps-test2003.codfw.wmnet']
Of which those FAILED:
set(['maps-test2003.codfw.wmnet'])
Completed auto-reimage of hosts:
['maps-test2004.codfw.wmnet']
Of which those FAILED:
set(['maps-test2004.codfw.wmnet'])
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2002.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610211709_gehel_7440.log.
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2003.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610211713_gehel_8323.log.
Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts:
['maps-test2004.codfw.wmnet']
The log can be found in /var/log/wmf-auto-reimage/201610211720_gehel_9640.log.
Completed auto-reimage of hosts:
['maps-test2003.codfw.wmnet']
Of which those FAILED:
set(['maps-test2003.codfw.wmnet'])
Completed auto-reimage of hosts:
['maps-test2002.codfw.wmnet']
and were ALL successful.
Completed auto-reimage of hosts:
['maps-test2004.codfw.wmnet']
Of which those FAILED:
set(['maps-test2004.codfw.wmnet'])
Change 318282 had a related patch set uploaded (by Gehel):
maps / postgresql: new configuration format for slaves
Change 318282 merged by Gehel:
maps / postgresql: new configuration format for slaves
Change 318283 had a related patch set uploaded (by Gehel):
maps / postgresql: corrected hiera key for replication password
Change 318283 merged by Gehel:
maps / postgresql: corrected hiera key for replication password
Change 318286 had a related patch set uploaded (by Gehel):
maps / postgresql: corrected hiera key for replication password
Change 318286 merged by Gehel:
maps / postgresql: corrected hiera key for replication password
Mentioned in SAL (#wikimedia-operations) [2016-10-27T13:33:39Z] <gehel> postgres replication checks in error after deployment of https://gerrit.wikimedia.org/r/#/c/315271/ (T147194) - replication is working, only check is failing - icinga is silenced
Mentioned in SAL (#wikimedia-operations) [2016-10-27T13:34:05Z] <gehel> maps / postgres replication checks in error after deployment of https://gerrit.wikimedia.org/r/#/c/315271/ (T147194) - replication is working, only check is failing - icinga is silenced
Change 318511 had a related patch set uploaded (by Gehel):
maps / postgresql: use replication user for monitoring
Change 318511 merged by Gehel:
maps / postgresql: use replication user for monitoring
Re-image is complete, initial tile generation is in progress and working fine, but we are going to switch it to Cassandra. I already mark this task as resovled.