Page MenuHomePhabricator

Make partman/custom/no-srv-format.cfg work
Closed, ResolvedPublic

Description

Currently the partman/custom/no-srv-format.cfg recipe is deliberately broken, to prevent accidental reinstalls of db hosts. See T251392 and T251416 for context.

We should instead make it work, and do as it describes - don't touch /srv. That would remove the need to connect to the console and manually go through the partitioning installer steps.

Event Timeline

Kormat created this task.May 4 2020, 1:48 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 4 2020, 1:48 PM
Kormat moved this task from Triage to Next on the DBA board.May 4 2020, 1:53 PM

N.B.: the case in netboot.cfg must be changed in the same CR that fixes no-srv-format.cfg, as otherwise all db hosts will reimage by default.

Change 594494 had a related patch set uploaded (by Kormat; owner: Kormat):
[operations/puppet@production] install_server: switch d-i-test to buster

https://gerrit.wikimedia.org/r/594494

Change 594494 merged by Kormat:
[operations/puppet@production] install_server: switch d-i-test to buster

https://gerrit.wikimedia.org/r/594494

Change 594679 had a related patch set uploaded (by Kormat; owner: Kormat):
[operations/puppet@production] install_server: Add no-srv-format-testing.cfg

https://gerrit.wikimedia.org/r/594679

Change 594679 merged by Kormat:
[operations/puppet@production] install_server: Add no-srv-format-testing.cfg

https://gerrit.wikimedia.org/r/594679

Change 594683 had a related patch set uploaded (by Kormat; owner: Kormat):
[operations/puppet@production] install_server: no-srv-format-testing.cfg v2

https://gerrit.wikimedia.org/r/594683

Change 594683 merged by Kormat:
[operations/puppet@production] install_server: no-srv-format-testing.cfg v2

https://gerrit.wikimedia.org/r/594683

Kormat changed the task status from Open to Stalled.EditedMay 6 2020, 3:31 PM

Blocked by T252027

I have the recipe dumpsdata100X-no-data-format.cfg which does less than it should (but at least doesn't format the array). I'd love a fully functional solution.

Change 601761 had a related patch set uploaded (by Kormat; owner: Kormat):
[operations/puppet@production] install_server: Allow reuse of partitions during reimage. [WIP]

https://gerrit.wikimedia.org/r/601761

Kormat added a comment.Jun 3 2020, 7:21 AM

@ArielGlenn: I had a look at your use-case, and made a more general solution that should cover your needs as well: https://gerrit.wikimedia.org/r/c/operations/puppet/+/601761

This can be closed when we change the default recipe for db hosts to reuse-parts (T252027: debian-installer: partman doesn't allow lvm LVs to be reused when reimaging), after we've been using it for a week or so more.

This can be closed when we change the default recipe for db hosts to reuse-parts (T252027: debian-installer: partman doesn't allow lvm LVs to be reused when reimaging), after we've been using it for a week or so more.

I am planning to do one last test with this with an es host, and if it goes fine, I think we can change puppet and consider this a victory. All the other hosts have worked fine.

Marostegui changed the task status from Stalled to Open.Thu, Jun 18, 9:31 AM

@Kormat I think we can proceed with this, es1025 and es2022 reimages worked fine.

Change 606401 had a related patch set uploaded (by Kormat; owner: Kormat):
[operations/puppet@production] install_server: Use reuse-db.cfg by default for db machines.

https://gerrit.wikimedia.org/r/606401

https://gerrit.wikimedia.org/r/606401 is out to make reuse-parts the default for most of the db fleet. We still need recipes for the tendril, zarcillo and dbprov hosts however.

Change 606401 merged by Kormat:
[operations/puppet@production] install_server: Use reuse-db.cfg by default for db machines.

https://gerrit.wikimedia.org/r/606401

Mentioned in SAL (#wikimedia-operations) [2020-06-18T12:04:51Z] <kormat> reimaging db1077 for final test T251768

Script wmf-auto-reimage was launched by kormat on cumin1001.eqiad.wmnet for hosts:

['db1077.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202006181208_kormat_184185.log.

Completed auto-reimage of hosts:

['db1077.eqiad.wmnet']

and were ALL successful.

Kormat closed this task as Resolved.Thu, Jun 18, 12:37 PM
Kormat claimed this task.

This is done :)

Change 608306 had a related patch set uploaded (by Kormat; owner: Kormat):
[operations/puppet@production] install_server: Remove no-srv-format.cfg

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608306

Change 608306 merged by Kormat:
[operations/puppet@production] install_server: Remove no-srv-format.cfg

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608306