Page MenuHomePhabricator

Automate rebuild and rebuild toolsbeta-sgewebgrid-generic-0901
Closed, ResolvedPublic

Description

fsck of root volume failed, system is inaccessible.

Checking in progress on 1 disk (0.0% complete)Checking in progress on 0 disks (100.0% complete)[[0;1;31mFAILED[0m] Failed to start File System Check o���4970c-1b27-4544-b346-adbfe686feb0.

Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2021-10-06T10:05:27Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-06T10:07:58Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-06T10:08:53Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-06T10:13:23Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-06T10:36:40Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Change 726894 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/cookbooks@wmcs] toolforge: new add_grid_webgrid_generic_node recipe

https://gerrit.wikimedia.org/r/726894

Mentioned in SAL (#wikimedia-cloud) [2021-10-07T07:58:03Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-07T08:04:44Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-07T12:50:10Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-07T12:50:46Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-07T12:55:09Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-07T13:31:42Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-07T14:06:33Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-07T14:21:08Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Change 729926 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] cinderutils::ensure: Check falsey instead of empty string

https://gerrit.wikimedia.org/r/729926

Mentioned in SAL (#wikimedia-cloud) [2021-10-11T10:32:00Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Change 729990 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/software/spicerack@master] puppet.PuppetHost.get_ca_server: use only the last line

https://gerrit.wikimedia.org/r/729990

Change 729994 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] wmcs::instance: use path to allow different systemctl paths

https://gerrit.wikimedia.org/r/729994

Change 729994 merged by David Caro:

[operations/puppet@production] wmcs::instance: use path to allow different systemctl paths

https://gerrit.wikimedia.org/r/729994

Change 729926 abandoned by David Caro:

[operations/puppet@production] cinderutils::ensure: Check falsey instead of empty string

Reason:

This was not the cause, but the size limits, will add extra verbosity to the messages to avoid falling for this again.

https://gerrit.wikimedia.org/r/729926

Change 730001 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] wmcs-prepare-cinder-volume.py: chown also after mounting

https://gerrit.wikimedia.org/r/730001

Change 730001 merged by David Caro:

[operations/puppet@production] wmcs-prepare-cinder-volume.py: chown also after mounting

https://gerrit.wikimedia.org/r/730001

Mentioned in SAL (#wikimedia-cloud) [2021-10-11T15:00:35Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Change 730005 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] cinderutils::ensure: give more info when no device found

https://gerrit.wikimedia.org/r/730005

Mentioned in SAL (#wikimedia-cloud) [2021-10-11T15:24:57Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Change 730025 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] p:base: Move the reboot-host script there

https://gerrit.wikimedia.org/r/730025

Change 729990 merged by David Caro:

[operations/software/spicerack@master] puppet.PuppetHost.get_ca_server: use only the last line

https://gerrit.wikimedia.org/r/729990

Change 730025 merged by David Caro:

[operations/puppet@production] cumin::target: Move the reboot-host script there

https://gerrit.wikimedia.org/r/730025

Mentioned in SAL (#wikimedia-cloud) [2021-10-12T14:46:18Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-12T14:51:59Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-12T16:10:55Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Change 730270 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/software/spicerack@master] remote: use only the last line for the uptime

https://gerrit.wikimedia.org/r/730270

After manual reboot, the new VM was able to run puppet correctly, but now it seems to have issues with a long hostname xd

root@toolsbeta-sgewebgrid-generic-09-2:~# qconf -sel
reresolve hostname failed: hostname exceeds hostname length(MAXHOSTNAMELEN) on this system

root@toolsbeta-sgewebgrid-generic-09-2:~# getconf HOST_NAME_MAX
64

root@toolsbeta-sgewebgrid-generic-09-2:~# hostname -f | wc -c
67

That seems to be hardcoded on the kernel :S

Mentioned in SAL (#wikimedia-cloud) [2021-10-13T10:19:16Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-13T10:19:46Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus

Mentioned in SAL (#wikimedia-cloud) [2021-10-13T11:18:56Z] <wm-bot> Added a new grid webgrid generic node toolsbeta-sgewebgen-09-1.toolsbeta.eqiad1.wikimedia.cloud to the pool (T292465) - cookbook ran by dcaro@vulcanus

dcaro renamed this task from Rebuild toolsbeta-sgewebgrid-generic-0901 to Automate rebuild and rebuild toolsbeta-sgewebgrid-generic-0901.Oct 14 2021, 7:34 AM
dcaro triaged this task as High priority.
dcaro moved this task from Doing to Today on the User-dcaro board.

Change 730005 merged by Andrew Bogott:

[operations/puppet@production] cinderutils::ensure: give more info when no device found

https://gerrit.wikimedia.org/r/730005

Change 731111 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] grid_configurator: Added new naming schemes

https://gerrit.wikimedia.org/r/731111

Change 731113 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] wmcs-srpeadcheck-tools: add new shorter webgrid names

https://gerrit.wikimedia.org/r/731113

Change 731114 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/puppet@production] tools-clush-generator: add the shorter webgrid names

https://gerrit.wikimedia.org/r/731114

Change 731111 merged by David Caro:

[operations/puppet@production] grid_configurator: Added new naming schemes

https://gerrit.wikimedia.org/r/731111

Change 730270 abandoned by David Caro:

[operations/software/spicerack@master] remote: use only the last line for the uptime

Reason:

I'll unblock myself working around this shortcoming on the cookbook code instead.

https://gerrit.wikimedia.org/r/730270

Change 731911 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/cookbooks@wmcs] start_instance_with_prefix: fix next instance counter

https://gerrit.wikimedia.org/r/731911

Change 731912 had a related patch set uploaded (by David Caro; author: David Caro):

[operations/cookbooks@wmcs] start_instance_with_prefix: add tries parameter

https://gerrit.wikimedia.org/r/731912

dcaro removed dcaro as the assignee of this task.Oct 19 2021, 1:47 PM
dcaro removed a project: User-dcaro.

Change 731911 abandoned by David Caro:

[operations/cookbooks@wmcs] start_instance_with_prefix: fix next instance counter

Reason:

Merged into https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/731885

https://gerrit.wikimedia.org/r/731911

Change 731912 merged by jenkins-bot:

[operations/cookbooks@wmcs] start_instance_with_prefix: add tries parameter

https://gerrit.wikimedia.org/r/731912

Change 731113 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] wmcs-srpeadcheck-tools: add new shorter webgrid names

https://gerrit.wikimedia.org/r/731113

Change 726894 merged by Arturo Borrero Gonzalez:

[operations/cookbooks@wmcs] toolforge: new add_grid_webgrid_generic_node recipe

https://gerrit.wikimedia.org/r/726894

Change 731114 abandoned by David Caro:

[operations/puppet@production] tools-clush-generator: add the shorter webgrid names

Reason:

This was redone in some other patch

https://gerrit.wikimedia.org/r/731114

taavi claimed this task.