fsck of root volume failed, system is inaccessible.
Checking in progress on 1 disk (0.0% complete)Checking in progress on 0 disks (100.0% complete)[[0;1;31mFAILED[0m] Failed to start File System Check o���4970c-1b27-4544-b346-adbfe686feb0.
fsck of root volume failed, system is inaccessible.
Checking in progress on 1 disk (0.0% complete)Checking in progress on 0 disks (100.0% complete)[[0;1;31mFAILED[0m] Failed to start File System Check o���4970c-1b27-4544-b346-adbfe686feb0.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | dcaro | T290970 File System corruption on cloud-vps instances | |||
Resolved | taavi | T292465 Automate rebuild and rebuild toolsbeta-sgewebgrid-generic-0901 |
Using https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/New_grid_engine_exec_host as a guideline to create a
cookbook.
Mentioned in SAL (#wikimedia-cloud) [2021-10-06T10:05:27Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-06T10:07:58Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-06T10:08:53Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-06T10:13:23Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-06T10:36:40Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Change 726894 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/cookbooks@wmcs] toolforge: new add_grid_webgrid_generic_node recipe
Mentioned in SAL (#wikimedia-cloud) [2021-10-07T07:58:03Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-07T08:04:44Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-07T12:50:10Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-07T12:50:46Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-07T12:55:09Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-07T13:31:42Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-07T14:06:33Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-07T14:21:08Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Change 729926 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] cinderutils::ensure: Check falsey instead of empty string
Mentioned in SAL (#wikimedia-cloud) [2021-10-11T10:32:00Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Change 729990 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/software/spicerack@master] puppet.PuppetHost.get_ca_server: use only the last line
Change 729994 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] wmcs::instance: use path to allow different systemctl paths
Change 729994 merged by David Caro:
[operations/puppet@production] wmcs::instance: use path to allow different systemctl paths
Change 729926 abandoned by David Caro:
[operations/puppet@production] cinderutils::ensure: Check falsey instead of empty string
Reason:
This was not the cause, but the size limits, will add extra verbosity to the messages to avoid falling for this again.
Change 730001 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] wmcs-prepare-cinder-volume.py: chown also after mounting
Change 730001 merged by David Caro:
[operations/puppet@production] wmcs-prepare-cinder-volume.py: chown also after mounting
Mentioned in SAL (#wikimedia-cloud) [2021-10-11T15:00:35Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Change 730005 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] cinderutils::ensure: give more info when no device found
Mentioned in SAL (#wikimedia-cloud) [2021-10-11T15:24:57Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Change 730025 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] p:base: Move the reboot-host script there
Change 729990 merged by David Caro:
[operations/software/spicerack@master] puppet.PuppetHost.get_ca_server: use only the last line
Change 730025 merged by David Caro:
[operations/puppet@production] cumin::target: Move the reboot-host script there
Mentioned in SAL (#wikimedia-cloud) [2021-10-12T14:46:18Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-12T14:51:59Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-12T16:10:55Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Change 730270 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/software/spicerack@master] remote: use only the last line for the uptime
After manual reboot, the new VM was able to run puppet correctly, but now it seems to have issues with a long hostname xd
root@toolsbeta-sgewebgrid-generic-09-2:~# qconf -sel reresolve hostname failed: hostname exceeds hostname length(MAXHOSTNAMELEN) on this system root@toolsbeta-sgewebgrid-generic-09-2:~# getconf HOST_NAME_MAX 64 root@toolsbeta-sgewebgrid-generic-09-2:~# hostname -f | wc -c 67
Mentioned in SAL (#wikimedia-cloud) [2021-10-13T10:19:16Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-13T10:19:46Z] <wm-bot> Adding a new grid webgrid generic node (T292465) - cookbook ran by dcaro@vulcanus
Mentioned in SAL (#wikimedia-cloud) [2021-10-13T11:18:56Z] <wm-bot> Added a new grid webgrid generic node toolsbeta-sgewebgen-09-1.toolsbeta.eqiad1.wikimedia.cloud to the pool (T292465) - cookbook ran by dcaro@vulcanus
Change 730005 merged by Andrew Bogott:
[operations/puppet@production] cinderutils::ensure: give more info when no device found
Change 731111 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] grid_configurator: Added new naming schemes
Change 731113 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] wmcs-srpeadcheck-tools: add new shorter webgrid names
Change 731114 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/puppet@production] tools-clush-generator: add the shorter webgrid names
Change 731111 merged by David Caro:
[operations/puppet@production] grid_configurator: Added new naming schemes
Change 730270 abandoned by David Caro:
[operations/software/spicerack@master] remote: use only the last line for the uptime
Reason:
I'll unblock myself working around this shortcoming on the cookbook code instead.
Change 731911 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/cookbooks@wmcs] start_instance_with_prefix: fix next instance counter
Change 731912 had a related patch set uploaded (by David Caro; author: David Caro):
[operations/cookbooks@wmcs] start_instance_with_prefix: add tries parameter
Change 731911 abandoned by David Caro:
[operations/cookbooks@wmcs] start_instance_with_prefix: fix next instance counter
Reason:
Merged into https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/731885
Change 731912 merged by jenkins-bot:
[operations/cookbooks@wmcs] start_instance_with_prefix: add tries parameter
Change 731113 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] wmcs-srpeadcheck-tools: add new shorter webgrid names
Change 726894 merged by Arturo Borrero Gonzalez:
[operations/cookbooks@wmcs] toolforge: new add_grid_webgrid_generic_node recipe
Change 731114 abandoned by David Caro:
[operations/puppet@production] tools-clush-generator: add the shorter webgrid names
Reason:
This was redone in some other patch