Page MenuHomePhabricator

Codesearch down/unreachable (2025-12-03)
Closed, ResolvedPublic

Event Timeline

Well, I can't even ssh into the host to check what's going on 😢

Mentioned in SAL (#wikimedia-operations) [2025-12-03T23:08:20Z] <Amir1> hard rebooting codesearch9.codesearch.eqiad1.wikimedia.cloud (T411728)

please leave it for this moment. This is good timing because I wanted to try and extend the disk anyways and basically announce downtime.. then it was already down.

ah okay, I leave it now. FWIW it's inode:

ladsgroup@codesearch9:~$ df -i | grep -i srv
/dev/sdb       5242880 5242879       1  100% /srv

Yes, this is still T411047 and follow-up after we got more quota. (linked from there)

shutting instance down to attempt resizing volume .. in progress.

Mentioned in SAL (#wikimedia-cloud) [2025-12-03T23:22:17Z] <mutante> - shut down instance codesearch9; extending volume "data2" to double its size T411728 T411047

Dzahn claimed this task.

successfully resized /dev/sda to double its size (80 -> 160GB) in Horizon (possible after we got the project quota)

remounted volume and ran resize2fs

confirmed there are plenty of inodes again

https://codesearch.wmcloud.org/_health/ is coming up

  • shutdown -h now
  • click "resize volume" in web UI
  • start instance
  • volume gets mounted automatically
  • resize2fs /dev/sda
  • mount -o remount /dev/sda

https://wikitech.wikimedia.org/wiki/Help:Adding_disk_space_to_Cloud_VPS_instances#Extend_a_volume

Mentioned in SAL (#wikimedia-cloud) [2025-12-05T18:35:18Z] <mutante> re-enabled puppet on codesearch9 - service up; has double disk space vs before incident - T411047 T411728