Page MenuHomePhabricator

wikitech-static down
Closed, ResolvedPublic

Description

There are several alerts on IRC mentioning wikitech-static being up/down/up/down (ping does work):

05:52:17 <+icinga-wm> PROBLEM - Wikitech-static main page has content on cloudweb2001-dev is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
05:55:57 <+icinga-wm> PROBLEM - Wikitech-static main page has content on labweb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
05:57:23 <+icinga-wm> PROBLEM - Wikitech-static main page has content on labweb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
05:59:21 <+icinga-wm> RECOVERY - Wikitech-static main page has content on labweb1001 is OK: OK - Certificate wikitech-static.wikimedia.org will expire on Fri 14 Jan 2022 02:46:45 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Wikitech-static
06:02:01 <+icinga-wm> RECOVERY - Wikitech-static main page has content on labweb1002 is OK: OK - Certificate wikitech-static.wikimedia.org will expire on Fri 14 Jan 2022 02:46:45 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Wikitech-static
06:02:39 <+icinga-wm> RECOVERY - Wikitech-static main page has content on cloudweb2001-dev is OK: OK - Certificate wikitech-static.wikimedia.org will expire on Fri 14 Jan 2022 02:46:45 PM GMT +0000. https://wikitech.wikimedia.org/wiki/Wikitech-static
06:15:11 <+icinga-wm> PROBLEM - Wikitech-static main page has content on cloudweb2001-dev is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
06:16:07 <+icinga-wm> PROBLEM - Wikitech-static main page has content on labweb1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static
06:16:47 <+icinga-wm> PROBLEM - Wikitech-static main page has content on labweb1002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Wikitech-static

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2021-11-08T05:52:59Z] <rzl> rebooted wikitech-static via rackspace web UI - T295266

^ that reboot made wikitech-static to come back (ping always worked, HTTP didn't).
https://wikitech.wikimedia.org/wiki/Rackspace_Cloud -> this needs updating as those instructions didn't work.

Here's /var/log/syslog from wikitech-static:

A few suspicious excerpts, not sure how much of this is normal background noise:

Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: MWException from line 974 of /srv/mediawiki/w/includes/import/WikiImporter.php: Missing text field in import.
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #0 /srv/mediawiki/w/includes/import/WikiImporter.php(1134): WikiImporter->makeContent(Object(Title), '830', Array)
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #1 /srv/mediawiki/w/includes/import/WikiImporter.php(1111): WikiImporter->processUpload(Array, Array)
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #2 /srv/mediawiki/w/includes/import/WikiImporter.php(863): WikiImporter->handleUpload(Array)
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #3 /srv/mediawiki/w/includes/import/WikiImporter.php(678): WikiImporter->handlePage()
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #4 /srv/mediawiki/w/maintenance/importDump.php(353): WikiImporter->doImport()
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #5 /srv/mediawiki/w/maintenance/importDump.php(286): BackupReader->importFromHandle(Resource id #741)
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #6 /srv/mediawiki/w/maintenance/importDump.php(130): BackupReader->importFromFile('compress.zlib:/...')
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #7 /srv/mediawiki/w/maintenance/doMaintenance.php(112): BackupReader->execute()
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #8 /srv/mediawiki/w/maintenance/importDump.php(358): require_once('/srv/mediawiki/...')
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #9 {main}
Nov  8 04:58:53 wikitech-static cron[669]: (CRON) error (can't fork)
Nov  8 04:58:55 wikitech-static kernel: [60933412.339095] check_icinga[29461]: segfault at 1dcb0 ip 000000000001dcb0 sp 00007ffcd0dbd6c8 error 14 in python3.5[558e772b1000+3f0000]
Nov  8 05:51:12 wikitech-static kernel: [60936548.020519] systemd invoked oom-killer: gfp_mask=0x24201ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD), nodemask=0, order=0, oom_score_adj=0
[...]
Nov  8 05:51:14 wikitech-static kernel: [60936548.021054] Out of memory: Kill process 3273 (mysqld) score 124 or sacrifice child
Nov  8 05:51:14 wikitech-static kernel: [60936548.021179] Killed process 3273 (mysqld) total-vm:708564kB, anon-rss:36004kB, file-rss:0kB, shmem-rss:0kB
Nov  8 05:51:14 wikitech-static kernel: [60936548.046071] oom_reaper: reaped process 3273 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

mysqld getting oom-killed would certainly explain why we couldn't load wiki pages (!) but 05:51 is too late for that to be the full story. Whatever was causing that memory pressure is probably our root cause, though.

colewhite triaged this task as Medium priority.Nov 8 2021, 10:32 PM

I've seen that host struggle with memory issues in the past, so we may just be seeing organic growth of mediawiki resource needs. It's probably worth figuring out what it would cost to move to a bigger host; likely cheaper than spending time optimizing things.

A few suspicious excerpts, not sure how much of this is normal background noise:

Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: MWException from line 974 of /srv/mediawiki/w/includes/import/WikiImporter.php: Missing text field in import.
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #0 /srv/mediawiki/w/includes/import/WikiImporter.php(1134): WikiImporter->makeContent(Object(Title), '830', Array)
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #1 /srv/mediawiki/w/includes/import/WikiImporter.php(1111): WikiImporter->processUpload(Array, Array)
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #2 /srv/mediawiki/w/includes/import/WikiImporter.php(863): WikiImporter->handleUpload(Array)
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #3 /srv/mediawiki/w/includes/import/WikiImporter.php(678): WikiImporter->handlePage()
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #4 /srv/mediawiki/w/maintenance/importDump.php(353): WikiImporter->doImport()
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #5 /srv/mediawiki/w/maintenance/importDump.php(286): BackupReader->importFromHandle(Resource id #741)
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #6 /srv/mediawiki/w/maintenance/importDump.php(130): BackupReader->importFromFile('compress.zlib:/...')
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #7 /srv/mediawiki/w/maintenance/doMaintenance.php(112): BackupReader->execute()
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #8 /srv/mediawiki/w/maintenance/importDump.php(358): require_once('/srv/mediawiki/...')
Nov  8 04:20:50 wikitech-static import-wikitech.sh[26942]: #9 {main}

Already filed as T292348: MWException from line 974 of WikiImporter.php: Missing text field in import

I've seen that host struggle with memory issues in the past, so we may just be seeing organic growth of mediawiki resource needs. It's probably worth figuring out what it would cost to move to a bigger host; likely cheaper than spending time optimizing things.

It's only got 1GB currently. If it's a VM, I can't imagine doubling it or quadrupling it will be too expensive... And presumably it's just a change in the control panel along with a reboot (or power off and on again).

I've seen that host struggle with memory issues in the past, so we may just be seeing organic growth of mediawiki resource needs. It's probably worth figuring out what it would cost to move to a bigger host; likely cheaper than spending time optimizing things.

It's only got 1GB currently. If it's a VM, I can't imagine doubling it or quadrupling it will be too expensive... And presumably it's just a change in the control panel along with a reboot (or power off and on again).

I hadn't thought of that -- it might indeed be possible to dynamically resize the VM. In either case we'd need to figure out about how to get approval for the higher monthly charge though.

Andrew mentioned this in Unknown Object (Task).Dec 20 2021, 10:17 PM

i created a tentative (and private) procurement ticket about this issue, here: T298052

LSobanski claimed this task.
LSobanski subscribed.

The related action item has been resolved so I'll resolve this one as well. Please reopen if you think otherwise.