Page MenuHomePhabricator

Investigate recent (2023) downtime of WS Export
Closed, ResolvedPublicBUG REPORT

Description

The tool has had more downtime recently than is usual:

Screenshot 2023-04-28 at 08-16-54 WS Export - Community Tech tools.png (584×1 px, 58 KB)

Most recently it was down for 18 hours and 38 minutes from 2023-04-27 05:23:01.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

This is still happening. I've not had much of a chance to look into it, but I does seem like it's the PDF conversion process that's routinely using 100% CPU. I'm not sure what the best fix is for this (a better conversion queue would be ideal, with the web server not being responsible for doing all the converting!) but a quick thing might be to just limit the number of pages allowed in a PDF conversion. (The Epub is always available for large books, and people can convert them locally with Calibre or whatever they like.)

There's also a bunch of failures related to trying to show the current version number in the footer, which can be fixed via T334454 (I've a patch created for that now).

Aklapper renamed this task from Investigate recent downtime (April 2023) to Investigate recent downtime (April 2023) of WS Export.Jul 14 2023, 9:37 AM
Samwilson renamed this task from Investigate recent downtime (April 2023) of WS Export to Investigate recent (2023) downtime of WS Export.Jul 14 2023, 9:37 AM

Hello, @Samwilson -- I'm having this issue (T345025) when trying to export any file type from Wikisource, including epub files.

I'm going to work on T345406 to try to fix this issue.

As of today, stats show 80% uptime over the last 90 days (aka. 20% downtime). Since December 6th, cumulative uptime is something like ~20%, with whole days of 0% uptime in the mix.

Stats are well and good, but is there any alerting when it falls down so someone with a pager can go kick it?

Alternately, at this level of (lack of) availability it needs to be actually disabled ("Download" button removed from the UI on the WIkisourcen).

I've been wondering if we should just turn off PDF generation, as am emergency measure. That'd at least keep it online for Epub generation, and if anyone wants a PDF (or any other format) they can use Calibre to convert it. I suspect people would be fairly disappointed, but maybe there's no other option for the time being.

As for being alerted, I use the emails from uptime robot to some extent, but I get so many from so many different tools that I've become rather blind to them.

I've been looking at ws-export a bit lately, and have been hoping to a) upgrade to the latest symfony, for using some of the latest stuff with the job-queue, and b) upgrade the VPS (which I've started, and the test site is now done).

A config like this would do it, I think:

# The PDF formats are `pdf-a4`, `pdf-a5`, `pdf-a6`, and `pdf-letter`.
RewriteCond %{QUERY_STRING} format=pdf [NC]
RewriteRule .* $1 [L,R=451]
ErrorDocument 451 "Ebooks in formats other than EPUB are not available at the moment. See <a href='https://phabricator.wikimedia.org/T335553'>T335553</a> for details."

(I don't suppose it's an appropriate use of HTTP 451 Unavailable For Legal Reasons, but you get the idea.)

As of right now ws-export has been down for 32 hours straight. Over the last 30 days recorded uptime is 60%. That's a 40% downtime.

I think it's past time to disable PDF exports.

Okay, I've disabled PDFs for now. Let's give it a few days, and see if it does improve uptime.

You may also consider disabling HTMLZ. I noticed a massive download from ws.en and ws.fr in the past two months. I don’t think this format is very useful for individuals. Adding --disable-font-rescaling could help with pdfs.

I've merged and deployed @Tpt 's semaphore patch, and removed the block on PDFs, so ebook convert calls should now be limited to only four at once.

Good point about HTMLZ, @Denis_Gagne52, I was also wondering why that format was now so popular. Do you know what might be causing it? Conversion to HTMLZ will also be throttled now.

The instance is not responding by SSH and prometheus-alerts says it's been down for a day. I dumped its console log, and it shows some failures during boot.

Begin: Running /scripts/local-bottom ... GROWROOT: /sbin/growpart: 824: /sbin/growpart: grep: not found
/sbin/growpart: 853: /sbin/growpart: sed: not found
WARN: unknown label 
/sbin/growpart: 354: /sbin/growpart: sed: not found
FAILED: sed failed on dump output
/sbin/growpart: 83: /sbin/growpart: rm: not found
done.
FAILED Failed to listen on SSSD NSS Service responder socket.
FAILED Failed to listen on Service responder private socket.
FAILED Failed to listen on SSD Sudo Service responder socket.
FAILED Failed to listen on SSSD SSH Service responder socket.

Then later

[422452.260395] Out of memory: Killed process 1445204 (apache2) total-vm:2053832kB, anon-rss:878592kB, file-rss:4kB, shmem-rss:15864kB, UID:33 pgtables:2024kB oom_score_adj:0

One thing I noticed is that Google is crawling the site, despite the efforts to stop it from doing so.

I confirmed using Google Search Console that we're blocking access to robots.txt, but the actual crawl with UA "GoogleOther" is not blocked. I think we should allow access to robots.txt.

Data from grafana and the syslog indicate that the downtime was primarily due to an out-of-memory condition. It was in swapdeath, or whatever you call the panic swapping that Linux does these days. Look at the timestamps on these log messages:

2024-02-21T01:06:40.793764+00:00 wsexport-prod02 kernel: [505035.522390] Tasks state (memory values in pages):
2024-02-21T01:06:43.444104+00:00 wsexport-prod02 kernel: [505035.522391] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
2024-02-21T01:06:44.820534+00:00 wsexport-prod02 kernel: [505035.522397] [    269]     0   269     5983      334    69632        0         -1000 systemd-udevd
2024-02-21T01:06:46.505921+00:00 wsexport-prod02 kernel: [505035.522400] [    417]   998   417     4500      299    77824        0             0 systemd-network
2024-02-21T01:06:48.730643+00:00 wsexport-prod02 kernel: [505035.522402] [    495]   997   495    22514      235    77824        0             0 systemd-timesyn
2024-02-21T01:06:50.617373+00:00 wsexport-prod02 kernel: [505035.522404] [    501]   101   501     7530      530    69632        0          -900 dbus-daemon
2024-02-21T01:06:52.081834+00:00 wsexport-prod02 kernel: [505035.522406] [    504]   107   504     4958      293    77824        0             0 lldpd
2024-02-21T01:06:53.551169+00:00 wsexport-prod02 kernel: [505035.522408] [    505]   106   505   439663     6913   442368        0             0 prometheus-node
2024-02-21T01:06:55.545466+00:00 wsexport-prod02 kernel: [505035.522410] [    508]     0   508   111125     2731   131072        0             0 rsyslogd
2024-02-21T01:06:57.412726+00:00 wsexport-prod02 kernel: [505035.522412] [    515]     0   515     8515      555   106496        0             0 sssd
2024-02-21T01:06:58.599984+00:00 wsexport-prod02 kernel: [505035.522413] [    542]   107   542     4958      292    65536        0             0 lldpd
2024-02-21T01:07:00.592946+00:00 wsexport-prod02 kernel: [505035.522415] [    548]     0   548     3883      329    69632        0         -1000 sshd
2024-02-21T01:07:02.299532+00:00 wsexport-prod02 kernel: [505035.522417] [    551]   995   551    79902      847   114688        0             0 polkitd
2024-02-21T01:07:03.644227+00:00 wsexport-prod02 kernel: [505035.522419] [    559]     0   559    56183     3253   184320        0             0 apache2
2024-02-21T01:07:04.934097+00:00 wsexport-prod02 kernel: [505035.522420] [    590]     0   590     9500     1210   110592        0             0 sssd_pam
2024-02-21T01:07:05.945739+00:00 wsexport-prod02 kernel: [505035.522422] [    591]     0   591     8293      441    98304        0             0 sssd_ssh
2024-02-21T01:07:07.375760+00:00 wsexport-prod02 kernel: [505035.522424] [    592]     0   592     8675      840   102400        0             0 sssd_sudo
2024-02-21T01:07:08.674805+00:00 wsexport-prod02 kernel: [505035.522425] [    595]     0   595     6475      274    77824        0             0 systemd-logind
2024-02-21T01:07:10.785148+00:00 wsexport-prod02 kernel: [505035.522427] [    602]     0   602      723       22    40960        0             0 agetty
2024-02-21T01:07:11.974763+00:00 wsexport-prod02 kernel: [505035.522429] [    603]     0   603     4458      147    57344        0             0 login
2024-02-21T01:07:13.829025+00:00 wsexport-prod02 kernel: [505035.522431] [    609]     0   609     7257     2076    94208        0             0 unattended-upgr
2024-02-21T01:07:15.336010+00:00 wsexport-prod02 kernel: [505035.522432] [    658]     0   658     4683      379    77824        0           100 systemd
2024-02-21T01:07:16.637250+00:00 wsexport-prod02 kernel: [505035.522434] [    674]     0   674     1414      391    57344        0             0 bash
2024-02-21T01:07:18.129040+00:00 wsexport-prod02 kernel: [505035.522436] [1495723]     0 1495723    10359      288   102400        0          -250 systemd-journal
2024-02-21T01:07:19.942141+00:00 wsexport-prod02 kernel: [505035.522438] [1496337]     0 1496337   344422     3402   204800        0             0 prometheus-rsys
2024-02-21T01:07:21.789299+00:00 wsexport-prod02 kernel: [505035.522442] [1497644]    33 1497644   261636    52877   675840        0             0 apache2
2024-02-21T01:07:22.951738+00:00 wsexport-prod02 kernel: [505035.522444] [1498124]    33 1498124   244740    52335   667648        0             0 apache2
2024-02-21T01:07:24.091859+00:00 wsexport-prod02 kernel: [505035.522447] [1498292]    33 1498292   244741    52630   667648        0             0 apache2
2024-02-21T01:07:25.792442+00:00 wsexport-prod02 kernel: [505035.522449] [1499087]    33 1499087   228356    52638   663552        0             0 apache2
2024-02-21T01:07:27.611843+00:00 wsexport-prod02 kernel: [505035.522451] [1499319]    33 1499319   260921    52731   671744        0             0 apache2
2024-02-21T01:07:28.860919+00:00 wsexport-prod02 kernel: [505035.522452] [1499630]    33 1499630   243728    51907   659456        0             0 apache2
2024-02-21T01:07:30.861453+00:00 wsexport-prod02 kernel: [505035.522456] [1499782]    33 1499782   326148    52656   688128        0             0 apache2
2024-02-21T01:07:32.102094+00:00 wsexport-prod02 kernel: [505035.522460] [1499941]    33 1499941   228868    53139   667648        0             0 apache2
2024-02-21T01:07:33.715219+00:00 wsexport-prod02 kernel: [505035.522462] [1500441]    33 1500441   277509    52640   684032        0             0 apache2
2024-02-21T01:07:35.785600+00:00 wsexport-prod02 kernel: [505035.522466] [1500829]    33 1500829   228356    52627   663552        0             0 apache2
2024-02-21T01:07:38.237898+00:00 wsexport-prod02 kernel: [505035.522470] [1501168]    33 1501168   261113    52635   675840        0             0 apache2
2024-02-21T01:07:39.959260+00:00 wsexport-prod02 kernel: [505035.522472] [1501322]    33 1501322   228869    53141   667648        0             0 apache2
2024-02-21T01:07:41.712915+00:00 wsexport-prod02 kernel: [505035.522474] [1501563]    33 1501563   227848    52313   659456        0             0 apache2
2024-02-21T01:07:42.960290+00:00 wsexport-prod02 kernel: [505035.522476] [1501740]    33 1501740   228356    52629   663552        0             0 apache2
2024-02-21T01:07:44.530280+00:00 wsexport-prod02 kernel: [505035.522480] [1502218]    33 1502218   228868    53138   667648        0             0 apache2
2024-02-21T01:07:46.870584+00:00 wsexport-prod02 kernel: [505035.522481] [1502477]    33 1502477   276997    52656   679936        0             0 apache2
2024-02-21T01:07:48.549924+00:00 wsexport-prod02 kernel: [505035.522484] [1502712]    33 1502712   261124    52631   675840        0             0 apache2
2024-02-21T01:07:50.528512+00:00 wsexport-prod02 kernel: [505035.522486] [1502858]    33 1502858   244740    52640   667648        0             0 apache2
2024-02-21T01:07:51.778460+00:00 wsexport-prod02 kernel: [505035.522488] [1503026]    33 1503026   261124    52655   692224        0             0 apache2
2024-02-21T01:07:53.812760+00:00 wsexport-prod02 kernel: [505035.522490] [1503518]    33 1503518   228868    53146   667648        0             0 apache2
2024-02-21T01:07:55.522776+00:00 wsexport-prod02 kernel: [505035.522492] [1503858]    33 1503858   260612    52129   671744        0             0 apache2
2024-02-21T01:07:57.726343+00:00 wsexport-prod02 kernel: [505035.522493] [1504003]    33 1504003   278021    53156   684032        0             0 apache2
2024-02-21T01:07:59.849291+00:00 wsexport-prod02 kernel: [505035.522495] [1504146]    33 1504146   310777    53357   692224        0             0 apache2
2024-02-21T01:08:02.065346+00:00 wsexport-prod02 kernel: [505035.522498] [1504313]    33 1504313   228869    52879   667648        0             0 apache2
2024-02-21T01:08:03.865581+00:00 wsexport-prod02 kernel: [505035.522500] [1504770]    33 1504770   244740    52339   667648        0             0 apache2
2024-02-21T01:08:05.924014+00:00 wsexport-prod02 kernel: [505035.522502] [1505206]    33 1505206   227653    51635   663552        0             0 apache2
2024-02-21T01:08:07.831493+00:00 wsexport-prod02 kernel: [505035.522503] [1505540]    33 1505540   211972    52321   659456        0             0 apache2
2024-02-21T01:08:09.636051+00:00 wsexport-prod02 kernel: [505035.522505] [1505684]    33 1505684   228868    52845   667648        0             0 apache2
2024-02-21T01:08:11.428323+00:00 wsexport-prod02 kernel: [505035.522506] [1505838]    33 1505838   228868    52844   667648        0             0 apache2
2024-02-21T01:08:13.301684+00:00 wsexport-prod02 kernel: [505035.522511] [1506529]    33 1506529   228356    52330   663552        0             0 apache2
2024-02-21T01:08:15.692589+00:00 wsexport-prod02 kernel: [505035.522513] [1506671]    33 1506671   228165    52447   663552        0             0 apache2
2024-02-21T01:08:17.408311+00:00 wsexport-prod02 kernel: [505035.522514] [1506809]    33 1506809   211781    52009   659456        0             0 apache2
2024-02-21T01:08:19.747583+00:00 wsexport-prod02 kernel: [505035.522516] [1507140]    33 1507140   228356    52506   663552        0             0 apache2
2024-02-21T01:08:22.550883+00:00 wsexport-prod02 kernel: [505035.522520] [1507285]    33 1507285   260926    52340   671744        0             0 apache2
2024-02-21T01:08:24.848754+00:00 wsexport-prod02 kernel: [505035.522522] [1507582]    33 1507582   244740    52208   667648        0             0 apache2
2024-02-21T01:08:27.052923+00:00 wsexport-prod02 kernel: [505035.522524] [1508267]    33 1508267   228869    52716   667648        0             0 apache2
2024-02-21T01:08:28.836714+00:00 wsexport-prod02 kernel: [505035.522525] [1508612]    33 1508612   228356    52196   667648        0             0 apache2
2024-02-21T01:08:31.604261+00:00 wsexport-prod02 kernel: [505035.522527] [1509049]    33 1509049   261124    52202   671744        0             0 apache2
2024-02-21T01:08:33.079865+00:00 wsexport-prod02 kernel: [505035.522530] [1509259]    33 1509259   103570    31936   442368        0             0 apache2
2024-02-21T01:08:36.015444+00:00 wsexport-prod02 kernel: [505035.522532] [1509691]    33 1509691    81135     9555   286720        0             0 apache2
2024-02-21T01:08:38.416952+00:00 wsexport-prod02 kernel: [505035.522533] [1511413]    33 1511413    77502     5688   229376        0             0 apache2
2024-02-21T01:08:40.143473+00:00 wsexport-prod02 kernel: [505035.522535] [1511840]    33 1511840    77500     5071   217088        0             0 apache2
2024-02-21T01:08:42.577363+00:00 wsexport-prod02 kernel: [505035.522537] [1513752]    33 1513752    77500     5684   229376        0             0 apache2
2024-02-21T01:08:44.090978+00:00 wsexport-prod02 kernel: [505035.522539] [1513960]    33 1513960    77500     5282   225280        0             0 apache2
2024-02-21T01:08:45.608731+00:00 wsexport-prod02 kernel: [505035.522544] [1514176]    33 1514176    77500     5113   225280        0             0 apache2
2024-02-21T01:08:47.183089+00:00 wsexport-prod02 kernel: [505035.522546] [1514395]    33 1514395    59063     4969   204800        0             0 apache2
2024-02-21T01:08:49.801012+00:00 wsexport-prod02 kernel: [505035.522547] [1514602]    33 1514602    59063     4405   204800        0             0 apache2
2024-02-21T01:08:51.990212+00:00 wsexport-prod02 kernel: [505035.522549] [1514819]    33 1514819    59063     4406   204800        0             0 apache2
2024-02-21T01:08:54.091577+00:00 wsexport-prod02 kernel: [505035.522550] [1515595]    33 1515595    59063     4406   204800        0             0 apache2
2024-02-21T01:08:56.232447+00:00 wsexport-prod02 kernel: [505035.522552] [1515814]    33 1515814    59063     4406   204800        0             0 apache2
2024-02-21T01:08:58.869767+00:00 wsexport-prod02 kernel: [505035.522556] [1516027]    33 1516027    59063     4406   204800        0             0 apache2
2024-02-21T01:09:00.229466+00:00 wsexport-prod02 kernel: [505035.522557] [1516244]    33 1516244    59063     4405   204800        0             0 apache2
2024-02-21T01:09:02.071304+00:00 wsexport-prod02 kernel: [505035.522559] [1516454]    33 1516454    59063     4405   204800        0             0 apache2
2024-02-21T01:09:04.476558+00:00 wsexport-prod02 kernel: [505035.522561] [1516682]    33 1516682    59063     4405   204800        0             0 apache2
2024-02-21T01:09:06.408662+00:00 wsexport-prod02 kernel: [505035.522563] [1516888]    33 1516888    59065     4407   204800        0             0 apache2
2024-02-21T01:09:08.114316+00:00 wsexport-prod02 kernel: [505035.522564] [1517103]    33 1517103    59063     4406   204800        0             0 apache2
2024-02-21T01:09:09.826114+00:00 wsexport-prod02 kernel: [505035.522568] [1517321]    33 1517321    59109     5802   212992        0             0 apache2
2024-02-21T01:09:11.859289+00:00 wsexport-prod02 kernel: [505035.522570] [1517533]    33 1517533    59065     4406   204800        0             0 apache2
2024-02-21T01:09:13.533954+00:00 wsexport-prod02 kernel: [505035.522571] [1517943]    33 1517943    59063     4405   204800        0             0 apache2
2024-02-21T01:09:14.897073+00:00 wsexport-prod02 kernel: [505035.522573] [1518159]    33 1518159    59063     4406   204800        0             0 apache2
2024-02-21T01:09:17.185731+00:00 wsexport-prod02 kernel: [505035.522575] [1518368]    33 1518368    59065     4406   204800        0             0 apache2
2024-02-21T01:09:19.315331+00:00 wsexport-prod02 kernel: [505035.522576] [1518594]    33 1518594    59063     4406   204800        0             0 apache2
2024-02-21T01:09:22.117960+00:00 wsexport-prod02 kernel: [505035.522578] [1518810]    33 1518810    59063     4405   204800        0             0 apache2
2024-02-21T01:09:23.819382+00:00 wsexport-prod02 kernel: [505035.522580] [1519018]    33 1519018    59065     4406   204800        0             0 apache2
2024-02-21T01:09:26.101424+00:00 wsexport-prod02 kernel: [505035.522581] [1519235]    33 1519235    59065     4409   204800        0             0 apache2
2024-02-21T01:09:27.956194+00:00 wsexport-prod02 kernel: [505035.522583] [1519855]    33 1519855    59065     4408   204800        0             0 apache2
2024-02-21T01:09:29.843908+00:00 wsexport-prod02 kernel: [505035.522585] [1520223]    33 1520223    59063     4405   204800        0             0 apache2
2024-02-21T01:09:31.768784+00:00 wsexport-prod02 kernel: [505035.522586] [1520546]    33 1520546    59065     4407   204800        0             0 apache2
2024-02-21T01:09:33.322841+00:00 wsexport-prod02 kernel: [505035.522588] [1520762]    33 1520762    59065     4409   204800        0             0 apache2
2024-02-21T01:09:36.085902+00:00 wsexport-prod02 kernel: [505035.522592] [1520969]    33 1520969    59063     4405   204800        0             0 apache2
2024-02-21T01:09:37.990519+00:00 wsexport-prod02 kernel: [505035.522594] [1521188]    33 1521188    59063     4405   204800        0             0 apache2
2024-02-21T01:09:40.187977+00:00 wsexport-prod02 kernel: [505035.522595] [1521404]    33 1521404    59063     4405   204800        0             0 apache2
2024-02-21T01:09:42.987700+00:00 wsexport-prod02 kernel: [505035.522597] [1521615]    33 1521615    59063     4405   204800        0             0 apache2
2024-02-21T01:09:44.670900+00:00 wsexport-prod02 kernel: [505035.522598] [1521832]    33 1521832    59063     4406   204800        0             0 apache2
2024-02-21T01:09:46.369386+00:00 wsexport-prod02 kernel: [505035.522600] [1522047]    33 1522047    59063     4406   204800        0             0 apache2
2024-02-21T01:09:48.005094+00:00 wsexport-prod02 kernel: [505035.522602] [1522459]    33 1522459    77736     5959   229376        0             0 apache2
2024-02-21T01:09:50.008466+00:00 wsexport-prod02 kernel: [505035.522604] [1522671]    33 1522671    59063     4405   204800        0             0 apache2
2024-02-21T01:09:52.129037+00:00 wsexport-prod02 kernel: [505035.522605] [1522888]    33 1522888    59063     4406   204800        0             0 apache2
2024-02-21T01:09:54.278183+00:00 wsexport-prod02 kernel: [505035.522607] [1523102]    33 1523102    59065     4406   204800        0             0 apache2
2024-02-21T01:09:56.013956+00:00 wsexport-prod02 kernel: [505035.522608] [1523313]    33 1523313    59063     4405   204800        0             0 apache2
2024-02-21T01:09:57.534805+00:00 wsexport-prod02 kernel: [505035.522610] [1523530]    33 1523530    59063     4405   204800        0             0 apache2
2024-02-21T01:10:00.450857+00:00 wsexport-prod02 kernel: [505035.522611] [1523746]    33 1523746    59063     4406   204800        0             0 apache2
2024-02-21T01:10:02.324503+00:00 wsexport-prod02 kernel: [505035.522613] [1523777]    33 1523777    59063     4405   204800        0             0 apache2
2024-02-21T01:10:02.979753+00:00 wsexport-prod02 kernel: [505035.522615] [1524563]    33 1524563    59065     4406   204800        0             0 apache2
2024-02-21T01:10:04.798130+00:00 wsexport-prod02 kernel: [505035.522619] [1524770]    33 1524770    59063     4406   204800        0             0 apache2
2024-02-21T01:10:07.131847+00:00 wsexport-prod02 kernel: [505035.522621] [1524988]    33 1524988    59063     4406   204800        0             0 apache2
2024-02-21T01:10:08.625830+00:00 wsexport-prod02 kernel: [505035.522623] [1525204]    33 1525204    59063     4406   204800        0             0 apache2
2024-02-21T01:10:10.775688+00:00 wsexport-prod02 kernel: [505035.522624] [1525414]    33 1525414    77767     6475   229376        0             0 apache2
2024-02-21T01:10:12.406934+00:00 wsexport-prod02 kernel: [505035.522626] [1525630]    33 1525630    59063     4405   204800        0             0 apache2
2024-02-21T01:10:14.348056+00:00 wsexport-prod02 kernel: [505035.522627] [1525851]    33 1525851    59065     4407   204800        0             0 apache2
2024-02-21T01:10:16.432095+00:00 wsexport-prod02 kernel: [505035.522629] [1526059]    33 1526059    59065     4407   204800        0             0 apache2
2024-02-21T01:10:18.038547+00:00 wsexport-prod02 kernel: [505035.522630] [1526277]    33 1526277    59063     4406   204800        0             0 apache2
2024-02-21T01:10:19.392591+00:00 wsexport-prod02 kernel: [505035.522632] [1526491]    33 1526491    59065     4407   204800        0             0 apache2
2024-02-21T01:10:22.004137+00:00 wsexport-prod02 kernel: [505035.522634] [1526901]    33 1526901    59063     4405   204800        0             0 apache2
2024-02-21T01:10:23.565845+00:00 wsexport-prod02 kernel: [505035.522635] [1527111]    33 1527111    59063     4406   204800        0             0 apache2
2024-02-21T01:10:25.422057+00:00 wsexport-prod02 kernel: [505035.522637] [1527325]    33 1527325    77736     5967   229376        0             0 apache2
2024-02-21T01:10:27.445012+00:00 wsexport-prod02 kernel: [505035.522638] [1527543]    33 1527543    59063     4405   204800        0             0 apache2
2024-02-21T01:10:28.541553+00:00 wsexport-prod02 kernel: [505035.522640] [1527752]    33 1527752    59063     4406   204800        0             0 apache2
2024-02-21T01:10:31.289039+00:00 wsexport-prod02 kernel: [505035.522641] [1527969]    33 1527969    59063     4406   204800        0             0 apache2
2024-02-21T01:10:33.246344+00:00 wsexport-prod02 kernel: [505035.522642] [1528182]    33 1528182    59063     4405   204800        0             0 apache2
2024-02-21T01:10:34.746986+00:00 wsexport-prod02 kernel: [505035.522644] [1528700]    33 1528700    59063     4406   204800        0             0 apache2
2024-02-21T01:10:36.511488+00:00 wsexport-prod02 kernel: [505035.522645] [1529168]    33 1529168    59063     4405   204800        0             0 apache2
2024-02-21T01:10:38.370518+00:00 wsexport-prod02 kernel: [505035.522647] [1529383]    33 1529383    78926     8367   262144        0             0 apache2
2024-02-21T01:10:40.165795+00:00 wsexport-prod02 kernel: [505035.522648] [1529604]    33 1529604    59063     4405   204800        0             0 apache2
2024-02-21T01:10:42.092809+00:00 wsexport-prod02 kernel: [505035.522650] [1529821]    33 1529821    59063     4406   204800        0             0 apache2
2024-02-21T01:10:43.377917+00:00 wsexport-prod02 kernel: [505035.522652] [1530047]    33 1530047   136102     9989   303104        0             0 apache2
2024-02-21T01:10:45.575691+00:00 wsexport-prod02 kernel: [505035.522654] [1530267]    33 1530267    59063     4405   204800        0             0 apache2
2024-02-21T01:10:47.258033+00:00 wsexport-prod02 kernel: [505035.522662] [1530482]    33 1530482    59063     4406   204800        0             0 apache2
2024-02-21T01:10:50.412334+00:00 wsexport-prod02 kernel: [505035.522664] [1530698]    33 1530698    59063     4406   204800        0             0 apache2
2024-02-21T01:10:51.785146+00:00 wsexport-prod02 kernel: [505035.522665] [1530909]    33 1530909    59063     4405   204800        0             0 apache2
2024-02-21T01:10:55.205901+00:00 wsexport-prod02 kernel: [505035.522667] [1531318]    33 1531318    59065     4406   204800        0             0 apache2
2024-02-21T01:10:56.926600+00:00 wsexport-prod02 kernel: [505035.522669] [1531534]    33 1531534    59063     4405   204800        0             0 apache2
2024-02-21T01:10:59.039554+00:00 wsexport-prod02 kernel: [505035.522670] [1531742]    33 1531742    78887     8040   262144        0             0 apache2
2024-02-21T01:11:01.252741+00:00 wsexport-prod02 kernel: [505035.522672] [1531962]    33 1531962    78887     7582   262144        0             0 apache2
2024-02-21T01:11:03.045591+00:00 wsexport-prod02 kernel: [505035.522673] [1532192]    33 1532192    59063     4405   204800        0             0 apache2
2024-02-21T01:11:05.074182+00:00 wsexport-prod02 kernel: [505035.522675] [1532399]    33 1532399    59063     4406   204800        0             0 apache2
2024-02-21T01:11:07.280543+00:00 wsexport-prod02 kernel: [505035.522678] [1532623]    33 1532623    78887     7582   262144        0             0 apache2
2024-02-21T01:11:09.655732+00:00 wsexport-prod02 kernel: [505035.522680] [1532903]    33 1532903    59065     4406   204800        0             0 apache2
2024-02-21T01:11:11.160238+00:00 wsexport-prod02 kernel: [505035.522681] [1533672]    33 1533672    59063     4406   204800        0             0 apache2
2024-02-21T01:11:12.823341+00:00 wsexport-prod02 kernel: [505035.522683] [1533893]    33 1533893    77816     6220   221184        0             0 apache2
2024-02-21T01:11:14.342816+00:00 wsexport-prod02 kernel: [505035.522685] [1534105]    33 1534105    59063     4406   204800        0             0 apache2
2024-02-21T01:11:15.718870+00:00 wsexport-prod02 kernel: [505035.522686] [1534322]    33 1534322    77736     6433   229376        0             0 apache2
2024-02-21T01:11:17.096069+00:00 wsexport-prod02 kernel: [505035.522688] [1534540]    33 1534540    59065     4406   204800        0             0 apache2
2024-02-21T01:11:18.239012+00:00 wsexport-prod02 kernel: [505035.522689] [1534748]    33 1534748    59063     4406   204800        0             0 apache2
2024-02-21T01:11:20.296357+00:00 wsexport-prod02 kernel: [505035.522691] [1534964]    33 1534964    59063     4406   204800        0             0 apache2
2024-02-21T01:11:22.532454+00:00 wsexport-prod02 kernel: [505035.522693] [1535180]    33 1535180    59063     4405   204800        0             0 apache2
2024-02-21T01:11:24.570047+00:00 wsexport-prod02 kernel: [505035.522694] [1535389]    33 1535389    59063     4406   204800        0             0 apache2
2024-02-21T01:11:26.118457+00:00 wsexport-prod02 kernel: [505035.522696] [1535797]    33 1535797    78887     8040   262144        0             0 apache2
2024-02-21T01:11:28.430256+00:00 wsexport-prod02 kernel: [505035.522697] [1536020]    33 1536020    59063     4406   204800        0             0 apache2
2024-02-21T01:11:30.830034+00:00 wsexport-prod02 kernel: [505035.522699] [1536233]    33 1536233    59063     4406   204800        0             0 apache2
2024-02-21T01:11:33.232261+00:00 wsexport-prod02 kernel: [505035.522700] [1536449]    33 1536449    59063     4405   204800        0             0 apache2
2024-02-21T01:11:35.181393+00:00 wsexport-prod02 kernel: [505035.522702] [1536663]    33 1536663    59063     4406   204800        0             0 apache2
2024-02-21T01:11:37.667488+00:00 wsexport-prod02 kernel: [505035.522704] [1536871]    33 1536871    59065     4406   204800        0             0 apache2
2024-02-21T01:11:39.336359+00:00 wsexport-prod02 kernel: [505035.522705] [1537096]    33 1537096    59063     4405   204800        0             0 apache2
2024-02-21T01:11:40.896462+00:00 wsexport-prod02 kernel: [505035.522707] [1537312]    33 1537312    78588     7253   266240        0             0 apache2
2024-02-21T01:11:42.633230+00:00 wsexport-prod02 kernel: [505035.522708] [1538148]    33 1538148    59065     4409   204800        0             0 apache2
2024-02-21T01:11:44.499293+00:00 wsexport-prod02 kernel: [505035.522710] [1538373]    33 1538373    59063     4405   204800        0             0 apache2
2024-02-21T01:11:46.039475+00:00 wsexport-prod02 kernel: [505035.522711] [1541268]   105 1541268     9266     2057   102400        0             0 exim4
2024-02-21T01:11:48.255432+00:00 wsexport-prod02 kernel: [505035.522714] [1549391]     0 1549391      644       23    40960        0             0 sessionclean
2024-02-21T01:11:50.378310+00:00 wsexport-prod02 kernel: [505035.522715] [1549398]     0 1549398      644       33    40960        0             0 sessionclean
2024-02-21T01:11:53.634047+00:00 wsexport-prod02 kernel: [505035.522718] [1549400]     0 1549400     3849       25    45056        0             0 sort
2024-02-21T01:11:56.071626+00:00 wsexport-prod02 kernel: [505035.522722] [1549401]     0 1549401     3849       26    49152        0             0 sort
2024-02-21T01:11:58.387673+00:00 wsexport-prod02 kernel: [505035.522724] [1549402]     0 1549402      644       29    40960        0             0 sessionclean
2024-02-21T01:12:01.528141+00:00 wsexport-prod02 kernel: [505035.522725] [1549528]     0 1549528     1086       55    45056        0             0 puppet-run
2024-02-21T01:12:03.522898+00:00 wsexport-prod02 kernel: [505035.522727] [1549529]     0 1549529    19470     2717   184320        0             0 puppet
2024-02-21T01:12:05.923978+00:00 wsexport-prod02 kernel: [505035.522729] [1549567]    33 1549567    59063     4855   204800        0             0 apache2
2024-02-21T01:12:08.023849+00:00 wsexport-prod02 kernel: [505035.522730] [1549742]    33 1549742    59067     5076   212992        0             0 apache2
2024-02-21T01:12:09.683414+00:00 wsexport-prod02 kernel: [505035.522732] [1549744]    33 1549744    59067     4979   212992        0             0 apache2
2024-02-21T01:12:12.501534+00:00 wsexport-prod02 kernel: [505035.522734] [1549776]    33 1549776    59063     4855   204800        0             0 apache2
2024-02-21T01:12:15.619380+00:00 wsexport-prod02 kernel: [505035.522735] [1549783]    33 1549783    59063     4855   204800        0             0 apache2
2024-02-21T01:12:16.971651+00:00 wsexport-prod02 kernel: [505035.522737] [1549784]    33 1549784    59063     4855   204800        0             0 apache2
2024-02-21T01:12:18.884153+00:00 wsexport-prod02 kernel: [505035.522739] [1549823]    33 1549823    59067     5077   212992        0             0 apache2
2024-02-21T01:12:22.812117+00:00 wsexport-prod02 kernel: [505035.522741] [1549862]    33 1549862    59067     4316   208896        0             0 apache2
2024-02-21T01:12:24.322130+00:00 wsexport-prod02 kernel: [505035.522742] [1549883]    33 1549883    59067     4859   204800        0             0 apache2
2024-02-21T01:12:26.757584+00:00 wsexport-prod02 kernel: [505035.522744] [1549896]    33 1549896    59069     4323   208896        0             0 apache2
2024-02-21T01:12:30.755541+00:00 wsexport-prod02 kernel: [505035.522746] [1549918]    33 1549918    59067     4356   208896        0             0 apache2
2024-02-21T01:12:32.256911+00:00 wsexport-prod02 kernel: [505035.522748] [1549928]     0 1549928     5493      349    81920        0             0 sssd_be
2024-02-21T01:12:33.687008+00:00 wsexport-prod02 kernel: [505035.522749] [1549949]    33 1549949    59069     4260   204800        0             0 apache2
2024-02-21T01:12:35.269794+00:00 wsexport-prod02 kernel: [505035.522751] [1549970]    33 1549970    59067     3928   200704        0             0 apache2
2024-02-21T01:12:36.625806+00:00 wsexport-prod02 kernel: [505035.522752] [1549987]     0 1549987     1193       38    45056        0             0 find
2024-02-21T01:12:39.154564+00:00 wsexport-prod02 kernel: [505035.522754] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-0.slice/user@0.service/init.scope,task=systemd,pid=658,uid=0
2024-02-21T01:12:41.503955+00:00 wsexport-prod02 kernel: [505035.522791] Out of memory: Killed process 658 (systemd) total-vm:18732kB, anon-rss:1516kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:76kB oom_score_adj:100

The messages hit the kernel ring buffer pretty quickly, but it took 6 minutes for rsyslogd to receive them and put timestamps on them. And then does it kill some of the 150 Apache workers? No, it kills systemd. Good job Linux.

Mentioned in SAL (#wikimedia-cloud) [2024-02-22T07:20:46Z] <TimStarling> on wsexport-prod02 reduced MaxRequestWorkers to 20 T335553

Despite reducing MaxRequestWorkers, there were three more instances of downtime in the last 4 hours, each time with a drop in available memory to near-zero, a load spike, and missed prometheus reports. No oom-killer and I haven't been able to find any useful related logs.

I found an apache process with 1.4GB of RSS remaining after the request terminated. Several other apache processes had RSS of over 500MB. I see that the tool downloads the Parsoid HTML of a page and loads it into a DOMDocument. That allows the process to exceed the memory limit, and memory is not returned to the OS when it is freed. So I set MaxConnectionsPerChild to 1 so that workers will exit after each request is served.

Reducing MaxConnectionsPerChild was definitely very helpful in turning the RAM available graph from a scary sawtooth to a more peaceful looking kelp forest. However, available memory dropped by 1.6GB 36 hours ago and did not recover.

wsexport memory.png (298×472 px, 45 KB)

With no wall clock time limit, requests can run for as long as they like, doing HTTP client requests.

(gdb) source /root/php8.2-8.2.7/.gdbinit
(gdb) zbacktrace
[0x7f2f53619e60] curl_multi_select(object[0x7f2f53619eb0], 1) [internal function]
[0x7f2f53619db0] GuzzleHttp\Handler\CurlMultiHandler->tick() /var/www/tool/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php:165 
[0x7f2f53619d30] GuzzleHttp\Handler\CurlMultiHandler->execute(object[0x7f2f53619d80]) /var/www/tool/vendor/guzzlehttp/guzzle/src/Handler/CurlMultiHandler.php:189 
[0x7f2f53619c90] GuzzleHttp\Promise\Promise->invokeWaitFn() /var/www/tool/vendor/guzzlehttp/promises/src/Promise.php:251 
[0x7f2f53619c10] GuzzleHttp\Promise\Promise->waitIfPending() /var/www/tool/vendor/guzzlehttp/promises/src/Promise.php:227 
[0x7f2f53619b80] GuzzleHttp\Promise\Promise->invokeWaitList() /var/www/tool/vendor/guzzlehttp/promises/src/Promise.php:272 
[0x7f2f53619b00] GuzzleHttp\Promise\Promise->waitIfPending() /var/www/tool/vendor/guzzlehttp/promises/src/Promise.php:229 
[0x7f2f53619a70] GuzzleHttp\Promise\Promise->wait() /var/www/tool/vendor/guzzlehttp/promises/src/Promise.php:69 
[0x7f2f53619950] App\BookProvider->getPages(array(1233)[0x7f2f536199a0]) /var/www/tool/src/BookProvider.php:203 
[0x7f2f53619780] App\BookProvider->getMetadata("\37777777730\37777777652\37777777730\37777777660\37777777731\37777777603\37777777730\37777777661\37777777730\37777777651_\37777777730\37777777647\37777777731
\37777777604\37777777730\37777777655\37777777731\37777777601\37777777730\37777777647\37777777730\37777777670", false, object[0x7f2f536197f0]) /var/www/tool/src/BookProvider.php:131 
[0x7f2f536196f0] App\BookProvider->get("\37777777730\37777777652\37777777730\37777777660\37777777731\37777777603\37777777730\37777777661\37777777730\37777777651_\37777777730\37777777647\37777777731\37777777604
\37777777730\37777777655\37777777731\37777777601\37777777730\37777777647\37777777730\37777777670") /var/www/tool/src/BookProvider.php:48 
[0x7f2f53619670] App\BookCreator->create("\37777777730\37777777652\37777777730\37777777660\37777777731\37777777603\37777777730\37777777661\37777777730\37777777651 \37777777730\37777777647\37777777731
\37777777604\37777777730\37777777655\37777777731\37777777601\37777777730\37777777647\37777777730\37777777670") /var/www/tool/src/BookCreator.php:49 
[0x7f2f53619500] App\Controller\ExportController->export(object[0x7f2f53619550], object[0x7f2f53619560], object[0x7f2f53619570], object[0x7f2f53619580], object[0x7f2f53619590], object[0x7f2f536195a0]) 
/var/www/tool/src/Controller/ExportController.php:157 
[0x7f2f53619360] App\Controller\ExportController->home(object[0x7f2f536193b0], object[0x7f2f536193c0], object[0x7f2f536193d0], object[0x7f2f536193e0], object[0x7f2f536193f0], object[0x7f2f53619400]) 
/var/www/tool/src/Controller/ExportController.php:105 
[0x7f2f53619280] Symfony\Component\HttpKernel\HttpKernel->handleRaw(object[0x7f2f536192d0], 1) /var/www/tool/vendor/symfony/http-kernel/HttpKernel.php:163 
[0x7f2f536191b0] Symfony\Component\HttpKernel\HttpKernel->handle(object[0x7f2f53619200], 1, true) /var/www/tool/vendor/symfony/http-kernel/HttpKernel.php:75 
[0x7f2f536190f0] Symfony\Component\HttpKernel\Kernel->handle(object[0x7f2f53619140]) /var/www/tool/vendor/symfony/http-kernel/Kernel.php:202 
[0x7f2f53619020] (main) /var/www/tool/public/index.php:33 
(gdb) print sapi_globals
$1 = {server_context = 0x7f2f519a7030, request_info = {request_method = 0x7f2f52be25e8 "GET", 
    query_string = 0x7f2f519a7f00 "lang=ar&page=%D8%AA%D8%B0%D9%83%D8%B1%D8%A9+%D8%A7%D9%84%D8%AD%D9%81%D8%A7%D8%B8&format=pdf-a5", cookie_data = 0x0, content_length = 0, 
...

It's patiently downloading a list of 1233 chapters from Wikisource. If it successfully builds this book, it will cache the result for 1 day.

I migrated the tool to php-fpm and set a wall clock time limit of 120s.

Now some of my previous test cases are hitting the limit of 1024 open files. I strace'd a request and found that it is indeed trying to open over a thousand concurrent connections to text-lb.eqiad.wikimedia.org. This will need a PR.

The service went down another two times in the last few hours, due to OOMs. So I'm going to try setting a memory limit for the whole of php-fpm, using a systemd unit override file.

I was able to induce an OOM with concurrent requests in order to confirm that my solution is working. Hopefully this will mean the service will stay up now.

After two whole days of uninterrupted uptime, I am declaring this done.

Note followup T358634.