Page MenuHomePhabricator

https://lists.wikimedia.org is often slow to load
Open, MediumPublic

Description

It often takes aaaages to load anything :(

Event Timeline

I see the problem in two areas only:

  • Opening the main page
  • Opening the page for a mailing list with a lot of members.

For the first one, it might be the db. While I was waiting for it to load, I did processlist and SELECT count(*) AS count_1 FROM (SELECT member.id AS member_id, member._member_id AS member__member shows up. Unfortunately, I can't find the query in mailman code itself to get the actual query and try and possibly add an index or something.

The second one I'm sure is not a slow query, show processlist doesn't return anything while it's trying to load. It might be doing thousands of fast queries but still they are fast enough.

I'll dig deeper later.

@Reedy what did you see as slow back them? Right now doing:

seems relatively responsive, both logged and unlogged.

I wonder if it was during a heavy scrapping session or at a different timezone? I haven't checked hyperkitty, though.

It's nearly every time I use it.

Similarly, clicking "Manage this list" on https://lists.wikimedia.org/hyperkitty/list/mediawiki-announce@lists.wikimedia.org/...

Screenshot 2024-03-27 at 17.20.10.png (418×2 px, 136 KB)

160 seconds? Really?

Thank you Reedy, I trust you, it was just that the title wasn't descriptive enough (exact url, logged in/logged out, etc.). The 500 is indeed a symptom of the same issue (http timeouts from varnish). Now I have more data to work with :-D. For example, the first one was from hyperkitty, not postorious, so the title was misleading to me.

It's very slow for me as well, I hadn't opened it in a while but it was barely usable both yesterday and today.

~ $ curl -o /dev/null -s -w 'Total: %{time_total}s\n' https://lists.wikimedia.org/postorius/lists/
Total: 234.281773s
Reedy renamed this task from https://lists.wikimedia.org/postorius is sloooow to https://lists.wikimedia.org is often slow to load.Apr 3 2024, 1:32 PM

It's very slow for me as well, I hadn't opened it in a while but it was barely usable both yesterday and today.

~ $ curl -o /dev/null -s -w 'Total: %{time_total}s\n' https://lists.wikimedia.org/postorius/lists/
Total: 234.281773s

now:

 curl -o /dev/null -s -w 'Total: %{time_total}s\n' https://lists.wikimedia.org/postorius/lists/
Total: 1.020937s

I can't really confirm that from my side right now. When I click on that it's fast.

Maybe it's only slow sometimes while something else is happening or it's related to location or browser?

curl -o /dev/null -s -w 'Total: %{time_total}s\n' https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/
Total: 0.561476s

With the curl command you are not logged in, are you?

That's correct, I am not logged in.

With the curl command you are not logged in, are you?

I see the problem in two areas only:

  • Opening the main page
  • Opening the page for a mailing list with a lot of members.

For the first one, it might be the db. While I was waiting for it to load, I did processlist and SELECT count(*) AS count_1 FROM (SELECT member.id AS member_id, member._member_id AS member__member shows up. Unfortunately, I can't find the query in mailman code itself to get the actual query and try and possibly add an index or something.

The second one I'm sure is not a slow query, show processlist doesn't return anything while it's trying to load. It might be doing thousands of fast queries but still they are fast enough.

I'll dig deeper later.

I stumbled over this today myself. I am currently seeing a page like https://lists.wikimedia.org/postorius/lists/cloud-admin-feed.lists.wikimedia.org/ load in less than 1 second (good). I am also seeing a page like https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/ load in 2 minutes and 45 seconds (very, very bad).

The MW installer has a feature allowing the user to check a box to subscribe to mediawiki-announce. I tried to test it, since I'm doing maintenance on the client code. It timed out in the client, but the Apache logs on lists1004 show it completing successfully after 56 seconds. I got the confirmation email. I did another two requests with the same email and they each took about 57 seconds.

That's too slow. The feature will have to be removed if this can't be fixed.

The slow request was

curl https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/anonymous_subscribe -F email=$EMAIL -H'X-CSRFToken:aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' -H'Cookie: csrftoken=aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' -H'Referer: https://lists.wikimedia.org/postorius/lists/mediawiki-announce.lists.wikimedia.org/anonymous_subscribe'

Can we have tracebacker?

Please unbreak now.

This seems to be a different issue. The cases reported in this task are two specific pages but lists.wikimedia.org is slow on every page right now.

LSobanski triaged this task as Medium priority.Jan 28 2025, 9:57 AM

mailman-web has been restarted, it seems to be a bit faster now

Again. Please unbreak this permanently.

Again. Please unbreak this permanently.

Given the amount of scrapers on our mailman. If you can burst the AI bubble, it'd help us a lot.

To SRE-Collab: Maybe this can somehow use the CDN blocks?

My suspicion is that it's slow because it's inefficient. That's why I asked for stack traces. Profiling would also do the job.

Change #1188288 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/puppet@production] lists: Bump number of uwsgi processes to 12 (from 4)

https://gerrit.wikimedia.org/r/1188288

Change #1188288 merged by Ladsgroup:

[operations/puppet@production] lists: Bump number of uwsgi processes to 12 (from 4)

https://gerrit.wikimedia.org/r/1188288

Change #1188294 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] mailman: Update monitoring to 13 mailman processes

https://gerrit.wikimedia.org/r/1188294

Change #1188294 merged by Jcrespo:

[operations/puppet@production] mailman: Update monitoring to 13 mailman processes

https://gerrit.wikimedia.org/r/1188294

Change #1188320 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] mailman: add a local disk cache

https://gerrit.wikimedia.org/r/1188320

Change #1188320 merged by Arnaudb:

[operations/puppet@production] mailman: add a local disk cache

https://gerrit.wikimedia.org/r/1188320

Change #1188708 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] Revert^2 "mailman: add a local disk cache"

https://gerrit.wikimedia.org/r/1188708

Change #1188708 merged by Arnaudb:

[operations/puppet@production] Revert^2 "mailman: add a local disk cache"

https://gerrit.wikimedia.org/r/1188708

Change #1188796 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] Revert^3 "mailman: add a local disk cache"

https://gerrit.wikimedia.org/r/1188796

Change #1188796 abandoned by Arnaudb:

[operations/puppet@production] Revert^3 "mailman: add a local disk cache"

Reason:

wrong patch

https://gerrit.wikimedia.org/r/1188796

Change #1188798 had a related patch set uploaded (by Arnaudb; author: Arnaudb):

[operations/puppet@production] Revert^4 "mailman: add a local disk cache"

https://gerrit.wikimedia.org/r/1188798

Change #1188798 merged by Arnaudb:

[operations/puppet@production] Revert^4 "mailman: add a local disk cache"

https://gerrit.wikimedia.org/r/1188798

https://gerrit.wikimedia.org/r/1188798 creates a 1GB local disk cache that should help with those performance issues.

cache is around 100MB and the UI is slowing down again