Page MenuHomePhabricator

Wikifunctions is down
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?: An error is displayed, see screenshot

What should have happened instead?: Page should be displayed

Software version (on Special:Version page; skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Screenshot_20240908_124149_Chrome.jpg (1×1 px, 177 KB)

Event Timeline

mw-wikifunctions seems to be down in eqiad at the moment:

vgutierrez@cp6016:~$ nc -zv mw-wikifunctions.discovery.wmnet 4451
nc: connect to mw-wikifunctions.discovery.wmnet (10.2.2.88) port 4451 (tcp) failed: Connection refused
vgutierrez@cp6016:~$ nc -zv mw-wikifunctions.svc.eqiad.wmnet 4451
nc: connect to mw-wikifunctions.svc.eqiad.wmnet (10.2.2.88) port 4451 (tcp) failed: Connection refused
vgutierrez@cp6016:~$ nc -zv mw-wikifunctions.svc.codfw.wmnet 4451
Connection to mw-wikifunctions.svc.codfw.wmnet (10.2.1.88) 4451 port [tcp/*] succeeded!
Vgutierrez claimed this task.

Service should be restored now.

Joe reopened this task as In Progress.Sep 9 2024, 1:08 PM
Joe subscribed.

For the record, the cause was a relatively aggressive crawler filling up all resources. While we've rate-limited this bot, I think we should use robots.txt to ban crawling from most pages.

Probably just banning crawling of /view/ would be more than enough. Once we have a better backend performance and organic traffic increases, we can think of lifting that block.

Jdforrester-WMF triaged this task as Unbreak Now! priority.
Jdforrester-WMF subscribed.

I've added a general ban of ClaudeBot for all pages to https://www.wikifunctions.org/wiki/MediaWiki:Robots.txt for now. Re-Resolving, but we'll want to do follow-up to see if we can support higher requests at lower load.

Can/have we set a general rate limit so that when the next bot tries it, we don't go down again?

I wonder if we can raise that ban again? I think the crawler in combination with T374241 was causing the site instability issue. I would suggest that we de-ban the bot, and see if it causes issues again, because I don't think that the crawling itself would cause the issues described here.

I wonder if we can raise that ban again? I think the crawler in combination with T374241 was causing the site instability issue. I would suggest that we de-ban the bot, and see if it causes issues again, because I don't think that the crawling itself would cause the issues described here.

My bot (that runs the featured tool from the newsletter) that continuously tries to locally update every Z2 all the time, and has sent (by my really really rough estimate) 100,000 requests to Wikifunctions in the past week, has not caused any significant harm, so I don't think that ClaudeBot could have possibly been worse.