Page MenuHomePhabricator

Expand phabricator's robots.txt blacklist
Closed, ResolvedPublic

Description

Context: Phabricator is still getting hammered, to some extent, and it's putting stress on the database (T109279: Phabricator creates MySQL connection spikes: Attempt to connect to phuser@m3-master.eqiad.wmnet failed with error #1040: Too many connections.)

I watched the access logs a bit and noticed quite a few URLs which don't have any value in search engines. I also have a suspicion that some of the spikes could be coming from the sprint extension (T107197: Sprint extension doesn't scale to thousands of tasks in a single sprint: burndown page exceeds max execution timeout on visual editor project) and @chasemp mentioned to me that phragile does similar crazy things to get the data that it needs to compute burndown charts.

So I've made a paste (below) and will compile a list of URLs that should be excluded from search spiders. Once the list is looking good we will implement the change to phabricator's robots.txt and hopefully lessen the impact of search engines on our poor overworked mysql servers.

The list:

1/project/sprint
2/policy/explain
3/auth
4/login
5/maniphest/transaction
6/tag
7/search/query/all
8/conduit
9/api
10/project
11/applications
12/token
13/pholio
14/dashboard
15/calendar
16/herald

Event Timeline

mmodell claimed this task.
mmodell raised the priority of this task from to Medium.
mmodell updated the task description. (Show Details)
mmodell added a project: Phabricator.
mmodell added subscribers: mmodell, chasemp.

hmm, I'm not sure if we should exclude pastes

Change 234601 had a related patch set uploaded (by 20after4):
Add a bunch of apps and expensive URLs to robots.txt exclusion

https://gerrit.wikimedia.org/r/234601

Change 234601 merged by 20after4:
Add a bunch of apps and expensive URLs to robots.txt exclusion

https://gerrit.wikimedia.org/r/234601

Blocking /project sounds quite disastrous (also /tag for the unlucky projects which got a workboard created). Can at least /project/profile/ be allowed?