Page MenuHomePhabricator

Toolforge outage: toolsdb out of space
Closed, ResolvedPublicBUG REPORT

Description

Original title: Could not connect to database: User s51080 already has more than 'max_user_connections' active connections

Toolforge project: checkwiki
host: tools.db.svc.wikimedia.cloud
user: s51080
database: s51080__checkwiki_p
web server url: Check Wikipedia

No software changes for weeks.
To try and fix: stopped web server, and continuous service (cw-live-scan) for 5 minutes. Then restarted and receiving the same error from both services.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

It seems that toolsdb is generally down. Here is ticket for montage db connection [https://github.com/hatnote/montage/issues/353#issuecomment-3488646411 timeouting].

Zache triaged this task as High priority.Nov 5 2025, 6:56 AM

I triaged this high as this is still going on and impacts to multiple tools.

tools-db-4 storage volume is out of space, I'll use this task for tracking

disk space free trend for tools-db-4 over the last 30d

2025-11-05-082453_813x632_scrot.png (632×813 px, 45 KB)

and over the last couple of days

2025-11-05-082559_807x619_scrot.png (619×807 px, 43 KB)

Doing a comparison with the replica on tools-db-6, there's ~800G free there:

root@tools-db-6:/srv/labsdb/data# df -h /srv/labsdb
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        3.9T  2.8T  885G  77% /srv/labsdb

And most/all of the discrepancy in space seems to come from ibdata1

root@tools-db-4:~# du -cs /srv/labsdb/data/* | sort -nr | head -10
3098616996	total
1032818748	/srv/labsdb/data/ibdata1
294125620	/srv/labsdb/data/s53220__quickstatements_p
258368168	/srv/labsdb/data/s51434__mixnmatch_p
191399380	/srv/labsdb/data/s53685__editgroups
98341700	/srv/labsdb/data/s51698__yetkin
86048076	/srv/labsdb/data/s51412__data
72501176	/srv/labsdb/data/s51114__enwp10
60054592	/srv/labsdb/data/s53952__freebase_p
58990172	/srv/labsdb/data/s51499__wikiminiatlas
root@tools-db-6:~# du -cs /srv/labsdb/data/* | sort -nr | head -10
2174533936	total
294125620	/srv/labsdb/data/s53220__quickstatements_p
258122072	/srv/labsdb/data/s51434__mixnmatch_p
191378900	/srv/labsdb/data/s53685__editgroups
120139788	/srv/labsdb/data/ibdata1
97220368	/srv/labsdb/data/s51698__yetkin
86057432	/srv/labsdb/data/s51412__data
72501176	/srv/labsdb/data/s51114__enwp10
60054592	/srv/labsdb/data/s53952__freebase_p
58990160	/srv/labsdb/data/s51499__wikiminiatlas
Chlod renamed this task from Could not connect to database: User s51080 already has more than 'max_user_connections' active connections to Toolforge outage: toolsdb out of space.Nov 5 2025, 8:23 AM
Chlod updated the task description. (Show Details)

We added some disk space to tools-db-4 and toolsdb is working again.

We are still investigating what caused the big increase in size for ibdata1.

This also impacted ClueBot NG, alert for report interface being down was sent at 23:55 UTC (alerting rule is after 2min), then bot not editing at 00:39 UTC (alerting after 1hour, 5min). Recovery at 08:45 UTC. Manually checked around 1am (CET) and was reporting max connections used per the above.