I'm tagging DBAs on this after a conversation with @Marostegui in -cloud. I did some more digging and logging and I found that I think the wikimetrics user is restricted to 10 connections at the same time across all wikis, not just for one wiki. I ran show processlist fast enough to catch the moment it ran out of connections:
Id | User | Host | db | Command | Time | State | Info | Progress |
168140821 | s52261 | 10.68.23.232:32886 | ilowiki_p | Query | 0 | init | show processlist | 0.000 |
168141098 | s52261 | 10.68.23.232:32912 | maiwiki_p | Sleep | 3 | NULL | 0.000 | |
168141099 | s52261 | 10.68.23.232:32914 | fiwiki_p | Sleep | 3 | NULL | 0.000 | |
168141100 | s52261 | 10.68.23.232:32916 | napwiki_p | Sleep | 3 | NULL | 0.000 | |
168141103 | s52261 | 10.68.23.232:32918 | dsbwiki_p | Sleep | 3 | NULL | 0.000 | |
168141104 | s52261 | 10.68.23.232:32920 | cebwiki_p | Sleep | 3 | NULL | 0.000 | |
168141106 | s52261 | 10.68.23.232:32922 | pmswiki_p | Sleep | 3 | NULL | 0.000 | |
168141107 | s52261 | 10.68.23.232:32924 | fawiki_p | Sleep | 3 | NULL | 0.000 | |
168141108 | s52261 | 10.68.23.232:32926 | ugwiki_p | Sleep | 3 | NULL | 0.000 | |
168141109 | s52261 | 10.68.23.232:32928 | snwiki_p | Sleep | 3 | NULL | 0.000 | |
And in the log I have errors from all the other wikis it tries to access. I'll try to solve this by messing with the code. I should be able to close the connection to one wiki before moving on to the next one, it's not running in parallel like I thought. So I'm taking care of this, just cc-ing DBAs so they're in the loop since I'll be on paternity leave any moment.