Page MenuHomePhabricator

Recommendation API exceeds max_user_connections in MySQL
Closed, ResolvedPublic

Description

Recommendation API checks fail because of the following error:

[2018-12-17T19:47:00.210Z] ERROR: recommendation_api/232 on scb1002: ER_USER_LIMIT_REACHED: User 'recommendationapiservice' has exceeded the 'max_user_connections' resource (current value: 10) (err.code=ER_USER_LIMIT_REACHED, err.errno=1226, err.sqlState=42000, err.fatal=true, err.levelPath=error/article.creation.morelike)
    Error: ER_USER_LIMIT_REACHED: User 'recommendationapiservice' has exceeded the 'max_user_connections' resource (current value: 10)

We need to either increase the connection limit in MySQL or decrease the limits in the service.

Event Timeline

Pchelolo created this task.Dec 17 2018, 7:56 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 17 2018, 7:56 PM
mobrovac added a subscriber: mobrovac.

The important thing to note here is that each worker tries to connect to the DB, so at the very minimum we will need 192 connections (24*4 in eqiad and 16*6 in codfw).

The important thing to note here is that each worker tries to connect to the DB, so at the very minimum we will need 192 connections (24*4 in eqiad and 16*6 in codfw).

Even more than that, each worker uses a connection pool and the current limit in the pool is 30 connections, which, I believe is way too high and should be decreased. Given the probably low amount of traffic, the service will receive and the number of workers, we could set the individual worker connection pool limit to something much lower than 30.

The important thing to note here is that each worker tries to connect to the DB, so at the very minimum we will need 192 connections (24*4 in eqiad and 16*6 in codfw).

Even more than that, each worker uses a connection pool and the current limit in the pool is 30 connections, which, I believe is way too high and should be decreased. Given the probably low amount of traffic, the service will receive and the number of workers, we could set the individual worker connection pool limit to something much lower than 30.

Agreed, 30 is too high. However, we do have to keep in mind that we have the automatic checks which get executed every 60 seconds on all of the nodes + LVS. That's 4 + 6 + 2 == 12 connections just for the checks. So I would say that the bare minimum is at least three times that on the MariaDB side, and at least 5 on the client side.

30 running connections? That is a lot.
Does it do connection pooling?

30 running connections? That is a lot.
Does it do connection pooling?

No. A new connection is instantiated on every request. I agree that an external connection pooling mechanism could be enacted here to mitigate possible problems.

30 running connections? That is a lot.
Does it do connection pooling?

No. A new connection is instantiated on every request. I agree that an external connection pooling mechanism could be enacted here to mitigate possible problems.

It actually does: https://github.com/wikimedia/mediawiki-services-recommendation-api/blob/master/routes/article.creation.morelike.js#L73

But given that each pool is per worker process, we end up with so many connections overall.

bmansurov moved this task from Staged to In Progress on the Research board.
bmansurov moved this task from Backlog to In Progress on the Recommendation-API board.

Change 481871 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[operations/puppet@production] Recommendation API: increase mysql connection limit for service

https://gerrit.wikimedia.org/r/481871

@Banyek I've increased the MySQL connection limit to 400 in the above patch. Does that sound reasonable?

Change 481872 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[mediawiki/services/recommendation-api@master] Limit the number of MySQL connections

https://gerrit.wikimedia.org/r/481872

Change 481873 had a related patch set uploaded (by Bmansurov; owner: Bmansurov):
[mediawiki/services/recommendation-api/deploy@master] Limit the number of MySQL connections

https://gerrit.wikimedia.org/r/481873

Banyek added a subscriber: jcrespo.Jan 2 2019, 4:06 PM

@bmansurov as I see the memory usage on the m2 hosts, this could be good, but I am not sure if it is a good idea to increase it to a 40x size of the original.
I mean, as I've see the numbers this shouldn't cause problems, but I'd like to ask @Marostegui or @jcrespo about why the original number was set to 10?

@Banyek thanks! For the context I requested the initial number to be 50 per T205294: Request to create database and account for recommendation API, which was an arbitrary number. Didn't realize it was created as 10.

Banyek added a comment.Jan 2 2019, 4:21 PM

@Banyek thanks! For the context I requested the initial number to be 50 per T205294: Request to create database and account for recommendation API, which was an arbitrary number. Didn't realize it was created as 10.

and the 400 is arbitrary too or it was measured somehow?

@Banyek it's coming from T212154#4828960. Given we're having two connections per worker, I multiplied 192 by 2 and rounded it up to 400.

Banyek added a comment.Jan 2 2019, 4:25 PM

That makes sense, then I'll add a +1 and and LGTM to the patchm, but I won't mind, if you'd ask for a second opinion from the more experienced wiki dba's, because there might be something I am not aware of

@Banyek thanks. I'll wait for a comment from @Marostegui or @jcrespo.

400 is way too much.
The host itself has a 500 connections limit for everything.
I don't think increasing the number of connections is the right approach, I think we need to investigate how we can make sure not to have so many opened at the same time.
Here T212154#4828960 we were talking about 192, which is still quite a lot I would say.
How many running connections it actually has? Note: running, not idle.

bmansurov added a comment.EditedJan 2 2019, 4:41 PM

Thanks, @Marostegui.

@mobrovac, @Pchelolo can we reduce 192 to something smaller? Maybe disable the service in 3/4 of workers? Also what do you think about the suggestion and question in T212154#4849515.

As a absolute minimum we'd need at least 4 + 6 + 2 == 12 connections just for the automatic checks. I don't think we would ever actually reach 400 running connections, but that is the theoretical absolute maximum. Realistically, I would guess something like 50 would be enough, but it would be quite strange to have a lower limit on MySQL side then on client side.

@Marostegui does 50 sound good to you?

Change 481873 merged by Ppchelko:
[mediawiki/services/recommendation-api/deploy@master] Limit the number of MySQL connections

https://gerrit.wikimedia.org/r/481873

Change 481872 merged by Ppchelko:
[mediawiki/services/recommendation-api@master] Limit the number of MySQL connections

https://gerrit.wikimedia.org/r/481872

@Marostegui does 50 sound good to you?

That is a lot better yeah. However, how many running connections do you normally expect to have?

@Marostegui OK, could you review the patch then.

Here T203039#4557848 I did some rough estimates on the number of requests. 50K over one month, which amounts to about one connection per minute. But as pointed out at T212154#4849578, we'll need at least 12 connections for the service to even be deployed.

Thanks!
So, we need 12 connections for the service to be even deployed, and what are the other 38 connections "buffer" for you think we need?

bmansurov added a comment.EditedJan 4 2019, 4:01 PM

@Marostegui apparently we have 192 service workers according to T212154#4828960 and each service worker is allowed to have 2 MySQL connections according to this. And that's the amount we want because my estimates of 50K maybe wrong and we may end up getting more requests. In order to keep all service workers happy we need about 400 connections (192 *2). Since you mentioned that the database server cannot handle those many requests in T212154#4849515, we've reduced this number to 50. That 50 is not a buffer, because 12 connections are need for periodic service checks, which means for users to be able to query the service, we need the remaining 38 connections.

Marostegui moved this task from Triage to In progress on the DBA board.Jan 4 2019, 5:16 PM

Change 481871 merged by Marostegui:
[operations/puppet@production] Recommendation API: increase mysql connection limit for service

https://gerrit.wikimedia.org/r/481871

Mentioned in SAL (#wikimedia-operations) [2019-01-08T15:17:31Z] <marostegui> Increase connections from 10 to 50 for recommendationapiservice on m2 - T212154

Marostegui closed this task as Resolved.Jan 8 2019, 3:22 PM

I have deployed and changed the mysql config:

| GRANT USAGE ON *.* TO 'recommendationapi'@'10.64.0.166' IDENTIFIED BY PASSWORD 'xx' WITH MAX_USER_CONNECTIONS 50 |
| GRANT USAGE ON *.* TO 'recommendationapi'@'10.64.32.156' IDENTIFIED BY PASSWORD 'xx' WITH MAX_USER_CONNECTIONS 50 |
Pchelolo reopened this task as Open.Jan 16 2019, 11:29 PM

Hm, trying to deploy the service again I see the same issue:

ER_USER_LIMIT_REACHED: User 'recommendationapiservice' has exceeded the 'max_user_connections' resource (current value: 10)

Note that the current limit is still reported as '10'. @Marostegui has the change been actually applied to the user?

Reopening.

Marostegui added a comment.EditedJan 17 2019, 6:00 AM

Yes, it was, as shown at: T212154#4862734

root@db1065.eqiad.wmnet[mysql]> select user,host,max_user_connections from user where user like 'recommendationapi';
+-------------------+--------------+----------------------+
| user              | host         | max_user_connections |
+-------------------+--------------+----------------------+
| recommendationapi | 10.64.32.156 |                   50 |
| recommendationapi | 10.64.0.166  |                   50 |
+-------------------+--------------+----------------------+
2 rows in set (0.00 sec)

Did you restart the service once this got changed just in change?

Just to confirm, the slave (which I think you don't use) also has the change applied:

root@db1117.eqiad.wmnet[mysql]> select user,host,max_user_connections from user where user like 'recommendationapi';
+-------------------+--------------+----------------------+
| user              | host         | max_user_connections |
+-------------------+--------------+----------------------+
| recommendationapi | 10.64.32.156 |                   50 |
| recommendationapi | 10.64.0.166  |                   50 |
+-------------------+--------------+----------------------+
2 rows in set (0.00 sec)

Hm, trying to deploy the service again I see the same issue:

ER_USER_LIMIT_REACHED: User 'recommendationapiservice' has exceeded the 'max_user_connections' resource (current value: 10)

Note that the current limit is still reported as '10'. @Marostegui has the change been actually applied to the user?
Reopening.

Now that I see that, it is a different user. It is not recommendationapi but recommendationapiservice. I will get that fixed now

Marostegui closed this task as Resolved.Jan 17 2019, 2:30 PM

It now matches https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/481871/6/modules/role/templates/mariadb/grants/production-m2.sql.erb

root@db1065.eqiad.wmnet[mysql]> select user,host,max_user_connections from user where user like 'recommendationapi%';
+--------------------------+--------------+----------------------+
| user                     | host         | max_user_connections |
+--------------------------+--------------+----------------------+
| recommendationapi        | 10.64.32.156 |                   10 |
| recommendationapi        | 10.64.0.166  |                   10 |
| recommendationapiservice | 10.64.32.156 |                   50 |
| recommendationapiservice | 10.64.0.166  |                   50 |
+--------------------------+--------------+----------------------+
4 rows in set (0.00 sec)

Sorry for the confusion