Page MenuHomePhabricator

Request increased quota for petscan Toolforge tool for database access
Closed, ResolvedPublic

Description

Toolforge Tool Name: catscan2 (petscan)

Type and amount of quota increase requested: Wikireplicas connections to 40

Reason: The petscan tool needs a high level of parallelism to provide its services in a reasonable period of time for the community. It is currently using a number of other tools' connections and pooling them together, which is not good for tracking and administration of the wikireplicas service. Therefore, this will be a much better solution for the consumers and developer of petscan and the community at large.

Petscan is hosted on its own VPS project here https://petscan.wmflabs.org/

Event Timeline

@Magnus please review and let me know if that connection limit is suitable (it's what Quarry has now) and feel free to provide more context or details.

Bstorm renamed this task from Request increased quota for <Replace Me> Toolforge tool for database access to Request increased quota for petscan Toolforge tool for database access.Jun 18 2020, 12:04 AM
Reedy renamed this task from Request increased quota for petscan Toolforge tool for database access to Request increased quota for petscan Toolforge tool for database access.Jun 18 2020, 12:05 AM

Thanks, that seems perfect!

Just let me know when it's done, I'll change my configuration over to petscan-DB only

Approved during weekly wmcs meeting

I would like to know how big these queries are. This is a big increase of connections (4x the default) - are these queries heavy? Do you have an estimation of how long the queries run (average)? I am worried we'd be increase the (already huge) load on our hosts.

The reason I use that many connections is precisely to avoid long-running ones, as I had in a previous version; they would time out or lose database connection, so I re-wrote the code to use more but shorter queries.

I have no recent data (should be in WMF DB stats for the petscan tool, no?), but I'd say 30 sec max, vast majority <10 sec.

Just to add, if some queries take too long I can likely just change a parameter to fix that. Someone with access to the stats, let me know.

The reason I use that many connections is precisely to avoid long-running ones, as I had in a previous version; they would time out or lose database connection, so I re-wrote the code to use more but shorter queries.

I have no recent data (should be in WMF DB stats for the petscan tool, no?), but I'd say 30 sec max, vast majority <10 sec.

Sounds good +1 then

Just to add, if some queries take too long I can likely just change a parameter to fix that. Someone with access to the stats, let me know.

This is useful too, let's keep an eye on this in case it is needed.

Thanks for the clarification

I'm hoping to run this one on Monday so I'm not executing the new process right before the weekend.

Change 608441 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: record connection increase for petscan (catscan2)

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608441

This is done on all four replica hosts:

mysql:root@localhost [(none)]> SHOW GRANTS FOR 's51156';
+------------------------------------------------------------------------------------------------------------------------------------+
| Grants for s51156@%                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------+
| GRANT labsdbuser TO 's51156'@'%'                                                                                                   |
| GRANT USAGE ON *.* TO 's51156'@'%' IDENTIFIED BY PASSWORD '<snip>' WITH MAX_USER_CONNECTIONS 10 |
+------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql:root@localhost [(none)]> GRANT USAGE ON *.* TO 's51156'@'%' WITH MAX_USER_CONNECTIONS 40;
Query OK, 0 rows affected (0.03 sec)

mysql:root@localhost [(none)]> SHOW GRANTS FOR 's51156';
+------------------------------------------------------------------------------------------------------------------------------------+
| Grants for s51156@%                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------+
| GRANT labsdbuser TO 's51156'@'%'                                                                                                   |
| GRANT USAGE ON *.* TO 's51156'@'%' IDENTIFIED BY PASSWORD '<snip>' WITH MAX_USER_CONNECTIONS 40 |
+------------------------------------------------------------------------------------------------------------------------------------+

Change 608441 merged by Bstorm:
[operations/puppet@production] wikireplicas: record connection increase for petscan (catscan2)

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608441

@Magnus, you should be all set now to switch petscan over to using connections for s51156 instead of the pool you are doing. It's all recorded and set up. I made sure the setting is on all four replicas so that if any are re-cloned the grant should persist.

I have switched the configuration to use the petscan db connections only.

Thanks for setting this up!