Page MenuHomePhabricator

Request increased quota for 'wikiscan' Toolforge user for database access
Closed, ResolvedPublic

Description

Toolforge Tool Name: wikiscan (u12903, not tool)
Type and amount of quota increase requested: 15
Reason:
Wikiscan is a very big site, it compute statistics for all 900+ Wikimedia wikis, providing :
*Instant detailed statistics for user, even if they total several millions edits (example 3.5M user, 68M bot).
*Global table for all users with lot of statistics https://en.wikiscan.org/users
*Most actives pages and users for last 24h https://en.wikiscan.org/hours
*Same stats for all days and months since the wiki was created https://en.wikiscan.org/calendar
*Global metrics https://en.wikiscan.org/

To achieve this, it needs to constantly refresh new data from replicas. I have made several optimizations to remove long queries on replicas, updates are much more smoother now. I log every query to follow their execution time, the longest query since 2020-05-02 took 810 seconds, only 305 queries was above 5 mins, most queries are very fast (99% are below 1 min).

Current refresh rates :
Very large wiki (en, wikidata, commons): 6h for last 24h/days, 12 hours for users.
Big wiki (de, en...): 1h for last 24h/days, 3 hours for users.
Others: 1h for last 24h/days, 2 hours for users.

There is also other slower update processes and an integrated tool to query blocks by range for frwiki.

I would like to have more connections to refresh statistics faster, I think 15 would be enough, it won't always use all of them but it will helps to handle some peaks, thanks.

Event Timeline

Aklapper changed the task status from Open to Stalled.Jul 8 2019, 12:00 PM

Hi @Wikiscan, thanks for taking the time to report this and welcome to Wikimedia Phabricator!

Please provide clearer steps to reproduce the problem. I don't know what Wikiscan does and how it does it and what it connects to and how it connects. Please see https://www.mediawiki.org/wiki/How_to_report_a_bug - thanks.

Hi, yes it is used to access Wikimedia SQL replicas.

Aklapper changed the task status from Stalled to Open.Jul 8 2019, 12:48 PM

@Wikiscan can you explain a bit more why you need to be able to run more than 10 concurrent database queries against the Wiki Replica servers?

https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Connection_handling_policy helps explain a bit why we have limits in the first place.

Wikiscan use several workers to update all wikis https://wikiscan.org/?menu=allsites&submenu=workers, actual needs are between 5-12 connections. I plan to extend wikis from 380 wikis to more than 800 (https://meta.wikimedia.org/wiki/Grants:Project/Rapid/Akeron/Wikiscan), so I think 15 connections could be required at peaks (some of them are only needed few seconds).
There is also a secondary usage to display contributions and block logs for ip ranges like https://fr.wikiscan.org/plage-ip/90.0.0.0/16?submenu=edits, it depends on visiting users and web crawlers.

Wikiscan use several workers to update all wikis https://wikiscan.org/?menu=allsites&submenu=workers, actual needs are between 5-12 connections.

This is great information, but it does not provide any evidence or data to explain why more than 10 concurrent workers are needed.

Wikiscan needs to refresh recent stats every 30 minutes (ideally) for each wiki. It also need to refresh old stats periodically, a process which can take several days per wikis, and also other data like user groups.
All of this for 380 wikis and probably 800 wikis in the coming month. Also any visitor looking at contributions/blocks for ip ranges needs a connection.

It is hard to maintain everything up to date with so much data and this year schemas changes to the revision table (actor and comment joins) are slowing down all updates.

Wikiscan needs to refresh recent stats every 30 minutes (ideally) for each wiki.

I know from looking at past slow query logs that often this account makes queries which run for 3+ hours, so I am a bit skeptical about the idea of fresh stats every 30 minutes for each wiki. I am even a little confused about the business case for that rate of updates. Is there a tangible goal of the project that requires such frequency or was this number arbitrarily chosen at some point?

My meta concern is that we we have implemented per-tool concurrency limits to make usage of the Wiki Replicas more "fair". Our database servers can only sustain a finite number of concurrent users, and we have found that this number can be easily exhausted by a relatively small number of tools without system level enforcement. If the only barrier to increasing the rate limit for a tool is functionally "create a phabricator ticket" then I fear that it will not take long for such requests to overcome the total concurrent connections that the servers can support. "Buy more servers" is an excellent idea for how we could correct that problem, but unfortunately such things require a large amount of lead time as the general budget for new hardware is set once per fiscal year and we did not plan for new database server capacity for the Wiki Replicas in FY2019-2020 (July 2019-June 2020).

Long queries happened only in the recent months, probably related to https://phabricator.wikimedia.org/T226050 and the revision table changes.
I have disabled comment join for several wikis which where slower : enwiki, commonswiki, wikidatawiki, viwiki, itwiki, cywiki, arwiki, ruwiki, eswiki, ukwiki, plwiktionary. This mean no more revert detection for users on those wikis.

For the 30 mins update time, it is only to refresh recent changes to see what is happening actually on each wikis https://en.wikiscan.org/hours/6/pages and because contributors want to see up-to-date stats when they look at user edit counter like https://en.wikiscan.org/user/Ser_Amantio_di_Nicolao.
The full update time for each wiki is a lot slower (1 or 2 months). Wikiscan is a complex tool and it is very difficult to explain in details with my low English skills.

The bigest mistake I think is to use the same sql user for the integrated IP range web tool which eats connection at random times when visit occurs. I will try to use a dedicated user for this tool or disable it, maybe the 10 connections would be enough for workers only.

I am going to mark this as declined for now. Please do reopen if you find that you cannot find a reasonable way to stay within the global limit and we can discuss more how we might find a reasonable way to help your project fulfill it's goals.

@bd808

Hi, I have made several optimizations to remove long queries on replicas, updates are much more smoother now.

I log every query to follow their execution time, the longest query since 2020-05-02 took 810 seconds, only 305 queries was above 5 mins, most queries are very fast (99% below 1 min).

Wikiscan is a very large site, it needs to regularly update stats for 900+ wikis https://wikiscan.org/?menu=allsites&submenu=status for all users and hourly/daily stats.
There is also an integrated web tool which query block logs https://fr.wikiscan.org/plage-ip/90.0.0.0/16?submenu=blocks.

Is it possible to increase the max user connections to 15 ?

Thanks.

Wikiscan updated the task description. (Show Details)

@Wikiscan I retagged this request to the new Data-Services (Quota-requests) project and filled in the easy parts of the new template for these requests. I will leave it for you to summarize your reason and provide enough context to convince the review committee that it is justified.

bd808 renamed this task from Increase max_user_connections for Wikiscan to Request increased quota for 'wikiscan' Toolforge user for database access.Jun 12 2020, 9:41 PM

+1 from team meeting, moving to DBA approval

@Marostegui This would be the first customer for this process. Let us know what you think!

+1, especially because I log every query to follow their execution time, the longest query since 2020-05-02 took 810 seconds, only 305 queries was above 5 mins, most queries are very fast (99% below 1 min).
Let's please do not increase the connections today as it is Friday and there will be less availability during the weekend in case this turns into something unexpected. Better to do it on Monday or anytime next week but just in case we see issues.

Unfortunately, I didn't get to this today. I'll do it Monday since it's the first customer for the process instead of tomorrow (Friday).

Change 608438 had a related patch set uploaded (by Bstorm; owner: Bstorm):
[operations/puppet@production] wikireplicas: record grant for wikiscan

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608438

Change 608438 merged by Bstorm:
[operations/puppet@production] wikireplicas: record grant for wikiscan

https://gerrit.wikimedia.org/r/c/operations/puppet/ /608438

Grant is present on labsdb1009/10/11/12.

Example with the hashes removed:

mysql:root@localhost [(none)]> SHOW GRANTS FOR 'u12903';
+------------------------------------------------------------------------------------------------------------------------------------+
| Grants for u12903@%                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------+
| GRANT labsdbuser TO 'u12903'@'%'                                                                                                   |
| GRANT USAGE ON *.* TO 'u12903'@'%' IDENTIFIED BY PASSWORD '<snip>' WITH MAX_USER_CONNECTIONS 10 |
+------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

mysql:root@localhost [(none)]> GRANT USAGE ON *.* TO 'u12903'@'%' WITH MAX_USER_CONNECTIONS 15;
Query OK, 0 rows affected (0.07 sec)

mysql:root@localhost [(none)]> SHOW GRANTS FOR 'u12903';
+------------------------------------------------------------------------------------------------------------------------------------+
| Grants for u12903@%                                                                                                                |
+------------------------------------------------------------------------------------------------------------------------------------+
| GRANT labsdbuser TO 'u12903'@'%'                                                                                                   |
| GRANT USAGE ON *.* TO 'u12903'@'%' IDENTIFIED BY PASSWORD '<snip>' WITH MAX_USER_CONNECTIONS 15 |
+------------------------------------------------------------------------------------------------------------------------------------+