Page MenuHomePhabricator

Wiki replicas accessed from toolforge are very slow
Closed, InvalidPublic

Description

SQL queries against wiki replicas are much slower compared to Quarry. It used to be that wiki replicas accessed from toolforge were faster than running queries using Quarry and this changed at some point in 2019.

How to replicate:

ssh dev.tools.wmflabs.org

sql fiwiki_p;
MariaDB [fiwiki_p]> select count(distinct(page_id)) as c from page;
1278406 
1 row in set (52.10 sec)

Same query takes 2.36 seconds in Quarry (Example: 42216)

This is worse with more complex queries which are using multiple or large tables. In example: Counting number of articles takes 14 min 46.53 sec in toolforge and 29.98 seconds in Quarry 42220

SELECT count(distinct(page_id))
FROM pagelinks
INNER JOIN page ON pl_from = page_id
WHERE page_namespace = 0
AND page_is_redirect = 0;

Event Timeline

Quarry is basically a client of replicas db, like what can be your own toollabs tool. Hence, there is no reason to have quicker results from quarry rather than from your tool.
See https://wikitech.wikimedia.org/wiki/Help:Toolforge/Database#Naming_conventions to learn for the two types of replicas servers available.
Quarry uses *.web.db.svc.eqiad.wmflabs database server. I suspect that sql command on toollabs maps to analytics hosts, that are - by design - slower. As Quarry is a web service, it uses appropriate server. If you perform sql command for another usage, you should likely keep using analytics.

Yes, the sql fiwiki_p connects to fiwiki.analytics.db.svc.eqiad.wmflabs .

Anyway, my point was that it was not slower before. I used to use toollabs for writing example queries (which I shared on quarry) as it was faster to run and edit them there. Also, it was possible to run queries in toollabs which were timeouting in Quarry.

Another thing is that least I am running non-interactive cronjobs on toollabs and I have needed to rewrite queries because they have changed to too slow. Partially reason have been the actor change (T215445), but queries to tables without links to comments or actor like select count(distinct(page_id)) as c from page should not be directly impacted because of that.

bd808 subscribed.

I'm closing this as invalid not because I think @Zache is making anything up here, but because as usage of the finite shared Wiki Replica resources increase things get slower for everyone. Queries which are not directly using actor changes are still running on the same servers with other queries that are using slower joins against more tables. The WMCS and DBA teams are aware of the increased load on the 3 Wiki Replica servers and are proposing new hardware purchases in the FY2020/2021 budget.

So,. if I understand correctly the problem is that the views used currently in the tool labs are fucked by design and because that everything is unusable slow?

So,. if I understand correctly the problem is that the views used currently in the tool labs are fucked by design and because that everything is unusable slow?

Kind of yes. The normalization of data that has been done to improve MediaWiki's runtime OLTP needs makes the typical OLAP desires of a Tool do a lot more work. This in turn consumes more of the finite CPU/RAM/IOPS resources of the servers doing the work. Also, in the ~4 years since the current Wiki Replicas were designed and deployed a lot more folks have found reasons to dig around in the Wiki Replica data. Expecting the performance of each query to remain constant when more queries are running per unit time on the same hardware is not realistic.