See e.g. https://guc.toolforge.org/?by=date&user=Count+Count
Maybe this related to the replica migration?
See e.g. https://guc.toolforge.org/?by=date&user=Count+Count
Maybe this related to the replica migration?
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Query `meta_p` data explicitly from meta.labsdb instead of from s1 | labs/tools/guc | master | +1 -1 |
Most recent entry in the tool's error.log:
2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP Warning: PDOStatement::execute(): Error reading result set's header in /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/Main.php on line 283 2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP Stack trace: 2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP 1. {main}() /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/index.php:0 2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP 2. Guc\Main->__construct() /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/index.php:41 2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP 3. Guc\Main->reduceWikis() /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/Main.php:70 2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP 4. Guc\Main->doBigUnionReduce() /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/Main.php:223 2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP 5. PDOStatement->execute() /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/Main.php:283
It does appear that reduceWikis (and the rest of the database code) is not designed to work with the https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign
Adding @Krinkle as a GUC maintainer. Hopefully the fix is relatively simple. I remember GUC being updated to work by slice quite a while ago.
@bd808 From scanning the news, I took away:
With that, I don't understand what I need to change. For performance, GUC needs to reuse connections to be responsive (same as we do in prod; my reading of the news page suggests that we open 900 conns in prod but it doesn't, prod opens 8-9 and switches or prefix queries as needed; GUC opens 8-9 and switches as needed).
I haven't yet debugged, I'll try to do that this weekend.
This all sounds correct.
With that, I don't understand what I need to change.
I'm not sure either.
For performance, GUC needs to reuse connections to be responsive (same as we do in prod; my reading of the news page suggests that we open 900 conns in prod but it doesn't, prod opens 8-9 and switches or prefix queries as needed; GUC opens 8-9 and switches as needed).
This should still work as hoped. The new replicas run a separate MariaDB instance for each slice/section. The per-wiki service names like enwiki.web.db.svc.wikimedia.cloud are actually CNAME records pointing to the s{N}.web.svc.wikimedia.cloud slice/section. The ip returned for each s{N} service name lookup points to an HAProxy service which forwards the traffic to a pooled backend server and the specific port number we are running that slice/section's MariaDB instance on.
$ host enwiki.web.db.svc.wikimedia.cloud enwiki.web.db.svc.wikimedia.cloud is an alias for s1.web.db.svc.wikimedia.cloud. s1.web.db.svc.wikimedia.cloud has address 172.16.2.36 $ host enwiki.web.db.svc.eqiad.wmflabs enwiki.web.db.svc.eqiad.wmflabs is an alias for s1.web.db.svc.wikimedia.cloud. s1.web.db.svc.wikimedia.cloud has address 172.16.2.36
Cool. I was worried for a minute I'd have to split up all the union queries, even within the same section (not technically joins, but essentially the same issue).
With that, I don't understand what I need to change.
I'm not sure either.
Perfect, no problem. I consider this a good thing, in so far that there isn't an obvious big thing that stood out as having to change. It's probably something small then. I'll get to the bottom of it this weekend.
As a note, Xtools has a functional global contribs lookup that mostly does the same things (not IP ranges?).
Change 692417 had a related patch set uploaded (by Krinkle; author: Krinkle):
[labs/tools/guc@master] Query `meta_p` data explicitly from meta.labsdb instead of from s1
Change 692417 merged by jenkins-bot:
[labs/tools/guc@master] Query `meta_p` data explicitly from meta.labsdb instead of from s1