Page MenuHomePhabricator

Global user contributions (GUC) shows no entries
Closed, ResolvedPublic

Description

See e.g. https://guc.toolforge.org/?by=date&user=Count+Count

Maybe this related to the replica migration?

Event Timeline

Most recent entry in the tool's error.log:

2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP Warning:  PDOStatement::execute(): Error reading result set's header in /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/Main.php on line 283
2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP Stack trace:
2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP   1. {main}() /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/index.php:0
2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP   2. Guc\Main->__construct() /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/index.php:41
2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP   3. Guc\Main->reduceWikis() /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/Main.php:70
2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP   4. Guc\Main->doBigUnionReduce() /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/Main.php:223
2021-05-10 16:50:44: (mod_fastcgi.c.2543) FastCGI-stderr: PHP   5. PDOStatement->execute() /mnt/nfs/labstore-secondary-tools-project/guc/labs-tools-guc/src/Main.php:283

It does appear that reduceWikis (and the rest of the database code) is not designed to work with the https://wikitech.wikimedia.org/wiki/News/Wiki_Replicas_2020_Redesign

Adding @Krinkle as a GUC maintainer. Hopefully the fix is relatively simple. I remember GUC being updated to work by slice quite a while ago.

Majavah added a subscriber: Majavah.

(not an issue with Toolforge infrastructure itself)

@bd808 From scanning the news, I took away:

  • Old dns names now alias the new ones.
  • Old hardware is gone.
  • New hardware groups by production shard instead of all together.
  • sNUM dns still exist in new setup.

With that, I don't understand what I need to change. For performance, GUC needs to reuse connections to be responsive (same as we do in prod; my reading of the news page suggests that we open 900 conns in prod but it doesn't, prod opens 8-9 and switches or prefix queries as needed; GUC opens 8-9 and switches as needed).

I haven't yet debugged, I'll try to do that this weekend.

@bd808 From scanning the news, I took away:

  • Old dns names now alias the new ones.
  • Old hardware is gone.
  • New hardware groups by production shard instead of all together.
  • sNUM dns still exist in new setup.

This all sounds correct.

With that, I don't understand what I need to change.

I'm not sure either.

For performance, GUC needs to reuse connections to be responsive (same as we do in prod; my reading of the news page suggests that we open 900 conns in prod but it doesn't, prod opens 8-9 and switches or prefix queries as needed; GUC opens 8-9 and switches as needed).

This should still work as hoped. The new replicas run a separate MariaDB instance for each slice/section. The per-wiki service names like enwiki.web.db.svc.wikimedia.cloud are actually CNAME records pointing to the s{N}.web.svc.wikimedia.cloud slice/section. The ip returned for each s{N} service name lookup points to an HAProxy service which forwards the traffic to a pooled backend server and the specific port number we are running that slice/section's MariaDB instance on.

$ host enwiki.web.db.svc.wikimedia.cloud
enwiki.web.db.svc.wikimedia.cloud is an alias for s1.web.db.svc.wikimedia.cloud.
s1.web.db.svc.wikimedia.cloud has address 172.16.2.36
$ host enwiki.web.db.svc.eqiad.wmflabs
enwiki.web.db.svc.eqiad.wmflabs is an alias for s1.web.db.svc.wikimedia.cloud.
s1.web.db.svc.wikimedia.cloud has address 172.16.2.36

@bd808 From scanning the news, I took away:

  • Old dns names now alias the new ones.
  • Old hardware is gone.
  • New hardware groups by production shard instead of all together.
  • sNUM dns still exist in new setup.

This all sounds correct.

For performance, GUC needs […]

This should still work as hoped. The new replicas run a separate MariaDB instance for each slice/section. The per-wiki service names like enwiki.web.db.svc.wikimedia.cloud are actually CNAME records […]

Cool. I was worried for a minute I'd have to split up all the union queries, even within the same section (not technically joins, but essentially the same issue).

With that, I don't understand what I need to change.

I'm not sure either.

Perfect, no problem. I consider this a good thing, in so far that there isn't an obvious big thing that stood out as having to change. It's probably something small then. I'll get to the bottom of it this weekend.

Krinkle triaged this task as High priority.
Krinkle moved this task from Inbox to Confirmed Problem on the Tool-Global-user-contributions board.

As a note, Xtools has a functional global contribs lookup that mostly does the same things (not IP ranges?).

In case it helps, MusikAnimal posted this in a different task, may be useful for GUC

For PHP/Symfony users, the ToolforgeBundle has been updated to include a replicas connection manager. This goes by the dblists at noc.wikimedia.org to ensure your app has no more open connections than it needs to. It also has a simple command (php bin/console toolforge:ssh) to open an SSH tunnel for easier development on local environments. See docs at https://github.com/wikimedia/ToolforgeBundle#replicas-connection-manager

Change 692417 had a related patch set uploaded (by Krinkle; author: Krinkle):

[labs/tools/guc@master] Query `meta_p` data explicitly from meta.labsdb instead of from s1

https://gerrit.wikimedia.org/r/692417

Change 692417 merged by jenkins-bot:

[labs/tools/guc@master] Query `meta_p` data explicitly from meta.labsdb instead of from s1

https://gerrit.wikimedia.org/r/692417