Page MenuHomePhabricator

Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
Closed, ResolvedPublic

Description

Having long-running connections to all hosts are a huge issue for availability, and prevents me from depooling servers for regular maintenance. In particular, wikiadmin user is the most problematic as it is not checked for used resources (maximum duration of queries, connections, etc.)

These threads do /* FetchText::doGetText */ and then idle for seconds, with all the issues that can create.

If dump hosts are too slow, please say so and we can check options (specially now that we have new servers), but creating random connections to any host is a problem. Let's see why this is happening and propose a proper solution:

If these create light-weight queries only, lets disconnect and connect after some amount of seconds. If dump hosts are too slow, let's give them better resources. If there is a problem with the connection framework, let'x fix it with the addition of a proxy/persistent connections manager.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@jcrespo if you do not indeed see this on any other shards, it's probably not anywhere in the code I write/run, which is why I ask if you've seen it elsewhere except wikidata.

Also, I'm happy to look at the Wikidata-specific code and help make sure that the right db is used for these jobs.

Beyod that I would like the ability to tell LB to drop the current connection AND config and re-read. This would be very handy in general.

Depooled db1109 2 days ago, I still cannot put it under maintenance.

I still see it on other sections other than s8, see T143870.

If the code doesn't work, special mediawiki configuration should be setup so that dump hosts only know about dump dbs, I can see that working fine.

I would like the ability to tell LB to drop the current connection AND config and re-read. This would be very handy in general.

That functionality was added when we had issues with commons transcoding leaving db connections open, AFAIK

@jcrespo I can only talk about the Wikidata side of things, we are working on this in two ways:

  • We change the script invocation so that they don't run for several days anymore (the first part for that is awaiting review already) - T190513
  • We plan to implement T147169#3660704 shortly.

For db1109, I guess our scripts will take up to one more day before finishing and closing the connection (checked the progress on snapshot1007). If this issue is very very pressing on your end, kill the connections, our scripts will recover (by restarting from the beginning, which is awful, but T190513 will also address this).

@Ariel just told me that we should not restart the dumps this week, so that they don't run into the weekend, to give room for planned maintenance.

From another thread, from TimS:

It sounds like the snapshot hosts were in fact needing and using the host. It's not LoadBalancer's fault if some maintenance script calls wfGetDB() without specifying a query group. Throwing an exception is the correct thing to do if the MW configuration is incorrect, since it allows the maintenance script to terminate and be restarted with different configuration.

From another thread, from TimS:

It sounds like the snapshot hosts were in fact needing and using the host. It's not LoadBalancer's fault if some maintenance script calls wfGetDB() without specifying a query group. Throwing an exception is the correct thing to do if the MW configuration is incorrect, since it allows the maintenance script to terminate and be restarted with different configuration.

Given we set $wgDBDefaultGroup* to dump for these scripts, LoadBalancer should only ever use the "default group" in case the dump host(s) are not available. Given this (probably) didn't happen here, we're almost certainly facing some other problem.

* $wgDBDefaultGroup configures the DB group to use per default (if no other group is explicitely given), thus wfGetDB (and others) will not connect to the default hosts.

One thing we could possibly do next: Add a hook in getConnection (or somewhere close) that let's us kill the connection attempt (or the entire script) in case an unwanted replica is selected. This is not very nice, though… :S

Because this issue, or T143870, and/or long running connections due to mw connection handler, there was a connection issue at https://logstash.wikimedia.org/goto/286304e84262d2fe3335acd5eed135bb and there is likely to be another one soon. This may or may not create isues on wikidata dumps/exports, depending if retries are done on the latest configuration.

Because of hw maintenance, we cannot wait to depool the servers except a few minutes there until all webrequests finish.

Retries of wikidata entity dumps rerun the particular batch from a MediaWiki maintenance script; it will set up configuration from scratch.

Addshore added a subscriber: Addshore.

Trying to decode if there are any further concrete actionables here and who needs to action them?

Handover to @Marostegui for him to comment, as he will be the person to know if this continues happening or now.

I depooled today db1111 (https://phabricator.wikimedia.org/P14229) which is not on the vslow,dump group and I still see two things:

  • The host is being accessed by wikiadmin from snapshot1006:
  • The host keeps being accessed even though it's been depooled for more than 1 hour:
root@cumin1001:/home/marostegui# mysql.py -hdb1111 -e "show processlist" | grep wikiadmin
2017398773	wikiadmin	10.64.32.149:60598	wikidatawiki	Sleep	803		NULL	0.000
2018035580	wikiadmin	10.64.32.149:60960	wikidatawiki	Sleep	2		NULL	0.000
2018117400	wikiadmin	10.64.32.149:32822	wikidatawiki	Sleep	19		NULL	0.000
2018253068	wikiadmin	10.64.32.149:33008	wikidatawiki	Sleep	1		NULL	0.000
2018279747	wikiadmin	10.64.32.149:33028	wikidatawiki	Sleep	403		NULL	0.000
2018352112	wikiadmin	10.64.32.149:33082	wikidatawiki	Sleep	9		NULL	0.000
2018442606	wikiadmin	10.64.32.149:33152	wikidatawiki	Sleep	318		NULL	0.000
2019017411	wikiadmin	10.64.32.149:33300	wikidatawiki	Sleep	767		NULL	0.000
root@cumin1001:/home/marostegui# host 10.64.32.149
149.32.64.10.in-addr.arpa domain name pointer snapshot1006.eqiad.wmnet.
root@snapshot1006:~# host db1111
db1111.eqiad.wmnet has address 10.64.0.128
root@snapshot1006:~# netstat -putan | grep "10.64.0.128"
tcp        0      0 10.64.32.149:60598      10.64.0.128:3306        ESTABLISHED 25933/php7.2
tcp        0      0 10.64.32.149:33300      10.64.0.128:3306        ESTABLISHED 25711/php7.2
tcp        0      0 10.64.32.149:33028      10.64.0.128:3306        ESTABLISHED 25704/php7.2
tcp        0      0 10.64.32.149:60960      10.64.0.128:3306        ESTABLISHED 29537/php7.2
tcp        0      0 10.64.32.149:33152      10.64.0.128:3306        ESTABLISHED 30709/php7.2
tcp        0      0 10.64.32.149:33082      10.64.0.128:3306        ESTABLISHED 30461/php7.2
tcp        0      0 10.64.32.149:33008      10.64.0.128:3306        ESTABLISHED 30226/php7.2
tcp        0      0 10.64.32.149:32822      10.64.0.128:3306        ESTABLISHED 29707/php7.2
root@snapshot1006:~# ps aux | grep 25933
dumpsgen 25933  0.0  0.0 457784 63076 ?        S    06:38   0:03 /usr/bin/php7.2 /srv/mediawiki/php-1.36.0-wmf.27/../multiversion/MWScript.php fetchText.php --wiki wikidatawiki
root@snapshot1006:~# ps aux | grep 30709
dumpsgen 30709  0.0  0.1 461880 66992 ?        S    06:58   0:02 /usr/bin/php7.2 /srv/mediawiki/php-1.36.0-wmf.27/../multiversion/MWScript.php fetchText.php --wiki wikidatawiki

This is because the maintenance scripts that do "small" page ranges take several hours to complete. I will keep this in mind for when we can go to multiple bz2 streams in the page content history dumps; I'll be able to dump much smaller ranges then and concat them together. The other thing I should do is check how often we respawn fetchText; that is something I might be able to change sooner rather than later.

This is because the maintenance scripts that do "small" page ranges take several hours to complete. I will keep this in mind for when we can go to multiple bz2 streams in the page content history dumps; I'll be able to dump much smaller ranges then and concat them together. The other thing I should do is check how often we respawn fetchText; that is something I might be able to change sooner rather than later.

From the sounds of things I can leave this ticket on your plate then @ArielGlenn ? :)

This is because the maintenance scripts that do "small" page ranges take several hours to complete. I will keep this in mind for when we can go to multiple bz2 streams in the page content history dumps; I'll be able to dump much smaller ranges then and concat them together. The other thing I should do is check how often we respawn fetchText; that is something I might be able to change sooner rather than later.

From the sounds of things I can leave this ticket on your plate then @ArielGlenn ? :)

Sadly, yes :-P

mysql.php, used for wikidata entity dumps, does not apparently correctly handle the --group flag. it's unclear to me what it does do, I need to check into this sometime later. The queries run by it are extremely short so the impact is minimal, but it still needs to be checked.

This keeps handling and it is painful if we want to run automated schema changes (T288235). I have tried to deploy a change on s8 without manual intervention but it got stuck on:
db1111
db1114
db1126

All of the them had wikiadmin user connecting coming from snapshot hosts (snapshot1011) and that prevents the script to continue as it waits for all the connections to get drained.
None of those hosts are part of the vslow,dump group. They are part of main and api traffic groups.
Could this be given some priority?

Now that I look into it, the wikiadmin user is connected to all the pooled hosts, regardless of their config group. Connections are from snapshost1008 and snapshost1011
This would make impossible to run an automated schema change on any of those wikidata hosts unless those processes have finished.

I am going to manually run the schema change for now, but I would appreciate if some research could be done on how to overcome this.

Thanks!

The reason only those two snapshot hosts are involved is undoubtedly because dumps on the others have finished for this run.

Following my chat with @ArielGlenn on IRC, I got this query coming from snapshot1011

2690453130    wikiadmin    10.64.0.156:38912    wikidatawiki    Query    0    Statistics    SELECT /* MediaWiki\\Storage\\SqlBlobStore::fetchBlobs dumpsgen@snapsh... */  old_id,old_text,old_flags  FROM `text`    WHERE old_id = 1542930643    0.000

The above is happening from pages-meta-history dumps, and I will look into it later today. The snapshot1008 (wikidata entity) dumps will be harder.

As I feared, fetchText.php calls MediaWikiServices::getInstance()->getBlobStore()->getBlob() which gets a db replica connection on its own, with no opportunity for us to ask that it be in the vslow/dump group. We might be able to use the -dbgroupdefault dump option to this script; I will have to do some testing to see if that has any effect and what happens when that group is suddenly not available.

Thanks for looking into this issue :-)

As I feared, fetchText.php calls MediaWikiServices::getInstance()->getBlobStore()->getBlob() which gets a db replica connection on its own, with no opportunity for us to ask that it be in the vslow/dump group.

This can be fixed easily enough. We can just add a parameter to getBlob(), and pass it on to lower layers as appropriate. I'd be happy to work on that with you if you like.

The real solution would be not not hit the text table at all any more, but move the external store reference into content_address. We would cut out the middle-table, so to speak. There is no good reason to use the text table when external store is enabled, the only thing that is holding us back is the fact that the migration takes some effort.

Thanks for this thought, Daniel. I think it's better if I can pass the dbgroupdefault parameter to the maintenance script itself, instead of hacking something into getBlob(). But I do need to check if that's going to work ok. The longer term fix you mentioned, is there a task for that, so I can follow along?

Change 747455 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[mediawiki/core@master] try to use 'dump' group for db connections for dumps of page content

https://gerrit.wikimedia.org/r/747455

Change 747455 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[mediawiki/core@master] try to use 'dump' group for db connections for dumps of page content

https://gerrit.wikimedia.org/r/747455

Any ETA on when this will be merged? Thanks!

Change 747455 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn):

[mediawiki/core@master] try to use 'dump' group for db connections for dumps of page content

https://gerrit.wikimedia.org/r/747455

Any ETA on when this will be merged? Thanks!

Not yet; I need to talk with someone more knowledgeable than me about whether this approach is reasonable, before moving forward. I'll bring it up at our next meeting (tomorrow).

Not yet; I need to talk with someone more knowledgeable than me about whether this approach is reasonable, before moving forward. I'll bring it up at our next meeting (tomorrow).

Can I know how dumpers work? Any link to documentation would be appreciated. I need it to understand this patch and also finding a way for T298485

Not yet; I need to talk with someone more knowledgeable than me about whether this approach is reasonable, before moving forward. I'll bring it up at our next meeting (tomorrow).

Can I know how dumpers work? Any link to documentation would be appreciated. I need it to understand this patch and also finding a way for T298485

I don't know of any documentation specifically for the MW maintenance scripts for dumps or the modules used for import/export. There are genreal Manual pages for importing and exporting (maintained by volunteers I think) but I don't think they have the level of detail you are looking for. I have plenty of documentation for the python scripts, the formats, the content, and the various servers and how they are set up. But I guess that won't be so helpful here. Should we meet? Should I try to write something? If so, how in depth does it need to be?

More on operations part of it, how they are being ran? Is it cron/systemd timer? if so where? What is the maint script or bash script that is being ran, etc.

There is a complicated set of python scripts that coordinate the dump jobs for each wiki during the two monthly runs. https://wikitech.wikimedia.org/wiki/Dumps/Current_Architecture gives an overview. https://www.mediawiki.org/wiki/SQL/XML_Dumps#Becoming_a_dumps_co-maintainer gives rather a lot more. In general for testing you will run the python worker.py script, supplying it with the config file, the job name, the run date and the wiki; we test in deployment-prep, although I am working on a docker container testbed.

Thanks, that was the missing piece. My suggestion would be to set an environmental variable in calling mw scripts (if it's not set already). phpunit does a similar thing. And in LB code in mw when trying to get a connection, it should check the env variable and override groups. That would be the cleanest way if you ask me.

We still have the problem of config reload, how long each stream job usually takes?

Thanks, that was the missing piece. My suggestion would be to set an environmental variable in calling mw scripts (if it's not set already). phpunit does a similar thing. And in LB code in mw when trying to get a connection, it should check the env variable and override groups. That would be the cleanest way if you ask me.

We still have the problem of config reload, how long each stream job usually takes?

An environment variable would work, but I think I prefer the idea of setting the group per script. Wikidata's DumpEntities does this via Maintenance::finalSetup, and it makes more sense to me semantically: defaulting to a specific server group is specific to the script's task, not to the environment it runs in.

Having a way to tell getBlob to use a different db group, as I suggested earlier, would be nice, but it's rather brittle. We will have to remember to allow such a flag to be passed into all storage layer services. That doesn't seem great.

Note that finalSetup() runs after all settings have been loaded, but before MediaWikiServices becomes available. This allows us to safely modify settings, but it also prevents us from making use of services.

I think we have different perspectives and it might be because I'm coming from SRE? I personally think dumps are actually the environment, not the code. Beside the maint. script itself, they basically share the same code as Special:Export and that is being used from the appservers. And I think given that dumps has a dedicated group in dbs (similar to api), it feels more environment-y than code-y. (And we are removing code-y groups such as watchlist from dbs). The other reason is that for third parties, they can be using dump scripts but not needing a dedicated dumps group. And last but not least, that global variable doesn't override when a method asks for a different group. e.g. Currently basically all db queries directly called from wikibase has a dedicated group such as "from-client" or "from-repo". We need something to override that for the sake of reliability.

I would be open to any idea that would be easy to implement though.

The patch at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/747455/ is tested and ready to go, and in line with the way existing dumps scripts work. So I'd like to go ahead with it.

We can talk about the best general mechanism later, taking into account also the desire to be able to reload the LB config when a connection fails, rather than reusing it (T298485).

Change 747455 merged by jenkins-bot:

[mediawiki/core@master] try to use 'dump' group for db connections for dumps of page content

https://gerrit.wikimedia.org/r/747455

I think we have different perspectives and it might be because I'm coming from SRE? I personally think dumps are actually the environment, not the code.

I see that point, but I don't want the fix for this issue to be blocked on that discussion. Setting the group from an env variable would be a new approach. Also, forcing the group (rather than changing the default) would be new. I suggest opening a separate ticket for that idea.

Beside the maint. script itself, they basically share the same code as Special:Export and that is being used from the appservers.

It shared code, but the access pattern is vastly different.

Currently basically all db queries directly called from wikibase has a dedicated group such as "from-client" or "from-repo". We need something to override that for the sake of reliability.

Hm... this is starting to sound like we want consider a set of hints when choosing the replica, rather than specifying a group. Interesting idea.

The above patch was deployed with the train everywhere, so the specific set of queries should no longer be directed to non-vslow/dump db servers. If that's the cas, we are now back to the harder issue of what to do when a db server is depooled, and I think that discussion is happening elsewhere.

So I am still seeing a wikiadmin connection from snapshot1011.eqiad.wmnet. to a s8 API host (db1172)

I hate to ask but can we capture any queries?

I tried but I wasn't able to capture any :(

Next time I will try a different approach to capture them, which I think it will give us the exact queries

Thanks. I was pretty careful with my testing for the last fix, making sure that in production the patch redirected to a vslow/dump server. But I may have overlooked something. :-(

Today, I was trying to upgrade s8 to bullseye and I can't depool any host, they all ended up with lingering connections from snapshot1011.eqiad.wmnet.

For the whole hour in the hosts there, it was just sleep. Looking at snapshot1011, the dumper is /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpTextPass.php --wiki=wikidatawiki --stub=gzip:/mnt/dumpsdata/xmldatadumps/temp/w/wikidatawiki/wikidatawiki-20220220-stub-articles26.xml-p83298894p84798893.gz --prefetch=bzip2:/mnt/dumpsdata/xmldatadumps/public/wikidatawiki/20220201/wikidatawiki-20220201-pages-articles26.xml-p83298894p84798893.bz2 --report=1000 --spawn=/usr/bin/php7.2 --output=bzip2:/mnt/dumpsdata/xmldatadumps/public/wikidatawiki/20220220/wikidatawiki-20220220-pages-articles26.xml-p83298894p84798893.bz2.inprog --current

Most notably the top of the snapshot is not the dumper, it's bzip2. Does it keep connection open while compressing?

Most notably the top of the snapshot is not the dumper, it's bzip2. Does it keep connection open while compressing?

Isn't it compressing the stream on the fly?

Possibly but also keeping the connection open? Maybe it needs to buffer, close the connection and then compress given that it's cpu intensive and slow?

Possibly but also keeping the connection open? Maybe it needs to buffer, close the connection and then compress given that it's cpu intensive and slow?

WikiExporter writes each chunk of xml to an DumpOutput. In the above case, that would be a DumpBZip2Output, which is a DumpPipeOutput, which uses proc_open to start bzip2 and then writes each chunk to the child process's stdin. The output is far too big to buffer in memory. Writing to disk uncompressed may be an option, bout would require an order of magnitude more disk space (and we may hit file system limits on file size I suppose). And the wrapper scripts would need to be changed significantly, I suppose.

At least it can avoid connecting to non-dump hosts.

I am aware of and following this discussion but right now, my responsiveness on this task will be slow, most of my time needs to go to getting my teammate who will be dumps co-maintainer up to speed. Please bear with us.

At least it can avoid connecting to non-dump hosts.

I thought we did that... maybe @ArielGlenn has an idea.

[Edit: when they have time]

As I feared, fetchText.php calls MediaWikiServices::getInstance()->getBlobStore()->getBlob() which gets a db replica connection on its own, with no opportunity for us to ask that it be in the vslow/dump group. We might be able to use the -dbgroupdefault dump option to this script; I will have to do some testing to see if that has any effect and what happens when that group is suddenly not available.

I just tested it, it won't break, it simply ignores it.

Change 767477 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[operations/dumps@master] Add --dbgroupdefault=dump to every major dump run

https://gerrit.wikimedia.org/r/767477

I'm surprised --dbgroupdefault is not set in any dump run, it's a built-in system in mediawiki to provide support for cases like this.

Change 767477 merged by jenkins-bot:

[operations/dumps@master] Add --dbgroupdefault=dump to every major dump run

https://gerrit.wikimedia.org/r/767477

It's a bit hard to measure but it's probably fixed.

It's a bit hard to measure but it's probably fixed.

That would be wonderful if true. Let's leave this open for a while yet just in case...

Ladsgroup claimed this task.
Ladsgroup added a project: DBA.

One month have passed. We haven't had major issues like we used to (dumps running making it impossible to depool any host in s8) so I call this done. We can reopen or create new tickets if it happens again in a smaller scale or smaller dumps