DispatchChanges: Avoid long-lasting connections to the master DB
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	daniel
	Nov 26 2016, 3:07 PM

Description

Status Quo:

we have several (about 5 to 10) instances of dispatchChanges.php running as cron jobs concurrently.
dispatchChanges.php picks the most lagged client by looking at wbchanges_dispatch, grabs a lock representing that client, and pushes updates to it. On success, wbchanges_dispatch is updated and the lock released.
Locking is done using the Database::lock method, which translates to MySQLs GET_LOCK. The lock is bound to the DB connedction.
Each run of dispatchChanges.php may make several dispatch passes, picking and locking a client wiki each time.

The Problem:

Each dispatchChanges.php instance (about 10) holds a connection to the master DB for several minutes. (How much of a problem is this, really?)
We cannot close the connection and re-connect, since we would lose the lock.

Ideas:

Rely on information written to wbchanges_dispatch. But how to prevent starvation due to stale locks, left by a failed dispatch pass or dead process? High timeout values cause starvation, low timeout values cause false negatives.
Use some other locking mechanism. We have the LockManager base class with several implementations, including Memc, Redis, file system, and MySQL.
Put wbchanges on a separate DB system. But running a separate server for a singe DB table with 1000 rows and 10 concurrent connections seems overkill.

NOTE: Switching from one locking mechanism to the other requires careful coordination. Execution of dispatchChanges.php must be halted and then re-started with the new configuration, without causing more than a few minutes lag for client wikis.

Related Objects
Search...

Status	Assigned	Task
Invalid	None	T108944 [Epic] Improve change dispatching
Resolved	Ladsgroup	T151681 DispatchChanges: Avoid long-lasting connections to the master DB
Resolved	Ladsgroup	T151993 Implement ChangeDispatchCoordinator based on RedisLockManager
Resolved	Andrew	T155042 Increase quota for wikidata-dev project
Resolved	Ladsgroup	T155190 Build an environment to test change dispatching using Redis-based locking
Declined	None	T155196 Vagrant 1.9.1 provision failure on Trusty using role::labs:mediawiki_vagrant
Resolved	Ladsgroup	T157308 Create test script for ChangeDispatchCoordinator
Resolved	Ladsgroup	T159826 Use redis-based lock manager in dispatch changes in production
Resolved	hoo	T159828 Use redis-based lock manager for dispatchChanges on test.wikidata.org
Resolved	Ladsgroup	T162556 Consider only updating wb_changes_dispatch after a successful run
Resolved	Ladsgroup	T162557 Use replica for reading the last dispatch position (chd_seen)
Open	None	T162558 Use memcached (or something similar) to keep the latest chd_seen state, only flush to table every once in a while

Event Timeline

daniel created this task.Nov 26 2016, 3:07 PM

Restricted Application added a project: SRE. · View Herald TranscriptNov 26 2016, 3:07 PM

Holding connections on the master: if there are 5-10 jobs running it shouldn't be a big deal as I assume only 10 connections (max) will be just connected (but not doing anything, right?).
It can be a problem if the master is under heavy stress, however, if the lock isn't doing anything but keeping the connection open, isn't a massive deal.

@Manuel, @daniel Actually it is a problem, because masters have a limit of CPU# or 32 active threads on the pool of connections, which means half of the connections are reserved but doing nothing, so you are limiting the master throughput to 50% of the reality. Also, running connections for 15 minutes means that in case of a master problem, those connections do not get new configuration (like a master failover) or a new mediawiki version, something that limits High Availability- if there is a problem with the master, it cannot be easily failed over. All long running connections should be short, or easy to kill.

The wiki master is a SPOF, and it should not be used for coordination. For example, as @aaron will be able to tell you, the real master will not be easily accessible from a remote datacenter, which means this functionality as it is now, limits cross-dc scalability.

There is absolutely no reason to do this on a master, when it could be easily done on a separate server, that doesn't interact with other functionality.

jcrespo removed projects: netops, SRE.Nov 28 2016, 9:27 AM

I would favor doing the locking using an alternative mechanism for which we already have infrastructure. Setting up a separate Maria server just to serve 10 mostly idle connections seems overkill.

However, I'm not sure the available locking mechanismns based on Memc and Redis have sufficient protection against stale (orphan) locks. Perhaps @aaron can shed some light on this.

Possibly relevant: https://github.com/mlanett/redis-lock
Other redis locks: https://redis.io/topics/distlock

Using memcached is probably not feasible, since locks may be dropped at any time. If Redis turns out not to be an option, we may consider using a more powerful distributed locking service. Running another service just for this seems to be a bit of overkill, but if the alternative is to set up a separate Maria host for a single 1000 row table, ti may still be the better option.

Two options that come to mind:

In T151681#2826465, @jcrespo wrote:

@Marostegui, @daniel Actually it is a problem, because masters have a limit of CPU# or 32 active threads on the pool of connections, which means half of the connections are reserved but doing nothing, so you are limiting the master throughput to 50% of the reality.

I don't quite understand - why would an idle connection be hogging CPU threads? These connections are not active for 15 minutes, just open... Sure, they count towards the connection limit, but that's higher than just 32, no?

daniel added a comment.Nov 28 2016, 4:57 PM

This comment was removed by daniel.

Manuel unsubscribed.Nov 28 2016, 4:58 PM

max_connections is 5000, maximum active threads is 32 enforced on the connection pool. No connections should be open that are idle, and a typical connection should take less than 1 second, otherwise it has the risk of getting killed by the watchdog looking for idle connections (and we are not going to make an exception because wikidata). Extra connections make other connections take longer to connect, which increases the timeout error rate from regular connections.

Even if your connection don't count to the limit, they have an ongoing transaction open (lock) which creates issues with both metadata locking on things like ALTER TABLES and provide purge issues.

How often would locks be dropped? Using ScopedLock would handle exceptions in non-lock code. The shutdown handler usually catches SIGINT. I guess there are still fatal errors, though I'd hope that sort of thing would be rare. In that case, the redis lock manager used by our FileBackend instances could be reused.

There is also a flip-side to automatically dropping on connection loss, which is that loss can happen (possibly due to the net_wait_timeout options) while the connection to DBs actually being updated stays alive. In that case, multiple threads could run on the same client wiki. Not sure if that actually happens though, there would be reconnection events in logstash if it did.

In T151681#2827708, @aaron wrote:

How often would locks be dropped? Using ScopedLock would handle exceptions in non-lock code. The shutdown handler usually catches SIGINT. I guess there are still fatal errors, though I'd hope that sort of thing would be rare. In that case, the redis lock manager used by our FileBackend instances could be reused.

Reusing there code from FileBackend sounds like it's worth looking into, thanks @aaron!

daniel updated the task description. (Show Details)Nov 28 2016, 10:44 PM

daniel created subtask T151993: Implement ChangeDispatchCoordinator based on RedisLockManager.Nov 30 2016, 11:25 AM

Another example of why long running connections are a problem: I am depooling es1017 for important maintenance, I have depooled it, so I expect connections so finish within a few seconds, with the exception of wikiadmin's known long running queries, but I just see 2 sleeping connections:

| 2188112525 | wikiuser        | 10.64.32.39:47222 | wikidatawiki       | Sleep   |     1138 |                                                                             | NULL             |    0.000 |
| 2188171306 | wikiuser        | 10.64.16.64:43004 | wikidatawiki       | Sleep   |      584 |                                                                             | NULL             |    0.000 |

Should I kill them? Should I not be able to depool the server, even if I was in an emergency? I do not mind long running connections, if I know I can kill them at any time.

I also do not want you to make you work more than necessary. If you only need 1000 rows, and it contains no private data, I can give you access to a misc server shared with other resources, no need to have a dedicated server.

@jcrespo a misc server would be fine, no private data there.

We'll need to add a config variable to allow wb_changes_dispatch to live on a separate server, though. I should file a ticket for that later.

Anyway, @Ladsgroup will be exploring Redis based locks, see T151993: Implement ChangeDispatchCoordinator based on RedisLockManager. With that, we'd no longer need the MySQL locks at all.

In T151681#2836131, @jcrespo wrote:
Another example of why long running connections are a problem: I am depooling es1017 for important maintenance, I have depooled it, so I expect connections so finish within a few seconds, with the exception of wikiadmin's known long running queries, but I just see 2 sleeping connections:
| 2188112525 | wikiuser        | 10.64.32.39:47222 | wikidatawiki       | Sleep   |     1138 |                                                                             | NULL             |    0.000 |
| 2188171306 | wikiuser        | 10.64.16.64:43004 | wikidatawiki       | Sleep   |      584 |                                                                             | NULL             |    0.000 |
Should I kill them? Should I not be able to depool the server, even if I was in an emergency? I do not mind long running connections, if I know I can kill them at any time.

Hm, these are both job runners, jobs (probably) shouldn't run for so long. I wonder what's causing this.

Hm, these are both job runners, jobs (probably) shouldn't run for so long. I wonder what's causing this.

Separate issue then, but heads up for it.

daniel added a project: User-Daniel.Dec 6 2016, 4:55 PM

daniel moved this task from Inbox to Revisit on the User-Daniel board.Jan 5 2017, 6:56 PM

daniel moved this task from Revisit to Project on the User-Daniel board.

Ladsgroup created subtask T159826: Use redis-based lock manager in dispatch changes in production.Mar 7 2017, 1:14 PM

Ladsgroup closed subtask T151993: Implement ChangeDispatchCoordinator based on RedisLockManager as Resolved.Mar 7 2017, 1:52 PM

daniel reopened subtask T151993: Implement ChangeDispatchCoordinator based on RedisLockManager as Open.Mar 23 2017, 8:42 PM

hoo closed subtask T151993: Implement ChangeDispatchCoordinator based on RedisLockManager as Resolved.Apr 9 2017, 5:48 PM

hoo created subtask T162556: Consider only updating wb_changes_dispatch after a successful run.Apr 9 2017, 6:31 PM

hoo created subtask T162557: Use replica for reading the last dispatch position (chd_seen).Apr 9 2017, 6:43 PM

hoo created subtask T162558: Use memcached (or something similar) to keep the latest chd_seen state, only flush to table every once in a while.Apr 9 2017, 6:50 PM

elukey subscribed.Apr 11 2017, 2:46 PM

jcrespo mentioned this in T163801: WikiPage::updateCategoryCounts caused 14 minutes of lag on enwiki.Apr 26 2017, 7:59 PM

Ladsgroup closed subtask T159826: Use redis-based lock manager in dispatch changes in production as Resolved.May 8 2017, 12:00 PM

Okay, it's up and running. What's next?

@Ladsgroup You could look into T162557: Use replica for reading the last dispatch position (chd_seen) and T162556: Consider only updating wb_changes_dispatch after a successful run. Please talk to @hoo about it.

For the record in history. This chart is number of errors of master going readonly due to replicas being lagged:

Guess when redis dispatching has been deployed.
In ordinary days, we had around 1700 of errors of the master going readonly. Now, it's zero.

So do you think this had something to do with reports like T123867 T164191? This is highly surprising- I was expecting low to no master or replication performance impact, but zero is highly suspicious. Was this expected? Couldn't this be related to a bug on monitoring due to T161908#3245451 instead?

Yes, Strangely I removed all selectors in logstash and added them back one by one and made some changes.
it worked properly. This is the correct result:

Since deployment, we still have readonly mode, but they reduced to it's 80% of original flow by comparing last 24 hours and the similar period in last week. That's not super bad. I will continue monitoring this on a week total to see if it's having a real impact or not.

With the current state, we still have the same amount of connections to the master DBs, but we don't use GET_LOCK etc. on them anymore. The queries are unchanged also (despite the GET_LOCK etc.).

Due to this it would be surprising to me if this change alone improved the situation (it was mostly a pre-requirement for that).

With the current state, we still have the same amount of connections to the master DBs, but we don't use GET_LOCK etc. on them anymore.

And that for me is a huge win alone. Sadly we do not have yet metrics on connection time performance, where I would expect the biggest win.

Ladsgroup closed subtask T162556: Consider only updating wb_changes_dispatch after a successful run as Resolved.Jun 29 2017, 12:29 AM

Ladsgroup closed subtask T162557: Use replica for reading the last dispatch position (chd_seen) as Resolved.

With the deployment of the changes in the dispatching, in the last 24 hours we had around 1,800 cases of going readonly but this number last week the exact same day was 3,482. This is completely tangible.

hoo mentioned this in T171263: Wikidata Dispatcher and Job Queue is overflowed.Jul 21 2017, 10:13 AM

PokestarFan subscribed.Jul 21 2017, 1:51 PM

• 1978Gage2001 moved this task from Triage to In progress on the DBA board.Dec 11 2017, 9:45 AM

• 1978Gage2001 moved this task from Triage to In progress on the DBA board.

Marostegui moved this task from In progress to Triage on the DBA board.Dec 11 2017, 11:07 AM

I don't see huge issues on the current master- I would solve this as resolved and at some point thing about the epic parent (T108944) to improve the infrastructure.

That's another topic, I think we can call this resolved.

Marostegui closed this task as Resolved.Jun 16 2018, 5:36 AM

Marostegui assigned this task to Ladsgroup.

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptJun 16 2018, 5:36 AM

Physikerwelt mentioned this in T356561: Wikidata query service updater script seems to not close connections to wikibase after latest update.Feb 3 2024, 10:02 PM

	F8016523: image.png
	May 10 2017, 1:46 PM

	F8015751: image.png
	May 10 2017, 12:43 PM

DispatchChanges: Avoid long-lasting connections to the master DB Closed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

DispatchChanges: Avoid long-lasting connections to the master DB
Closed, ResolvedPublic
Actions

Related Objects
Search...