Wikibase\Repo\Store\WikiPageEntityStore::updateWatchlist: Automatic transaction with writes in progress (from DatabaseBase::query (LinkCache::addLinkObj)), performing implicit commit!
Closed, ResolvedPublicPRODUCTION ERROR
Actions

Assigned To

Authored By

	greg
	Jul 20 2016, 10:05 PM

Description

This is new in wmf.11: https://logstash.wikimedia.org/goto/17a8c23e6242da9e8b783b7a434e197b (last 2 days, relative, for the error)

Setting to UBN! not because I know it's causing problems for users (I don't), but because it is new logspam that should be dealt with.

Details

	Subject	Repo	Branch	Lines +/-
	Avoid use of deadlockLoop() in EntityPerPageTable	mediawiki/extensions/Wikibase	master	+29 -30
	Remove deadlockLoop() from places with few errors	mediawiki/extensions/Wikibase	master	+4 -6

Customize query in gerrit

Related Objects

Mentioned In: T140968: Wikibase refreshLinks hook breaks implicit transactions
Mentioned Here: T140968: Wikibase refreshLinks hook breaks implicit transactions
rMWd484555db6b7: Make onTransactionIdle() safer for multi-DB commits
T119736: Could not find local user data for {Username}@{wiki}

Event Timeline

greg created this task.Jul 20 2016, 10:05 PM

Restricted Application added subscribers: Luke081515, TerraCodes, Aklapper. · View Herald TranscriptJul 20 2016, 10:05 PM

Similar errors also showing up frequently now:

DatabaseBase::deadlockLoop: Automatic transaction with writes in progress (from DatabaseBase::query (LinksUpdate::acquirePageLock)), performing implicit commit!

and

DatabaseBase::deadlockLoop: Automatic transaction with writes in progress (from DatabaseBase::query (Wikibase\Lib\Store\SiteLinkTable::deleteLinksOfItem)), performing implicit commit!

and

DatabaseBase::deadlockLoop: Automatic transaction with writes in progress (from DatabaseBase::query (LinkCache::addLinkObj)), performing implicit commit!

The four of these messages account for 633 (the one in the title), 283 (the first in this comment), 194 (the middle one), and 84 (the last one) log messages in the past 4 hours, which, according to logstash, are the top 3 and #6 log message culprits.

jcrespo subscribed.Jul 21 2016, 3:46 PM

Since 10:30 the number of log errors has increased to 2000/hour, probably related to Wikidata API requests by bots.

This (just the one in the task title) now caused 3,643 fatals in the last hour. Please evaluate ASAP. cc @aude @daniel @Lydia_Pintscher

Adding to current sprint for investigation.

Last hour:

What changed at ~19:26?

Sadly, deadlockLoop doesn't tell us what called it or what it is trying to do. Knowing which transaction was still open is nice, but knowing what tried to open another one would be helpfull.

Oh! This warning is logged when an automatic transaction is flushed because an explicit transaction is started. This used to be a debug log entry, not a production warning. Aaron changed this to a warning in https://gerrit.wikimedia.org/r/#/c/298240/1 which got merged a week ago, on July 15.

So the situation isn't new, the logging is.

I'm not sure why Aaron changed this. As far as I understand, it is in the nature of automatic transactions to be flushed in such a situation. This doesn't usually indicate a problem.

Lowering priority per @daniel . Namely the message has been around for a while, it is not user facing issue and the priority of the message to recently promoted from debug to warnings.

@greg why do you think this caused fatal errors? I don't see how it could.

It was on the MediaWiki fatals dashboard in logstash :)

Premature commits like this make multi-DB transactions less safe. Only Job/DeferrableUpdate/Maintenance code should be flushing transactions. If something needs its own transaction it should use some sort of deferred update.

@aaron If I interpret what you say correctly, that means that we should not use automatic transactions in web requests at all, at least none that write.

I suggest to have a larger discussion about the appropriate use of transactions, on wikitech-l or as an RFC. In the past, automatic transactions were (ab)used as a kind of write buffer, while use of transactions to achieve atomic updates was rare. It seems like you want to change that. I'm all for saner use of transactions, but it seems we should examine the current usage patterns more closely before deprecating them.

In T140955#2487188, @greg wrote:

It was on the MediaWiki fatals dashboard in logstash :)

Interesting - why, if these were warnings? Or are DB warnings considered fatal for some reason?

I was inspired by T119736. I'm not sure what the "write buffer" thing is, but premature commits are just broken and make the code less rollback-safe.

Current use is mostly fine. For the last 24hr, <<+channel:wfLogDBError +message:*implicit* -message:*Wikibase* -message:*deadlockLoop*>> just shows a few Banner class notices. So there is little to fix here.

In T140955#2487962, @daniel wrote:

Interesting - why, if these were warnings? Or are DB warnings considered fatal for some reason?

See the dashboard for what it has/doesn't have.

The watchlist one is likely fixed by d484555db6b734ef56edf2d521dbcfb54170c7a6 in core . The extension code looks OK.

The other one is caused by using deadlockLoop(), which I'd suggest removing.

aaron merged a task: T140967: WikiPageEntityStore::updateWatchlist breaks implicit transactions.Jul 27 2016, 5:25 AM

This (although it is not the only one) is causing problems with huge sizes on logs. I would suggest to either disable the logging or sample them or all logs will be broken!

Then you can continue fixing it in the same way before put it back to error level.

• Jonas moved this task from Backlog to Proposed on the Wikidata-Sprint-2016-07-19 board.Jul 27 2016, 4:04 PM

Change 301538 had a related patch set uploaded (by Aaron Schulz):
Remove deadlockLoop() from places with few errors

https://gerrit.wikimedia.org/r/301538

Change 301551 had a related patch set uploaded (by Aaron Schulz):
Avoid use of deadlockLoop() in EntityPerPageTable

https://gerrit.wikimedia.org/r/301551

Change 301538 merged by jenkins-bot:
Remove deadlockLoop() from places with few errors

https://gerrit.wikimedia.org/r/301538

Lydia_Pintscher moved this task from incoming to in progress on the Wikidata board.Aug 2 2016, 8:59 AM

Change 301551 merged by jenkins-bot:
Avoid use of deadlockLoop() in EntityPerPageTable

https://gerrit.wikimedia.org/r/301551

All of @aaron's patches have been merged, can we consider this resolved?

I will check the logs.

In T140955#2516051, @jcrespo wrote:

I will check the logs.

We will deploy the fixes this week (with wmf13). The improvements should take effect once the branch hits Wikidata (tomorrow).

Indeed, it continues happening on WMF-servers. As we have the tag Wikimedia-production-error (a WMF issue rather than a mediawiki or wikidata-only), I would like to keep this open if you are ok with that to check, for example, it does not happen by other unrelated reasons (both me and Greg can do that).

What I do in such cases is to move it to "Done" on my own board if my part is done, if that is an alternative for cross-team issues? I hope that is ok for you.

I cannot see this anymore since 23 UTC yesterday. If you can confirm its successful deployment, I will mark this as resolved.

I see this and T140968 resolved since Aug 4.

• mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:11 PM

	F4295127: Selection_130.png
	Jul 21 2016, 7:35 PM

Wikibase\Repo\Store\WikiPageEntityStore::updateWatchlist: Automatic transaction with writes in progress (from DatabaseBase::query (LinkCache::addLinkObj)), performing implicit commit!Closed, ResolvedPublicPRODUCTION ERRORActions

Description

Details

Related Objects

Event Timeline

Wikibase\Repo\Store\WikiPageEntityStore::updateWatchlist: Automatic transaction with writes in progress (from DatabaseBase::query (LinkCache::addLinkObj)), performing implicit commit!
Closed, ResolvedPublicPRODUCTION ERROR
Actions