Page MenuHomePhabricator

Translate extension makes huge batch INSERTS
Closed, ResolvedPublic1 Estimated Story Points

Description

From email:
2014-12-18 07:47:11 mw1246 mediawikiwiki: Query affected 19034 rows:
INSERT INTO translate_messageindex (tmi_key,tmi_value) VALUES ('1198:project:About/Page_display_ti...

Note that the same transactions also DELETE about that many rows first.

Can we please get this both batched and moved into the job queue. This kind of insert can cause slave lag. These issue has been there for some time after being pointed out (partial fixes were made, granted). Can this get prioritized a bit more highly?

Event Timeline

aaron raised the priority of this task from to Needs Triage.
aaron updated the task description. (Show Details)
aaron added subscribers: aaron, Nikerabbit.

My reply to that email in 2014-12-19:

Aaron, I recommend you bring this up in SoS to convey the urgency. I am on vacation for weeks still.

In fact, it is was made to work this way long ago due to unreliable job queue: T38274. There is a global to move it back to JobQueue, but that does not do batching. For that we need some kind of locking system to avoid two updates running simultaneously and creating a mess.

This is the first time I hear about the issue with messageindex. The previous reports including the link above have been about messagegroupstats.

Anyone who wants to understand messageindex better should read http://laxstrom.name/blag/2013/12/02/pet-project-optimizing-message-index-to-the-last-byte/

aaron renamed this task from Translate extension make huge batch INSERTS to Translate extension makes huge batch INSERTS.Oct 1 2015, 8:53 AM
aaron updated the task description. (Show Details)

Also, from metawiki:

Query affected 44263 row(s):
query-m: DELETE FROM translate_messageindex [TRX#2d00942ca3e3]

Query affected 44263 row(s):
query-m: INSERT INTO translate_messageindex (tmi_key,tmi_value) VALUES ('X') [TRX#2d00942ca3e3]

Do all the rows actually need to change? Can the actual DELETE/UPDATE/INSERT of actual changes be done by the jobs?

Some non-exclusive ideas:

  • Use poolcounter to allow only one message index refresh at a time (so that they don't mess up the data)
  • Do in-memory diff of old/new and insert / delete only what is needed
  • Bucketize the index and process one bucket at a time (for smaller inserts and hopefully reduced memory usage; but how to determine the buckets?)
  • Instead of creating message index from scratch every time, do incremental changes (but complex, easy to get out of sync)

Up to date message index is required for page translation to work (translations for new units wont be allowed until they are in the message index).

You can also use GET_LOCK(). Just make sure that you do BEGIN/COMMIT inside of the lock calls.

aaron raised the priority of this task from High to Unbreak Now!.Oct 22 2015, 2:39 AM
aaron added a project: Blocked-on-Language.

I am aware of this request and I have discussed it with the team. I cannot promise a timely fix for you due to following reasons:

  • The fix is not trivial (in fact it still hasn't been even decided how to fix it)
  • I currently have only 20h per week commitment
  • I have to support the language team on their already planned work (especially now when some of the engineers are having holidays)

Having said that, I am thinking how to fix this and will work on as soon as I find time.

Translate extension is already doing some in-memory diffing. Easiest fix seems to be to verify that that diffing is working, and replace delete_all+insert with delete+replace.

Change 249436 had a related patch set uploaded (by Nikerabbit):
Try to avoid huge inserts in DatabaseMessageIndex

https://gerrit.wikimedia.org/r/249436

Change 249436 merged by jenkins-bot:
Try to avoid huge inserts in DatabaseMessageIndex

https://gerrit.wikimedia.org/r/249436

How to confirm this is fixed?

How to confirm this is fixed?

You can search for it in the DBPerformance log in logstash (the entry in the task description is from there).

This error no longer appears in logstash after the patch was deployed.