Wikibase\Repo\Store\SQL\EntityPerPageTable::{closure} creating high number of deadlocks
Open, LowPublic

Description

There seems to be a large number of API calls to wikidata creating a high number of deadlocks like this (450/hour):
{

"_index": "logstash-2015.09.04",
"_type": "mediawiki",
"_id": "AU-ZFLBnIkWKhegMyoko",
"_score": null,
"_source": {
  "message": "Wikibase\\Repo\\Store\\SQL\\EntityPerPageTable::{closure}\t10.64.32.28\t1213\tDeadlock found when trying to get lock; try restarting transaction (10.64.32.28)\tINSERT  INTO `wb_entity_per_page` (epp_entity_id,epp_entity_type,epp_page_id,epp_redirect_target) VALUES ('16507538','item','22922124',NULL)",
  "@version": 1,
  "@timestamp": "2015-09-04T15:58:33.340Z",
  "type": "mediawiki",
  "host": "mw1221",
  "level": "ERROR",
  "tags": [
    "syslog",
    "es",
    "es",
    "normalized_message_trimmed"
  ],
  "channel": "wfLogDBError",
  "url": "/w/api.php",
  "ip": "10.64.32.107",
  "http_method": "POST",
  "server": "www.wikidata.org",
  "referrer": null,
  "uid": "4943efb",
  "process_id": 21645,
  "wiki": "wikidatawiki",
  "db_server": "10.64.32.28",
  "db_name": "wikidatawiki",
  "db_user": "wikiuser",
  "method": "DatabaseBase::reportQueryError",
  "errno": 1213,
  "error": "Deadlock found when trying to get lock; try restarting transaction (10.64.32.28)",
  "sql1line": "INSERT  INTO `wb_entity_per_page` (epp_entity_id,epp_entity_type,epp_page_id,epp_redirect_target) VALUES ('16507538','item','22922124',NULL)",
  "fname": "Wikibase\\Repo\\Store\\SQL\\EntityPerPageTable::{closure}",
  "normalized_message": "Wikibase\\Repo\\Store\\SQL\\EntityPerPageTable::{closure}\t10.64.32.28\t1213\tDeadlock found when trying to get lock; try restarting transaction (10.64.32.28)\tINSERT  INTO `wb_entity_per_page` (epp_entity_id,epp_entity_type,epp_page_id,epp_redirect_target) VALUE"
},
"sort": [
  1441382313340
]

}

They started on Friday 4 Sept at around 10:50 and continues 5 hours later. They are not causing issues on the infrastructure, but they seem to be interfering with themselves. We need to identify the cause: if it is legitimate traffic, change how the inserts are done; if it is abnormal traffic, maybe avoid it in the first place.

jcrespo created this task.Sep 4 2015, 4:07 PM
jcrespo updated the task description. (Show Details)
jcrespo raised the priority of this task from to Needs Triage.
jcrespo added a subscriber: jcrespo.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 4 2015, 4:07 PM
Addshore added a comment.EditedSep 4 2015, 5:01 PM

I can confirm this slightly odd traffic is me. See epp_entity_id in the query which is 16507538 referring to Q16507538
https://www.wikidata.org/w/index.php?title=Q16507538&action=history

The traffic is legitimate, but abnormal..
Generally the rate of redirect creation would be nowhere near what is has been for the past 5 hours thus I don't think we need to worry about these deadlocks!

(I'll let you see respond and close if you feel appropriate) :)

jcrespo closed this task as Declined.Sep 4 2015, 5:37 PM
jcrespo claimed this task.

I'm more than ok with the work you are doing, I was just suggesting making the code faster :-)

aude added a subscriber: aude.Sep 4 2015, 5:50 PM

I am sure there is some underlying issue and think we should look into it, to make the code more robust and capable of handling stuff such as what addshore is doing :)

aude reopened this task as Open.Sep 4 2015, 6:06 PM
aude removed jcrespo as the assignee of this task.
aude set Security to None.

we should investigate if there is something we can do to improve the code to make it more robust.

although we have discussed discontinuing (dropping?) this table, I'm not sure we will do that anytime soon or at all.

If we want to see the exact rate we are dealing with I can probably go and pull out a number!

Also as said on IRC I asume the transactions retry when they fail due to a deadlock? Otherwise something is very odd as I only encountered 2 errors through the api in the 6 hours.

jcrespo triaged this task as Low priority.Sep 9 2015, 10:07 AM

I set the priority to low "nice to fix it, but it is not breaking anything now". Feel free to disagree with me.

hoo added a subscriber: hoo.Oct 7 2015, 4:02 PM

See merged ticket, this happened again when 300-400 new pages per minute were being created, with 35 parallel threads.