Page MenuHomePhabricator

Wikibase\Repo\Store\SQL\EntityPerPageTable::{closure} creating high number of deadlocks
Closed, ResolvedPublic

Description

There seems to be a large number of API calls to wikidata creating a high number of deadlocks like this (450/hour):
{

"_index": "logstash-2015.09.04",
"_type": "mediawiki",
"_id": "AU-ZFLBnIkWKhegMyoko",
"_score": null,
"_source": {
  "message": "Wikibase\\Repo\\Store\\SQL\\EntityPerPageTable::{closure}\t10.64.32.28\t1213\tDeadlock found when trying to get lock; try restarting transaction (10.64.32.28)\tINSERT  INTO `wb_entity_per_page` (epp_entity_id,epp_entity_type,epp_page_id,epp_redirect_target) VALUES ('16507538','item','22922124',NULL)",
  "@version": 1,
  "@timestamp": "2015-09-04T15:58:33.340Z",
  "type": "mediawiki",
  "host": "mw1221",
  "level": "ERROR",
  "tags": [
    "syslog",
    "es",
    "es",
    "normalized_message_trimmed"
  ],
  "channel": "wfLogDBError",
  "url": "/w/api.php",
  "ip": "10.64.32.107",
  "http_method": "POST",
  "server": "www.wikidata.org",
  "referrer": null,
  "uid": "4943efb",
  "process_id": 21645,
  "wiki": "wikidatawiki",
  "db_server": "10.64.32.28",
  "db_name": "wikidatawiki",
  "db_user": "wikiuser",
  "method": "DatabaseBase::reportQueryError",
  "errno": 1213,
  "error": "Deadlock found when trying to get lock; try restarting transaction (10.64.32.28)",
  "sql1line": "INSERT  INTO `wb_entity_per_page` (epp_entity_id,epp_entity_type,epp_page_id,epp_redirect_target) VALUES ('16507538','item','22922124',NULL)",
  "fname": "Wikibase\\Repo\\Store\\SQL\\EntityPerPageTable::{closure}",
  "normalized_message": "Wikibase\\Repo\\Store\\SQL\\EntityPerPageTable::{closure}\t10.64.32.28\t1213\tDeadlock found when trying to get lock; try restarting transaction (10.64.32.28)\tINSERT  INTO `wb_entity_per_page` (epp_entity_id,epp_entity_type,epp_page_id,epp_redirect_target) VALUE"
},
"sort": [
  1441382313340
]

}

They started on Friday 4 Sept at around 10:50 and continues 5 hours later. They are not causing issues on the infrastructure, but they seem to be interfering with themselves. We need to identify the cause: if it is legitimate traffic, change how the inserts are done; if it is abnormal traffic, maybe avoid it in the first place.

Related Objects

StatusSubtypeAssignedTask
OpenNone
Resolvedhoo
ResolvedLadsgroup
Resolvedadrianheine
Resolveddaniel
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
Resolvedhoo
Resolvedhoo
Resolvedhoo
Resolvedhoo
DeclinedNone
ResolvedLydia_Pintscher
ResolvedNone
ResolvedWMDE-leszek
DeclinedNone
DeclinedNone
DeclinedNone
ResolvedLadsgroup
Resolvedaude
ResolvedMarostegui
ResolvedLadsgroup
ResolvedAndrew
ResolvedLadsgroup
Resolvedaude
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedMarostegui
ResolvedAndrew

Event Timeline

jcrespo raised the priority of this task from to Needs Triage.
jcrespo updated the task description. (Show Details)
jcrespo added a subscriber: jcrespo.

I can confirm this slightly odd traffic is me. See epp_entity_id in the query which is 16507538 referring to Q16507538
https://www.wikidata.org/w/index.php?title=Q16507538&action=history

The traffic is legitimate, but abnormal..
Generally the rate of redirect creation would be nowhere near what is has been for the past 5 hours thus I don't think we need to worry about these deadlocks!

(I'll let you see respond and close if you feel appropriate) :)

jcrespo claimed this task.

I'm more than ok with the work you are doing, I was just suggesting making the code faster :-)

I am sure there is some underlying issue and think we should look into it, to make the code more robust and capable of handling stuff such as what addshore is doing :)

aude removed jcrespo as the assignee of this task.
aude set Security to None.

we should investigate if there is something we can do to improve the code to make it more robust.

although we have discussed discontinuing (dropping?) this table, I'm not sure we will do that anytime soon or at all.

If we want to see the exact rate we are dealing with I can probably go and pull out a number!

Also as said on IRC I asume the transactions retry when they fail due to a deadlock? Otherwise something is very odd as I only encountered 2 errors through the api in the 6 hours.

I set the priority to low "nice to fix it, but it is not breaking anything now". Feel free to disagree with me.

See merged ticket, this happened again when 300-400 new pages per minute were being created, with 35 parallel threads.

hoo claimed this task.

The code in question no longer exists: The wb_entity_per_page table is no longer being updated.