number of database updates multiplied x3 since 29 October
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	jcrespo
	Nov 2 2015, 10:15 AM

Description

This is mainly observed on s1, s4 and s7. There is an increase on the other shards too, but it is not so sudden or it ends afterwards.

One spike seems to have kicked it, observed for example at 07:59:35 on commons, and at 06:00:29 on enwiki.

2 main suspects: page_links_updated and wbc_entity_usage.

Related Objects

Mentioned Here: T125838: Implement usage tracking without eu_touched
rOMWC29ccbd24839f: Fix broken boilerplate maxjobs default in RunJobs.php
T117304: Investigate Memcached spike

Event Timeline

jcrespo created this task.Nov 2 2015, 10:15 AM

jcrespo raised the priority of this task from to Needs Triage.

jcrespo updated the task description. (Show Details)

jcrespo added projects: Performance Issue, DBA, Wikidata, SRE.

jcrespo subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 2 2015, 10:15 AM

ori set Security to None.Nov 2 2015, 10:16 AM

ori added a subscriber: Performance-Team.

akosiaris renamed this task from number of database updates multiplied x3 since 29 November to number of database updates multiplied x3 since 29 October.Nov 2 2015, 10:20 AM

The trend has not finished:

It seems to peak at 4:30 and 7am.

Sounds like the the result of fixing the rpc/RunJobs to properly run jobs till the 30 sec limit rather than 1 at a time (which wasted huge amounts of time in setup overhead and caused massive job backlogs, particularly for the 'enqueue' and 'refreshLinks' queues). Keeping up means more DB traffic. This was noticed by Erik and fixed by me on 2015-10-29.

06:30 logmsgbot: aaron@tin Synchronized rpc/RunJobs.php: 29ccbd248 (duration: 00m 17s)

Related: T117304

In any case, if the run rate for any type of job is too high for some reason, then runner count and $wgJobBackoffThrottling can be adjusted.

ok, if it is explained and expected, then it is not urgent.

A higher rate of updates does not necessarily imply less performance, (it could mean less lag, for example, if the updates are smaller). However, there could be an overhead in round trips- I will let you own/decide if to close this, as you may have better overview if this is causing impact on application db reads or lag. This is creating around 2000 extra QPS, per large server on enwiki.

I do not see operational problems, like higher rate of connections failed.

jcrespo removed a project: Wikidata.Nov 3 2015, 12:29 PM

Lydia_Pintscher added subscribers: Lydia_Pintscher, aude, hoo.Nov 4 2015, 1:44 PM

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 4 2015, 1:44 PM

jcrespo moved this task from Triage to Blocked external/Not db team on the DBA board.Nov 17 2015, 3:25 PM

And, as an update, the number of updates and selects is in an all-time low right now (compated to last week).

fgiunchedi triaged this task as Medium priority.Dec 1 2015, 12:39 PM

fgiunchedi subscribed.

Danny_B added a project: Performance-Team.Aug 8 2016, 11:34 AM

Danny_B removed a subscriber: Performance-Team.

Is this still relevant? Might have been fixed with T125838: Implement usage tracking without eu_touched.

This is not ongoing.

	F2911313: db1072.png
	Nov 3 2015, 12:07 PM

number of database updates multiplied x3 since 29 OctoberClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

number of database updates multiplied x3 since 29 October
Closed, ResolvedPublic
Actions