Page MenuHomePhabricator

number of database updates multiplied x3 since 29 October
Closed, ResolvedPublic

Description

This is mainly observed on s1, s4 and s7. There is an increase on the other shards too, but it is not so sudden or it ends afterwards.

One spike seems to have kicked it, observed for example at 07:59:35 on commons, and at 06:00:29 on enwiki.

2 main suspects: page_links_updated and wbc_entity_usage.

Event Timeline

jcrespo raised the priority of this task from to Needs Triage.
jcrespo updated the task description. (Show Details)
jcrespo subscribed.
akosiaris renamed this task from number of database updates multiplied x3 since 29 November to number of database updates multiplied x3 since 29 October.Nov 2 2015, 10:20 AM

The trend has not finished:

db1072.png (312×1 px, 70 KB)

It seems to peak at 4:30 and 7am.

Sounds like the the result of fixing the rpc/RunJobs to properly run jobs till the 30 sec limit rather than 1 at a time (which wasted huge amounts of time in setup overhead and caused massive job backlogs, particularly for the 'enqueue' and 'refreshLinks' queues). Keeping up means more DB traffic. This was noticed by Erik and fixed by me on 2015-10-29.

06:30 logmsgbot: aaron@tin Synchronized rpc/RunJobs.php: 29ccbd248 (duration: 00m 17s)

Related: T117304

In any case, if the run rate for any type of job is too high for some reason, then runner count and $wgJobBackoffThrottling can be adjusted.

ok, if it is explained and expected, then it is not urgent.

A higher rate of updates does not necessarily imply less performance, (it could mean less lag, for example, if the updates are smaller). However, there could be an overhead in round trips- I will let you own/decide if to close this, as you may have better overview if this is causing impact on application db reads or lag. This is creating around 2000 extra QPS, per large server on enwiki.

I do not see operational problems, like higher rate of connections failed.

And, as an update, the number of updates and selects is in an all-time low right now (compated to last week).

Related? 50% less job queing: https://grafana.wikimedia.org/dashboard/db/job-queue-rate?panelId=1&fullscreen&from=1447820310472&to=1447824844728&var-Job=All

fgiunchedi triaged this task as Medium priority.Dec 1 2015, 12:39 PM
fgiunchedi subscribed.
jcrespo claimed this task.

This is not ongoing.