enwiki's job is about 28m atm and increasing
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Glaisher
	May 8 2015, 5:23 PM

Description

It was at 15m about three days ago. See https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_136#Job_queue

Details

Subject	Repo	Branch	Lines +/-
Bumped the $wgJobBackoffThrottling refreshLinks limit	operations/mediawiki-config	master	+1 -1
Temporary hack to drain excess refreshLinks jobs	mediawiki/core	wmf/1.26wmf4	+8 -0
Temporary hack to drain excess refreshLinks jobs	mediawiki/core	wmf/1.26wmf5	+8 -0
Made triggerOpportunisticLinksUpdate() jobs make use of parser cache	mediawiki/core	wmf/1.26wmf5	+28 -7
Made triggerOpportunisticLinksUpdate() jobs make use of parser cache	mediawiki/core	wmf/1.26wmf4	+28 -7
Removed duplicated jobs in triggerOpportunisticLinksUpdate()	mediawiki/core	wmf/1.26wmf4	+3 -2
Made triggerOpportunisticLinksUpdate() jobs make use of parser cache	mediawiki/core	master	+28 -7
Increase jobrunner::runners_basic	operations/puppet	production	+1 -1

Customize query in gerrit

Related Objects

Mentioned In: rMW914d71f3ccbc: Temporary hack to drain excess refreshLinks jobs
rMWba91f0a2d339: Temporary hack to drain excess refreshLinks jobs
rOMWCa9b6107f1494: Bumped the $wgJobBackoffThrottling refreshLinks limit
rMW2a13b5de3b46: Made triggerOpportunisticLinksUpdate() jobs make use of parser cache
rMW140a0bd562d4: Made triggerOpportunisticLinksUpdate() jobs make use of parser cache
rMW6ab262666970: Removed duplicated jobs in triggerOpportunisticLinksUpdate()
rMW187fd647232a: Made triggerOpportunisticLinksUpdate() jobs make use of parser cache
rOPUP2205fa2b5407: Increase jobrunner::runners_basic
Mentioned Here: rMWba91f0a2d339: Temporary hack to drain excess refreshLinks jobs

Event Timeline

Glaisher created this task.May 8 2015, 5:23 PM

Glaisher raised the priority of this task from to Needs Triage.

Glaisher updated the task description. (Show Details)

Glaisher added projects: acl*sre-team, MediaWiki-Core-JobQueue.

Glaisher subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 8 2015, 5:23 PM

Betacommand subscribed.May 8 2015, 5:24 PM

Glaisher renamed this task from enwiki's job is about 22m and increasing to enwiki's job is about 22m atm and increasing.May 8 2015, 5:24 PM

Glaisher triaged this task as High priority.

Glaisher updated the task description. (Show Details)

Glaisher added a project: WMF-JobQueue.

Glaisher set Security to None.

Glaisher removed a subscriber: Betacommand.

Apr 30 13:50:29 <Krenair>	Someone just pointed out in tech that enwiki has a ridiculously large job queue at the moment
Apr 30 16:44:53 <legoPanda>	Betacommand: I see 10 million refreshlinks jobs???
Apr 30 21:49:58 <legoPanda>	AaronSchulz: do you know why enwiki has 11m refreshLinks jobs queued?

May 07 16:25:20 <T13|mobile>	[16:24:33] There's concerns that the enwp job queue is stuck since it's growing so much and pushing 20 million. Can someone peek and poke at it as needed?
May 07 16:25:20 <T13|mobile>	[16:25:03] <MatmaRex> T13|mobile: i'd wager this is fallout from last saturday, when someone accidentally disabled the job queue
May 07 16:32:03 <T13|mobile>	"jobs": 19977207
May 07 16:32:58 <MatmaRex>	T13|mobile: just to reassure you, the job queue is (probably) working again, it was broken only for a short while
May 07 16:35:14 <legoktm>	T13|mobile: looks like they're all refreshLinks jobs
May 07 16:37:00 <MatmaRex>	T13|mobile: some jobs actually generate more jobs when executed :D
May 07 16:38:40 <legoktm>	well, it's executing jobs
May 07 16:40:36 <MatmaRex>	T13|mobile: for example: (simplifying, since i don't know exactly how it works) say you edit a template used on 200 000 pages. rather than generate 20 000 jobs to update the pages immediately, which itself would take a long time, MediaWiki instead generates (say) 100 jobs, each of which generates 2000 jobs, each of which actually updates a page.
May 07 16:45:22 <legoktm>	T13|mobile: don't complain about job queue length when you're the one who made it so long! :P
May 07 17:01:23 <manybubbles>	oh my that is a lot of jobs
May 07 18:15:52 <T13|away>	legoktm: would my guess that part of the reason the jobqueue is still ever expanding might be related to SULF?
May 07 18:39:18 <T13|away>	[18:15:52] legoktm: would my guess that part of the reason the jobqueue is still ever expanding might be related to SULF?
May 08 18:16:16 <Betacommand>	Are ops aware of the enwiki job queue issue?
May 08 18:19:45 <Glaisher>	 "jobs": 21894746,
May 08 18:20:38 <Krenair>	the other job types seem relatively low

<Krenair> This has been going on since April 30th at least

Glaisher added a subscriber: Betacommand.May 8 2015, 5:25 PM

Technical13 subscribed.May 8 2015, 5:26 PM

I did a quick look around the job runners, and they seem to be running fine without being starved of resources. A point of note is that it apparently is only the refreshLinks that are piling up.

Mamyles subscribed.May 8 2015, 6:46 PM

Krenair renamed this task from enwiki's job is about 22m atm and increasing to enwiki's job is about 23m atm and increasing.May 9 2015, 4:09 PM

Krenair raised the priority of this task from High to Unbreak Now!.

matmarex subscribed.May 9 2015, 4:26 PM

Steinsplitter subscribed.May 9 2015, 5:36 PM

Change 209719 had a related patch set uploaded (by Aaron Schulz):
Increase jobrunner::runners_basic

https://gerrit.wikimedia.org/r/209719

Change 209852 had a related patch set uploaded (by Aaron Schulz):
Made triggerOpportunisticLinksUpdate() jobs make use of parser cache

https://gerrit.wikimedia.org/r/209852

Change 209877 had a related patch set uploaded (by Aaron Schulz):
Removed duplicated jobs in triggerOpportunisticLinksUpdate()

https://gerrit.wikimedia.org/r/209877

EoRdE6 renamed this task from enwiki's job is about 23m atm and increasing to enwiki's job is about 24m atm and increasing.May 9 2015, 9:13 PM

EoRdE6 subscribed.

Now past 25.6m. I made edits to templates as far back as April 19th that haven't filtered through to the articles yet.

Change 209719 merged by Ori.livneh:
Increase jobrunner::runners_basic

https://gerrit.wikimedia.org/r/209719

ori mentioned this in rOPUP2205fa2b5407: Increase jobrunner::runners_basic.May 11 2015, 6:40 PM

"jobs": 27803968

Change 209852 merged by jenkins-bot:
Made triggerOpportunisticLinksUpdate() jobs make use of parser cache

https://gerrit.wikimedia.org/r/209852

• Gilles mentioned this in rMW187fd647232a: Made triggerOpportunisticLinksUpdate() jobs make use of parser cache.May 11 2015, 8:40 PM

Change 210243 had a related patch set uploaded (by Aaron Schulz):
Made triggerOpportunisticLinksUpdate() jobs make use of parser cache

https://gerrit.wikimedia.org/r/210243

Change 210244 had a related patch set uploaded (by Aaron Schulz):
Made triggerOpportunisticLinksUpdate() jobs make use of parser cache

https://gerrit.wikimedia.org/r/210244

Change 209877 merged by jenkins-bot:
Removed duplicated jobs in triggerOpportunisticLinksUpdate()

https://gerrit.wikimedia.org/r/209877

Change 210244 merged by jenkins-bot:
Made triggerOpportunisticLinksUpdate() jobs make use of parser cache

https://gerrit.wikimedia.org/r/210244

aaron mentioned this in rMW6ab262666970: Removed duplicated jobs in triggerOpportunisticLinksUpdate().May 11 2015, 11:57 PM

aaron mentioned this in rMW140a0bd562d4: Made triggerOpportunisticLinksUpdate() jobs make use of parser cache.

Change 210243 merged by jenkins-bot:
Made triggerOpportunisticLinksUpdate() jobs make use of parser cache

https://gerrit.wikimedia.org/r/210243

aaron mentioned this in rMW2a13b5de3b46: Made triggerOpportunisticLinksUpdate() jobs make use of parser cache.May 12 2015, 12:11 AM

Change 210246 had a related patch set uploaded (by Aaron Schulz):
Bumped the $wgJobBackoffThrottling refreshLinks limit

https://gerrit.wikimedia.org/r/210246

Well, for sure job runners are working harder now:

Queues on most wikis approach 0 or are in the thousands. s1 databases don't seem to have suffered from the bump, or at least there's no visible change in the graphs other than a brief (unrelated?) jump in mysql_innodb_buffer_pool_pages_dirty https://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&m=cpu_report&tab=ch&vn=&hide-hf=false&hreg[]=db10%2852|51|55|57|65|66|72|73%29

Job queue seems to have begun to drop slowly, still near 29 million jobs though.

"mwscript showJobs.php enwiki --group" shows it as still going up
Edit: And not long after I said that I looked again and it had gone down. Okay then...

Change 210611 had a related patch set uploaded (by Aaron Schulz):
Temporary hack to drain excess refreshLinks jobs

https://gerrit.wikimedia.org/r/210611

Change 210610 had a related patch set uploaded (by Aaron Schulz):
Temporary hack to drain excess refreshLinks jobs

https://gerrit.wikimedia.org/r/210610

Change 210246 merged by Chad:
Bumped the $wgJobBackoffThrottling refreshLinks limit

https://gerrit.wikimedia.org/r/210246

• demon mentioned this in rOMWCa9b6107f1494: Bumped the $wgJobBackoffThrottling refreshLinks limit.May 12 2015, 11:10 PM

Change 210610 merged by jenkins-bot:
Temporary hack to drain excess refreshLinks jobs

https://gerrit.wikimedia.org/r/210610

Change 210611 merged by jenkins-bot:
Temporary hack to drain excess refreshLinks jobs

https://gerrit.wikimedia.org/r/210611

aaron mentioned this in rMWba91f0a2d339: Temporary hack to drain excess refreshLinks jobs.May 12 2015, 11:22 PM

aaron mentioned this in rMW914d71f3ccbc: Temporary hack to drain excess refreshLinks jobs.

Now en.wiki is merely at 21 millions. According to https://wikiapiary.com/wiki/Wikipedia_%28en%29 , it started dropping this morning at 8 UTC; at this speed, the queue should be drained in a matter of hours.

15 million. Checking https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=statistics let's see what it looks like tomorrow morning.

1.4 million. Looking good!

now down to 78 jobs! do we consider this done?

ori closed this task as Resolved.May 14 2015, 9:14 PM

ori claimed this task.

@ArielGlenn how are you coming up with 78 jobs? I haven't seen it go below 1.15 million. I still consider it done, but it's still a little higher than normal according to the graphs.

And, for what it's worth, still template edits from back as far as April 19th that haven't filtered through. I don't know what order it works through the jobs, but I would have assumed oldest to newest.

@Mlaffs You would have assumed very wrong... From my understanding, the job queue is not a linear, easy to follow thing. Changing a template with 50K transclusions does not mean 50K jobs will be added in any specific order. It actually creates a job, that creates jobs based on a bunch of different factors and variables and then runs through those jobs to decide what jobs need jobs and what order to do them in. Then, once it finishes one of the jobs that decides what jobs to run, it creates more jobs to see if the jobs are actually done or if they need to be run again and it makes more jobs based on that including a job that reorders all the jobs... Or, something like that...

@Technical, I was watching the estimate provided at the en wp link I mentioned above. Given that it's only an estimate but still.

I thought that that was no longer an estimate when using redis job queue.

And, for what it's worth, still template edits from back as far as April 19th that haven't filtered through. I don't know what order it works through the jobs

If you check the graph at https://wikiapiary.com/wiki/Wikipedia_%28en%29 for exact timings, you'll see that 27 millions jobs were consumed in just 19 hours; that's probably the effect of rMWba91f0a2d339 which skipped certain "redundant" jobs. This bug was resolved when the abnormal mass of jobs has been removed.

In the next 24 hours the decrease was about 600k, so we're back to business as usual even though there is still some backlog to recover.

• MZMcBride subscribed.May 15 2015, 1:06 PM

Is it possible to change the DESCRIPTION at the top of this page?

See https://en.wikipedia.org/wiki/Wikipedia:Village_pump_(technical)/Archive_136#Job_queue

Yes, click the edit task button in the top right hand corner, @Wbm1058.

Current large job queue types, as of a few minutes ago:

ParsoidCacheUpdateJobOnDependencyChange	10688
refreshLinks	65786
cirrusSearchLinksUpdate	113788
RestbaseUpdateJobOnDependencyChange	91488

Wbm1058 updated the task description. (Show Details)May 17 2015, 2:10 PM

	F163880: graph.php.png
	May 12 2015, 9:35 AM

enwiki's job is about 28m atm and increasingClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

enwiki's job is about 28m atm and increasing
Closed, ResolvedPublic
Actions