High priority jobs like enotifs are executed very slowly
OpenPublic

Description

Email notifications used to be instant, now they're taking about 20 minutes on en.wiki (with only 60 thousands jobs in the queue).
Not only this shows some problem with the job queue system and it's a non-small regression, but it's also very confusing because I'm sent notifications when they're already obsoleted (for instance because I already replied).


Dear Nemo bis,

The Wikipedia page User talk:Nemo bis has been changed on 13 January 2013
by anonymous user 76.126.142.118, see
http://en.wikipedia.org/wiki/User_talk:Nemo_bis for the current
revision.

See
http://en.wikipedia.org/w/index.php?title=User_talk:Nemo_bis&diff=next&oldid=532890436
to view this change.


Received: from imp-3.mail.tiscali.it (10.39.115.235) by mx-3-it.mail.tiscali.it (8.5.148)

id 50BF36D0094B0EEF for <redacted>@tiscali.it; Sun, 13 Jan 2013 21:01:44 +0100

Received: from wiki-mail.wikimedia.org ([208.80.152.133])
by imp-3.mail.tiscali.it with
id nY1j1k02z2swdko01Y1kqf; Sun, 13 Jan 2013 21:01:44 +0100
x-cnfs-analysis: v=2.0 cv=RYES+iRv c=1 sm=2 a=P51sRyCuLXUxWMHwWK9oAA==:17
a=eIhxMilvRf8A:10 a=z82XInz0jxkA:10 a=RyZ8rIAjjLkA:10 a=eztASiHJGFwA:10
a=IkcTkHD0fZMA:10 a=3GbmggnxAAAA:8 a=8pif782wAAAA:8 a=d2uY_mg3cpUA:10
a=nk0ike9KCJb9eP9e8BIA:9 a=QEXdDO2ut3YA:10 a=c7XZu54lUV4A:10
a=9vCFg7g2Nj6V2bzh:21 a=HUl_rzNbRn9v3Gf1:21 a=P51sRyCuLXUxWMHwWK9oAA==:117
Received: from mw8.pmtpa.wmnet ([10.0.11.8]:57845)
by mchenry.wikimedia.org with esmtp (Exim 4.69)
(envelope-from <wiki@wikimedia.org>)
id 1TuTkG-0003E4-Fs
for <redacted>@tiscali.it; Sun, 13 Jan 2013 20:01:28 +0000
Received: from apache by mw8.pmtpa.wmnet with local (Exim 4.76)
id 1TuTkG-0008Ux-Bg
for <redacted>@tiscali.it; Sun, 13 Jan 2013 20:01:28 +0000
To: Nemo bis
Subject: Wikipedia page User talk:Nemo bis has been changed by anonymous user 76.126.142.118
From: MediaWiki Mail <wiki@wikimedia.org>
Reply-To: reply@not.possible
Date: Sun, 13 Jan 2013 20:01:28 +0000
MIME-Version: 1.0
Content-type: text/plain; charset=UTF-8
Content-transfer-encoding: 8bit
Message-ID: <enwiki.50f3129856d1c5.83285442@en.wikipedia.org>
X-Mailer: MediaWiki mailer


Version: wmf-deployment
Severity: normal
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=55822

bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz43936.
Nemo_bis created this task.Via LegacyJan 13 2013, 8:15 PM
Aklapper added a comment.Via ConduitJan 21 2013, 2:24 PM

One day when https://ganglia.wikimedia.org will be accessible again I could even look at the JobQueue graph...

Nemo, is the lag of ~20min still a problem?

/me looking at https://gerrit.wikimedia.org/r/#/q/project:mediawiki/core+-owner:L10n-bot+message:jobqueue,n,z

Nemo_bis added a comment.Via ConduitJan 21 2013, 6:09 PM

Job queue is now under 2000 or so on en.wiki, so it looks like the wrong timing to try to reproduce this bug. https://en.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=statistics
Anyway next time you can ask on my user talk and I'll compare timestamps of edit and enotif. :-)

Nemo_bis added a comment.Via ConduitMar 13 2013, 5:35 PM

Should probably raise severity because it takes now hours to receive an enotif from mediawiki.org (job queue 0 now, ~20 at 14 CET): 15:22–17:05 in the example.

Received: from wiki-mail.wikimedia.org ([208.80.152.133])
by imp-2.mail.tiscali.it with
id B55A1l00w2swdko0155BbL; Wed, 13 Mar 2013 18:05:11 +0100
x-cnfs-analysis: v=2.0 cv=KYdQQHkD c=1 sm=2 a=P51sRyCuLXUxWMHwWK9oAA==:17
a=gbdniXhMvlMA:10 a=RyZ8rIAjjLkA:10 a=cNjpVsleRgUA:10 a=eztASiHJGFwA:10
a=IkcTkHD0fZMA:10 a=3GbmggnxAAAA:8 a=4P5xif6CAAAA:8 a=KcaC6ams3nQA:10
a=mdTHgZqYbhYL0A32_hcA:9 a=QEXdDO2ut3YA:10 a=4wRdB16iIHwA:10
a=P51sRyCuLXUxWMHwWK9oAA==:117
Received: from mw1003.eqiad.wmnet ([10.64.0.33]:38380)
by mchenry.wikimedia.org with esmtp (Exim 4.69)
(envelope-from <wiki@wikimedia.org>)
id 1UFnVc-00068m-87
for <redacted>; Wed, 13 Mar 2013 15:22:28 +0000
Received: from apache by mw1003.eqiad.wmnet with local (Exim 4.76)
id 1UFnVc-00075V-19
for <redacted>; Wed, 13 Mar 2013 15:22:28 +0000
To: Nemo bis <redacted>
Subject: MediaWiki page Help:Extension:Translate/Configuration has been changed by Nikerabbit
From: MediaWiki Mail <wiki@wikimedia.org>
Reply-To: reply@not.possible
Date: Wed, 13 Mar 2013 15:22:28 +0000

Nemo_bis added a comment.Via ConduitMar 27 2013, 7:40 PM

If bug 46603 is right, Site requests is the correct component.
If it's just a jobqueue problem and mail relay doesn't factor into it, perhaps we just have too much stuff in "high priority"?

Nemo_bis added a comment.Via ConduitMay 2 2013, 6:00 PM

Currently it's basically instant, no time (1 s? unless Date is wrong) spent on apaches and about 20 s between mchenry.wikimedia.org and wiki-mail.wikimedia.org.
Global jobqueue very low around 100k, will check again when it gets higher.
https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&v=823574&m=Global_JobQueue_length&r=hour&z=default&jr=&js=&st=1365625056&z=large

aaron added a comment.Via ConduitJun 25 2013, 9:25 PM

Closing

Nemo_bis added a comment.Via ConduitSep 12 2013, 5:24 PM

Reopening: we have reports that password reminders on en.wiki take 60 minutes to arrive.
I can't think of any reason other than this bug; global job queue is reportedly around 2 millions. https://ganglia.wikimedia.org/latest/graph_all_periods.php?c=Miscellaneous%20pmtpa&h=hume.wikimedia.org&v=823574&m=Global_JobQueue_length&r=month&z=default&jr=&js=&st=1365625056&z=large

aaron added a comment.Via ConduitSep 17 2013, 5:42 PM

From graphite, none of the job queue push/pop graphs look remarkable over the last 2 months. The are lots of Parsoid jobs though (about 2 million on enwiki).

MZMcBride added a comment.Via ConduitSep 20 2013, 10:54 PM

(In reply to comment #7)

Reopening: we have reports that password reminders on en.wiki take 60 minutes
to arrive.

Link(s)?

I can't think of any reason other than this bug; global job queue is
reportedly around 2 millions.

There are apparently different queues.

Nemo_bis added a comment.Via ConduitSep 20 2013, 11:00 PM

(In reply to comment #9)

(In reply to comment #7)
> Reopening: we have reports that password reminders on en.wiki take 60 minutes
> to arrive.

Link(s)?

Nope. Reported on #wikimedia-tech, relayed from #wikipedia-en-help I think.

> I can't think of any reason other than this bug; global job queue is
> reportedly around 2 millions.

There are apparently different queues.

Yes (and it would be good to raise the concurrency for high priority jobs, they're still at 6 and used to be 8 till April IIRC) but this doesn't mean they don't affect each other; it happened in the past e.g. with bug 42614.

Aklapper added a comment.Via ConduitMar 15 2014, 11:24 PM

Nemo / MZ: Are you aware of any recent issues (as I'm not)?
This might end up as WORKSFORME now...

Aklapper added a comment.Via ConduitOct 16 2014, 12:49 PM

Is anybody aware of any recent issues (as I'm not) or is this WORKSFORME now?

Aklapper added a comment.Via ConduitNov 7 2014, 2:00 PM

Last call: Is anybody aware of any recent issues (as I'm not) or is this WORKSFORME now?

Nemo_bis added a comment.Via ConduitNov 7 2014, 7:23 PM

This bug can only be tested when the job queue is very high.

Aklapper added a project: Regression.Via WebDec 5 2014, 2:52 PM
Nemo_bis added a project: MediaWiki-Email.Via WebJan 4 2015, 6:57 PM
Nemo_bis set Security to None.
Liuxinyu970226 added a subscriber: Liuxinyu970226.Via WebWed, Mar 18, 6:00 AM

Add Comment

Column Prototype
This is a very early prototype of a persistent column. It is not expected to work yet, and leaving it open will activate other new features which will break things. Press "\" (backslash) on your keyboard to close it now.