Move non-critical monthly jobs to the nice queue
Closed, ResolvedPublic1 Estimated Story Points
Actions

Assigned To

Authored By

	• Tbayer
	Feb 1 2018, 1:37 AM

Description

As discussed repeatedly recently (e.g. https://phabricator.wikimedia.org/T182628#3890663 or several times on #wikimedia-analytics), the cluster load tends to be very high during the first days of every month, causing Hive queries to become very sluggish and often delaying data analysis work.
This is obviously because of the many recurring monthly jobs that are launched at that point. E.g. right now I'm seeing no less than 53 jobs at https://yarn.wikimedia.org/cluster/scheduler , all in either the root.production or the root.default queue. It would be very nice if some of the less critical ones could be moved to the 'nice' queue that was established for such purposes not long ago in T156841.

Details

	Subject	Repo	Branch	Lines +/-
	Delay clickstream monthly generation by 10 days	analytics/refinery	master	+7 -0

Customize query in gerrit

Related Objects

Mentioned Here: T156841: Hadoop: Add a lower priority queue: nice queue

Event Timeline

• Tbayer created this task.Feb 1 2018, 1:37 AM

Restricted Application added a project: Analytics. · View Herald TranscriptFeb 1 2018, 1:37 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

• Tbayer updated the task description. (Show Details)Feb 1 2018, 1:42 AM

• Tbayer mentioned this in Unknown Object (Task).Feb 1 2018, 5:03 AM

@Tbayer : maybe you can help us identify here what is not critical ?

We could schedule jobs for app sessions later in the month for example, this data does not seem that is looked at much. Would that work?

• Nuria moved this task from Incoming to Wikistats on the Analytics board.Feb 5 2018, 5:29 PM

• Tbayer added subscribers: • chelsyx, mpopov.Feb 6 2018, 9:28 PM

There aren't that many monthly jobs to move (mw-history, uniques, and now clickstream), and this month was especially bad because of some work that Erik B was doing. Let's delay the clickstream to not start until the 10th of the month. @JAllemandou

Ottomata added a subscriber: JAllemandou.Feb 12 2018, 5:11 PM

Change 409966 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Delay clickstream monthly generation by 10 days

https://gerrit.wikimedia.org/r/409966

gerritbot added a project: Patch-For-Review.Feb 12 2018, 5:38 PM

JAllemandou renamed this task from Move non-critical monthly jobs to the nice queue to Move Clickstream job to later in the month.Feb 12 2018, 5:39 PM

JAllemandou claimed this task.

JAllemandou set the point value for this task to 1.

JAllemandou edited projects, added Analytics-Kanban; removed Analytics.

JAllemandou moved this task from Next Up to In Code Review on the Analytics-Kanban board.

mpopov awarded a token.Feb 12 2018, 5:47 PM

@JAllemandou This is not what the task was about; how about we create a separate one for the Clickstream job?

In T186180#3946710, @Nuria wrote:

@Tbayer : maybe you can help us identify here what is not critical ?

We could schedule jobs for app sessions later in the month for example, this data does not seem that is looked at much. Would that work?

No, we still would want to have that data as soon as possible - just avoid it interfering with more timely one-off queries when these queries run.

Are there any technical issues with moving such monthly jobs into the nice queue?

Change 409966 merged by Nuria:
[analytics/refinery@master] Delay clickstream monthly generation by 10 days

https://gerrit.wikimedia.org/r/409966

JAllemandou moved this task from In Code Review to Ready to Deploy on the Analytics-Kanban board.Feb 15 2018, 10:40 AM

JAllemandou moved this task from Ready to Deploy to Done on the Analytics-Kanban board.Feb 15 2018, 5:01 PM

• Nuria closed this task as Resolved.Mar 2 2018, 9:32 PM

Move non-critical monthly jobs to the nice queueClosed, ResolvedPublic1 Estimated Story PointsActions

Description

Details

Related Objects

Event Timeline

Move non-critical monthly jobs to the nice queue
Closed, ResolvedPublic1 Estimated Story Points
Actions