Page MenuHomePhabricator

Move non-critical monthly jobs to the nice queue
Closed, ResolvedPublic1 Estimated Story Points

Description

As discussed repeatedly recently (e.g. https://phabricator.wikimedia.org/T182628#3890663 or several times on #wikimedia-analytics), the cluster load tends to be very high during the first days of every month, causing Hive queries to become very sluggish and often delaying data analysis work.
This is obviously because of the many recurring monthly jobs that are launched at that point. E.g. right now I'm seeing no less than 53 jobs at https://yarn.wikimedia.org/cluster/scheduler , all in either the root.production or the root.default queue. It would be very nice if some of the less critical ones could be moved to the 'nice' queue that was established for such purposes not long ago in T156841.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Tbayer mentioned this in Unknown Object (Task).Feb 1 2018, 5:03 AM

@Tbayer : maybe you can help us identify here what is not critical ?

We could schedule jobs for app sessions later in the month for example, this data does not seem that is looked at much. Would that work?

There aren't that many monthly jobs to move (mw-history, uniques, and now clickstream), and this month was especially bad because of some work that Erik B was doing. Let's delay the clickstream to not start until the 10th of the month. @JAllemandou

Change 409966 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Delay clickstream monthly generation by 10 days

https://gerrit.wikimedia.org/r/409966

JAllemandou renamed this task from Move non-critical monthly jobs to the nice queue to Move Clickstream job to later in the month.Feb 12 2018, 5:39 PM
JAllemandou claimed this task.
JAllemandou set the point value for this task to 1.
JAllemandou edited projects, added Analytics-Kanban; removed Analytics.
JAllemandou moved this task from Next Up to In Code Review on the Analytics-Kanban board.
Tbayer renamed this task from Move Clickstream job to later in the month to Move non-critical monthly jobs to the nice queue.EditedFeb 12 2018, 6:05 PM

@JAllemandou This is not what the task was about; how about we create a separate one for the Clickstream job?

@Tbayer : maybe you can help us identify here what is not critical ?

We could schedule jobs for app sessions later in the month for example, this data does not seem that is looked at much. Would that work?

No, we still would want to have that data as soon as possible - just avoid it interfering with more timely one-off queries when these queries run.

Are there any technical issues with moving such monthly jobs into the nice queue?

Change 409966 merged by Nuria:
[analytics/refinery@master] Delay clickstream monthly generation by 10 days

https://gerrit.wikimedia.org/r/409966