As discussed repeatedly recently (e.g. https://phabricator.wikimedia.org/T182628#3890663 or several times on #wikimedia-analytics), the cluster load tends to be very high during the first days of every month, causing Hive queries to become very sluggish and often delaying data analysis work.
This is obviously because of the many recurring monthly jobs that are launched at that point. E.g. right now I'm seeing no less than 53 jobs at https://yarn.wikimedia.org/cluster/scheduler , all in either the root.production or the root.default queue. It would be very nice if some of the less critical ones could be moved to the 'nice' queue that was established for such purposes not long ago in T156841.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Delay clickstream monthly generation by 10 days | analytics/refinery | master | +7 -0 |
Related Objects
- Mentioned Here
- T156841: Hadoop: Add a lower priority queue: nice queue
Event Timeline
@Tbayer : maybe you can help us identify here what is not critical ?
We could schedule jobs for app sessions later in the month for example, this data does not seem that is looked at much. Would that work?
There aren't that many monthly jobs to move (mw-history, uniques, and now clickstream), and this month was especially bad because of some work that Erik B was doing. Let's delay the clickstream to not start until the 10th of the month. @JAllemandou
Change 409966 had a related patch set uploaded (by Joal; owner: Joal):
[analytics/refinery@master] Delay clickstream monthly generation by 10 days
@JAllemandou This is not what the task was about; how about we create a separate one for the Clickstream job?
No, we still would want to have that data as soon as possible - just avoid it interfering with more timely one-off queries when these queries run.
Are there any technical issues with moving such monthly jobs into the nice queue?
Change 409966 merged by Nuria:
[analytics/refinery@master] Delay clickstream monthly generation by 10 days