Page MenuHomePhabricator

Can't re-run failed Oozie workflows in Hue/Hue-Next (as non-admin)
Closed, DeclinedPublic

Description

As a member of the analytics-product-users group, I should be able to re-run analytics-product system user's killed Oozie workflows, but the buttons are disabled for me:

Luca has tried a possible fix with https://gerrit.wikimedia.org/r/c/operations/puppet/+/665352 but the button was still disabled for me.

Event Timeline

Change 665361 had a related patch set uploaded (by Bearloga; owner: Elukey):
[analytics/wmf-product/jobs@master] wikipediapreview_stats: add ACL to allow job re-run

https://gerrit.wikimedia.org/r/665361

Change 665361 merged by Bearloga:
[analytics/wmf-product/jobs@master] wikipediapreview_stats: add ACL to allow job re-run

https://gerrit.wikimedia.org/r/665361

Adding the ACL made it possible for me to manage the job from the command line, but I still can't manage them from the Hue interface.

razzi edited projects, added Analytics-Clusters; removed Analytics.
razzi added a subscriber: razzi.

Let me take a look at this configuration.

Ottomata triaged this task as High priority.Mar 9 2021, 5:46 PM
Ottomata moved this task from Backlog to Q3 2020/2021 on the Analytics-Clusters board.

Looking at Hue source code, it seems it looks at the job's setting for mapreduce.job.acl-modify-job to determine if the kill button should be enabled. Excuse my lack of knowledge here, but do we use job acls at all? Would that be as simple as adding a line like here?

Good finding @razzi, never seen the option applied anywhere, but I suspect that it would need to be passed to the worflow.xml's job properties, otherwise it wouldn't work. We have never had this problem, maybe because our usernames are Hue admins. One test that we could do is:

  1. remove the admin flag for user razzi in Hue
  2. verify it the kill button is still available or not
  3. re-add the admin flag

In this way we'd also know why it works for us. Granting admin to all people may be too much since it is good that only analytics have those perms in my opinion, so we could ask to Product Analytics to experiment with mapreduce.job.acl-modify-job. Not sure if mapreduce.cluster.acls.enabled is also relevant, and if needs to be set for the job too.

Given that we are going to concentrate our efforts on Airflow, I would try to find a compromise, since PA is already able to kill jobs via oozie CLI, to avoid spending hours on this. Let's time box it and see if the acl setting work :)

I'm actually working on an Oozie job today! I'll test whether mapreduce.job.acl-modify-job works 😊

@nshahquinn-wmf any luck with that setting?

Thanks for the reminder—mapreduce.job.acl-modify-job = analytics-product-users alone did not help.

Perhaps mapreduce.cluster.acls.enable is also needed, but what would the value be? "True"? 1?

I will be working on the same Oozie job in a week or so; if you tell me what to add, I can try then.

@nshahquinn-wmf try setting mapreduce.cluster.acls.enable to true?

Hmm, no, that didn't work either. Given what @elukey said earlier, I think it makes sense not to invest more time in this and focus instead on making all our Airflow dreams come true 😊