Page MenuHomePhabricator

[Data Platform] Stop and remove oozie services
Closed, ResolvedPublic

Description

We have now finished the migration from Oozie to Airflow, so as far as I know oozie is no longer required.

https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Cluster/Oozie
https://wikitech.wikimedia.org/wiki/Data_Engineering/Systems/Cluster/Oozie/Administration

However, the oozie services are still running on an-coord1001 and an-test-coord1001 and we still have alerts configured in Icinga

btullis@an-coord1001:~$ oozie admin -servers
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/oozie/lib/log4j-slf4j-impl-2.17.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/oozie/lib/slf4j-log4j12-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/oozie/lib/slf4j-simple-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
an-coord1001.eqiad.wmnet : http://an-coord1001.eqiad.wmnet:11000/oozie

image.png (179×1 px, 48 KB)

There is also a known vulnerability that affects our version of oozie (T212416) so I propose that we stop the service and remove both server and client components.

Tagging Data-Engineering as they will be able to say with more certainty whether or not it is truly deprecated.

Related Objects

Event Timeline

I suggest the following steps before turning off:

  • Check logs, that no jobs have been running on the system
  • Send email / Slack notification to relevant engineering teams that the service will be shut off in a weeks time
  • Turn off the service after 1 week grace period
  • Remove the alerts from Icinga
  • Remove installation / libraries

Open to comment / other suggestions.

Ahoelzl renamed this task from Stop and remove oozie services to [Platform] Stop and remove oozie services.Oct 20 2023, 4:56 PM
Ahoelzl renamed this task from [Platform] Stop and remove oozie services to [Data Platform] Stop and remove oozie services.Oct 20 2023, 5:15 PM
Gehel triaged this task as Medium priority.Nov 3 2023, 10:27 AM
Gehel raised the priority of this task from Medium to High.Nov 15 2023, 9:50 AM

Change 974618 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Stop oozie server and remove some resources

https://gerrit.wikimedia.org/r/974618

Change 974619 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Remove the oozie client

https://gerrit.wikimedia.org/r/974619

Change 974620 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Stop applying oozie profiles in most places

https://gerrit.wikimedia.org/r/974620

Change 974646 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Remove the oozie integration from hue

https://gerrit.wikimedia.org/r/974646

Change 974647 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Remove oozie configuration from core hadoop configuration files

https://gerrit.wikimedia.org/r/974647

Change 974648 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Update our kerberos scripts to remove oozie customisation

https://gerrit.wikimedia.org/r/974648

Change 974649 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Remove all remaining references to oozie and clean up

https://gerrit.wikimedia.org/r/974649

Change 974618 merged by Btullis:

[operations/puppet@production] Stop oozie server and remove some resources

https://gerrit.wikimedia.org/r/974618

Change 974650 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Remove the oozie keytab from the hadoop coordinator role

https://gerrit.wikimedia.org/r/974650

Change 974650 merged by Btullis:

[operations/puppet@production] Remove the oozie keytab from the hadoop coordinator role

https://gerrit.wikimedia.org/r/974650

Change 974651 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Stop installing the oozie shared library for spark2

https://gerrit.wikimedia.org/r/974651

Change 974651 merged by Btullis:

[operations/puppet@production] Stop installing the oozie shared library for spark2

https://gerrit.wikimedia.org/r/974651

BTullis closed subtask Restricted Task as Resolved.Nov 15 2023, 5:22 PM

Change 974619 merged by Btullis:

[operations/puppet@production] Remove the oozie client

https://gerrit.wikimedia.org/r/974619

Change 974620 merged by Btullis:

[operations/puppet@production] Stop applying oozie profiles in most places

https://gerrit.wikimedia.org/r/974620

Change 974648 merged by Btullis:

[operations/puppet@production] Update our kerberos scripts to remove oozie customisation

https://gerrit.wikimedia.org/r/974648

Good progress so far.

I'd like to proceed to: Remove the oozie integration from hue | https://gerrit.wikimedia.org/r/c/operations/puppet/+/974646

After that, I believe that I can: Remove oozie configuration from core hadoop configuration files | https://gerrit.wikimedia.org/r/c/operations/puppet/+/974647

After that, I believe that the cleanup task should be a noop: Remove all remaining references to oozie and clean up | https://gerrit.wikimedia.org/r/c/operations/puppet/+/974649

Any reviews welcome.

Change 974646 merged by Btullis:

[operations/puppet@production] Remove the oozie integration from hue

https://gerrit.wikimedia.org/r/974646

Change 974647 merged by Btullis:

[operations/puppet@production] Remove oozie configuration from core hadoop configuration files

https://gerrit.wikimedia.org/r/974647

Mentioned in SAL (#wikimedia-analytics) [2023-11-23T14:12:00Z] <btullis> roll-restarting hadoop masters on test cluster for T341893

Mentioned in SAL (#wikimedia-analytics) [2023-11-23T14:58:33Z] <btullis> merging 974649: Remove all remaining references to oozie and clean up | https://gerrit.wikimedia.org/r/c/operations/puppet/+/974649 for T341893

Change 974649 merged by Btullis:

[operations/puppet@production] Remove all remaining references to oozie and clean up

https://gerrit.wikimedia.org/r/974649

BTullis moved this task from Needs Review to Done on the Data-Platform-SRE board.

I've finished this removal and I've had a good crack at archiving/updating related oozie docs in Wikitech.