Page MenuHomePhabricator

Move kafka clusters to fixed uid/gid
Closed, ResolvedPublic

Description

We are currently running our kafka clusters with different uid/gids, that is not great when dealing with OS upgrades.

I added some code to facilitate a standardized uid/gid for kafka, 916, that we should rollout to our clusters.
The idea, for each cluster, is to do the following:

  1. Disable puppet on the target cluster
  2. File a change like the following and merge it: https://gerrit.wikimedia.org/r/c/operations/puppet/+/743163
  3. For every node,
    1. stop kafka and kafka mirror
    2. execute the script below
    3. re-enable puppet and run it (to bring back kafka daemons and make sure that the new code works fine).
#!/bin/bash

set -x

change_uid() {
    # $1 new uid
    # $2 username
    if id "$2" &>/dev/null
    then
        OLD_UID=$(id -u $2)
        usermod -u $1 $2
        find / \( -path /proc -o -path /mnt -o -path /sys -o -path /dev -o -path /media \) -prune -false -o -user $OLD_UID -print0 | xargs -0 chown $1
    fi
}

change_gid() {
    # $1 new gid
    # $2 username
    if getent group $2 &>/dev/null
    then
        OLD_GID=$(getent group $2 | cut -d ":" -f 3)
        groupmod -g $1 $2
        find / \( -path /proc -o -path /mnt -o -path /sys -o -path /dev -o -path /media \) -prune -false -o -group $OLD_GID -print0  | xargs -0 chgrp $1
    fi
}

## hdfs


change_uid 916 kafka
change_gid 916 kafka

I have tested the procedure with Kafka test and it worked fine :)

Clusters to move:

  • Jumbo (Data Engineering)
  • Test
  • Main eqiad (ServiceOps)
  • Main codfw (ServiceOps)
  • Logging eqiad (Observability)
  • Logging codfw (Observability)

Event Timeline

elukey updated the task description. (Show Details)
elukey updated the task description. (Show Details)

Change 743351 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::kafka::main: use fixed uid/gid in the codfw cluster

https://gerrit.wikimedia.org/r/743351

Change 743351 merged by Elukey:

[operations/puppet@production] role::kafka::main: use fixed uid/gid in the codfw cluster

https://gerrit.wikimedia.org/r/743351

Mentioned in SAL (#wikimedia-operations) [2021-12-06T09:09:01Z] <elukey> move kafka main codfw to fixed uid/gid for the kafka user (requires a stop/start of all daemons) - T296982

Change 743914 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Configure the kafka jumbo cluster to use a fixed uid/gid

https://gerrit.wikimedia.org/r/743914

odimitrijevic moved this task from Incoming to Ingest on the Data-Engineering board.

Change 752677 had a related patch set uploaded (by Herron; author: Herron):

[operations/puppet@production] kafka-logging: move to fixed UID/GID for kafka user

https://gerrit.wikimedia.org/r/752677

Change 752677 merged by Herron:

[operations/puppet@production] kafka-logging: move to fixed UID/GID for kafka user

https://gerrit.wikimedia.org/r/752677

elukey claimed this task.

Forgot a couple of things to do for a complete cleanup:

  1. We should move deployment-prep's clusters as well to the fixed uid/gid.
  2. The profile::kafka::broker::use_fixed_uid_gid hiera value should be removed from puppet (since we will switch to default true).

Mentioned in SAL (#wikimedia-releng) [2022-05-23T08:06:11Z] <elukey> move kafka logging in deployment-prep to fixed uid/gid - T296982

Mentioned in SAL (#wikimedia-releng) [2022-05-23T08:29:30Z] <elukey> move kafka main in deployment-prep to fixed uid/gid - T296982

Mentioned in SAL (#wikimedia-releng) [2022-05-23T08:37:55Z] <elukey> move kafka jumbo in deployment-prep to fixed uid/gid - T296982

The three kafka clusters in deployment-prep are using the new uid/gid, before turning the profile::kafka::broker::use_fixed_uid_gid option true by default I'll follow up with SRE to verify that no other cluster is left to move.

Change 797127 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Set fixed uid/gid for kafka by default

https://gerrit.wikimedia.org/r/797127

Change 797127 merged by Elukey:

[operations/puppet@production] Set fixed uid/gid for kafka by default

https://gerrit.wikimedia.org/r/797127

Change is rolled out everywhere, and now we have sane defaults in profile::kafka::broker.