Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Gehel | T405232 User Migration from Run dev instances to airflow devenv. | |||
| Resolved | • Stevemunene | T405340 User Migration from Run dev instances to airflow devenv. Pre-Migration Planning |
Event Timeline
To help with identifying the current users, I have setup sessions with some of the active users to see what they are working on and to get a better way of automating the discovery.
I spotted some users still running the older airflow dev instances the other day whilst investigating an dissue with stat1010. I used this command:
btullis@stat1010:~$ pgrep -fa run_dev_instance 904858 sudo -u analytics-privatedata ./run_dev_instance.sh -m /tmp/airflow_research_fab -p 8989 research 904859 /usr/bin/bash ./run_dev_instance.sh -m /tmp/airflow_research_fab -p 8989 research 2163557 /usr/bin/bash ./run_dev_instance.sh -i -p 8088 -m /home/cdobbins traffic
Running it across all 4 stat servers, we can see only one other user mentioned, at the moment.
btullis@cumin1003:~$ sudo cumin A:stat 'pgrep -fa run_dev_instance' 4 hosts will be targeted: stat[1008-1011].eqiad.wmnet OK to proceed on 4 hosts? Enter the number of affected hosts to confirm or "q" to quit: 4 ===== NODE GROUP ===== (1) stat1010.eqiad.wmnet ----- OUTPUT of 'pgrep -fa run_dev_instance' ----- 904858 sudo -u analytics-privatedata ./run_dev_instance.sh -m /tmp/airflow_research_fab -p 8989 research 904859 /usr/bin/bash ./run_dev_instance.sh -m /tmp/airflow_research_fab -p 8989 research 2163557 /usr/bin/bash ./run_dev_instance.sh -i -p 8088 -m /home/cdobbins traffic ===== NODE GROUP ===== (1) stat1009.eqiad.wmnet ----- OUTPUT of 'pgrep -fa run_dev_instance' ----- 3550843 sudo -u analytics-privatedata ./run_dev_instance.sh -m /tmp/fab_airflow/ -p 8989 research 3550844 /usr/bin/bash ./run_dev_instance.sh -m /tmp/fab_airflow/ -p 8989 research ===== NODE GROUP ===== (1) stat1011.eqiad.wmnet ----- OUTPUT of 'pgrep -fa run_dev_instance' ----- 491589 sudo -u analytics-privatedata ./run_dev_instance.sh -m /tmp/aikochou_airf-low -p 8796 ml 491590 /usr/bin/bash ./run_dev_instance.sh -m /tmp/aikochou_airf-low -p 8796 ml ================ PASS |█████████████████████████████████████████████████████████████████████████████████▊ | 75% (3/4) [00:00<00:00, 5.16hosts/s] FAIL |███████████████████████████▎ | 25% (1/4) [00:00<00:01, 1.73hosts/s] 25.0% (1/4) of nodes failed to execute command 'pgrep -fa run_dev_instance': stat1008.eqiad.wmnet 75.0% (3/4) success ratio (< 100.0% threshold) for command: 'pgrep -fa run_dev_instance'. Aborting.: stat[1009-1011].eqiad.wmnet 75.0% (3/4) success ratio (< 100.0% threshold) of nodes successfully executed all commands. Aborting.: stat[1009-1011].eqiad.wmnet
While this is only a snapshot in time, at least we have identified a few users who still run the legacy airflow dev instances.
@fkaelin , @CDobbins , @AikoChou
I all of these users are aware of the new method, then we're doing well.
Note that @fkaelin is using the sudo method to run an airflow instance as a system user.
sudo -u analytics-privatedata ./run_dev_instance.sh
This isn't possible with the new airflow-devenv method, since we wanted to avoid the situation of having dev instances be able to modify production data that is owned by analytics-privatedata (at least in HDFS/Hive anyway).
All interactions with Hadoop/Hive now occur using one's own Kerberos principal.
@Stevemunene will reach out to @fkaelin , @CDobbins, and @AikoChou to ensure that they know about the new dev envs and see if they need help for migration. He will also communicate on our slack channel and email list to announce the retirement of run_dev_instance.sh
Reached out individually to the affected members and also shared communication with the wider team on the upcoming transition.