Page MenuHomePhabricator

Check home/HDFS leftovers of razzi
Closed, ResolvedPublic

Description

The access for Razzi Abuissa was removed. It needs to be checked if data was left in home dirs on stat*/HDFS since they were part of the "analytics-privatedata-users" group.

The Kerberos principal has already been removed.

Event Timeline

====== stat1004 ======
total 513244
drwxr-xr-x  2 26051 wikidev      4096 Jul 20  2021 hdfs-namenode-fsimage
-rw-rw-r--  1 26051 wikidev   1245367 Jan 10 16:42 part.txt
-rw-r--r--  1 26051 wikidev      3155 Oct 28  2020 razzi-key.txt
drwxrwxr-x 11 26051 wikidev      4096 Mar 16  2021 refinery
-rw-r--r--  1 root  root    524288000 May 18  2021 test.img
drwxrwxr-x  6 26051 wikidev      4096 Dec  7  2020 venv
drwxrwxr-x  6 26051 wikidev      4096 Dec  7  2020 venv3

====== stat1005 ======
total 102740
drwxrwxr-x 16 26051 wikidev      4096 Feb  3 16:55 amundsen
-rw-r--r--  1 26051 wikidev     64837 Feb  9  2021 Detailed_Pageview_Report.ipynb
drwxr-xr-x 11 26051 wikidev      4096 Jun 11  2020 neo4j-community-4.0.6
-rw-rw-r--  1 26051 wikidev 105113455 Jun 16  2020 neo4j-community-4.0.6-unix.tar.gz
-rw-rw-r--  1 26051 wikidev         0 Feb  3 17:00 neo4j_log.txt
-rw-rw-r--  1 26051 wikidev       136 Feb  3 16:55 run_neo4j.sh
-rw-rw-r--  1 26051 wikidev       141 Oct  6  2020 test.hql
-rw-r--r--  1 26051 wikidev       833 Oct  6  2020 Untitled.ipynb
drwxr-xr-x  7 26051 wikidev      4096 Oct  6  2020 venv

====== stat1006 ======
total 8
-rw-r--r-- 1 26051 wikidev   72 Oct  8  2020 Untitled.ipynb
drwxr-xr-x 7 26051 wikidev 4096 Oct  8  2020 venv

====== stat1007 ======
total 422244
drwxrwxr-x  3 26051 wikidev      4096 Oct 30  2020 check_maxmind_backup
-rw-r--r--  1 root  root        49287 Oct  6  2020 Detailed_Pageview_Report.ipynb
drwxr-xr-x  9 26051 wikidev      4096 Jul  2  2021 elasticsearch-7.13.3
-rw-rw-r--  1 26051 wikidev 327177336 Jul  7  2021 elasticsearch-7.13.3-linux-x86_64.tar.gz
drwxr-xr-x 11 26051 wikidev      4096 Jun 11  2020 neo4j-community-4.0.6
-rw-rw-r--  1 26051 wikidev 105113455 Jun 16  2020 neo4j-community-4.0.6-unix.tar.gz
-rw-rw-r--  1 26051 wikidev         0 Feb  3 16:49 neo4j_log.txt
-rw-rw-r--  1 26051 wikidev       135 Feb  3 16:49 run_neo4j.sh
drwxrwxr-x 11 26051 wikidev      4096 Oct 20  2020 source

====== stat1008 ======
total 552944
drwxrwxr-x 17 26051 wikidev      4096 Feb  4 15:24 amundsen
drwxrwxr-x  2 26051 wikidev      4096 Mar 14 18:10 bin
drwxrwxr-x  2 26051 wikidev      4096 Mar 14 18:10 compiler_compat
-rw-------  1 26051 wikidev 101524037 Mar 14 18:09 conda_dist_env.2022-03-14T18.04.00.tgz
drwxrwxr-x 11 26051 wikidev      4096 Mar 14 18:12 conda_karapace
drwxrwxr-x  2 26051 wikidev      4096 Mar 14 18:10 conda-meta
lrwxrwxrwx  1 26051 wikidev        20 Feb  4 15:30 elasticsearch -> elasticsearch-7.13.3
drwxr-xr-x 10 26051 wikidev      4096 Feb  3 05:57 elasticsearch-7.13.3
-rw-rw-r--  1 26051 wikidev 327177336 Jul  7  2021 elasticsearch-7.13.3-linux-x86_64.tar.gz
drwxrwxr-x  4 26051 wikidev      4096 Apr 14  2021 flerovium_backup
drwxrwxr-x  8 26051 wikidev      4096 Mar 14 18:10 include
-rw-r--r--  1 26051 wikidev       736 Mar 14 18:14 karapace.config.json
drwxrwxr-x 15 26051 wikidev      4096 Mar 14 18:10 lib
lrwxrwxrwx  1 26051 wikidev        22 Feb  3 18:25 neo4j -> neo4j-community-3.5.30
drwxr-xr-x 11 26051 wikidev      4096 Feb  3 19:25 neo4j-community-3.5.30
-rw-r--r--  1 26051 wikidev 137349874 Feb  3 05:44 neo4j-community-3.5.30-unix.tar.gz
-rw-------  1 26051 root          301 Sep 14  2020 piwik.pw
-rw-rw-r--  1 26051 wikidev       220 Oct  7  2020 popular_pages.hql
-rw-r--r--  1 26051 wikidev     29543 Apr 16  2021 pyspark_install_pandas.ipynb
-rw-r--r--  1 26051 wikidev     39947 Apr 16  2021 pyspark_wmfdata.ipynb
drwxrwxr-x  9 26051 wikidev      4096 Mar 14 18:10 share
drwxrwxr-x  3 26051 wikidev      4096 Mar 14 18:10 ssl
-rw-rw-r--  1 26051 wikidev         7 Sep 15  2020 testsecret.txt
drwxrwxr-x  3 26051 wikidev      4096 Mar 14 18:10 x86_64-conda_cos6-linux-gnu
drwxrwxr-x  3 26051 wikidev      4096 Mar 14 18:10 x86_64-conda-linux-gnu

======= HDFS ========
Found 7 items
drwx------   - razzi     razzi              0 2021-04-01 00:00 /user/razzi/.Trash
drwxr-xr-x   - razzi     razzi              0 2021-11-09 02:59 /user/razzi/.sparkStaging
drwx------   - razzi     razzi              0 2021-04-16 20:32 /user/razzi/.staging
drwxr-xr-x   - razzi     razzi              0 2020-10-15 20:06 /user/razzi/16
-rw-r-----   3 analytics analytics         17 2021-03-19 21:06 /user/razzi/mysql-analytics-client-pw.txt
drwxr-x---   - analytics analytics          0 2021-03-23 01:08 /user/razzi/sqoop
drwxr-x---   - razzi     razzi              0 2021-05-25 18:10 /user/razzi/testdir

====== Hive =========
drwxr-x---   - razzi                  analytics-privatedata-users          0 2021-04-16 20:03 /user/hive/warehouse/razzi.db/banner_history
drwxr-x---   - razzi                  analytics-privatedata-users          0 2021-04-16 20:32 /user/hive/warehouse/razzi.db/banner_history2
drwxr-x---   - razzi                  analytics-privatedata-users          0 2020-10-06 19:57 /user/hive/warehouse/razzi.db/test

I've reviewed everything above and it can all be safely deleted. An admin needs to do this, with cumin, see instructions (ping @Ottomata) The HDFS and Hive stuff is done, I took care of it.

BTullis subscribed.

I have carried out this removal of files.

btullis@cumin1001:~$ sudo cumin 'C:profile::analytics::cluster::client or C:profile::hadoop::master or C:profile::hadoop::master::standby' 'rm -rf /home/razzi'
15 hosts will be targeted:
an-airflow1001.eqiad.wmnet,an-coord[1001-1002].eqiad.wmnet,an-launcher1002.eqiad.wmnet,an-master[1001-1002].eqiad.wmnet,an-test-client1001.eqiad.wmnet,an-test-coord1001.eqiad.wmnet,an-test-master[1001-1002].eqiad.wmnet,stat[1004-1008].eqiad.wmnet
Ok to proceed on 15 hosts? Enter the number of affected hosts to confirm or "q" to quit 15
===== NO OUTPUT =====
PASS |█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100% (15/15) [00:12<00:00,  1.18hosts/s]
FAIL |                                                                                                                                                                          |   0% (0/15) [00:12<?, ?hosts/s]
100.0% (15/15) success ratio (>= 100.0% threshold) for command: 'rm -rf /home/razzi'.
100.0% (15/15) success ratio (>= 100.0% threshold) of nodes successfully executed all commands.