Page MenuHomePhabricator

Check home/HDFS leftovers of gilles
Open, HighPublic

Description

The access for Gilles Dubuc was removed. It needs to be checked if data was left in home dirs on stat*/notebook*/HDFS since he was part of the "analytics-privatedata-users" group.

I've removed the Kerberos principal.

Event Timeline

odimitrijevic moved this task from Incoming to Ops Week on the Analytics board.

Following https://wikitech.wikimedia.org/wiki/Analytics/Ops_week#Have_any_users_left_the_Foundation?

15:15:09 [:/Users/otto] $ wmf-check-analytics-home gilles

====== stat1004 ======
total 266872
-rw-rw-r-- 1 4319 wikidev         0 Nov 20  2018 13
-rw-rw-r-- 1 4319 wikidev         0 Aug 23  2019 19
-rw-rw-r-- 1 4319 wikidev      1438 Jan 17  2019 2019-01.csv
-rw-rw-r-- 1 4319 wikidev       963 Feb 28  2019 2019-01.tsv
-rw-rw-r-- 1 4319 wikidev      4910 Mar  8  2019 2019-02.tsv
-rw-r--r-- 1 4319 wikidev     13230 Mar  5  2019 asoranking.py
-rw-rw-r-- 1 4319 wikidev 224603240 May 23  2019 export.2019.tsv
-rw-rw-r-- 1 4319 wikidev      4449 May 23  2019 export2.sql
-rw-rw-r-- 1 4319 wikidev         0 Nov 30  2018 export.new2.tsv
-rw-rw-r-- 1 4319 wikidev  19140001 Oct 16  2018 export.new.tsv
-rw-rw-r-- 1 4319 wikidev  17584474 Oct 16  2018 export.old.tsv
-rw-rw-r-- 1 4319 wikidev      5429 May 23  2019 export.sql
-rw-rw-r-- 1 4319 wikidev   1347606 Nov 30  2018 export.tsv
drwxrwxr-x 6 4319 wikidev      4096 Jun  3  2019 foo
-rw-rw-r-- 1 4319 wikidev       520 Nov 21  2018 frisps.sql
-rw-rw-r-- 1 4319 wikidev   4048820 Nov 21  2018 frisps.tsv
-rw-rw-r-- 1 4319 wikidev       332 Feb 28  2019 query.sql
-rw-r--r-- 1 4319 wikidev     15315 Dec  4  2018 randomforest2.py
-rw-rw-r-- 1 4319 wikidev      3671 Oct  4  2018 randomforest.py
-rw-rw-r-- 1 4319 wikidev    965717 Feb 28  2019 result.tsv
-rw-rw-r-- 1 4319 wikidev     34649 Sep 19  2019 screenlog.0
-rw-rw-r-- 1 4319 wikidev       520 Nov 21  2018 seisps.sql
-rw-rw-r-- 1 4319 wikidev     21919 Nov 21  2018 seisps.tsv
-rw-rw-r-- 1 4319 wikidev       562 Oct  1  2018 surveyresponsetime.sql
-rw-rw-r-- 1 4319 wikidev   5206994 Oct  1  2018 surveyresponsetime.tsv
-rw-rw-r-- 1 4319 wikidev       516 Nov 21  2018 usisps.sql
-rw-rw-r-- 1 4319 wikidev    204623 Nov 21  2018 usisps.tsv

====== stat1005 ======
total 32
-rw-r--r-- 1 4319 wikidev   255 Jun 21  2019 smart.jpg
drwxr-xr-x 2 4319 wikidev  4096 Jun 21  2019 thumbor
-rw-r--r-- 1 4319 wikidev 22339 Jun 21  2019 thumbor.conf

====== stat1006 ======
total 260
drwxr-xr-x  2 4319 wikidev   4096 Jan 29  2015 20150103
drwxr-xr-x  2 4319 wikidev   4096 Jan 29  2015 20150120
-rw-r--r--  1 4319 wikidev 162957 Sep  5  2017 cron_last_run
drwxr-xr-x 10 4319 wikidev   4096 May  3  2017 multimedia
-rw-rw-r--  1 4319 wikidev  28517 Sep 11  2017 T173580
-rwxrw-r--  1 4319 wikidev   1155 Sep 11  2017 T173580.sh
-rwxrw-r--  1 4319 wikidev   1061 Sep 11  2017 T173580.sh.save
drwxrwxr-x  2 4319 wikidev   4096 Jun 21  2019 thumbor
-rw-r--r--  1 4319 wikidev  22339 Jun 21  2019 thumbor.conf
drwxr-xr-x  2 4319 wikidev  20480 Jun 23  2017 tsvs_new
drwxr-xr-x  8 4319 wikidev   4096 Jan 30  2015 tsvs_sql

====== stat1007 ======
total 140096
-rw-r--r--  1 4319 wikidev        0 Mar 27  2019 2019-02.tsv
-rw-rw-r--  1 4319 wikidev      828 May 28  2019 2019-04.tsv
drwxr-xr-x 15 4319 wikidev     4096 Mar 15  2019 articlequality
-rw-rw-r--  1 4319 wikidev       40 Oct 13  2020 bar
-rw-r--r--  1 4319 wikidev  8151040 Oct 13  2020 cp3052-hit-front.log
-rw-r--r--  1 4319 wikidev   184320 Oct 13  2020 cp3052-hit-local.log
-rw-rw-r--  1 4319 wikidev      200 Oct 20  2020 cp3052.hql
-rw-r--r--  1 4319 wikidev  2187264 Oct 13  2020 cp3052-miss.log
-rw-r--r--  1 4319 wikidev   610304 Oct 13  2020 cp3052-pass.log
-rw-rw-r--  1 4319 wikidev  1368834 Oct 20  2020 cp3052-responseend
-rw-rw-r--  1 4319 wikidev  1092021 Oct 20  2020 cp3052-responseend.fixed
-rw-rw-r--  1 4319 wikidev  1091900 Oct 20  2020 cp3052-responseend.fixed.filtered
-rw-rw-r--  1 4319 wikidev  7391169 Oct 20  2020 cp3052-responseend-oversampled
-rw-rw-r--  1 4319 wikidev  5926859 Oct 20  2020 cp3052-responseend-oversampled.fixed
-rw-rw-r--  1 4319 wikidev  5926514 Oct 20  2020 cp3052-responseend-oversampled.fixed.filtered
-rw-rw-r--  1 4319 wikidev   493615 Oct 13  2020 cp3052-responsestart
-rw-rw-r--  1 4319 wikidev   493608 Oct 13  2020 cp3052-responsestart.fixed
-rw-rw-r--  1 4319 wikidev  2655959 Oct 13  2020 cp3052-responsestart-oversampled
-rw-rw-r--  1 4319 wikidev  2655952 Oct 13  2020 cp3052-responsestart-oversampled.fixed
-rw-r--r--  1 4319 wikidev  8003584 Oct 13  2020 cp3054-hit-front.log
-rw-r--r--  1 4319 wikidev   217088 Oct 13  2020 cp3054-hit-local.log
-rw-rw-r--  1 4319 wikidev      200 Oct 20  2020 cp3054.hql
-rw-r--r--  1 4319 wikidev  2265088 Oct 13  2020 cp3054-miss.log
-rw-r--r--  1 4319 wikidev   577536 Oct 13  2020 cp3054-pass.log
-rw-rw-r--  1 4319 wikidev  1374253 Oct 20  2020 cp3054-responseend
-rw-rw-r--  1 4319 wikidev  1097107 Oct 20  2020 cp3054-responseend.fixed
-rw-rw-r--  1 4319 wikidev  1096925 Oct 20  2020 cp3054-responseend.fixed.filtered
-rw-rw-r--  1 4319 wikidev  7415140 Oct 20  2020 cp3054-responseend-oversampled
-rw-rw-r--  1 4319 wikidev  5948197 Oct 20  2020 cp3054-responseend-oversampled.fixed
-rw-rw-r--  1 4319 wikidev  5947679 Oct 20  2020 cp3054-responseend-oversampled.fixed.filtered
-rw-rw-r--  1 4319 wikidev   494559 Oct 13  2020 cp3054-responsestart
-rw-rw-r--  1 4319 wikidev   494552 Oct 13  2020 cp3054-responsestart.fixed
-rw-rw-r--  1 4319 wikidev  2677168 Oct 13  2020 cp3054-responsestart-oversampled
-rw-rw-r--  1 4319 wikidev  2677161 Oct 13  2020 cp3054-responsestart-oversampled.fixed
-rw-rw-r--  1 4319 wikidev       40 Oct 13  2020 foo
-rw-rw-r--  1 4319 wikidev      596 Jun  6  2018 foo.py
-rw-rw-r--  1 4319 wikidev        5 May 28  2019 foo.tmp
-rw-rw-r--  1 4319 wikidev     1994 Mar  1  2021 foo.tsv
-rwxrwxr-x  1 4319 wikidev    35840 Oct 13  2020 ministat
drwxr-xr-x  2 4319 wikidev     4096 Oct 13  2020 ministat-master
drwxrwxr-x  4 4319 wikidev     4096 Mar 14  2019 mwbzutils
-rw-rw-r--  1 4319 wikidev 10823126 Jun 12  2020 out2.txt
-rw-rw-r--  1 4319 wikidev 51970626 Jun 15  2020 out.txt
-rw-r--r--  1 4319 wikidev      698 Mar 18  2019 splitdump.py
-rw-rw-r--  1 4319 wikidev      655 Mar  3  2021 T276121.py
-rw-rw-r--  1 4319 wikidev      389 Mar  1  2021 T276121.sql
-rwxrwxr-x  1 4319 wikidev       81 Mar  1  2021 testbeelinetsv.sh
-rw-rw-r--  1 4319 wikidev       46 May 28  2019 test.tmp
drwxr-xr-x  8 4319 wikidev     4096 Oct  4  2018 venv

====== stat1008 ======
total 0

======= HDFS ========
Found 2 items
drwx------   - gilles gilles          0 2019-06-23 00:00 /user/gilles/.Trash
drwx------   - gilles gilles          0 2021-05-21 13:06 /user/gilles/.staging

====== Hive =========

@Krinkle, not sure if you are the right person to ask, but do you know if there is any reason to save any of the above data?

Some of the scripts (.py, .sh, .sql, .hql) may be helpful as I'm not sure we documented all the analysis in question for some of our datasets.

I don't think any of the data or other files need to be kept (tsv, csv, log, and others). Anything there has presumably already been published in a safe aggregate manner, or is not of interest to us, or will already have been ensured to remain available as sanitized data in the regular HDFS places.

Ok. @Krinkle can I copy them to your homedirs and chown them to you?

@Krinkle ping :) To unblock this task we could either move all the old home dirs under yours (something like /home/krinkle/gilles/etc..) or only some files, and then drop the rest. What do you think?

That's fine yeah, just transfer them all and I'll take care of it.