The access for Djellel Difallah was removed. It needs to be checked if data was left in home dirs on stat*/HDFS since they were part of the "analytics-privatedata-users" group. The Kerberos principal has already been removed. Point of contact for eventual questions on data to be retained are @MGerlach and @Isaac.
Description
Event Timeline
====== stat1004 ====== total 0 ====== stat1005 ====== total 266892 -rw-r--r-- 1 22235 wikidev 4588519 Feb 12 2020 core_stable.tar.gz drwxrwxr-x 7 22235 wikidev 4096 Feb 25 2020 data -rw-r--r-- 1 22235 wikidev 44229 Jan 8 2020 en.txt -rw-rw-r-- 1 22235 wikidev 3817346 Feb 26 2020 enwiki.dis -rw-r--r-- 1 22235 wikidev 19152 Oct 21 2019 ExamplesPySpark_b.ipynb -rw-r--r-- 1 22235 wikidev 15027 Jan 9 2020 f2 -rw-r--r-- 1 22235 wikidev 151716 Jan 7 2020 hs_err_pid2535.log -rw-r--r-- 1 22235 wikidev 0 Oct 21 2019 __init__.py -rw-r--r-- 1 22235 wikidev 27192 Feb 12 2020 Link Page Arabic.ipynb -rw-r--r-- 1 22235 wikidev 20152 Feb 12 2020 Link Page Ar-Copy1.ipynb -rw-r--r-- 1 22235 wikidev 36227 Feb 25 2020 Link Page AR-Copy2.ipynb -rw-r--r-- 1 22235 wikidev 257252 Feb 21 2020 Link Page CS-Copy1.ipynb -rw-r--r-- 1 22235 wikidev 77777 Feb 12 2020 Link Page CS.ipynb -rw-r--r-- 1 22235 wikidev 40224 Jan 9 2020 Link Page En.ipynb -rw-r--r-- 1 22235 wikidev 9323 Feb 12 2020 Link Page KO-Copy1.ipynb -rw-r--r-- 1 22235 wikidev 29440 Feb 17 2020 Link Page KO-copy2.ipynb -rw-r--r-- 1 22235 wikidev 14964 Feb 12 2020 Link Page KO.ipynb -rw-r--r-- 1 22235 wikidev 26395 Feb 12 2020 Link Page Vi-Copy1.ipynb -rw-r--r-- 1 22235 wikidev 30704 Feb 21 2020 Link Page VI-Copy2.ipynb -rw-r--r-- 1 22235 wikidev 6161 Feb 12 2020 Link Page Vi.ipynb -rw-r--r-- 1 22235 wikidev 131796811 May 22 2019 lm-ar-opus-large-backward-v0.1.pt -rw-r--r-- 1 22235 wikidev 131796801 May 22 2019 lm-ar-opus-large-forward-v0.1.pt -rw-r--r-- 1 22235 wikidev 20827 Oct 21 2019 MostPopularUserAgents.ipynb drwxr-xr-x 12 22235 wikidev 4096 Feb 12 2020 nltk_data drwxrwxr-x 2 22235 wikidev 4096 Feb 25 2020 notebooks drwxr-xr-x 2 22235 wikidev 4096 Feb 10 2020 output.singletons -rw-r--r-- 1 22235 wikidev 17408 Feb 12 2020 Parser.ipynb -rw-rw-r-- 1 22235 wikidev 2539 Apr 19 2020 pig_1587337383578.log drwxr-xr-x 5 22235 wikidev 4096 Feb 7 2020 polyglot_data drwxr-xr-x 2 22235 wikidev 4096 Oct 21 2019 __pycache__ drwxr-xr-x 10 22235 wikidev 4096 Feb 12 2020 pywikibot -rw------- 1 22235 wikidev 1375 Feb 12 2020 pywikibot.lwp drwxr-xr-x 2 22235 wikidev 4096 Nov 7 2019 repos -rw-r--r-- 1 22235 wikidev 3464 Feb 25 2020 reqs.txt -rw-r--r-- 1 22235 wikidev 63559 Oct 21 2019 rPCA.ipynb -rw-r--r-- 1 22235 wikidev 2781 Oct 21 2019 rpca.py -rw-r--r-- 1 22235 wikidev 12162 Dec 10 2019 SpaCy test.ipynb -rw-r--r-- 1 22235 wikidev 2448 Jan 8 2020 test.py -rw-r--r-- 1 22235 wikidev 35 Feb 17 2020 throttle.ctrl -rw-r--r-- 1 22235 wikidev 9040 Jan 18 2020 Untitled1.ipynb -rw-r--r-- 1 22235 wikidev 38970 Jan 31 2020 Untitled2.ipynb -rw-r--r-- 1 22235 wikidev 4636 Feb 7 2020 Untitled3.ipynb -rw-r--r-- 1 22235 wikidev 7954 Jan 8 2020 Untitled.ipynb -rw------- 1 22235 wikidev 251 Feb 12 2020 user-config.py -rw------- 1 22235 wikidev 494 Feb 12 2020 user-password.py drwxrwxr-x 6 22235 wikidev 4096 Feb 25 2020 venv -rw-r--r-- 1 22235 wikidev 197284 Feb 9 2020 viwiki.dis ====== stat1006 ====== total 3184876 drwxr-xr-x 21 22235 wikidev 4096 Aug 25 2020 backup drwxrwxr-x 3 22235 wikidev 4096 Jun 19 2020 bipart.csv -rw-r--r-- 1 22235 wikidev 15258830 Dec 11 2019 csanchors.pickle -rw-rw-r-- 1 22235 wikidev 722507 Jun 23 2020 diff2 drwxrwxr-x 13 22235 wikidev 4096 May 29 2020 diff-match-patch -rw-rw-r-- 1 22235 wikidev 72153 May 29 2020 diff_match_patch-current.jar -rw-rw-r-- 1 22235 wikidev 5579 Jun 23 2020 diff.tmp drwxrwxr-x 3 22235 wikidev 4096 Jun 7 2020 enwiki_dataset.parquet -rw-r--r-- 1 22235 wikidev 19016342 Dec 11 2019 koanchors.pickle -rw-rw-r-- 1 22235 wikidev 1685398 May 26 2020 lucene-analyzers-common-8.5.2.jar -rw-rw-r-- 1 22235 wikidev 3475136 May 26 2020 lucene-core-8.5.2.jar -rw-rw-r-- 1 22235 wikidev 3198873213 Jun 20 2020 m2vbi.csv drwxrwxr-x 3 22235 wikidev 4096 Aug 8 2020 nltk_data drwxrwxr-x 4 22235 wikidev 4096 Aug 15 2020 notebooks drwxrwxr-x 3 22235 wikidev 4096 Jun 8 2020 repo drwxrwxr-x 3 22235 wikidev 4096 Aug 26 2020 sock -rw-rw-r-- 1 22235 wikidev 7890 Jun 23 2020 sock_no_indefinite.csv -rw-rw-r-- 1 22235 wikidev 7236492 Jun 23 2020 sock_parse_comment.csv -rw-rw-r-- 1 22235 wikidev 3973911 Jun 22 2020 socks.csv -rw-rw-r-- 1 22235 wikidev 7618403 Jun 23 2020 socks_full.csv -rw-rw-r-- 1 22235 wikidev 3000865 Jun 23 2020 socks_template.csv -rw-rw-r-- 1 22235 wikidev 20487 Jun 23 2020 tmp_apostroph.csv drwxrwxr-x 7 22235 wikidev 4096 Jun 8 2020 venv -rw-r--r-- 1 22235 wikidev 277968 Jun 17 2020 whitelist.csv ====== stat1007 ====== total 24421704 -rw-r--r-- 1 22235 wikidev 11503531 Feb 13 2020 000000_0 drwxrwxr-x 3 22235 wikidev 4096 Dec 12 2019 anom -rw-rw-r-- 1 22235 wikidev 5601051999 May 25 2020 bios_full.csv.bz2 -rw-rw-r-- 1 22235 wikidev 6734633300 May 25 2020 bios_full.tgz -rw-rw-r-- 1 22235 wikidev 504 May 25 2020 bio.sql -rw-rw-r-- 1 22235 wikidev 10737013804 May 25 2020 bios_wikidata.csv -rw-rw-r-- 1 22235 wikidev 1357446538 May 25 2020 bios_wikidata.tgz -rw-rw-r-- 1 22235 wikidev 304698 May 29 2015 brickhouse-0.7.1.jar drwxrwxr-x 3 22235 wikidev 4096 Dec 5 2019 data -rw-rw-r-- 1 22235 wikidev 191964391 May 28 2020 full9.csv -rw-rw-r-- 1 22235 wikidev 80675214 May 28 2020 full9.csv.gz -rw-r--r-- 1 22235 wikidev 6268756 May 25 2020 ids.csv drwxrwxr-x 5 22235 wikidev 4096 Dec 8 2019 linkrec -rw-rw-r-- 1 22235 wikidev 865298 Jun 10 2020 master.csv -rw-rw-r-- 1 22235 wikidev 8314882 Jun 10 2020 master_original.csv -rw-rw-r-- 1 22235 wikidev 226 Jun 9 2020 master.sql drwxrwxr-x 4 22235 wikidev 4096 Apr 19 2020 nltk_data -rw-rw-r-- 1 22235 wikidev 27463250 Apr 20 2020 nltk_data.zip -rw-rw-r-- 1 22235 wikidev 7917 Dec 17 2019 out -rw-rw-r-- 1 22235 wikidev 3910 Feb 10 2020 pig_1581330083849.log -rw-r--r-- 1 22235 wikidev 11503531 Feb 14 2020 redirect drwxrwxr-x 2 22235 wikidev 4096 Jan 20 2020 resultsMapping-CoOcurrenceCountPandas drwxrwxr-x 2 22235 wikidev 4096 Jan 20 2020 scp drwxrwxr-x 2 22235 wikidev 4096 Jan 20 2020 SectionsCharacterization drwxrwxr-x 8 22235 wikidev 4096 Aug 10 2020 sockpuppet -rw-rw-r-- 1 22235 wikidev 137 May 28 2020 socks -rw-rw-r-- 1 22235 wikidev 1371534 May 28 2020 socks.csv -rw-rw-r-- 1 22235 wikidev 236 May 25 2020 wikid -rw-r--r-- 1 22235 wikidev 237310243 Jan 20 2020 wikidataSixLanguages.csv.g drwxrwxr-x 2 22235 wikidev 4096 Jan 20 2020 wikidataSixLanguages.csv.gz ====== stat1008 ====== total 4 drwxrwxr-x 6 22235 wikidev 4096 Oct 13 08:47 venv ======= HDFS ======== Found 36 items drwx------ - dedcode dedcode 0 2020-09-16 00:00 /user/dedcode/.Trash drwxr-xr-x - dedcode dedcode 0 2020-08-16 21:59 /user/dedcode/.sparkStaging drwx------ - dedcode dedcode 0 2020-08-16 20:57 /user/dedcode/.staging drwxr-xr-x - dedcode dedcode 0 2020-06-19 14:04 /user/dedcode/bipart.csv -rw-r--r-- 3 dedcode dedcode 304698 2020-06-07 09:08 /user/dedcode/brickhouse-0.7.1.jar -rw-r--r-- 3 dedcode dedcode 2545 2020-04-20 07:33 /user/dedcode/comment_properties_mapper2.py -rw-r--r-- 3 dedcode dedcode 37347264 2020-04-20 05:23 /user/dedcode/denv.zip drwxrwxrwx - dedcode dedcode 0 2020-06-08 22:27 /user/dedcode/embeddings drwxr-xr-x - dedcode dedcode 0 2020-05-29 19:04 /user/dedcode/graph drwxr-xr-x - dedcode dedcode 0 2020-02-11 13:41 /user/dedcode/linkrec drwxr-xr-x - dedcode dedcode 0 2020-02-10 15:15 /user/dedcode/ltrees drwxr-xr-x - dedcode dedcode 0 2020-06-20 14:15 /user/dedcode/m2vbipart.csv -rw-r--r-- 3 dedcode dedcode 27463250 2020-04-20 05:40 /user/dedcode/nltk_data.zip drwxr-xr-x - dedcode dedcode 0 2020-02-25 17:03 /user/dedcode/notebooks drwxr-xr-x - dedcode dedcode 0 2020-02-12 03:46 /user/dedcode/output.pairs drwxr-xr-x - dedcode dedcode 0 2020-02-12 03:46 /user/dedcode/output.singletons drwxr-xr-x - dedcode dedcode 0 2020-02-12 03:46 /user/dedcode/output.triples drwxr-xr-x - dedcode dedcode 0 2020-05-29 19:12 /user/dedcode/simplewiki.parquet drwxr-xr-x - dedcode dedcode 0 2020-05-30 10:31 /user/dedcode/sock_joal drwxr-xr-x - dedcode dedcode 0 2020-06-23 13:54 /user/dedcode/sock_parse_comment.csv drwxr-xr-x - dedcode dedcode 0 2020-06-23 00:33 /user/dedcode/sock_template.csv drwxr-xr-x - dedcode dedcode 0 2020-06-07 22:17 /user/dedcode/sockdata drwxr-xr-x - dedcode dedcode 0 2020-06-22 10:15 /user/dedcode/socks.csv -rw-r--r-- 3 dedcode dedcode 207437778 2020-04-20 05:53 /user/dedcode/test_spark_venv.zip -rw-r--r-- 3 dedcode dedcode 3854 2020-04-20 07:17 /user/dedcode/textproperties.py drwxr-xr-x - dedcode dedcode 0 2020-04-19 23:30 /user/dedcode/token_out drwxr-xr-x - dedcode dedcode 0 2020-02-12 04:03 /user/dedcode/vi.pab_table drwxr-xr-x - dedcode dedcode 0 2020-02-12 04:04 /user/dedcode/vi.pabc_table drwxr-xr-x - dedcode dedcode 0 2020-02-12 03:51 /user/dedcode/vi.pairs drwxr-xr-x - dedcode dedcode 0 2020-02-12 03:51 /user/dedcode/vi.singletons drwxr-xr-x - dedcode dedcode 0 2020-02-12 03:51 /user/dedcode/vi.triples drwxr-xr-x - dedcode dedcode 0 2020-04-20 05:27 /user/dedcode/virtualenv drwxr-xr-x - dedcode dedcode 0 2020-04-20 07:33 /user/dedcode/wikidiff_output_feat_split drwxr-xr-x - dedcode dedcode 0 2019-11-22 03:44 /user/dedcode/wikidiff_output_feat_split5 drwxr-xr-x - dedcode dedcode 0 2019-11-22 00:21 /user/dedcode/wikidiff_output_new_split5 drwxr-xr-x - dedcode dedcode 0 2020-05-28 22:28 /user/dedcode/wikidiff_output_split ====== Hive ========= drwxrwxrwt - dedcode hdfs 0 2020-05-25 12:39 /user/hive/warehouse/dedcode.db/bio_pageids drwxr-xr-x - dedcode hadoop 0 2020-05-31 14:05 /user/hive/warehouse/dedcode.db/df1 drwxr-xr-x - dedcode hadoop 0 2020-06-07 16:51 /user/hive/warehouse/dedcode.db/enwiki_history_agg2 drwxr-xr-x - dedcode hadoop 0 2020-06-07 21:11 /user/hive/warehouse/dedcode.db/enwiki_history_agg_compact drwxr-xr-x - dedcode hadoop 0 2020-06-13 23:25 /user/hive/warehouse/dedcode.db/enwiki_history_agg_new drwxr-xr-x - dedcode hadoop 0 2020-06-13 23:07 /user/hive/warehouse/dedcode.db/enwiki_history_agg_new_test drwxr-xr-x - dedcode hadoop 0 2020-06-15 06:06 /user/hive/warehouse/dedcode.db/enwiki_history_agg_part drwxr-xr-x - dedcode hadoop 0 2020-08-14 11:49 /user/hive/warehouse/dedcode.db/enwiki_history_agg_sample drwxr-xr-x - dedcode hadoop 0 2020-08-14 07:41 /user/hive/warehouse/dedcode.db/enwiki_history_agg_sample3 drwxr-xr-x - dedcode hadoop 0 2020-08-14 07:27 /user/hive/warehouse/dedcode.db/enwiki_history_agg_sample4 drwxr-xr-x - dedcode hadoop 0 2020-06-15 03:13 /user/hive/warehouse/dedcode.db/enwiki_history_agg_temp drwxr-xr-x - dedcode hadoop 0 2020-06-15 01:25 /user/hive/warehouse/dedcode.db/enwiki_history_agg_tmp drwxr-xr-x - dedcode hadoop 0 2020-06-15 02:27 /user/hive/warehouse/dedcode.db/enwiki_history_agg_tmp2 drwxr-xr-x - dedcode hadoop 0 2020-06-16 10:10 /user/hive/warehouse/dedcode.db/enwiki_history_diff drwxr-xr-x - dedcode hadoop 0 2020-06-15 03:45 /user/hive/warehouse/dedcode.db/enwiki_history_diff_part drwxr-xr-x - dedcode hadoop 0 2020-06-12 20:06 /user/hive/warehouse/dedcode.db/enwiki_history_diff_part_talk drwxr-xr-x - dedcode hadoop 0 2020-06-12 22:18 /user/hive/warehouse/dedcode.db/enwiki_history_part drwxr-xr-x - dedcode hadoop 0 2020-06-12 10:56 /user/hive/warehouse/dedcode.db/enwiki_history_part_talk drwxr-xr-x - dedcode hadoop 0 2020-06-06 09:50 /user/hive/warehouse/dedcode.db/enwiki_history_part_year drwxr-xr-x - dedcode hadoop 0 2020-08-16 07:43 /user/hive/warehouse/dedcode.db/enwiki_ig drwxr-xr-x - dedcode hadoop 0 2020-08-16 09:47 /user/hive/warehouse/dedcode.db/enwiki_ig_bis drwxr-xr-x - dedcode hadoop 0 2020-08-16 00:50 /user/hive/warehouse/dedcode.db/enwiki_ig_prep drwxr-xr-x - dedcode hadoop 0 2020-08-16 20:03 /user/hive/warehouse/dedcode.db/enwiki_interaction_graph drwxr-xr-x - dedcode hadoop 0 2020-06-07 21:33 /user/hive/warehouse/dedcode.db/enwiki_sock_dataset drwxr-xr-x - dedcode hadoop 0 2020-06-08 07:39 /user/hive/warehouse/dedcode.db/enwiki_sock_dataset_full drwxr-xr-x - dedcode hadoop 0 2020-06-16 21:30 /user/hive/warehouse/dedcode.db/enwiki_user_feat drwxrwxrwt - dedcode hadoop 0 2020-05-25 10:25 /user/hive/warehouse/dedcode.db/ids_dataset drwxr-xr-x - dedcode hadoop 0 2020-06-03 23:22 /user/hive/warehouse/dedcode.db/simple_history_part_year drwxr-xr-x - dedcode hadoop 0 2020-06-17 09:45 /user/hive/warehouse/dedcode.db/sock_dataset drwxr-xr-x - dedcode hadoop 0 2020-06-17 12:41 /user/hive/warehouse/dedcode.db/sock_dataset_whitelist drwxrwxrwt - dedcode hdfs 0 2020-06-07 11:52 /user/hive/warehouse/dedcode.db/sock_id drwxr-xr-x - dedcode hadoop 0 2020-06-15 16:01 /user/hive/warehouse/dedcode.db/sock_ids drwxr-xr-x - dedcode hadoop 0 2020-08-13 22:03 /user/hive/warehouse/dedcode.db/sock_label drwxr-xr-x - dedcode hadoop 0 2020-06-07 13:31 /user/hive/warehouse/dedcode.db/tmp_diffs drwxrwxrwt - dedcode hdfs 0 2020-04-20 14:37 /user/hive/warehouse/dedcode.db/users_data drwxrwxrwt - dedcode hdfs 0 2020-05-20 23:33 /user/hive/warehouse/dedcode.db/wdhuman drwxrwxrwt - dedcode hdfs 0 2020-05-21 02:44 /user/hive/warehouse/dedcode.db/wdhumanids drwxrwxrwt - dedcode hadoop 0 2020-06-17 09:33 /user/hive/warehouse/dedcode.db/whitelist drwxrwxrwt - dedcode hdfs 0 2020-05-25 11:58 /user/hive/warehouse/dedcode.db/wiki_data drwxrwxrwt - dedcode hdfs 0 2020-05-25 11:42 /user/hive/warehouse/dedcode.db/wiki_ids
@Isaac @MGerlach could you please check if we have to keep anything or if we can drop? :)
@elukey thanks for the ping. I just talked with Djellel.
- hdfs/hive: all data can be dropped
- stat100X[5,6,7,8]: /user/dedcode/: is this possible to keep for some time? we are mainly interested in keeping potentially relevant code (*.py, *.ipynb, *.java). do you have any suggestions how to back up those files without going through every folder manually?
@MGerlach I can move the /home/dedcode dirs under your username, what we care is that an active user maintains/own them so we can ping in case there are issues etc... Would it be ok? Then you'll be in charge of dropping data when needed :)
@MGerlach I created on the stat boxes /home/mgerlach/dedcode_home, and changed file ownership permission to your username, lemme know if you can read files etc..
I am going to proceed to drop hdfs and hive data :)
Mentioned in SAL (#wikimedia-analytics) [2021-03-11T08:15:56Z] <elukey> hdfs dfs -rmr /user/dedcode on an-launcher1002 (data in trash for a month) - T276748
Mentioned in SAL (#wikimedia-analytics) [2021-03-11T08:25:46Z] <elukey> drop database dedcode cascade in hive - T276748