Page MenuHomePhabricator

Check home/HDFS leftovers of ryanmax
Closed, ResolvedPublic

Description

The access for

Todd Leroux <toddleroux>
Ryan Steinberg <ryanmax>
Joe Wass <afandian2>

was removed. It needs to be checked if data was left in home dirs on stat*/HDFS since they were part of the "analytics-privatedata-users" group.

There were no Kerberos principals. Point of contact wrt potentially keeping any data is @Miriam

Event Timeline

No files of interest for toddleroux

btullis@marlin:~$ check-user-leftovers toddleroux

====== stat1004 ======
total 0

====== stat1005 ======
total 0

====== stat1006 ======
total 0

====== stat1007 ======
total 0

====== stat1008 ======
total 0

======= HDFS ========
Found 2 items
drwxr-xr-x   - toddleroux toddleroux          0 2019-05-20 21:35 /user/toddleroux/.sparkStaging
drwx------   - toddleroux toddleroux          0 2019-03-03 15:45 /user/toddleroux/.staging

====== Hive =========

Here are the files belonging to ryanmax - @Miriam what would you like us to do with the following files and tables?

====== stat1007 ======
total 42456
drwxrwxr-x 3 20482 wikidev     4096 Jan 11  2019 env
drwxrwxr-x 4 20482 wikidev     4096 Dec  5  2018 env.bak
-rw-rw-r-- 1 20482 wikidev   330758 Dec 14  2018 neg_offset_sample.tsv
drwxrwxr-x 2 20482 wikidev     4096 May 20  2019 pcor-data
drwxrwxr-x 2 20482 wikidev     4096 Apr 23  2019 py
-rw-rw-r-- 1 20482 wikidev 43115274 Dec 14  2018 section_id.tsv
drwxrwxr-x 8 20482 wikidev     4096 Sep 23  2019 sql
drwxrwxr-x 2 20482 wikidev     4096 Dec  5  2018 topic-data-from-miriam

======= HDFS ========
Found 12 items
drwx------   - ryanmax ryanmax           0 2019-10-31 00:00 /user/ryanmax/.Trash
drwxr-xr-x   - ryanmax ryanmax           0 2019-09-30 19:39 /user/ryanmax/.sparkStaging
drwx------   - ryanmax ryanmax           0 2019-09-25 17:59 /user/ryanmax/.staging
drwxr-xr-x   - ryanmax ryanmax           0 2019-06-01 00:33 /user/ryanmax/anonymous_citationusage_april.parquet
drwxr-xr-x   - ryanmax ryanmax           0 2019-06-01 00:29 /user/ryanmax/anonymous_pageloads_april.parquet
-rw-r--r--   3 ryanmax ryanmax 16873458971 2019-03-07 05:15 /user/ryanmax/enwiki-20190301-pages-articles-multistream.xml.bz2
-rw-r--r--   3 ryanmax ryanmax 16918450547 2019-05-03 23:37 /user/ryanmax/enwiki-20190320-pages-articles-multistream.xml.bz2
-rw-r--r--   3 ryanmax ryanmax 16973178656 2019-05-03 23:40 /user/ryanmax/enwiki-20190401-pages-articles-multistream.xml.bz2
-rw-r--r--   3 ryanmax ryanmax 17016934888 2019-05-03 23:41 /user/ryanmax/enwiki-20190420-pages-articles-multistream.xml.bz2
drwxr-xr-x   - ryanmax ryanmax           0 2019-05-31 23:59 /user/ryanmax/session_ids.parquet
drwxr-xr-x   - ryanmax ryanmax           0 2019-09-30 16:20 /user/ryanmax/test
-rw-r--r--   3 ryanmax ryanmax    44368803 2019-08-06 16:38 /user/ryanmax/tiny-20190401.xml

====== Hive =========
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-05-08 00:20 /user/hive/warehouse/ryanmax.db/archived_free_count
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-08-01 18:31 /user/hive/warehouse/ryanmax.db/archived_free_count_w_date
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-07-26 17:15 /user/hive/warehouse/ryanmax.db/archived_page_lengths
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2018-12-06 01:24 /user/hive/warehouse/ryanmax.db/archived_page_topic
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2018-12-05 21:07 /user/hive/warehouse/ryanmax.db/archived_page_topics
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-04-24 03:31 /user/hive/warehouse/ryanmax.db/archived_pages_with_extlinks_all_dates
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-03-22 03:53 /user/hive/warehouse/ryanmax.db/archived_projmed
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-04-24 03:29 /user/hive/warehouse/ryanmax.db/archived_projmed_categories
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-04-24 03:30 /user/hive/warehouse/ryanmax.db/archived_projmed_with_extlinks
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-07-30 22:43 /user/hive/warehouse/ryanmax.db/archived_sections
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-05-08 21:51 /user/hive/warehouse/ryanmax.db/archived_top1k_med
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-06-01 06:17 /user/hive/warehouse/ryanmax.db/archived_top1k_med_anon
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-05-09 03:39 /user/hive/warehouse/ryanmax.db/archived_top_hosts_notwpm_events
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-06-01 08:37 /user/hive/warehouse/ryanmax.db/archived_top_hosts_notwpm_events_anon
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-05-08 23:35 /user/hive/warehouse/ryanmax.db/archived_top_hosts_w_events
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-06-01 07:36 /user/hive/warehouse/ryanmax.db/archived_top_hosts_w_events_anon
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-05-08 23:47 /user/hive/warehouse/ryanmax.db/archived_top_hosts_wpm_events
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-06-01 07:46 /user/hive/warehouse/ryanmax.db/archived_top_hosts_wpm_events_anon
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-08-09 20:18 /user/hive/warehouse/ryanmax.db/externallinks
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-08-09 18:27 /user/hive/warehouse/ryanmax.db/externallinks_from_dump_20190401
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-08-09 18:27 /user/hive/warehouse/ryanmax.db/externallinks_from_dump_20190420
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-04 00:31 /user/hive/warehouse/ryanmax.db/free_id_types
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-08-01 18:27 /user/hive/warehouse/ryanmax.db/infobox_count
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-08-01 13:49 /user/hive/warehouse/ryanmax.db/page_lengths_w_date
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-20 18:08 /user/hive/warehouse/ryanmax.db/pages_w_with_extlinks
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-20 18:08 /user/hive/warehouse/ryanmax.db/pages_wpm
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-20 18:08 /user/hive/warehouse/ryanmax.db/pages_wpm_with_extlinks
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-05-20 17:16 /user/hive/warehouse/ryanmax.db/pcor_pmids
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-05-20 16:07 /user/hive/warehouse/ryanmax.db/pcori_pmids
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-05-20 17:15 /user/hive/warehouse/ryanmax.db/pmids_in_w
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-04 17:49 /user/hive/warehouse/ryanmax.db/population_externallinks
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-04 19:49 /user/hive/warehouse/ryanmax.db/population_freelink_id_types
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-04 20:29 /user/hive/warehouse/ryanmax.db/population_infobox
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-25 17:55 /user/hive/warehouse/ryanmax.db/population_page_titles_20190420
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-20 18:12 /user/hive/warehouse/ryanmax.db/population_w_pages_with_extlinks
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-20 18:10 /user/hive/warehouse/ryanmax.db/population_wpm_pages_with_extlinks
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-24 06:58 /user/hive/warehouse/ryanmax.db/population_wpm_sections
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-20 18:08 /user/hive/warehouse/ryanmax.db/projmed_categories
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-04 17:28 /user/hive/warehouse/ryanmax.db/trash_population_w_pages_with_extlinks
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-09-04 17:26 /user/hive/warehouse/ryanmax.db/trash_population_wpm_pages_with_extlinks
drwxr-x---   - ryanmax                analytics-privatedata-users          0 2019-04-24 03:41 /user/hive/warehouse/ryanmax.db/wpm_sections

No data for afandian2

btullis@marlin:~$ check-user-leftovers afandian2

====== stat1004 ======
total 0

====== stat1005 ======
total 0

====== stat1006 ======
total 0

====== stat1007 ======
total 0

====== stat1008 ======
total 0

======= HDFS ========

====== Hive =========

Removed home directories for toddleroux and afandian2 with:

sudo cumin 'C:profile::analytics::cluster::client or C:profile::hadoop::master or C:profile::hadoop::master::standby' 'rm -rf /home/toddleroux'
sudo cumin 'C:profile::analytics::cluster::client or C:profile::hadoop::master or C:profile::hadoop::master::standby' 'rm -rf /home/afandian2'

Removed the HDFS home for toddleroux with:

sudo -u hdfs kerberos-run-command hdfs hdfs dfs -rm -r /user/toddleroux

The data is still present for ryanmax as shown above in: T325527#8862442.
@Miriam - did you get a chance to assess whether there is value in retaining this data, or are you happy for us to remove it?

BTullis renamed this task from Check home/HDFS leftovers of toddleroux / ryanmax / afandian2 to Check home/HDFS leftovers of ryanmax.Aug 11 2023, 10:04 AM

Oh sorry @BTullis I completely missed this, and thanks @Sfaci for the ping!
Is it possible to move this data to @tizianopiccardi's home, as he is a co-author of the paper?

The files in the folder of ryanmax can be deleted. The relevant files were already moved to my home folder.

Gehel triaged this task as Low priority.Nov 15 2023, 9:35 AM
Gehel moved this task from Incoming to Ready for Work on the Data-Platform-SRE board.

Thanks all for your input.
I have now removed the files with:

sudo cumin 'C:profile::analytics::cluster::client or C:profile::hadoop::master or C:profile::hadoop::master::standby' 'rm -rf /home/ryanmax'