Page MenuHomePhabricator

Check home/HDFS leftovers of nathante
Closed, ResolvedPublic


The access for Nathan TeBlunthuis (nathante) was removed. It needs to be checked if data was left in home dirs on stat*/notebook*/HDFS since they were part of the "analytics-privatedata-users" group.

I've already removed the Kerberos principal.

Event Timeline

====== stat1004 ======
total 24
-rw-r--r-- 1 nathante wikidev 12682 Nov 18  2018 DwellTimeModels.R.r
drwxrwxr-x 3 nathante wikidev  4096 Sep 25  2018 R
drwxrwxr-x 2 nathante wikidev  4096 Oct  2  2018 reading_time
ls: cannot access '/var/userarchive/nathante.tar.bz2': No such file or directory

====== stat1005 ======
total 0
ls: cannot access '/var/userarchive/nathante.tar.bz2': No such file or directory

====== stat1006 ======
total 5928776
-rw-r--r--  1 nathante wikidev        204 Aug 20  2019 all_wikis.csv
-rw-r--r--  1 nathante wikidev       7386 Aug 14  2019 deleted_config_revisions.json
-rw-r--r--  1 nathante wikidev       3170 Aug 14  2019 deleted_config_revisions.pickle
drwxr-xr-x 18 nathante wikidev       4096 Jun  3  2019 editquality
-rw-r--r--  1 nathante wikidev   10875043 May  2  2019 label_edits_gender_geo.pickle
drwxrwxr-x  3 nathante wikidev       4096 Apr 18  2019 local
drwxr-xr-x 15 nathante wikidev       4096 Jun 13  2019 mediawiki
drwxr-xr-x 14 nathante wikidev       4096 Oct  1  2019 mw_revert_tool_detector
drwxr-xr-x 29 nathante wikidev       4096 Mar 13  2020 nathante
-rw-rw-r--  1 nathante wikidev 6027045510 Sep 30 21:14 nathante_wmf_export.tar.gz
drwxr-xr-x  3 nathante wikidev       4096 Sep 19  2019 ores_bias_plots
drwxr-xr-x 19 nathante wikidev      32768 Sep 30 18:06 ores_bias_project
drwxr-xr-x  5 nathante wikidev       4096 Sep 30 20:35 ores_project_bias_analysis
drwxrwxr-x  2 nathante wikidev       4096 Sep 30 20:35 ores_project_code
drwxrwxr-x  2 nathante wikidev       4096 Sep 30 20:49 ores_project_data
-rw-r--r--  1 nathante wikidev   20157871 May  1  2019 page_wikidata_properties.pickle
drwxr-xr-x  2 nathante wikidev       4096 Sep  1  2019 __pycache__
drwxr-xr-x  7 nathante wikidev       4096 Jun 25  2019 pyRemembeR
drwxr-xr-x  6 nathante wikidev       4096 Oct 30  2019 R
drwxrwxr-x  3 nathante wikidev       4096 Sep 30 20:31 readingtime_nonsensitive
-rw-r--r--  1 nathante wikidev   12852408 Jun 27  2019 remember_python.RDS
-rwxr-xr-x  1 nathante wikidev          0 Jun 27  2019 remember_python.RDS.lock
drwxr-xr-x  2 nathante wikidev       4096 Sep 27  2018 seaborn-data
drwxr-xr-x  5 nathante wikidev       4096 Jun 21  2019 temp
drwxr-xr-x  6 nathante wikidev       4096 Oct 17  2018 test
drwxr-xr-x  8 nathante wikidev       4096 Oct 17  2018 venv
ls: cannot access '/var/userarchive/nathante.tar.bz2': No such file or directory

====== stat1007 ======
total 33912
drwxr-xr-x  8 20110 wikidev     4096 Jun 20  2019 ess-18.10.2
-rw-rw-r--  1 20110 wikidev  2939425 Nov 10  2018 ess-18.10.2.tgz
-rw-rw-r--  1 20110 wikidev       41 Jun  4  2019 github_access_token
-rw-rw-r--  1 20110 wikidev   127156 Jun 20  2019 hs_err_pid131876.log
-rw-rw-r--  1 20110 wikidev    64420 Jun 20  2019 hs_err_pid145522.log
-rw-rw-r--  1 20110 wikidev   122383 Jun 20  2019 hs_err_pid146065.log
-rw-rw-r--  1 20110 wikidev    64420 Jun 20  2019 hs_err_pid146218.log
-rw-rw-r--  1 20110 wikidev    64420 Jun 20  2019 hs_err_pid146247.log
-rw-rw-r--  1 20110 wikidev    64420 Jun 20  2019 hs_err_pid146277.log
-rw-rw-r--  1 20110 wikidev    64420 Jun 20  2019 hs_err_pid146305.log
-rw-rw-r--  1 20110 wikidev    64420 Jun 20  2019 hs_err_pid146340.log
-rw-rw-r--  1 20110 wikidev   120812 Jun 20  2019 hs_err_pid146402.log
-rw-rw-r--  1 20110 wikidev    64468 Jun 20  2019 hs_err_pid146560.log
drwxrwxr-x 19 20110 wikidev     4096 Apr 29  2019 mediawiki-config
drwxrwxr-x 10 20110 wikidev     4096 Aug 14  2019 mw_revert_tool_detector
drwxrwxr-x  8 20110 wikidev     4096 Jun  5  2019 ores_bias
-rwxrwxr-x  1 20110 wikidev      459 Apr 27  2019 #.profile#
drwxrwxr-x  3 20110 wikidev     4096 May 21  2019 R
drwxr-xr-x 15 20110 wikidev     4096 May 21  2019 R-3.6.0
-rw-rw-r--  1 20110 wikidev 30449618 Apr 26  2019 R-3.6.0.tar.gz
-rw-rw-r--  1 20110 wikidev   236521 Jun 20  2019 replay_pid146065.log
-rw-rw-r--  1 20110 wikidev   226822 Jun 20  2019 replay_pid146402.log
ls: cannot access '/var/userarchive/nathante.tar.bz2': No such file or directory

====== stat1008 ======
total 0
ls: cannot access '/var/userarchive/nathante.tar.bz2': No such file or directory

======= HDFS ========
Found 17 items
drwx------   - nathante nathante          0 2020-02-05 00:00 /user/nathante/.Trash
drwxr-xr-x   - nathante nathante          0 2020-09-26 20:00 /user/nathante/.sparkStaging
drwx------   - nathante nathante          0 2019-06-27 03:44 /user/nathante/.staging
drwxr-xr-x   - nathante hdfs              0 2018-10-28 18:36 /user/nathante/cleanReadingData
drwxr-xr-x   - nathante nathante          0 2020-09-24 19:35 /user/nathante/cutoff_revisions_sample_N5000.csv
drwxr-xr-x   - nathante nathante          0 2018-11-04 05:54 /user/nathante/modelReadingTime_stratsamp
drwxr-xr-x   - nathante nathante          0 2018-11-04 06:00 /user/nathante/modelReadingTime_stratsamp_bigger
drwxr-xr-x   - nathante nathante          0 2018-11-06 07:49 /user/nathante/modelReadingTime_stratsamp_smaller
drwxr-xr-x   - nathante nathante          0 2019-12-03 22:02 /user/nathante/ores_bias
drwxr-xr-x   - nathante nathante          0 2020-09-27 06:16 /user/nathante/ores_bias_data
-rw-r--r--   3 nathante nathante  725084695 2020-01-11 23:52 /user/nathante/ores_bias_project
drwxr-xr-x   - nathante hdfs              0 2018-09-25 23:08 /user/nathante/output
drwxr-xr-x   - nathante hdfs              0 2018-10-28 18:48 /user/nathante/pageEventTimings
drwxr-xr-x   - nathante hdfs              0 2018-09-25 23:00 /user/nathante/pageEventTotalTimeHist
drwxr-xr-x   - nathante hdfs              0 2018-09-27 07:28 /user/nathante/readingDepthSample
drwxr-xr-x   - nathante hdfs              0 2018-09-27 00:59 /user/nathante/rerrTotalLength
drwxr-xr-x   - nathante nathante          0 2020-09-24 19:37 /user/nathante/threshold_strata_counts_N5000.csv

====== Hive =========
drwxr-xr-x   - nathante         hadoop                          0 2019-11-19 19:55 /user/hive/warehouse/nathante.db/cutoff_revisions_sample_2periods
drwxr-xr-x   - nathante         hadoop                          0 2019-11-06 23:02 /user/hive/warehouse/nathante.db/cutoff_revisions_sample_simplestrata
drwxrwxrwt   - nathante         hdfs                            0 2018-10-26 05:56 /user/hive/warehouse/nathante.db/neg_span_spike
drwxr-xr-x   - nathante         hadoop                          0 2018-10-29 04:43 /user/hive/warehouse/nathante.db/readingdatamodel_stage1
drwxrwxrwt   - nathante         hadoop                          0 2018-10-31 05:50 /user/hive/warehouse/nathante.db/readingdatamodel_stage2
drwxr-xr-x   - nathante         hadoop                          0 2018-10-09 00:12 /user/hive/warehouse/nathante.db/samp_cleanreadingdata
drwxr-xr-x   - nathante         hadoop                          0 2018-10-10 05:57 /user/hive/warehouse/nathante.db/samp_cleanreadingdata_pages
drwxr-xr-x   - nathante         hadoop                          0 2018-10-31 18:37 /user/hive/warehouse/nathante.db/tablereadingdatamodel_stage2
drwxr-xr-x   - nathante         hadoop                          0 2019-03-31 16:45 /user/hive/warehouse/nathante.db/un_developent_data
drwxr-xr-x   - nathante         hadoop                          0 2019-06-18 18:51 /user/hive/warehouse/nathante.db/wiki_weeks_tab

@Groceryheist is there anything that you need to save? Also cc: @calbon @leila

@elukey the only items @Groceryheist needs to export are at T264255 (waiting security and analytics review). The rest can be purged.

@leila, T264255 is now resolved (I believe a tarball with all required files was copied over to a public location).
Please, can you confirm that we can proceed to delete the data in stat100* machines and HDFS?

All stat100x home dirs purged, only hdfs/hive left!

mforns claimed this task.

Deleted both HDFS and HIVE directories, plus the corresponding database in HIVE.
Marking this as resolved!