Page MenuHomePhabricator

Check home/HDFS leftovers of neilpquinn-wmf
Closed, ResolvedPublic

Description

Neil moved to a new account name (the new one is nshahquinn-wmf). Since he had/has access to analytics-privatadata-users it needs to be checked if data was left in home dirs on stat*/HDFS for the old neilpquinn-wmf username.

(it should all be transferred by Neil proactively, just still following our defined procedure for retiring accounts here)

The Kerberos principal has already been removed.

Point of contact for any questions is @nshahquinn-wmf

Event Timeline

Hi @nshahquinn-wmf - I've noticed that there are still some files on the stats servers owned by your previous account: neilpquinn-wmf.

====== stat1005 ======
total 140
drwxrwxrwx 5 12049 wikidev  4096 Mar 30 23:32 Audiences-External_automatic_translation
drwxrwxrwx 6 12049 wikidev  4096 Feb 27 10:09 canonical-data
drwxrwxrwx 9 12049 wikidev  4096 Feb 24 14:13 misc-analysis
-rwxrwxrwx 1 12049 wikidev 88858 May 27 01:08 most-internally-referred.ipynb
drwxrwxrwx 6 12049 wikidev  4096 May 27 00:01 nshahquinn
drwxrwxrwx 3 12049 wikidev  4096 May 27 01:06 population
drwxrwxrwx 5 12049 wikidev  4096 Jul 21  2022 product_analytics_jobs
-rwxrwxrwx 1 12049 wikidev   718 Oct 15  2019 publish-notebook.sh
drwxrwxrwx 2 12049 wikidev  4096 Dec  1  2022 __pycache__
-rwxrwxrwx 1 12049 wikidev   924 May 30 23:41 sandbox.ipynb
drwxrwxrwx 6 12049 wikidev  4096 Feb 10 05:00 wiki-comparison
drwxrwxrwx 6 12049 wikidev  4096 May 25 00:03 Wikistories
drwxrwxrwx 9 12049 wikidev  4096 Dec 15  2022 wmfdata-python
drwxrwxrwx 3 12049 wikidev  4096 Oct 21  2022 wmfdata-sandbox

====== stat1006 ======
total 24704
-rw-rw-r-- 1 12049 wikidev  6368865 Mar 22  2016 baseline_edits.tsv
drwxr-xr-x 6 12049 wikidev     4096 Sep 23  2022 campaigns
-rw-rw-r-- 1 12049 wikidev     2210 Aug  4  2016 mobile_edit_tag_permutations.tsv
-rw-rw-r-- 1 12049 wikidev     2866 Jun 27  2016 monthly_edits.tsv
-rw-r--r-- 1 12049 wikidev  2182230 Apr 21  2016 neilpquinn_VE_experiment_revs.tsv
-rw-r--r-- 1 12049 wikidev     1228 Nov 23  2022 Untitled.ipynb
-rw-rw-r-- 1 12049 wikidev  2536214 Dec 18  2015 VE_experiment_cohort.tsv
-rw-rw-r-- 1 12049 wikidev 14178087 May  3  2016 VE_experiment_revs.tsv
drwxr-xr-x 6 12049 wikidev     4096 Oct 12  2022 wikistories
drwxr-xr-x 7 12049 wikidev     4096 Nov 23  2022 wmfdata-python

====== stat1008 ======
total 4
-rw-r--r-- 1 12049 wikidev 72 Oct  5  2022 Untitled.ipynb

Would you like me to do anything with regard to these files, or are they safe for us to remove?

Also, I notice that there are some hive databases still belonging to neilpquinn-wmf as well.

====== Hive =========
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2019-03-19 00:07 /user/hive/warehouse/neilpquinn.db/content_namespaces
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2020-07-21 19:24 /user/hive/warehouse/neilpquinn.db/countries_test
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2019-04-10 23:38 /user/hive/warehouse/neilpquinn.db/editor_month_new
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2019-04-10 00:09 /user/hive/warehouse/neilpquinn.db/editor_month_official
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2021-09-07 20:41 /user/hive/warehouse/neilpquinn.db/kaios_experiment_event
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2020-08-06 11:11 /user/hive/warehouse/neilpquinn.db/kaios_wp_webrequest
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2018-10-18 01:00 /user/hive/warehouse/neilpquinn.db/mob_or_ve_edits
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2020-02-05 22:13 /user/hive/warehouse/neilpquinn.db/new_editors
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2021-01-29 16:09 /user/hive/warehouse/neilpquinn.db/test
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2020-07-30 17:32 /user/hive/warehouse/neilpquinn.db/test_chart
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2018-11-26 21:50 /user/hive/warehouse/neilpquinn.db/wiki_articles
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2019-06-12 15:51 /user/hive/warehouse/neilpquinn.db/wiki_education_foundation_participants
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2021-01-29 15:58 /user/hive/warehouse/neilpquinn.db/wmfdata_test
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2021-01-30 09:49 /user/hive/warehouse/neilpquinn.db/wmfdata_test_1
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2021-01-30 09:49 /user/hive/warehouse/neilpquinn.db/wmfdata_test_2
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2021-01-30 09:50 /user/hive/warehouse/neilpquinn.db/wmfdata_test_3
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2021-02-01 13:28 /user/hive/warehouse/neilpquinn.db/wmfdata_test_4
drwxrwx---   - neilpquinn-wmf         analytics-privatedata-users          0 2023-02-27 19:34 /user/hive/warehouse/nshahquinn.db/wikipediapreview_stats_altered
drwxrwx---   - neilpquinn-wmf         analytics-privatedata-users          0 2023-04-18 02:53 /user/hive/warehouse/nshahquinn.db/wikipediapreview_stats_backup
drwxr-x---   - neilpquinn-wmf         analytics-privatedata-users          0 2023-01-25 02:38 /user/hive/warehouse/nshahquinn.db/wmfdata_test_1

How would you like us to handle these? Are they safe for us to remove, or would you rather that we change the ownership or rename them somehow?
Thanks.

@BTullis thanks for working on this! Everything you mentioned, on the stat servers and in Hive, is safe to delete.

BTullis claimed this task.

Great! Thanks @nshahquinn-wmf

I have removed the posix home directories with: sudo cumin 'C:profile::analytics::cluster::client or C:profile::hadoop::master or C:profile::hadoop::master::standby' 'rm -rf /home/neilpquinn-wmf'

For the hive file, I checked to see if there was a hive database by the name of neilpquinn but there was not.

btullis@an-launcher1002:~$ sudo -u hdfs kerberos-run-command hdfs hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hive/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

Logging initialized using configuration in file:/etc/hive/conf.analytics-hadoop/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive (nshahquinn)> use neilpquinn;
FAILED: SemanticException [Error 10072]: Database does not exist: neilpquinn
hive (nshahquinn)>

Therefore, I can simply delete the HDFS files that were left over.

btullis@an-launcher1002:~$ sudo -u hdfs kerberos-run-command hdfs hdfs dfs -rm -r /user/hive/warehouse/neilpquinn.db
23/08/14 13:59:31 INFO fs.TrashPolicyDefault: Moved: 'hdfs://analytics-hadoop/user/hive/warehouse/neilpquinn.db' to trash at: hdfs://analytics-hadoop/user/hdfs/.Trash/Current/user/hive/warehouse/neilpquinn.db

With the last three, it's a little more complicated. Two of them appear to be tables owned by neilpquinn-wmf in a database called nshahquinn.db which also contains two other tables. One of them appears to be an orphan and does not appear in hive.

hive (default)> use nshahquinn;
OK
Time taken: 1.231 seconds
hive (nshahquinn)> show tables;
OK
tab_name
log4j:ERROR No output stream or file set for the appender named [console].
wikipediapreview_stats_altered
wikipediapreview_stats_backup
wikipediapreview_stats_test
wikis
Time taken: 0.196 seconds, Fetched: 4 row(s)

I will firstly drop the two tables:

hive (nshahquinn)> drop table wikipediapreview_stats_altered;
OK
Time taken: 0.456 seconds
hive (nshahquinn)> drop table wikipediapreview_stats_backup;
OK
Time taken: 0.123 seconds
hive (nshahquinn)>

... then follow up by deleting any remaining files.

btullis@an-launcher1002:~$ sudo -u analytics kerberos-run-command analytics hdfs dfs -ls /user/hive/warehouse/nshahquinn.db/
Found 2 items
drwxrwx---   - nshahquinn-wmf analytics-privatedata-users          0 2023-08-12 01:06 /user/hive/warehouse/nshahquinn.db/wikis
drwxr-x---   - neilpquinn-wmf analytics-privatedata-users          0 2023-01-25 02:38 /user/hive/warehouse/nshahquinn.db/wmfdata_test_1

btullis@an-launcher1002:~$ sudo -u hdfs kerberos-run-command hdfs hdfs dfs -rm -r /user/hive/warehouse/nshahquinn.db/wmfdata_test_1
23/08/14 14:06:33 INFO fs.TrashPolicyDefault: Moved: 'hdfs://analytics-hadoop/user/hive/warehouse/nshahquinn.db/wmfdata_test_1' to trash at: hdfs://analytics-hadoop/user/hdfs/.Trash/Current/user/hive/warehouse/nshahquinn.db/wmfdata_test_1

i think that's all done now. please do let me know if anything seems amiss @nshahquinn-wmf