Page MenuHomePhabricator

Clean up home dirs for users jamesur and nithum
Closed, ResolvedPublic3 Story Points

Description

The users jamesur and nithum have been offboarded from one or more analytics groups, we need to check if they left any data in their home directories on either stat or notebook hosts, on HDFS (/user/ and Hive databases).

Event Timeline

elukey triaged this task as Normal priority.Dec 17 2018, 2:53 PM
elukey created this task.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 17 2018, 2:53 PM
fdans claimed this task.Dec 17 2018, 5:40 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.
fdans added a comment.Dec 18 2018, 2:25 PM

Stuff to delete in both users

jamesur:

  • home dir in stat1005
  • home dir in hdfs
  • database jamesur in hive (1 table)

nithum:

  • home dir in stat1004
  • home dir in stat1005
  • home dir in stat1007
  • home dir in notebook1003
  • home dir in notebook1004
  • home dir in hdfs
  • database nithum in hive (1 table)

Since this is the first time we do this, let's also document the process in https://wikitech.wikimedia.org/wiki/Analytics/Team/Oncall before closing the task.

Stuff to delete in both users
jamesur:

  • home dir in stat1005
  • home dir in hdfs
  • database jamesur in hive (1 table)

If you haven't already any chance you could ship my stat1005 stuff over to @jrbs (shell name foks)? It would allow him to use the templates I've already created for future lookups and double check to make sure none of the files I have are still useful to keep for legal reasons (all of that should already be in the safe but just as a double check). If too hard or already deleted not the end of the world however.

@Jalexander done! There is now a directory on stat1007 (stat1005 is deprecated) called jamesur in foks's home directory (owned by `foks:root and read/write/execute only for him). Is there anything else that you want to keep? Is the Hive db important?

Thanks :)

@Jalexander done! There is now a directory on stat1007 (stat1005 is deprecated) called jamesur in foks's home directory (owned by `foks:root and read/write/execute only for him). Is there anything else that you want to keep? Is the Hive db important?
Thanks :)

Perfect, thanks! Nope the other two can happily be deleted :).

elukey added a subscriber: leila.Jan 24 2019, 2:38 PM

@leila hello :)

Do you know anything about Nithum Thain? As far as I can see he worked with Ellery on some projects T157724.

Thanks!

leila added a comment.Jan 24 2019, 2:44 PM

@elukey yes. emailing him now to get a clearance to delete his home.

We got an answer back, green light to free nithum's data.

elukey added a comment.Feb 5 2019, 5:50 PM

Current status:

  • jamesur
====== stat1004 ======
ls: cannot access '/srv/home/jamesur': No such file or directory

====== stat1006 ======
ls: cannot access '/srv/home/jamesur': No such file or directory

====== stat1007 ======
ls: cannot access '/srv/home/jamesur': No such file or directory

====== notebook1003 ======
ls: cannot access '/srv/home/jamesur': No such file or directory

====== notebook1004 ======
ls: cannot access '/srv/home/jamesur': No such file or directory

======= HDFS ========
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
ls: `/user/jamesur': No such file or directory

====== Hive =========
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
drwxrwxrwt   - jamesur          hadoop          0 2016-01-12 02:49 /user/hive/warehouse/jamesur.db
  • nithum
====== stat1004 ======
total 0

====== stat1006 ======
ls: cannot access '/srv/home/nithum': No such file or directory

====== stat1007 ======
total 40
drwxrwxr-x 3 15708 wikidev  4096 Mar 24  2017 detox
-rw-rw-r-- 1 15708 wikidev 32101 Feb 23  2017 QueryResult.java
-rw-rw-r-- 1 15708 wikidev    13 Feb 23  2017 sqoop.password

====== notebook1003 ======
total 0

====== notebook1004 ======
total 4
drwxr-xr-x 7 15708 wikidev 4096 Jul  5  2018 venv

======= HDFS ========
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Found 3 items
drwx------   - nithum nithum          0 2017-03-24 17:57 /user/nithum/.staging
drwxr-xr-x   - nithum nithum          0 2017-06-05 18:15 /user/nithum/talk_diff_external

====== Hive =========
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
drwxrwxrwt   - nithum           hadoop          0 2017-02-23 19:23 /user/hive/warehouse/nithum.db
drwxrwxrwt   - nithum           hadoop          0 2017-03-24 17:57 /user/hive/warehouse/tlwiki.db
elukey removed fdans as the assignee of this task.Feb 5 2019, 6:36 PM
elukey added a subscriber: fdans.

Zoom in into tlwiki.db:

checking table: tlwiki.logging
location:hdfs://analytics-hadoop/user/hive/warehouse/tlwiki.db/logging,

checking table: tlwiki.page
location:hdfs://analytics-hadoop/user/hive/warehouse/tlwiki.db/page,

checking table: tlwiki.revision
location:hdfs://analytics-hadoop/user/hive/warehouse/tlwiki.db/revision,

checking table: tlwiki.user
location:hdfs://analytics-hadoop/user/hive/warehouse/tlwiki.db/user,

The db seems owned by user nithum, I'd drop it. Comments?

More info on this db: It contains sqooped data from tlwiki as naming suggests (number of revision coherent with recent snapshot). Data format is not optimal (hive-oriented, not even avro) and data is old compared to the one currently provided. I suggest we drop it.

Everything cleaned up!

elukey claimed this task.Feb 11 2019, 12:49 PM
elukey added a project: Analytics-Kanban.
elukey set the point value for this task to 3.
elukey moved this task from Next Up to Done on the Analytics-Kanban board.
Nuria closed this task as Resolved.Feb 14 2019, 5:09 AM