Page MenuHomePhabricator

Add trash folder to hadoop
Closed, ResolvedPublic

Event Timeline

Nuria triaged this task as High priority.
Nuria added a project: Analytics-Kanban.

Change 423156 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet/cdh@master] cdh::hadoop: add the config support for HDFS Trash

https://gerrit.wikimedia.org/r/423156

Tested the two values that I've set in the above patch in labs:

elukey@hadoop-master-1:~$ hdfs dfs -rm -r -f /user/elukey/.sparkStaging
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
18/03/30 13:28:03 INFO fs.TrashPolicyDefault: Moved: 'hdfs://analytics-hadoop-labs/user/elukey/.sparkStaging' to trash at: hdfs://analytics-hadoop-labs/user/elukey/.Trash/Current/user/elukey/.sparkStaging

So here's how they work, from https://developer.ibm.com/hadoop/2015/10/22/hdfs-trash:

Deletion interval specifies how long (in minutes) a checkpoint will be expired before it is deleted. It is the value of fs.trash.interval. The NameNode runs a thread to periodically remove expired checkpoints from the file system.

Emptier interval specifies how long (in minutes) the NameNode waits before running a thread to manage checkpoints. The NameNode deletes checkpoints that are older than fs.trash.interval and creates a new checkpoint from /user/${username}/.Trash/Current. This frequency is determined by the value of fs.trash.checkpoint.interval, and it must not be greater than the deletion interval. This ensures that in an emptier window, there are one or more checkpoints in the trash.
For example, set

fs.trash.interval = 360 (deletion interval = 6 hours)
fs.trash.checkpoint.interval = 60 (emptier interval = 1 hour)

This causes the NameNode to create a new checkpoint every hour and to delete checkpoints that have existed longer than 6 hours.

In my case, I've set checkpoint every 5 mins and delete after 30, this is the status after some minutes:

elukey@hadoop-master-1:~$ hdfs dfs -ls /user/elukey/.Trash
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF-8
Found 1 items
drwx------   - elukey hdfs          0 2018-03-30 13:28 /user/elukey/.Trash/180330133000

Let's talk these values with team, I was thinking trash should persist several days but that drops we do on a cron (retention) should delete skipping trash, which i think should be possible

Change 423156 merged by Elukey:
[operations/puppet/cdh@master] cdh::hadoop: add the config support for HDFS Trash

https://gerrit.wikimedia.org/r/423156

Change 423613 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] profile::hadoop::common: allow to enable/disable the HDFS trash

https://gerrit.wikimedia.org/r/423613

Change 423613 merged by Elukey:
[operations/puppet@production] profile::hadoop::common: allow to enable/disable the HDFS trash

https://gerrit.wikimedia.org/r/423613

Change 423844 had a related patch set uploaded (by Elukey; owner: Elukey):
[analytics/refinery@master] Append '-skipTrash' to all the hdfs -rm invocations

https://gerrit.wikimedia.org/r/423844

Change 423845 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet/cdh@master] Add the -skipTrash option to hdfs -rm

https://gerrit.wikimedia.org/r/423845

Change 423845 merged by Elukey:
[operations/puppet/cdh@master] Add the -skipTrash option to hdfs -rm

https://gerrit.wikimedia.org/r/423845

Change 423844 merged by Elukey:
[analytics/refinery@master] Append '-skipTrash' to all the hdfs -rm invocations

https://gerrit.wikimedia.org/r/423844

Change 424237 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::analytics_cluster::hadoop:master|standby: enable HDFS trash

https://gerrit.wikimedia.org/r/424237

Change 424237 merged by Elukey:
[operations/puppet@production] role::analytics_cluster::hadoop:master|standby: enable HDFS trash

https://gerrit.wikimedia.org/r/424237

Mentioned in SAL (#wikimedia-operations) [2018-04-11T16:44:14Z] <elukey> restart hadoop hdfs namenodes on analytics100[12] to pick up HDFS Trash settings - T189051

Added documentation to https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster#recover_files_deleted_by_mistake_using_the_hdfs_CLI_rm_command?, last step is to send a mail to analytics@ (and possibly research, engineering?) to announce the new feature.