Today I noticed a warning in icinga about HDFS space used, we crossed the 2PB mark:
https://grafana.wikimedia.org/d/000000585/hadoop?orgId=1&panelId=25&fullscreen&from=now-90d&to=now
The last 90d view shows that something changed during the past month, more or less from the last week of October onwards.
elukey@stat1004:~$ sudo -u hdfs hdfs dfs -du -h / 24 768 M /system 1.2 T 3.7 T /tmp 66.0 T 198.6 T /user 73.3 T 219.9 T /var 547.2 T 1.6 P /wmf
The /user dir contains some big home dirs:
1.3 T /user/mforns 1.4 T /user/otto 2.5 T /user/west1 2.9 T /user/nuria 4.4 T /user/ebernhardson 5.7 T /user/ezachte 6.0 T /user/nathante 7.6 T /user/dsaez 9.5 T /user/milimetric 9.8 T /user/halfak 14.7 T /user/piccardi 16.4 T /user/joal 23.6 T /user/leila 26.7 T /user/druid 61.4 T /user/hive
And the /var dir contains /var/log/hadoop-yarn/apps logs.
There seems to be ~300T of replicated data for the past month (~1.7PB to ~2PB), so ~100T un replicated. Since the trend seems that the space used is increasing, let's figure out what is causing it.