Page MenuHomePhabricator

Archive /home/ezachte data on stat1007
Open, MediumPublic


/home/ezachte on stat1007 is 687G. Most of that is in

128G	./wikistats_backup
557G	./wikistats_data

Assuming this data isn't updated or rsynced to anymore, can we either delete or archive it in HDFS?

Related Objects


Event Timeline

Ottomata created this task.Nov 13 2019, 6:46 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 13 2019, 6:46 PM
Ottomata renamed this task from Archive /home/ezacthe data on stat1007 to Archive /home/ezachte data on stat1007.Nov 14 2019, 6:25 PM
fdans triaged this task as Medium priority.Nov 14 2019, 6:33 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.
fdans added a subscriber: fdans.

Let's archive this in HDFS

I have just reapplied for server access with John Bond
I was supposed to add the new public key myself at, but I can't even view that ticket as Erik_Zachte (ezachte).
Once I'm back online I will review the folders mentioned here, and comment.

This took a while, as I was totally focused on OpenStreetMap this summer (doing field surveys). :-)

Ah hi Erik! Ok thank you!

@Erik_Zachte Hi! Gentle ping to see if you have time to review the files during the next days :)

@elukey Hi! I'll get to this in coming days. Thanks for your patience.

So I looked first into the cron processes that are still enabled on home/ezachte. There are two.

One is running fine (compressing page view counts into daily/monthly zips for 3rd parties).

The other one is running fine up to the rsync step which fails, so this one hasn't been published after March 26.
See e.g.

I copied a small part of the bash file to home/ezachte/wikistats/

Sun Dec 15 12:29:53 UTC 2019
+ cd /home/ezachte/wikistats/dumps/perl
+ rsync -av -ipv4 /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthly.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyAllProjects.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyAllProjectsOriginal.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyCombined.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyMobile.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyOriginal.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyOriginalCombined.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyOriginalMobile.htm
opening tcp connection to thorium.eqiad.wmnet port 873
sending daemon args: --server -vvlogDtpre.iLsfxC "--log-format=%i" .
EN (5 args)
@Error: Unknown module ''
rsync error: error starting client-server protocol (code 5) at main.c(1666) [sender=3.1.2]
+ exit

Any suggestion?