Page MenuHomePhabricator

Archive /home/ezachte data on stat1007
Open, MediumPublic

Description

/home/ezachte on stat1007 is 687G. Most of that is in

128G	./wikistats_backup
557G	./wikistats_data

Assuming this data isn't updated or rsynced to stats.wikimedia.org anymore, can we either delete or archive it in HDFS?

Related Objects

StatusAssignedTask
OpenNone
OpenNone

Event Timeline

Ottomata created this task.Nov 13 2019, 6:46 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 13 2019, 6:46 PM
Ottomata renamed this task from Archive /home/ezacthe data on stat1007 to Archive /home/ezachte data on stat1007.Nov 14 2019, 6:25 PM
fdans triaged this task as Medium priority.Nov 14 2019, 6:33 PM
fdans moved this task from Incoming to Operational Excellence on the Analytics board.
fdans added a subscriber: fdans.

Let's archive this in HDFS

I have just reapplied for server access with John Bond
I was supposed to add the new public key myself at https://phabricator.wikimedia.org/T215790, but I can't even view that ticket as Erik_Zachte (ezachte).
Once I'm back online I will review the folders mentioned here, and comment.

This took a while, as I was totally focused on OpenStreetMap this summer (doing field surveys). :-)

Ah hi Erik! Ok thank you!

@Erik_Zachte Hi! Gentle ping to see if you have time to review the files during the next days :)

@elukey Hi! I'll get to this in coming days. Thanks for your patience.

So I looked first into the cron processes that are still enabled on home/ezachte. There are two.

One is running fine (compressing page view counts into daily/monthly zips for 3rd parties).

The other one is running fine up to the rsync step which fails, so this one hasn't been published after March 26.
See e.g. https://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm

I copied a small part of the bash file to home/ezachte/wikistats/dammit.lt/bash/test_rsync.sh

Sun Dec 15 12:29:53 UTC 2019
+ cd /home/ezachte/wikistats/dumps/perl
+ rsync -av -ipv4 /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthly.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyAllProjects.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyAllProjectsOriginal.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyCombined.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyMobile.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyOriginal.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyOriginalCombined.htm /home/ezachte/wikistats_data/dumps/out/out_wp/EN/TablesPageViewsMonthlyOriginalMobile.htm thorium.eqiad.wmnet::stats.wikimedia.org/htdocsEN
opening tcp connection to thorium.eqiad.wmnet port 873
sending daemon args: --server -vvlogDtpre.iLsfxC "--log-format=%i" . stats.wikimedia.org/htdocs
EN (5 args)
@Error: Unknown module 'stats.wikimedia.org'
rsync error: error starting client-server protocol (code 5) at main.c(1666) [sender=3.1.2]
+ exit

Any suggestion?