Page MenuHomePhabricator

Push dumps.wm.o logs files to stat1002
Closed, ResolvedPublic

Description

Previously an access request now (see title)

Event Timeline

Addshore raised the priority of this task from to Medium.
Addshore updated the task description. (Show Details)
Addshore added a subscriber: Addshore.

The logs are on dataset1001, but we should really be copying them off somewhere else like all apache logs. Do you have access to logs on another host?

I have access to fluorine which contains mediawiki logs and udp2log.
Also the stat / analytics cluster.

Copying to either of those locations IMO would be good.

Well, these should end up on fluorine like everything else. Let me look into how that works (or someone who knows can tell me now).

Id' rather put em on fluorine like the mw apache logs.

I don't think any apache logs end up on fluorine
Well it looks like apache error logs end up on fluorine, but not access logs

Ah I see you are right; well then the rsync is fine.

Change 253594 had a related patch set uploaded (by ArielGlenn):
keep fewer dataset web server logs, add date to filename

https://gerrit.wikimedia.org/r/253594

need to change the file name format for these logs, otherwise it's going ot be very annoying for you on the other end of that rsync. see above patchset.

After looking at the other rsyncs you do (erbium, oxygen), and considering the other syncs the dataset hosts do (datasets downloadable to the public), can datasets push to stat1002 rather than the other way around? We could add that right in the dumps module; the other way, it winds up in the dataset module with the other rsyncs, which doesn't feel clean to me. Also, if you wind up doing this for logs on any other hosts, auth/rsyncd config is centralized on your end instead of spread out on the other hosts.

I am fine with doing it either way :)
Someone from the analytics team may also have an opinion though!

Addshore renamed this task from Requesting access to dataset-admins for Addshore to Push dumps.wm.o logs files to stat1002.Nov 19 2015, 2:00 PM
Addshore updated the task description. (Show Details)
Addshore removed a project: SRE-Access-Requests.

Changed the title and remove access requests per the discussion here

since no one from analytics noticed (silence = consent) I'll go ahead and do this the way described above.

Change 253594 merged by ArielGlenn:
keep fewer dataset web server logs, add date to filename

https://gerrit.wikimedia.org/r/253594

Any ETA on when we could start the rsync @ArielGlenn ?

Really? The bot didn't add the changeset to this ticket? Well it's this: https://gerrit.wikimedia.org/r/#/c/268129/ for the class, needs some cleanup and then to be called with the right destination. Where should they land exactly?

Hey sorry, I don't think I've seen this ticket before, hence the silence! I just commented on change about pull vs. push.

After a ridiculous amount of help from @Ottomata (thank you!) this is now live, and a manual run of the cron job from the command line worked as expected, so closing.

By the looks of things this should be live but I don't see the logs in the location!

/a/log/webrequest/archive/dumps.wikimedia.org on stat1002 is full of them. Are you looking in the right place?

Ahh there you go! I was looking in the wrong place!!!!!!