Page MenuHomePhabricator

Push dumps.wm.o logs files to stat1002
Closed, ResolvedPublic


Previously an access request now (see title)

Event Timeline

Addshore created this task.Nov 16 2015, 3:52 PM
Addshore raised the priority of this task from to Medium.
Addshore updated the task description. (Show Details)
Addshore added a subscriber: Addshore.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 16 2015, 3:52 PM
Addshore set Security to None.Nov 16 2015, 3:53 PM
Addshore added a subscriber: Deskana.
hoo added a subscriber: hoo.Nov 16 2015, 3:54 PM
Addshore updated the task description. (Show Details)Nov 16 2015, 4:02 PM

The logs are on dataset1001, but we should really be copying them off somewhere else like all apache logs. Do you have access to logs on another host?

I have access to fluorine which contains mediawiki logs and udp2log.
Also the stat / analytics cluster.

Copying to either of those locations IMO would be good.

Well, these should end up on fluorine like everything else. Let me look into how that works (or someone who knows can tell me now).

Id' rather put em on fluorine like the mw apache logs.

Addshore added a comment.EditedNov 16 2015, 9:23 PM

I don't think any apache logs end up on fluorine
Well it looks like apache error logs end up on fluorine, but not access logs

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Nov 17 2015, 9:54 AM

Ah I see you are right; well then the rsync is fine.

Change 253594 had a related patch set uploaded (by ArielGlenn):
keep fewer dataset web server logs, add date to filename

need to change the file name format for these logs, otherwise it's going ot be very annoying for you on the other end of that rsync. see above patchset.

After looking at the other rsyncs you do (erbium, oxygen), and considering the other syncs the dataset hosts do (datasets downloadable to the public), can datasets push to stat1002 rather than the other way around? We could add that right in the dumps module; the other way, it winds up in the dataset module with the other rsyncs, which doesn't feel clean to me. Also, if you wind up doing this for logs on any other hosts, auth/rsyncd config is centralized on your end instead of spread out on the other hosts.

I am fine with doing it either way :)
Someone from the analytics team may also have an opinion though!

Addshore renamed this task from Requesting access to dataset-admins for Addshore to Push dumps.wm.o logs files to stat1002.Nov 19 2015, 2:00 PM
Addshore updated the task description. (Show Details)
Addshore removed a project: SRE-Access-Requests.

Changed the title and remove access requests per the discussion here

Any progress here?

since no one from analytics noticed (silence = consent) I'll go ahead and do this the way described above.

ArielGlenn moved this task from Backlog to Up Next on the Datasets-General-or-Unknown board.

Change 253594 merged by ArielGlenn:
keep fewer dataset web server logs, add date to filename

Any ETA on when we could start the rsync @ArielGlenn ?

Really? The bot didn't add the changeset to this ticket? Well it's this: for the class, needs some cleanup and then to be called with the right destination. Where should they land exactly?

Ottomata added a subscriber: Ottomata.EditedFeb 11 2016, 2:37 PM

Hey sorry, I don't think I've seen this ticket before, hence the silence! I just commented on change about pull vs. push.

ArielGlenn closed this task as Resolved.Mar 4 2016, 10:35 AM

After a ridiculous amount of help from @Ottomata (thank you!) this is now live, and a manual run of the cron job from the command line worked as expected, so closing.

By the looks of things this should be live but I don't see the logs in the location!

/a/log/webrequest/archive/ on stat1002 is full of them. Are you looking in the right place?

Ahh there you go! I was looking in the wrong place!!!!!!

Addshore added a comment.EditedMay 9 2016, 3:57 PM

T134776 as a followup