Page MenuHomePhabricator

Requests to lists.wikimedia.org should end up in hadoop wmf.webrequest via kafka!
Closed, DeclinedPublic

Description

It does not look like requests to this domain make their way into hadoop!

Having these requests would be great!

9:19 PM <ottomata> but to get into webrequest, there has to be varnish and a varnishkafka instance

Event Timeline

Addshore created this task.Oct 23 2015, 8:26 PM
Addshore raised the priority of this task from to Needs Triage.
Addshore updated the task description. (Show Details)
Addshore added a subscriber: Addshore.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 23 2015, 8:26 PM
Addshore set Security to None.
Addshore added a project: Operations.
Dzahn added a subscriber: Dzahn.Oct 23 2015, 10:36 PM

What are you trying to find out?

Dzahn triaged this task as Medium priority.Oct 23 2015, 10:50 PM

Well, this mainly applies to dumps.wm.o (which the other ticket was open for).
But I was looking to see access and download counts of dumps, which types of dumps etc.
Mainly for wikidata, but of course if dumps.wm.o went to hadoop this could be done for all dumps.

As for lists, looking at access to the archives of lists.

Nuria added a subscriber: Nuria.Oct 26 2015, 3:37 PM

In order to get this requests in hadoop this domain needs to be fronted by varnish, by looking through puppet I do not think that is the case (but i might be wrong). I would add ops to this ticket as they are the ones that have maintained thus far the dumps domain

If you want to find this info at this time maybe you can do so from apache logs?

Dzahn added a comment.Oct 26 2015, 7:47 PM

Ok, let's not mix up dumps. and lists. in a single ticket please.
They are different and unrelated.

I'm still not sure what we are trying to solve by getting these numbers for list archives.

JohnLewis closed this task as Declined.Nov 3 2015, 5:40 PM
JohnLewis claimed this task.
JohnLewis added a subscriber: JohnLewis.

Closing as declined for several reasons:

  • requires Varnish, which lists is not behind and putting it behind varnish gains nothing and in the long run probably over complicates the set up (which proved annoying at the time of the migration anyway)
  • data on viewing archives is probably an overly complex thing to handle in this regards because:
    • no useful data is stored besides the standard IP data (except English ArbCom which does store exact who looked at archives and when and which ones to a T)
    • private list archives like the English Wikipedia ArbCom archive viewing data probably boarders on being data accessible to more people than absolutely necessary as a bad thing

Some lists will make this a complex task honestly and with no real use case that justifies the work to make the changes, likely will never be actioned.