Page MenuHomePhabricator

Do whatever is necessary to hook up production Wikidata Query Service to HDFS and other log collection systems
Closed, ResolvedPublic

Description

Top-line metrics to record:

  • Number of queries run per day
  • Number of unique users of the query service per day

Event Timeline

Deskana created this task.May 4 2015, 5:24 PM
Deskana raised the priority of this task from to Normal.
Deskana updated the task description. (Show Details)
Deskana added a project: Discovery.
Deskana added a subscriber: Deskana.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 4 2015, 5:24 PM
Deskana updated the task description. (Show Details)May 4 2015, 5:25 PM
Deskana set Security to None.

We should use the Varnish request logs, passed into Hadoop, for this; EventLogging can be used when we have more specific questions.

Query runs is just a simply COUNT(*); for unique users, let's use unique permutations of (ip, user_agent, accept_language)

Ori suggested another option: logstash as described in http://www.bravo-kernel.com/2014/12/setting-up-logstash-1-4-2-to-forward-nginx-logs-to-elasticsearch/ and then get it to logstash-beta.wmflabs.org

As said in the email thread, I'd like an idea of what the ability of researchers to access that machine easily and conveniently is. If the answer is "there isn't one, we'd have to invent it" please let's look for other options.

@Ironholds so far it's on http://logstash-beta.wmflabs.org but I'm not sure what other options do we have for now.

How does one automatically get data out of logstash?

@Ironholds ok, I'll look into this and update here

How does one automatically get data out of logstash?

Cron and curl, mostly. There is an Elasticsearch feature called percolator but I don't know that it integrates with logstash.

Deskana renamed this task from Log metrics to evaluate the success of the Wikidata Query Service to Do whatever is necessary to hook up production Wikidata Query Service to HDFS and other log collection systems.Jul 2 2015, 5:05 PM

@Ironholds Now that the service is running in production, and is hooked up to varnish, do we need to do anything here or is it all automatically taken care of? Do you know?

Is it hooked up to the same varnishes? Which varnishes is it hooked up to? What do the requests look like? What's the URL it lives at?

@Ironholds @Smalyshev Sounds like you two need to have a quick hangout while I'm out to find answers to these questions. :-)

Indeed. @Smalyshev can you poke me Monday?

Deskana closed this task as Resolved.Aug 25 2015, 8:16 PM
Deskana claimed this task.

@Ironholds informs me that this is done. T109360 is to take the data from the logs and analyse it.