Page MenuHomePhabricator

Load webrequest raw data into druid so ops can use it for troubleshooting
Closed, ResolvedPublic3 Estimated Story Points

Description

Load webrequest raw data sampled into druid so ops can use it for troubleshooting. Data is sampled 1/128

Data will have the fields we think are most useful from this table: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Traffic/Webrequest

Event Timeline

Ping @Gilles @Krinkle @BBlack @ema and @faidon so they know this data will be available on a regular basis. Data is available on pivot: https://tinyurl.com/y7ogd7n5

Suggestions welcome as to whether the data on druid should have more fields from webrequest

Friendly remainder to us that we need to tag this data on pivot homescreen

Change 357191 had a related patch set uploaded (by Joal; owner: Joal):
[operations/puppet@production] Add webrequest dataset to pivot configuration

https://gerrit.wikimedia.org/r/357191

Change 357191 merged by Elukey:
[operations/puppet@production] Add webrequest dataset to pivot configuration

https://gerrit.wikimedia.org/r/357191

JAllemandou set the point value for this task to 3.Jun 5 2017, 4:42 PM

thanks @JAllemandou ! I give it a quick try and it looks very interesting, how often is the data loaded from webrequest? IOW how much lag we should be expecting?

Hey @fgiunchedi, you're welcome, I'm glad it's useful.
Data is loaded hourly, with between 1h30 and 2h lag when everything works fine (most of the time)

Let's go ahead with these changes then, if there not additional suggestions by ops team.

This is definitely interesting, so many thanks on behalf of all of us for setting this up and thinking of us! :) Like every tool, I think it will require some time before we get accustomed to having it and remember it when we investigate an incident, but I think over time it will happen and could be super useful.

One thing I've proposed before that could be useful for the raw logs & Druid, but perhaps even for the data in HDFS, is incorporating data from the GeoIP2 ISP database (which I don't believe we're subscribed to, but is fairly cheap). Being able to aggregate by ISP/AS number could be useful for these kind of investigations ("how many hits were by Yandex"). AIUI enriching the data pipeline with that kind of data may not be very easy with the current Kafka/Druid pipeline, though, so if that's the case, perhaps just something to keep in mind for future iterations and/or Hadoop data.

File ticket on this regard, we should follow up on this separately: https://phabricator.wikimedia.org/T167907

let's go ahead with adding thsi data as is however.

Milimetric triaged this task as Medium priority.Jun 22 2017, 3:07 PM