Page MenuHomePhabricator

Add the requestctl element of the x-analytics map to turnlio's webrequest_sampled_128
Closed, ResolvedPublic

Description

This will be very useful for debugging requestctl rules ad-hoc or just checking how often they fire on real traffic, intended or otherwise.

Extracting the entire string value after requestctl= is fine :)

Event Timeline

ping @JAllemandou -- did I put this on the right phab tag? It'd be really awesome to have and I suspect is a pretty easy change

@CDanis if you want to give it a go

https://github.com/wikimedia/analytics-refinery/tree/master/oozie/webrequest/druid

has the files you need to modify. I think you make the same changes to the hourly and daily files. You need to modify the .hql that hoists the x_analytics_map values into top level fields, and then edit the load_webrequests_daily.json.template file to instruct druid to ingest those new fields.

I've never done this but I think that is the way!

(BTW analytics/refinery is in gerrit, not github).

This is indeed not complicated - Andrew's description of what to do is comprehensive :)

Change 828633 had a related patch set uploaded (by CDanis; author: CDanis):

[analytics/refinery@master] Add requestctl sub-field to turnilo webrequest

https://gerrit.wikimedia.org/r/828633

I've written a patch, which is hopefully correct.

Once merged, would it be possible to re-run the past few weeks of webrequest to populate the new field in the old data?

Change 828633 merged by Joal:

[analytics/refinery@master] Add requestctl sub-field to turnilo webrequest

https://gerrit.wikimedia.org/r/828633

Path merged, next deploy should be around Tuesday next week. I've asked for a rerun of the oozie jobs for the past 4 weeks.

This was deployed yesterday, daily job restarted as of Aug 1st.

(in the future, unfortunately Hive behaves like MySQL in that it doesn't let you use aliases in group by. Like if you select something as s you can group by something but you can't group by s. I always thought this was weird, especially since I wrote a DBMS with a query parser in college in like a few days that did not have this limitation...)

This was deployed yesterday, daily job restarted as of Aug 1st.

(in the future, unfortunately Hive behaves like MySQL in that it doesn't let you use aliases in group by. Like if you select something as s you can group by something but you can't group by s. I always thought this was weird, especially since I wrote a DBMS with a query parser in college in like a few days that did not have this limitation...)

Thanks! And good to know, sorry.

Is some other change needed to make this visible on Turnilo's list of dimensions itself? I don't see it available as a dimension in the UI for the webrequest_sampled_128 table.

Is some other change needed to make this visible on Turnilo's list of dimensions itself? I don't see it available as a dimension in the UI for the webrequest_sampled_128 table.

Actually yes! I completely forgot about the need to update the explicit datasource definition in puppet: https://github.com/wikimedia/puppet/blob/18f9aac84a0525d502d7ed48391caba5766fdf32/modules/turnilo/templates/config.yaml.erb#L1915

Change 830666 had a related patch set uploaded (by CDanis; author: CDanis):

[operations/puppet@production] Add turnilo config for new requestctl field

https://gerrit.wikimedia.org/r/830666

Change 830666 merged by CDanis:

[operations/puppet@production] Add turnilo config for new requestctl field

https://gerrit.wikimedia.org/r/830666

CDanis claimed this task.