Page MenuHomePhabricator

Mechanism to flag webrequests as "debug"
Closed, ResolvedPublic

Description

Problem Statement

While debugging, some folks want to send real-looking requests targetted at debug servers. These would go through varnish and therefore end up consumed into HDFS and therefore in our pipelines. The question is, how do we exclude them? One potential is to key off of something in X-Analytics, like pageview=debug or pageview=0 or something like that. Any other ideas are welcome.

Event Timeline

@jijiki in discussing this with the team we want to brainstorm about it a bit. Some think there might be a better way. Give us until end of day tomorrow before you get too far with the varnish changes, is that ok?

Milimetric renamed this task from pageview=0 in X-Analytics supersedes anything else to Mechanism to flag webrequests as "debug".Sep 23 2020, 7:39 PM
Milimetric updated the task description. (Show Details)
jijiki added a comment.EditedSep 24 2020, 9:01 PM

@Milimetric that is fine, take your time and thank you!

ps. this is not urgent

fdans triaged this task as Medium priority.Oct 8 2020, 5:22 PM
fdans moved this task from Incoming to Data Quality on the Analytics board.

@jijiki we talked this over and here are our thoughts:

  • let's use debug=1 in the header, that way it's more generic, in case other data pipelines need to ignore these (and not just the pageview pipeline)
  • we're currently thinking of just excluding all of these requests at "refine" time by basically filtering out "debug=1". This means kafka and the wmf_raw.webrequest table would have the requests, but they wouldn't be brought into wmf.webrequest.

@Milimetric Sorry for the late reply, thank you very much! I will move forward with the relevant patch. Do we need to coordinate after merging it?

Change 629735 had a related patch set uploaded (by Effie Mouzeli; owner: Effie Mouzeli):
[operations/puppet@production] varnish: check for pageview=0 value in X-Analytics header

https://gerrit.wikimedia.org/r/629735

I will move forward with the relevant patch. Do we need to coordinate after merging it?

Let's coordinate on when you'd like to run your first test and we'll try and deploy before that. Merging the patch and running some small sanity checks should be fine.

jijiki moved this task from Inbox 🐅 to Next up 🥌 on the User-jijiki board.Oct 20 2020, 10:50 AM
jijiki added a subscriber: ema.EditedOct 28 2020, 9:11 AM

@Milimetric, after discussing with @ema, traffic feels that those requests should be visible in turnilo (eg webrequests_sampled_128), but we should be able to filter them out easily. Moreover, I gave it some more thought and I think we should add to the condition to have the "x-wikimedia-debug" present as well, along with the X-Analytics: debug=1 header. The way I have written it now, it can be easily used to run requests which would be invisible in turnilo. All our mwdebug servers are small VMs, so sending a lot of traffic towards them would take a lot of time :)

Your thoughts?

@Millimetric, after discussing with @ema, traffic feels that those requests should be visible in turnilo (eg webrequests_sampled_128), but we should be able to filter them out easily.

Indeed! In general, whether or not the requests constitute "debug" traffic is not very important for us. We want to see them in Turnilo, especially if they're a significant amount. Of course it would be totally fine to have an additional dimension available to filter that traffic away if needed.

Ok, some of these things are easy and some are a bit harder. Instead of filtering out the requests at "refine" time, we will update the pageview definition. So the requests will be there in the webrequest dataset, with is_pageview = false. To be able to easily filter them out when playing with webrequest_sampled_128, we'll need to add a dimension, probably x_analytics['debug'] = 1 as is_debug.

But I'm not sure I understand what you mean here:

I gave it some more thought and I think we should add to the condition to have the "x-wikimedia-debug" present as well, along with the X-Analytics: debug=1 header. The way I have written it now, it can be easily used to run requests which would be invisible in turnilo.

because the x-wikimedia-debug doesn't make it into Druid either as far as I see. Let me know if I missed something.

jijiki moved this task from Next up 🥌 to Q3 2020 on the User-jijiki board.Dec 15 2020, 12:06 PM

ping @jijiki on the above question ^. In the meantime we have another request to add client source port to this data, so I wanted to bundle both changes together. That's tracked here T271953. Maybe we can meet quickly to sort this out?

fdans moved this task from Next Up to In Progress on the Analytics-Kanban board.Jan 21 2021, 6:19 PM

Change 629735 merged by Vgutierrez:
[operations/puppet@production] varnish: Set debug=1 in X-Analytics header

https://gerrit.wikimedia.org/r/629735

jijiki closed this task as Resolved.Jan 27 2021, 2:02 PM

@Milimetric patch is merged! We are setting debug=1 in the X-Analytics header if "X-Wikimedia-Debug" is present. Thank you for your effort!