Page MenuHomePhabricator

Sticky header: Add agent_type and access_method to sticky header instrumentation
Closed, ResolvedPublic3 Estimated Story Points

Description

Original Request

We use Hive UDFs to add these fields to the webrequest data. The code takes the user agent and/or the URI hostname.

We should add a Refine TranformFunction and apply it to all event data that has those input fields.

Acceptance criteria

  • Add access method to schema tracking returns to top of the page

QA steps

  • follow the QA steps in T292586
  • when the scroll-to-top event gets logged, note that access_method is included:

Screen Shot 2021-11-24 at 2.27.22 PM.png (2×3 px, 1 MB)

QA Results - Beta

ACStatusDetails
1T294246#7555033

QA Results - Prod

ACStatusDetails
1T294246#7555040

Event Timeline

odimitrijevic moved this task from Incoming to Event Platform on the Analytics board.

hi @ovasileva - just some context on why this ticket is on our board. Out of T292586#7454994, a few of the fields requested in the instrumentation spec turns out can/will be available in all event tables in Hive. So this ticket is to upstream those fields to be automatically available for all event logging.

ovasileva raised the priority of this task from Medium to High.Oct 26 2021, 4:05 PM

@mforns @Ottomata I was curious about the timeline for getting this added and whether you need help from us. This blocks our next rollout (hence @ovasileva bumping to high).

@odimitrijevic to help prioritize. The code to get this done would not be a lot, perhaps a few hours of work to write, then a few more hours to test with fake data in the Data Lake to make sure it works, then an hour or two to deploy and configure it to be used.

This would add this information to ALL event tables.

Prioritizing the work after conversation with @Ottomata. Let's aim to complete by 11/29 when the web team is planning to QA work done in https://phabricator.wikimedia.org/T292586#7454994

Hey y'all, after a conversation with @Milimetric and @mforns, we realized that adding these fields are not the right thing to do. I steered you wrong in https://phabricator.wikimedia.org/T292586#7454873, I'm sorry about that.

  • agent_type - bot detection is complicated, and apparently is really only possible to do well via predictions based on individual datasets. The logic we use for isSpider in Webrequest is pretty brittle, and only helps with identifying a few well known and self identifying 'spiders' beyond what UA-Parser gives us in user_agent_map['device_family'] == 'Spider'. We don't want to add it to all events. However, we can help you with the Hive SQL or Spark code needed to re-use this logic in your analysis.
  • access_method - I almost finished the code to add this one, but upon further reflection, determining the 'access method' of a request by inspecting the hostname and the user-agent is also a bit brittle. We do this for Webrequest because we don't have control over the producer of webrequest data. However, your instrumentation should have knowledge of the context it is operating in, e.g. in a mobile app it knows it is sending a request from a mobile app, from the mobile mediawiki web UI, it knows it is in a mobile version of the site, etc. It would be more correct for the instrumentation producer to set this field, so we won't be adding this field to all events. Again, we can help you with the Hive SQL or Spark code needed to re-use the logic we use in Webrequest on your events if you prefer to do that rather than sending it from the producer.

This is my/our fault for not properly grooming or prioritizing this request. I just assumed that adding fields would be the right thing to do, but I don't recall ever discussing this with my team. Something went wrong in our process here; we'll discuss this in our next team retrospective.

Let me know if I can help in any other way; e.g. get new versions of your schema out that include an access_method field and/or code to reuse the isSpider Webrequest logic in your analysis.

thanks @Ottomata for the updates (cc @jwang)

@ovasileva I can work on getting access_method added to the schema and populated by the instrument as a follow up (@jwang confirmed in Slack that we're ok with user_agent_map for the time being)

Thanks @Ottomata for the update, and @cjming for picking this up. I'll edit the task description here to account for adding access method to the schema

ovasileva renamed this task from Add agent_type and access_method to event data to Add agent_type and access_method to sticky header instrumentation.Nov 16 2021, 10:43 AM
ovasileva updated the task description. (Show Details)

Change 739659 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[schemas/event/secondary@master] Update web_ui_scroll schema

https://gerrit.wikimedia.org/r/739659

Change 739668 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[mediawiki/extensions/WikimediaEvents@master] Update scroll instrument

https://gerrit.wikimedia.org/r/739668

bwang removed bwang as the assignee of this task.Nov 18 2021, 9:16 PM
bwang added a subscriber: bwang.

Change 739659 merged by jenkins-bot:

[schemas/event/secondary@master] Update web_ui_scroll schema

https://gerrit.wikimedia.org/r/739659

Change 739668 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Update scroll instrument

https://gerrit.wikimedia.org/r/739668

nray reassigned this task from Edtadros to cjming.
nray moved this task from Code Review to QA on the Readers-Web-Backlog (Kanbanana-FY-2021-22) board.
nray added a subscriber: Edtadros.
cjming updated the task description. (Show Details)
ovasileva renamed this task from Add agent_type and access_method to sticky header instrumentation to Sticky header: Add agent_type and access_method to sticky header instrumentation.Dec 1 2021, 11:33 AM

According to T292586#7542866 it looks like there's a problem in the implementation here. Perhaps the patches are not deployed in production, yet?

Change 743227 had a related patch set uploaded (by Clare Ming; author: Clare Ming):

[mediawiki/extensions/WikimediaEvents@wmf/1.38.0-wmf.9] Update scroll instrument

https://gerrit.wikimedia.org/r/743227

Change 743227 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@wmf/1.38.0-wmf.9] Update scroll instrument

https://gerrit.wikimedia.org/r/743227

Mentioned in SAL (#wikimedia-operations) [2021-12-02T19:26:10Z] <taavi@deploy1002> Synchronized php-1.38.0-wmf.9/extensions/WikimediaEvents/modules/ext.wikimediaEvents/webUIScroll.js: Backport: [[gerrit:743227|Update scroll instrument (T294246)]] (duration: 00m 56s)

Test Result - Beta

Status: ✅ PASS
Environment: beta
OS: macOS Monterey
Browser: Chrome
Device: MBP
Emulated Device:NA

Test Artifact(s):

QA Steps

follow the QA steps in T292586

An event should be logged when a user is believed to have scrolled to the top. For now, this is defined as when the following sequence of events occurs:
The sticky header displays
At least 5 seconds pass
The sticky header disappears (because the user scrolls to to the top)

✅ AC1: when the scroll-to-top event gets logged, note that access_method is included:

Screen Shot 2021-12-07 at 3.20.09 PM.png (229×661 px, 44 KB)

Test Result - Prod

Status: ✅ PASS
Environment: enwiki
OS: macOS Monterey
Browser: Chrome
Device: MBP
Emulated Device:NA

Test Artifact(s):

QA Steps

follow the QA steps in T292586

An event should be logged when a user is believed to have scrolled to the top. For now, this is defined as when the following sequence of events occurs:
The sticky header displays
At least 5 seconds pass
The sticky header disappears (because the user scrolls to to the top)

✅ AC1: when the scroll-to-top event gets logged, note that access_method is included:

Screen Shot 2021-12-07 at 3.25.17 PM.png (233×674 px, 42 KB)

Edtadros updated the task description. (Show Details)

Looks good, resolving. Follow-ups will be documented in T294639: Schema QA sticky header instrumentation