Page MenuHomePhabricator

Create, and deploy working MobileWebUIActionsTracking schema
Closed, ResolvedPublic5 Story Points

Description

Background

As a part of AMC, we will be adding links to various special pages and other pages in a way similar to the menus on vector. We would like to look at engagement for these links so that we can iterate these lists in the future

Following T217851 and T218627, and while we're at T216152, we would like to use this schema to also measure clickthrough rates on navigation links outside the main menu (e.g. the new history link) . This might amount to re-building parts of the former MobileWebClickTracking schema (which was split up some years ago due to EL performance limitations that today no longer apply).

Apart from instrumenting the additional buttons, this should involve adding a pageloaded event (similar to Schema:Print or Schema:MobileWebShareButton)

QA steps

On staging environment sampling rate is bumped to 100% for a new schema. It means that every click will be tracked with MobileWebUIActionsTracking.
First, please enable the EventLogging log, so it's easier to test events. HOW TO can be found in EventLogging Guide, See logging in your browser section

Then please verify that clicking on main menu entries send events:

  • clicking Home button sends 'home'
  • clicking Random button sends 'random'.

In short - every click on a element with 'data-event-name' HTML attribute has to trigger an event. The "action" has to be "click", and the event name is defined by data-event-name html attribute.

Plan

  • Create new schema MobileWebUIActionsTracking
  • Move tracking code to WikimediaEvents repository. MinervaNeueu extension shouldn't be aware of Wikimedia analytics tracking system.
  • Decommision MobileWebMainMenuClickTracking - created a separate ticket for that

QA Results

ACStatusDetails
1T220016#5371019
2T220016#5371019

Acceptance criteria

Use the MainMenuClick schema as base and

  • Add the following:
  • Any links contained in the following menus:
    • main menu
    • actions bar
    • overflow menu
    • user menu (once it exists)
    • notifications (just the icon, not individual notifications)
  • The main menu link
  • The overflow menu link
  • The user menu link
  • Remove the following
  • username
  • Double check sampling rate: sampling rate should be 50% for all links

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@MNeisler do we want to enable the schema on production with sampling rate 50% ?

pmiazga updated the task description. (Show Details)Jul 30 2019, 5:09 PM

Note: removed requirement of sampling rate 100% for all AMC users. We're enabling AMC for all users shortly, there is no need for special handling AMC users any more.

I clarified with @pmiazga that it's expected that we're not sending seen events for UI elements yet. That is, we're tracking UI element interactions as we were before.

@MNeisler do we want to enable the schema on production with sampling rate 50% ?

@pmiazga Yes, I think we can enable with the 50% sampling rate for all clicks.

Per T220016#5377322. Alternatively, I'm happy to sign this off and create an "enable the instrumentation" task.

phuedx removed phuedx as the assignee of this task.Jul 30 2019, 6:42 PM
phuedx added a subscriber: phuedx.

Change 526516 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[operations/mediawiki-config@master] Enable MobileWebUIActionsTracking schema with 50% sampling rate

https://gerrit.wikimedia.org/r/526516

Change 526516 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable MobileWebUIActionsTracking schema with 50% sampling rate

https://gerrit.wikimedia.org/r/526516

Mentioned in SAL (#wikimedia-operations) [2019-07-30T23:06:53Z] <catrope@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Enable MobileWebUIActionsTracking schema with 50% sampling rate (T220016) (duration: 00m 48s)

Jdlrobson added a comment.EditedJul 30 2019, 11:09 PM

Events are sampled, but looks like sampling is not by user session or page - 50% of the time when refreshing I trigger events, 50% of the time I don't. Not sure if that was intentional.

I had to revert the deploy today - it caused this prominent spike in logstash

`
schema:EventError event_revision:19207217 level:ERROR wiki: type:eventlogging message:Additional properties are not allowed (u'modes' was unexpected) uuid:cf59c1d8b32011e981651418775b0d42 normalized_message:Additional properties are not allowed (u'modes' was unexpected) revision:14035058 tags:eventlogging_EventError, kafka, input-kafka-eventlogging, truncated_by_filter_truncate, es, normalized_message_untrimmed raw_event:{"dt": "2019-07-30T23:21:50Z", "event": {"action": "click", "destination": "/wiki/Sp%C3%A9cial:Random#/random",

Let's get the schema fixed, QA this again and make sure it gets deployed. Window is a little tight on this if we want it live for next Monday. Also note, the existing schema MobileWebMainMenuClickTracking is broken so currently we're not tracking any main menu clicks.

Jdlrobson renamed this task from Create new MobileWebUIActionsTracking schema to Create, and deploy working MobileWebUIActionsTracking schema.Jul 30 2019, 11:29 PM
Jdlrobson updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2019-07-30T23:31:24Z] <catrope@deploy1001> Synchronized wmf-config/InitialiseSettings.php: Revert "Enable MobileWebUIActionsTracking schema with 50% sampling rate" (T220016) (duration: 00m 47s)

Change 526688 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/extensions/WikimediaEvents@wmf/1.34.0-wmf.15] Improved MobileUIActions tracking schema

https://gerrit.wikimedia.org/r/526688

Change 526691 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[operations/mediawiki-config@master] Enable MobileWebUIActionsTracking schema with 50% sampling rate

https://gerrit.wikimedia.org/r/526691

Change 526688 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.34.0-wmf.15] Improved MobileUIActions tracking schema

https://gerrit.wikimedia.org/r/526688

Change 526691 merged by jenkins-bot:
[operations/mediawiki-config@master] Enable MobileWebUIActionsTracking schema with 50% sampling rate

https://gerrit.wikimedia.org/r/526691

Mentioned in SAL (#wikimedia-operations) [2019-07-31T16:37:52Z] <urbanecm@deploy1001> Synchronized php-1.34.0-wmf.15/extensions/WikimediaEvents/: SWAT: [[:gerrit:526688|Improved MobileUIActions tracking schema]] (T220016) (duration: 00m 54s)

Mentioned in SAL (#wikimedia-operations) [2019-07-31T16:39:32Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[:gerrit:526691|Enable MobileWebUIActionsTracking schema with 50% sampling rate]] (T220016) (duration: 00m 58s)

Jdlrobson assigned this task to phuedx.Jul 31 2019, 5:03 PM

A few quick checks of the current data:

select event.name, count(*) as event_count
from event.mobilewebuiactionstracking
where year = 2019 and month=7 and day = 31 
group by event.name
ORDER BY event_count DESC LIMIT 10000;

name	event_count
random	16237
home	10567
settings	5150
login	2092
nearby	1421
watchlist	812
profile	425
contributions	418
logout	133
homepage	9
communityportal	5
specialpages	3
preferences	2
select event.modes, COUNT(*) AS modes_count 
from event.mobilewebuiactionstracking
Where year = 2019 and month=7 and day = 31 
group by event.modes
ORDER BY modes_count DESC LIMIT 10000;

modes	modes_count
stable	36524
beta	667
beta,amc	42
stable,amc	35
desktop	6

At a glance this data is very consistent with what we were seeing in the old schema (obviously excluding AMC entries) - random was always the most popular feature there.

pmiazga reopened this task as Open.Aug 8 2019, 5:19 PM
pmiazga claimed this task.

Re-opening this task as not all menu entries are tracked. Some UserMenu/Overflow menu elements do not have data-event-name attribute, thus interactions with those elements are not tracked.

ovasileva updated the task description. (Show Details)Aug 8 2019, 5:22 PM

Re-opening this task as not all menu entries are tracked. Some UserMenu/Overflow menu elements do not have data-event-name attribute, thus interactions with those elements are not tracked.

added the above to the acceptance criteria (as it was in the "background" section below)

Change 529178 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/extensions/WikimediaEvents@master] Track minerva.MobileWebUIActionsTracking events

https://gerrit.wikimedia.org/r/529178

Per slack conversation with @MNeisler

Because now we're using generic MobileWebUIActionsTracking schema, we should prefix event names so it's clean where those come from.
We will do following steps:

  • add menu. prefix when tracking each menu entry, eg: menu.random, menu.home, menu.profile ....
  • add ui. prefix when tracking menu triggers, eg ui.mainmenu, ui.usermenu, ui.overflowmenu
  • watchstar icon will be tracked with menu.watch and menu.unwatch depends on the watch state. If user is not watching the page and user clicks the watchstar icon - menu.watch event should be sent.

Also, per conversation with @ovasileva there is no need to distinguish clicks from same entries but located in different menus. For example the languages menu entry can be in toolbar (regular user), or in the overflow menu (AMC user visiting user page). In both cases the Language menu entry will be tracked with menu.languages code.

Change 529180 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/skins/MinervaNeue@master] Track all menu interactions

https://gerrit.wikimedia.org/r/529180

Marking patch as WIP as analytics infrastructure will need to be consulted on the newly expected traffic to this schema. I'm fairly confident we haven't run this by Nuria etc.. and this increase is going to be considerable when we sample at 50%. Let's not melt the infrastructure. Alternatively we should limit this logging to AMC only to ensure traffic estimates are accurate and we don't get blocked here.

pmiazga added a subscriber: Nuria.Aug 21 2019, 6:18 PM

The previous MobileWebMainMenuClickTracking was tracking only main menu clicks. Because those actions are pretty rare, we were able to track such actions with 50% sampling rate.
The new schema MobileWebUIActionsTracking will track much more click actions, not only clicking menu elements, but also clicking different menus, icons on toolbar, tabs. Therefore the 50% sampling rate might be a bit too much for our analytics server.
It's very difficult to estimate how many events per minute we will get, instead of spending too much time on trying to come up with some numbers, I propose

  • change the sampling rate to something very small, like 1% just before the train (Thursday, EU mid day swat)
  • when train hits all wikis, and analytics servers are doing fine and we can handle 10 times more, bump it to 10% in the first available SWAT window, Thursday evening sf ?)

@Nuria are you ok with us taking this path?, First tring 1%, and then bumping to 10%?
@MNeisler is 10% sampling rate for you ok? Do you want to go higher?
/cc @ovasileva

Nuria added a comment.Aug 21 2019, 6:48 PM

@pmiazga Looks like this schema is to be deployed to all wikis. It has some unexplained peaks that do not look too good, as volume spikes for too short of a period (this could be an issue on the collection end if peaks were to appear in other schemas but they do not).
If the number of elements we are tracking now is an order of magnitude bigger than before the peaks might be quite significant. I suggest a very cautious sampling rate of 0.01% for about a day.

@MNeisler is 10% sampling rate for you ok? Do you want to go higher?

I'm fine with increasing to a 10% sampling rate once we confirm it's working ok.

Change 532422 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[operations/mediawiki-config@master] Drop MobileWebUIActionsTracking sampling rate to 0.01%

https://gerrit.wikimedia.org/r/532422

Change 529178 abandoned by Pmiazga:
Track minerva.MobileWebUIActionsTracking events

Reason:
we decided to fix the MainMenu so it doesn't call stopPropagation(). This patch is not needed atm.

https://gerrit.wikimedia.org/r/529178

Change 532427 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/MobileFrontend@master] Edit icon should trigger UI click tracking events

https://gerrit.wikimedia.org/r/532427

Change 532429 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/skins/MinervaNeue@master] Avoid unnecessary stopPropagation usage so event click tracking can work

https://gerrit.wikimedia.org/r/532429

Change 532427 merged by jenkins-bot:
[mediawiki/extensions/MobileFrontend@master] Edit icon should trigger UI click tracking events

https://gerrit.wikimedia.org/r/532427

Change 532429 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] Avoid unnecessary stopPropagation usage so event click tracking can work

https://gerrit.wikimedia.org/r/532429

Change 529180 merged by jenkins-bot:
[mediawiki/skins/MinervaNeue@master] Track all menu interactions

https://gerrit.wikimedia.org/r/529180

Change 532422 merged by jenkins-bot:
[operations/mediawiki-config@master] Drop MobileWebUIActionsTracking sampling rate to 0.01%

https://gerrit.wikimedia.org/r/532422

Mentioned in SAL (#wikimedia-operations) [2019-08-27T11:11:24Z] <pmiazga@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:532422|Drop MobileWebUIActionsTracking sampling rate to 0.01% (T220016)]] (duration: 00m 46s)

Now we're tracking all clicks on menu elements (Main Menu, Toolbar, Overflow, User) with 0.01% sampling rate.
Previously, for the MainMenu only, we were getting ~80-100 events per 10 minutes, now we're getting less than 1 even per 10 minutes (see https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1&var-schema=MobileWebUIActionsTracking&from=now-7d&to=now ). I'll bump the sampling rate by 100 (to 1%).

/cc @Nuria @MNeisler

Nuria added a comment.Sep 2 2019, 1:08 PM

Sounds good, +1

Change 533930 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[operations/mediawiki-config@master] Bump MobileWebUIActionsTracking sampling rate to 1 percent

https://gerrit.wikimedia.org/r/533930

Change 533930 merged by jenkins-bot:
[operations/mediawiki-config@master] Bump MobileWebUIActionsTracking sampling rate to 1 percent

https://gerrit.wikimedia.org/r/533930

Mentioned in SAL (#wikimedia-operations) [2019-09-03T11:10:59Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:533930|Bump MobileWebUIActionsTracking sampling rate to 1 percent (T220016)]] (duration: 00m 53s)

Mentioned in SAL (#wikimedia-operations) [2019-09-03T11:25:44Z] <ladsgroup@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:533930|Bump MobileWebUIActionsTracking sampling rate to 1 percent (T220016)]] (duration: 00m 52s)

pmiazga added a comment.EditedSep 3 2019, 5:16 PM

We're receiving ~15 events per minute (previously we were receiving ~100 events), I think that we can still safely bump the sampling rate to ~10%, but first, we need to understand why do we get an events spike from time to time.

https://grafana.wikimedia.org/d/000000018/eventlogging-schema?orgId=1&var-schema=MobileWebUIActionsTracking&from=now-10d&to=now

Nuria added a comment.Sep 3 2019, 9:42 PM

we need to understand why do we get an events spike from time to time.

Agreed, I bet if you look at the data on hadoop it will have a loads of clues. Let us know if you need help.

I spent a little time digging into the data that occurs around some of the event spikes. I specifically looked at the spikes seen at 2019-08-20 on 12:00pm and 2019-09-04 at 17:30pm.

These spikes seem to be due to a large number of events recorded for a single ip address and session token at these times likely indicating an unflagged bot (It was not flagged as useragent.is_bot).

On August 20th, there were a total of 2636 events associated with a single ip address and session token between 11:00 and 12:00, compared to around 1 to 40 events recorded for all other ips during that hour. All the events recorded from this ip address during this time were clicks to the random feature and came from a Chrome Mobile browser.

SELECT ip, event.token,
COUNT(*) as events
FROM event.mobilewebuiactionstracking
WHERE year = 2019 and month =08 and day = 20
AND dt > '2019-08-20T11:00:00Z'
AND dt < '2019-08-20T12:00:00Z'
GROUP by ip, event.token
ORDER BY events DESC LIMIT 100

On September 4th, there were a total of 373 events associated with a single ip address and session token between 16:30 and 17:30, compared to around 1 to 8 events recorded for all other ips during that hour. It looks like all the events recorded from this ip address during this time were clicks to the menu.edit feature and came from a Chrome Mobile browser.

SELECT ip, event.token,
COUNT(*) as events
FROM event.mobilewebuiactionstracking
WHERE year = 2019 and month =09 and day = 04
AND dt > '2019-09-04T16:30:00Z'
AND dt < '2019-09-04T17:30:00Z'
GROUP by ip, event.token
ORDER BY events DESC LIMIT 100

Let me know if there's anything else you think we should look into further.

Bumping sampling rate to 10%

Change 535536 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[operations/mediawiki-config@master] Bump MobileWebUIActionsTracking sampling rate to 10 percent

https://gerrit.wikimedia.org/r/535536

Change 535536 merged by jenkins-bot:
[operations/mediawiki-config@master] Bump MobileWebUIActionsTracking sampling rate to 10 percent

https://gerrit.wikimedia.org/r/535536

Mentioned in SAL (#wikimedia-operations) [2019-09-10T11:08:59Z] <urbanecm@deploy1001> Synchronized wmf-config/InitialiseSettings.php: SWAT: c780fa4: Bump MobileWebUIActionsTracking sampling rate to 10 percent (T220016) (duration: 00m 55s)

pmiazga removed pmiazga as the assignee of this task.Sep 10 2019, 11:14 AM

We're up to 10%! @MNeisler - if it looks good from your side, could you sign off?

MNeisler closed this task as Resolved.Sep 13 2019, 6:33 PM

I did a quick check of the current data coming in and number of events. It looks good from my side. Some instrumentation errors are is being addressed in another task. I'll go ahead and close this one.