Page MenuHomePhabricator

Record InukaPageView from KaiOS app
Closed, ResolvedPublic

Description

We need to instrument the KaiOS app to send InukaPageView data.

A page view starts when an article is opened and continues until the user moves to a new article or returns to the home page; changing the section or viewing the quick facts does not result in a new pageview.

In addition, a visit to the app main page is also a pageview; it starts when the main page is opened and continues until the user moves to an article (probably through the search). However, since this will require more effort to implement, we can temporarily leave main page visits unlogged.

  • user_id: unique, persistent ID for each install
  • session_id: should behave like the session cookie on web
  • pageview_token: unique ID for each pageview, for deduplicating events
  • client-type: kaios-app
  • referring_domain
    • null for the first page viewed when the app is freshly opened (always the app main page)
    • kaios-app for views referred by other pages within the app
    • if we even implement redirection from the web to the app on KaiOS devices, the domain of the referring wiki (e.g. hi.wikipedia.org)
  • load_dt: timestamp when the page is first rendered, after the loading screen
  • page_open_time: total time spent on the page, in ms
  • page_visible_time: total time the page was actually visible, in ms
  • section_count: how many sections the article contains
    • 0 for the app main page
  • opened_section_count: how many sections have been viewed, through paging or the TOC
    • 0 for the app main page
  • page_namespace: the actual namespace of the page shown to the user
    • -1 for the app main page
    • otherwise, will almost always be 0 (the article namespace) since the app tries to prevent users from finding their way to non-article pages
  • is_main_page: true for the app main page, false otherwise
  • is_search_page: true for the app main page, false otherwise

Event Timeline

SBisson edited projects, added Inuka-Team (Kanban); removed Inuka-Team.

Open question: is there a javascript component we can use to send data from the KaiOS app to eventlogging? Is it what T228181 is?

Pinging @jlinehan and @mpopov for advice.

Kind of? @jlinehan and I are still working to determine how to best integrate EPC on the web, whether to bundle it into EventLogging or make available as a separate library, etc., plus it's a new system which works with the Modern Event Platform overhaul of the backend.

ANYWAY! In the meantime, if your targeted version of KaiOS supports navigator.sendBeacon then that's really all the JS component you need to send data to EventLogging. See https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Client-side_events for details of which url endpoint to send events to.

We do have support for sendBeacon. So you're suggesting we craft the request just like what the server expects and send it to the right URL? Works for me!

@mpopov I just read this from the doc

Note that you can send events from "any" domain but in prod only events coming from wikimedia domains are processed. There are many clones of wikimedia running our code (like bad.wikipedia-withadds.com) that in turns send us fake data.

When installed on the phone, our app is served from something like app://123456789/index.html Does it mean eventlogging won't consider our data because it is coming from a random domain?

@mpopov I just read this from the doc

Note that you can send events from "any" domain but in prod only events coming from wikimedia domains are processed. There are many clones of wikimedia running our code (like bad.wikipedia-withadds.com) that in turns send us fake data.

When installed on the phone, our app is served from something like app://123456789/index.html Does it mean eventlogging won't consider our data because it is coming from a random domain?

Ah, yeah! I had that in mind and was going to mention it but completely forgot. Yes, there will need to be a way to mark incoming events from the app as valid, probably by formatting the User-Agent in a specific way and including specific information (how Wikipedia iOS & Android apps do, I believe) and updating the filter server-side to allow KaiOS.

That will need to be a request for Analytics Engineering and working with them to figure out how the KaiOS app's UA should look to enable ingestion of events sent by it. I'll follow-up with @Neil_P._Quinn_WMF.

Do we need to log every peep of the user?

Do we need to log every peep of the user?

We don't need to and we aren't planning to. This data stream, which will also gather data from a sample of mobile web users in India, doesn't include the specific pages that are viewed and will be entirely deleted after 90 days 😊

We are planning to collect a separate, more detailed stream of data from the app itself, but as with our Android and iOS apps, we will not be tracking the specific pages read by individual devices.

Hope that helps!

@SBisson, I updated the task description with the things we discussed in our meeting this week.

I think the only remaining point is the os field; I suggest we rename it client_type, with allowable values of android-web, ios-web, kaios-web, and kaios-app.

I think the only remaining point is the os field; I suggest we rename it client_type, with allowable values of android-web, ios-web, kaios-web, and kaios-app.

Done

@mpopov I just read this from the doc

Note that you can send events from "any" domain but in prod only events coming from wikimedia domains are processed. There are many clones of wikimedia running our code (like bad.wikipedia-withadds.com) that in turns send us fake data.

When installed on the phone, our app is served from something like app://123456789/index.html Does it mean eventlogging won't consider our data because it is coming from a random domain?

Ah, yeah! I had that in mind and was going to mention it but completely forgot. Yes, there will need to be a way to mark incoming events from the app as valid, probably by formatting the User-Agent in a specific way and including specific information (how Wikipedia iOS & Android apps do, I believe) and updating the filter server-side to allow KaiOS.

That will need to be a request for Analytics Engineering and working with them to figure out how the KaiOS app's UA should look to enable ingestion of events sent by it. I'll follow-up with @Neil_P._Quinn_WMF.

@nshahquinn-wmf Could you reach out to the right people about this so we know what we have to do in the app to ensure the data is ingested?

@SBisson, I've just filed T244547 for pageview counting and T244548 because we'll want to count the app's previews as well. Those tasks set out all the steps I know about, and I've tagged Nuria and Analytics so they can verify and do what's needed in their systems. I'll let you or @hueitan take it from here :)

AMuigai triaged this task as Medium priority.Feb 10 2020, 1:29 PM

The Event Logging's Client-side events said the url is limited to 1000 chars, but the code and the information given in stackoverflow are 2000 chars

Should we follow 2000 or 1000?

The Event Logging's Client-side events said the url is limited to 1000 chars, but the code and the information given in stackoverflow are 2000 chars

Should we follow 2000 or 1000?

@Milimetric, @Krinkle, do you know? It seems like the actual limit is 2000 characters and the Wikitech documentation is just mistaken, but I'm not sure.

The Event Logging's Client-side events said the url is limited to 1000 chars, but the code and the information given in stackoverflow are 2000 chars

Should we follow 2000 or 1000?

@Milimetric, @Krinkle, do you know? It seems like the actual limit is 2000 characters and the Wikitech documentation is just mistaken, but I'm not sure.

Are there ever InukaPageView events that url-encode to strings longer than, say, 950 characters?

Yes, the code is correct. It's been for several years.

When EL first launched we limited the client to 1000 as that was the lowest common denominator between all the relevant layers (various browsers/devices, various proxies at WMF, varnishlog, zeromq, eventlogging/python etc.).

A number of years ago we bumped that to around 2000 and later to around 3000-5000 I think. However for EventLogging specifically the logical limit remains 2000 and I think is mostly limited by browsers (query string/ URL limitations) and varnishlog (request URL memory allocation).

This is the first I see it was documented on that wiki page so that clearly wasn't updated when we did that work :)

Yes, the code is correct. It's been for several years.

When EL first launched we limited the client to 1000 as that was the lowest common denominator between all the relevant layers (various browsers/devices, various proxies at WMF, varnishlog, zeromq, eventlogging/python etc.).

A number of years ago we bumped that to around 2000 and later to around 3000-5000 I think. However for EventLogging specifically the logical limit remains 2000 and I think is mostly limited by browsers (query string/ URL limitations) and varnishlog (request URL memory allocation).

This is the first I see it was documented on that wiki page so that clearly wasn't updated when we did that work :)

Thank you! I've updated the docs.

In T246295#5922749, @hueitan wrote:
In T246295#5921936, @nshahquinn-wmf wrote:

@hueitan what will the app's user agent look like?

Here's one UA example from the device

Mozilla/5.0 (Mobile; Nokia_2720_Flip; rv:48.0) Gecko/48.0 Firefox/48.0 KAIOS/2.5.2

@nshahquinn-wmf @hueitan: I would recommend appending WikipediaApp/<version> to the UA when sending events (if you aren't already) so it gets parsed by UAParser UDF to yield useragent.wmf_app_version.

@nshahquinn-wmf @hueitan: I would recommend appending WikipediaApp/<version> to the UA when sending events (if you aren't already) so it gets parsed by UAParser UDF to yield useragent.wmf_app_version.

It doesn't look like we can spoof the user agent on the sendBeacon call. Have you seen this being done before?

It doesn't look like we can spoof the user agent on the sendBeacon call. Have you seen this being done before?

Nope :( Okay, so the options are…

  • Switch from navigator.sendBeacon to XHR if it's supported on KaiOS and our CORS config has Access-Control-Allow-Headers for User-Agent header. Then we can do something like
var xhr = new XMLHttpRequest();
xhr.open("GET", beaconUrl);
xhr.setRequestHeader("User-Agent", navigator.userAgent + " WikipediaApp/<version>");
xhr.send(data);
  • Add an optional app_version field to InukaPageView to send with events from the app (this is probably the better of the two approaches)
  • Forget about logging app version entirely, if @nshahquinn-wmf is okay with it

It doesn't look like we can spoof the user agent on the sendBeacon call. Have you seen this being done before?

Nope :( Okay, so the options are…

  • Switch from navigator.sendBeacon to XHR if it's supported on KaiOS and our CORS config has Access-Control-Allow-Headers for User-Agent header. Then we can do something like
var xhr = new XMLHttpRequest();
xhr.open("GET", beaconUrl);
xhr.setRequestHeader("User-Agent", navigator.userAgent + " WikipediaApp/<version>");
xhr.send(data);
  • Add an optional app_version field to InukaPageView to send with events from the app (this is probably the better of the two approaches)
  • Forget about logging app version entirely, if @nshahquinn-wmf is okay with it

Thanks for the research, @mpopov! For this data stream (InukaPageView), we could definitely add our own app version field. However, this won't work for pageview (T244547) and preview (T244548) logging; if we use the default user agent for the corresponding requests, we won't be able to distinguish between KaiOS web and KaiOS app traffic in those datasets.

So I would strongly prefer that we change the user agent if possible, and use the modified one for all our requests to Wikimedia servers.

Also, according to @fdans, the WikipediaApp/{{version number}} string specifically at the needs to be at the beginning of the user agent. (T244547#5865738)

after trying with the xhr.setRequestHeader in either GET/POST method from @mpopov, the device doesn't send any request and getting blocked

Would that be possible to have a build-time option to disable this and other forms of tracking? For example in webpack.config.js

after trying with the xhr.setRequestHeader in either GET/POST method from @mpopov, the device doesn't send any request and getting blocked

Hmm, that's unfortunate! If there's definitely no way for us to modify the user agent header, then for this instrumentation (the InukaPageView event logging), it won't be a big problem to use the standard user agent. This schema already contains a client_type field which will allow us to distinguish between KaiOS web and KaiOS app data.

The only issue will be that we won't be able to distinguish data from different versions of the app. I'm not sure how often we'll need to do that, but since it's easy to do, let's add it. I've already added an optional app_version field to the schema, so please just add it to the code.

Not being able to change the user agent will be a bigger problem for setting up the standard pageview counting (T244547), but I'll post about it in that task.

It doesn't look like we can spoof the user agent on the sendBeacon call. Have you seen this being done before?

&

Not being able to change the user agent will be a bigger problem for setting up the standard pageview counting (T244547), but I'll post about it in that task.

I just want to follow up around the User Agent issue. So, normally, an app can set the user agent string, is the problem here that "apps" are just web apps running on top of the mobile browser? If so, it sounds like we need to add nuance in how we parse user agents. Maybe we can extend the schema with a custom user agent that gets used instead of the UA string when available. This way it would be transparent to analysis. I'll double check with the team but this should be possible, let me know if it's desirable and if my assumption is right.

I just want to follow up around the User Agent issue. So, normally, an app can set the user agent string, is the problem here that "apps" are just web apps running on top of the mobile browser? If so, it sounds like we need to add nuance in how we parse user agents. Maybe we can extend the schema with a custom user agent that gets used instead of the UA string when available. This way it would be transparent to analysis. I'll double check with the team but this should be possible, let me know if it's desirable and if my assumption is right.

That would be an elegant solution for EventLogging instrumentation! It would address the issues here and with the VirtualPageView instrumentation (T244548). However, we've come up with workarounds for both cases, even though they're less elegant; here, that's the app_version field, and with VirtualPageView, that's the access_method field we just added.

Moreover, this wouldn't solve the issue for the standard pageview counting, since there we're just making a request to the Page Content Service rather than sending something to EventLogging. In my opinion, coming up with a solution for that would be the most useful thing.

Rileych raised the priority of this task from Medium to High.Mar 11 2020, 1:57 PM

Neil's comment on getting access to the data:

There are two main ways to access events that have reached the servers. One is reading the Kafka event stream using a command line program called kafkacat; events show up close to instantly but it's difficult to find specific events in the stream (e.g. events from the development app out of all the InukaPageView data). So this is more useful right after a big deployment. Instructions: https://docs.google.com/document/d/1DyGT4eGo9bkt7Zz6nlciCKBrLkzYOoektRaFKPRE8hE/edit
The second is querying the Hive databases in our data lake; events take a few hours to show up here, but it's much easier to query for a specific type of event. Basic instructions: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hive
In either case, you'll need analytics data access, which takes time to get but will be very useful when QAing instrumentation. Instructions: https://wikitech.wikimedia.org/wiki/Analytics/Data_access

@SBisson I checked for KaiOS app events since the start of 11 March UTC, and found 7 events. One was from @hueitan's location:

user_idsession_idpageview_tokenclient_typereferring_domainload_dtpage_open_timepage_visible_timesection_countopened_section_countpage_namespaceis_main_pageis_search_pageapp_version
01e8f6c45c157b353c77ea3da06ed93834353ca4e41f9ae96e1969bfd476kaios-appkaios-app2020-03-11T10:31:44.999Z4356043560500FalseFalse

The other six were from your location:

user_idsession_idpageview_tokenclient_typereferring_domainload_dtpage_open_timepage_visible_timesection_countopened_section_countpage_namespaceis_main_pageis_search_pageapp_version
1a522831547935e7d2a40fa25cb41b25ed5785571de7a56416f3d183d40ckaios-appkaios-app2020-03-10T20:15:31.600Z595473045954730400-1TrueTrue
1a522831547935e7d2a40fa25cb41b25ed57855761bea046815607a37ac2kaios-appkaios-app2020-03-10T20:18:24.536Z5937436959374369800FalseFalse
1a522831547935e7d2a40fa25cb41b25ed57855725b797789b49fe094ecakaios-appkaios-app2020-03-10T20:15:02.173Z5957671959576719800FalseFalse
1a522831547935e7d2a40fa25cb41b25ed57855709c1e85cd49a43a9be90kaios-appkaios-app2020-03-11T12:48:00.571Z12851285800FalseFalse
1a522831547935e7d2a40fa25cb41b25ed578557ff0aa0429ccc54a1277bkaios-appkaios-app2020-03-10T20:15:18.344Z5956055959560559600FalseFalse
1a522831547935e7d2a40fa25cb41b25ed578557d238f4c039c3c016acb8kaios-appkaios-app2020-03-10T20:15:08.012Z595708905957089000-1TrueTrue

A few things I noticed:

  • Five of your events have very similar, very high values for page_open_time and page_visible_time: about 59,500 seconds (16.5 hours). However, they all have different pageview tokens and happen close in time to each other. Something seems wrong there.
  • The app_version field is not set.
  • We discussed having the referring_domain set to null for the view to the main page when the app is first opened; you have two main page view events, but neither once has a null referring_domain. Did you already have the app open? Or have we deferred setting a null referrer in that situation?

Ahhh...I just checked the event validation error logs, and I see 2,351 InukaPageView errors from this month. It looks like all of them are kaios-app events. So far, I only see one type of error, where app_version is not recognized as a valid property. This is because we didn't bump the schema revision to 19883738 after I added that field.

So far, I only see one type of error, where app_version is not recognized as a valid property.

Okay, I've checked systematically and there are two different errors occuring. The second occurs when referring_domain is set to null. I remember encountering this before; with EventLogging, if you want a non-required field to have a null value, you can't pass null, because JSON doesn't allow nulls; you just have to not provide the field when you log the event. The intake service will then set the value as null.

@nshahquinn-wmf the schema version and referring_domain have been fixed[1]. You should seen a small batch of events from my location.

@SBisson thank you! I am now seeing quite a few kaios-app events show up in the production database; 180 yesterday and 182 so far today. Most are not from your location, but the locations all correspond to different team members 🙂

The rate of validation errors has also dropped a lot, and the remaining ones are all using the old schema version so it's just outdated code.

@SBisson can I move this to done or something needs to be checked here?

@SBisson can I move this to done or something needs to be checked here?

@nshahquinn-wmf can you confirm that the events you are seeing make sense?

Here are a couple potential issues I've noticed in the events from 18 March to now; I'll add more as I continue my investigation.

@eamedina and @Jpita have a lot of different values for user_id. Since they probably have multiple testing devices and are reinstalling the app frequently, this may be correct, but I'm not sure.

developereventsunique user IDs
Eduardo42911680
Huei4998
José1114416
Stephane429121
Sudhanshu1286

ua-parser seems to be recognizing our KaiOS testing devices as generic Firefox OS smartphones. As long as their user agents include "KaiOS" (case insensitive) somewhere, this should not be happening (I can't see the raw user agents).

os_familydevice_familyevents
AndroidNexus 58
Firefox OSGeneric Smartphone523
LinuxOther3737
Mac OS XOther2074
WindowsOther394

However, the pageviews dataset does include a number of events from our testing devices where the operating system has been recognized as KaiOS (T244547#5991585), so I'm not sure what's going on.

Here are a couple potential issues I've noticed in the events from 18 March to now; I'll add more as I continue my investigation.

I plan to finish this up tomorrow, FYI.

@eamedina and @Jpita have a lot of different values for user_id. Since they probably have multiple testing devices and are reinstalling the app frequently, this may be correct, but I'm not sure.

It actually does seem like this is some kind of bug. The majority of sessions have two different user IDs, which should never happen.

user IDs per sessionfrequency
1820
21362
developereventsunique user IDs
Eduardo42911680
Huei4998
José1114416
Stephane429121
Sudhanshu1286

But 1680 unique IDs, how can it possible!?

they probably have multiple testing devices and are reinstalling the app frequently

@nshahquinn-wmf I can confirm I have only one testing device. I install the app somewhat frequently, probably not as frequently as @Jpita though. Not sure what else I can add to this thread at the moment, you may be on something about this being a bug.

Will keep an eye on this thread, let me know if there's a question for me.

ua-parser seems to be recognizing our KaiOS testing devices as generic Firefox OS smartphones. As long as their user agents include "KaiOS" (case insensitive) somewhere, this should not be happening (I can't see the raw user agents).

os_familydevice_familyevents
AndroidNexus 58
Firefox OSGeneric Smartphone523
LinuxOther3737
Mac OS XOther2074
WindowsOther394

I'm now pretty sure this is an issue with the EventLogging backend, not the app's instrumentation. Filed as T248560.

I've now finished an in-depth examination of the data from our testing devices.

In total, I've found three things that are clearly bugs:

  • the user agent not being parsed correctly. This is work for the Analytics team, not us (T248560).
  • session IDs sometimes being regenerated too early (T248753)
  • page visible times sometimes being incorrect (T248757)

I've found one thing that might be a bug: when I order users' events by time, there are frequently "non-contiguous" pageviews, where the same pageview token recurs multiple times, separated by at least one different token. With KaiOS devices, the majority of the recorded pageviews were actually non-consecutive.

OS familypageview is consecutivenumber of pageviews
AndroidFalse1
AndroidTrue4
KaiOSFalse181
KaiOSTrue91
Mac OS XFalse224
Mac OS XTrue636
WindowsFalse51
WindowsTrue72

This might reflect use of the back button to reopen previous articles; if that's the case, this behavior is probably reasonable.

Finally, I'm now not worried about sessions which have more than one user ID (T242358#5993180); all of these came from desktop devices so I'm assuming it's some quirk of development environments which won't affect actual users.

My notebook is up on Github.

Since the instrumentation has been set up for some time now, we're closing this. My recent bug reports (T248757 and T248753) will live on separately.