Page MenuHomePhabricator

Update MobileWikiAppTalk Schema to track session length
Closed, ResolvedPublic

Description

Background
The Android team is working on updates to talk pages in order to improve discovery in a non intrusive way and user understanding of how to leverage talk pages to communicate with each other and improve articles. One of our research questions is, what if any changes does our interventions make on the time users spend on talk pages.

The Task
Update our existing schema to ensure we are tracking session length while adhering to privacy policies

Event Timeline

Note: @JTannerWMF - The MobileWikiAppTalk schema currently tracks time_spent for the following tasks:

  • submit
  • refresh
  • new_topic_click
  • reply_click
  • lang_change

Events open_topic and open_talk are also tracked without time_spent as they are initialization events.

If our Talk Page redesign adds new event/actions we will need to instrument tracking in this schema.

SNowick_WMF renamed this task from Update Schema to track session length to Update MobileWikiAppTalk Schema to track session length.Dec 6 2021, 8:28 PM

This already exists. Disregard.

We will need to add an is_anon column to this schema in order to sort drop off and time spent rates for users by logged in vs. anon.

The field pagens in this schema is populated with the Talk page title (ie. Talk, Discussion, Обсуждение, User Talk), can we add a field that uses the page namespace code number? Using the naming convention of MediaWiki_history namespace is the code and title is the name of the page, should we consider adding a pagetitle field for the title and using pagens for the namespace code? This will mean our data is not backwards compatible but since we only retain 90 days of data and aren't taking away any data this should be ok, pending any objections for reasons I haven't considered. @Dbrant and @Sharvaniharan can you weigh in?

The workaround I'm using for this is to add query parameters to pagens depending on which wiki is being queried, since some of the wiki are non-English and the Talk page names are not in English.

regexp_like(event.pagens, '(?i)talk') -- all Talk (?i) is for case insensitive
regexp_like(event.pagens, '(?i)Discus') -- frwiki Talk
regexp_like(event.pagens, '(?i)نقاش')) -- arwiki Talk
regexp_like(event.pagens, '(?i)संवाद') --hiwikiTalk
regexp_like(event.pagens, '(?i)Pembicaraan') -- idwiki Talk
regexp_like(event.pagens, '(?i)note') -- jawiki Talk
regexp_like(event.pagens, 'ノート') -- jawiki Talk

Data validation shows the latest version of the app (2.7.50396-r-2022-03-03) is sending the correctly-formatted pagens values to MobileWikiAppTalk. See https://phabricator.wikimedia.org/T301023.

This ticket should be resolved, currently all actions track time spent values. Any future Talk page design changes will need to have time_spent tracked as well. Those new values can be added to future design change implementation requests.

Note: Re-opening this ticket for @Sharvaniharan investigation, in latest query (only v. 50396) we are seeing events with 0 and 2 values in data. From my understanding we should only be seeing odd number values since these are all Talk pages. We are also seeing a small number of events with -1 values which also needs to be checked, we shouldn't see any negative values, noting this might be a clue to solving what's going on. I verified this on Hue as well as on Presto to make sure it wasn't just a string formatting issue.

Data

QUERY:

SELECT
CAST(FROM_ISO8601_TIMESTAMP(dt) AS DATE) AS DATE,
event.pagens as pagens,
count (*) as count
FROM event.mobilewikiapptalk
WHERE YEAR = 2022 AND MONTH >= 3 AND DAY >= 6
AND regexp_like(useragent.wmf_app_version, '-r-')
AND regexp_like(useragent.wmf_app_version, '50396')
GROUP BY CAST(FROM_ISO8601_TIMESTAMP(dt) AS DATE),event.pagens
ORDER BY CAST(FROM_ISO8601_TIMESTAMP(dt) AS DATE)

Hi @SNowick_WMF
Thank you for re-opening. I think I see what the issue is on our end. We might in some cases be sending the namespace of an article rather than its talk page's namespace number. -1 is for special namespaces. Will update the ticket here when I fix it. Thank you :)