[SPIKE] Investigate how sessionID is used in various schemas on talk pages
Closed, ResolvedPublic
Actions

Description

In order to evaluate the impact of the Usability Improvements we are making, we will need to track desktop talk page views.

T302999#7774670 proposes that we track desktop talk page views using the DesktopWebUIActions schema.

A requirement for doing the above is making sure that the following four schemas are sampling events at the same rates and using the same methods :

VisualEditorFeatureUse
EditAttemptStep
DesktopWebUIActions
Talk_Page_Edit

NOTE: a similar investigation into how sessionIDs across mobile talk page schema is happening in T303653.

Open Question(s)

1. To what extent – if any – is the sessionID that the VisualEditorFeatureUse, EditAttemptStep, DesktopWebUIActions, Talk_Page_Edit assigns to people consistent?
2. If the sessionID that the VisualEditorFeatureUse, EditAttemptStep, DesktopWebUIActions, Talk_Page_Edit assigns to people are NOT consistent, what work would be involved in making them so?

Done

Answers to the ===Open Question(s) above are documented

Related Objects
Search...

Status	Assigned	Task
Open	None	T249579 [EPIC] Usability improvements: make the actions, activity and content within talk pages easier to understand
Open	None	T294481 Instrument Usability Improvements
Resolved	ppelberg	T302999 Create instrumentation spec for Usability Improvements
Resolved	ppelberg	T304036 [SPIKE] Investigate how sessionID is used in various schemas on talk pages

Event Timeline

ppelberg created this task.Mar 17 2022, 12:30 AM

ppelberg moved this task from Backlog to Triaged on the DiscussionTools board.

ppelberg moved this task from Incoming to Upcoming on the Editing-team (Kanban Board) board.

ppelberg added a parent task: T302999: Create instrumentation spec for Usability Improvements.Mar 17 2022, 12:33 AM

ppelberg removed a parent task: T294481: Instrument Usability Improvements.

ppelberg mentioned this in T304037: Ensure Editing Team's use of DesktopWebUIActions schema will not interfere with Web Team's plans.Mar 17 2022, 12:43 AM

MNeisler moved this task from Triage to Tracking on the Product-Analytics board.Mar 17 2022, 1:29 AM

Quick definition: I'm going to use "editing schemas" to refer to EditAttemptStep, VisualEditorFeatureUse, and talk_page_edit, and "UIActions schemas" to refer to DesktopWebUIActions and MobileWebUIActions.

We have three things called a "session":

browsing session, fetched via mw.user.sessionId(), which remains constant as a user navigates the site. This is called token in the UIActions schemas, and session_token in EditAttemptStep.
editing session, generated every time a new EditAttemptStep init occurs. This is editing_session_id/editSessionId/session_id across the Editing schemas (it's inconsistently named, but is always the same between them in a given session)
pageview session, which is generated on every pageload and shared amongst code via mw.user.getPageviewToken(). This is what's used in the UIAction schemas and DiscussionTools to decide whether we're sampling events. (This is not what's used in WikiEditor / VisualEditor on desktop, though.) It's also not logged in any way in UIActions, but is present as page_token in EditAttemptStep.

These are completely different things, and are largely incompatible. EditAttemptStep can have multiple editing sessions in a given pageload, let alone browsing session, and we want to be able to distinguish between them. If we just used the browsing sessionid as the editing sessionid, we'd massively change the meaning of the data.

This browsing sessions token is logged in both the editing and UIAction schemas, as session_token or token respectively. It's also the only session-identification inside the UIAction schemas. For analysis purposes, it seems like this is what we'd need to hang everything on. It's going to be a bit of a pain, though -- it's not present in talk_page_edit because it's a client-side piece of data.

This will leave us with a stream of events all associated with a given browsing session, potentially including multiple edit sessions. You can try to split it up to individual pageloads by watching for action=init in the UIAction events plus grouping the edit sessions by page_token. (Bear in mind: these won't be events for every pageload in a given browsing session, because these are sampled per-pageload.)

If we wanted to make this more consistent, we'd need to decide how. Assuming we don't want to change the meaning of editing sessionids, the simplest change we could make would be to have the UIAction schemas also log the pageview token and then we'd fairly easily be able to group things together.

This leaves us with a decision for @MNeisler: is the current grouping we can generate sufficient? Do we need to add more data to the UIActions schemas?

DLynch mentioned this in T303653: [SPIKE] Investigate how sessionID is used in various *mobile* talk page schemas.Apr 14 2022, 6:10 PM

ppelberg assigned this task to MNeisler.Apr 14 2022, 7:20 PM

ppelberg moved this task from Upcoming to Blocked / Needs More Work on the Editing-team (Kanban Board) board.

Thanks for the summary @DLynch.

This leaves us with a decision for @MNeisler: is the current grouping we can generate sufficient? Do we need to add more data to the UIActions schemas?

Given the complexity of grouping by the browser sessions, I'd recommend adding a pageview token to the UIactions schema to be able to accurately join the datasets for our analysis. See my current thinking and a couple open questions below:

The key metric [i] we'll need to calculate by joining these schemas requires knowing the timestamp of the action=init event from the UI actions schema and finding the timestamp for the subsequent action = init event in the editing schemas. Since we're interested in tracking events within a single pageview (not a single browser session), being able to join and group by pageview sessions would be the most straightforward and consistent way to do this.
I also agree that we don't want to change the meaning of editing session ids to browser session id. The current definition of editing session id is extremely useful to how we analyze and track events within EditAttemptStep.
While it would not require the addition of more data, grouping by the currently available browser session ids would make the analysis much more complex (for the reasons detailed in T304036#7856122) and difficult to QA and confirm accurate estimates due to inconsistencies. If possible, I think spending a little more time on instrumentation might help save time in the long run.

@DLynch - a couple questions regarding the use of pageview token:

Is there a page_token within talk_page_edit or can this be easily added? Note: Pending resolution of T305541 to confirm we are tracking all saved comments and topics within EditAttemptStep, I should be able to use just the data within EditAttemptStep but this will be needed if for some reason we are unable to resolve that issue.
The EditAttemptStep documentation indicates that the page_token field "will only be set for client-side (JavaScript-generated) events, for server-side events the value will be an empty string." Does this mean that we will not be able to track pageview sessions for wikitext editor init events, which are logged server-side in EditAttemptStep.

[i]Of all the Contributors that post on a talk page, the average time duration from when a Contributor views a talk page to when they click an affordance to comment or start a conversation.

MNeisler reassigned this task from MNeisler to DLynch.Apr 18 2022, 3:03 PM

Is there a page_token within talk_page_edit or can this be easily added?

There is not. Everything I said about the browsing session also applies here, unfortunately -- it's a client-side piece of data, and talk_page_edit is server-side logging without access to it. (It's also, technically, a different pageview depending on how you look at it.) It's something that could be added in much the same way as we made sure the editing session ID was passed through, if that's required.

Does this mean that we will not be able to track pageview sessions for wikitext editor init events, which are logged server-side in EditAttemptStep.

Yes. You'd need to hang everything from the ready/loaded events for wikieditor. This one can't be changed, since the init happens before we have any way to pass a pageview token in.

ppelberg reassigned this task from DLynch to MNeisler.Apr 19 2022, 12:15 AM

It's something that could be added in much the same way as we made sure the editing session ID was passed through, if that's required.

Ok. I think we can hold on to creating that task pending resolution T305541 . If we can ensure all posted topics and comments are logged correct as saveSuccess in EditAttemptStep, then I can reliably use that schema for the metrics that require determining pageview. But good to know this can be added if needed.

Yes. You'd need to hang everything from the ready/loaded events for wikieditor. This one can't be changed, since the init happens before we have any way to pass a pageview token in.

Thanks for clarifying. This is not ideal but not a blocker. I'll make sure to account for this in the planned analyses.

I recommend we go ahead with investigating adding the pageview token to the UI actions schemas. I believe @ppelberg is creating a ticket for that work and this ticket can be resolved unless there are any additional questions or issues. Reassigning this task @ppelberg to confirm.

DLynch renamed this task from [SPIKE] Investigate how sessionID is used in various *desktop* talk page schemas to [SPIKE] Investigate how sessionID is used in various schemas on talk pages.May 2 2022, 3:18 PM

DLynch merged a task: T303653: [SPIKE] Investigate how sessionID is used in various *mobile* talk page schemas.

DLynch added subscribers: • Whatamidoing-WMF, Ryasmeen, VPuffetMichel.

ppelberg mentioned this in T307640: Add page view token to UIAction schemas.May 4 2022, 11:58 PM

In T304036#7871981, @MNeisler wrote:>

I recommend we go ahead with investigating adding the pageview token to the UI actions schemas. I believe @ppelberg is creating a ticket for that work...

@MNeisler the ticket you're referring to above is the newly-created T307640. I've assigned this task over to you to populate the ===Requirements section based on the approach you and David converged on.

@DLynch: when you're ready to share the patch that "adds a page view token to UIAction schemas," can you please attach it to T307640?

...and this ticket can be resolved unless there are any additional questions or issues. Reassigning this task @ppelberg to confirm.

David + Megan: it sounds like y'all have all that you need to move forward. As such, I'm going to resolve this task.

Restricted Application added a project: User-Ryasmeen. · View Herald TranscriptMay 5 2022, 12:05 AM

[SPIKE] Investigate how sessionID is used in various schemas on talk pagesClosed, ResolvedPublicActions