Page MenuHomePhabricator

[SPIKE] Investigate how sessionID is used in various schemas on talk pages
Closed, ResolvedPublic

Description

In order to evaluate the impact of the Usability Improvements we are making, we will need to track desktop talk page views.

T302999#7774670 proposes that we track desktop talk page views using the DesktopWebUIActions schema.

A requirement for doing the above is making sure that the following four schemas are sampling events at the same rates and using the same methods :

  • VisualEditorFeatureUse
  • EditAttemptStep
  • DesktopWebUIActions
  • Talk_Page_Edit
NOTE: a similar investigation into how sessionIDs across mobile talk page schema is happening in T303653.

Open Question(s)

  • 1. To what extent – if any – is the sessionID that the VisualEditorFeatureUse, EditAttemptStep, DesktopWebUIActions, Talk_Page_Edit assigns to people consistent?
  • 2. If the sessionID that the VisualEditorFeatureUse, EditAttemptStep, DesktopWebUIActions, Talk_Page_Edit assigns to people are NOT consistent, what work would be involved in making them so?

Done

  • Answers to the ===Open Question(s) above are documented

Event Timeline

ppelberg moved this task from Backlog to Triaged on the DiscussionTools board.
ppelberg moved this task from Incoming to Upcoming on the Editing-team (Kanban Board) board.

Quick definition: I'm going to use "editing schemas" to refer to EditAttemptStep, VisualEditorFeatureUse, and talk_page_edit, and "UIActions schemas" to refer to DesktopWebUIActions and MobileWebUIActions.

We have three things called a "session":

  • browsing session, fetched via mw.user.sessionId(), which remains constant as a user navigates the site. This is called token in the UIActions schemas, and session_token in EditAttemptStep.
  • editing session, generated every time a new EditAttemptStep init occurs. This is editing_session_id/editSessionId/session_id across the Editing schemas (it's inconsistently named, but is always the same between them in a given session)
  • pageview session, which is generated on every pageload and shared amongst code via mw.user.getPageviewToken(). This is what's used in the UIAction schemas and DiscussionTools to decide whether we're sampling events. (This is not what's used in WikiEditor / VisualEditor on desktop, though.) It's also not logged in any way in UIActions, but is present as page_token in EditAttemptStep.

These are completely different things, and are largely incompatible. EditAttemptStep can have multiple editing sessions in a given pageload, let alone browsing session, and we want to be able to distinguish between them. If we just used the browsing sessionid as the editing sessionid, we'd massively change the meaning of the data.

This browsing sessions token is logged in both the editing and UIAction schemas, as session_token or token respectively. It's also the only session-identification inside the UIAction schemas. For analysis purposes, it seems like this is what we'd need to hang everything on. It's going to be a bit of a pain, though -- it's not present in talk_page_edit because it's a client-side piece of data.

This will leave us with a stream of events all associated with a given browsing session, potentially including multiple edit sessions. You can try to split it up to individual pageloads by watching for action=init in the UIAction events plus grouping the edit sessions by page_token. (Bear in mind: these won't be events for every pageload in a given browsing session, because these are sampled per-pageload.)

If we wanted to make this more consistent, we'd need to decide how. Assuming we don't want to change the meaning of editing sessionids, the simplest change we could make would be to have the UIAction schemas also log the pageview token and then we'd fairly easily be able to group things together.

This leaves us with a decision for @MNeisler: is the current grouping we can generate sufficient? Do we need to add more data to the UIActions schemas?

Thanks for the summary @DLynch.

This leaves us with a decision for @MNeisler: is the current grouping we can generate sufficient? Do we need to add more data to the UIActions schemas?

Given the complexity of grouping by the browser sessions, I'd recommend adding a pageview token to the UIactions schema to be able to accurately join the datasets for our analysis. See my current thinking and a couple open questions below:

  • The key metric [i] we'll need to calculate by joining these schemas requires knowing the timestamp of the action=init event from the UI actions schema and finding the timestamp for the subsequent action = init event in the editing schemas. Since we're interested in tracking events within a single pageview (not a single browser session), being able to join and group by pageview sessions would be the most straightforward and consistent way to do this.
  • I also agree that we don't want to change the meaning of editing session ids to browser session id. The current definition of editing session id is extremely useful to how we analyze and track events within EditAttemptStep.
  • While it would not require the addition of more data, grouping by the currently available browser session ids would make the analysis much more complex (for the reasons detailed in T304036#7856122) and difficult to QA and confirm accurate estimates due to inconsistencies. If possible, I think spending a little more time on instrumentation might help save time in the long run.

@DLynch - a couple questions regarding the use of pageview token:

  • Is there a page_token within talk_page_edit or can this be easily added? Note: Pending resolution of T305541 to confirm we are tracking all saved comments and topics within EditAttemptStep, I should be able to use just the data within EditAttemptStep but this will be needed if for some reason we are unable to resolve that issue.
  • The EditAttemptStep documentation indicates that the page_token field "will only be set for client-side (JavaScript-generated) events, for server-side events the value will be an empty string." Does this mean that we will not be able to track pageview sessions for wikitext editor init events, which are logged server-side in EditAttemptStep.

[i]Of all the Contributors that post on a talk page, the average time duration from when a Contributor views a talk page to when they click an affordance to comment or start a conversation.

Is there a page_token within talk_page_edit or can this be easily added?

There is not. Everything I said about the browsing session also applies here, unfortunately -- it's a client-side piece of data, and talk_page_edit is server-side logging without access to it. (It's also, technically, a different pageview depending on how you look at it.) It's something that could be added in much the same way as we made sure the editing session ID was passed through, if that's required.

Does this mean that we will not be able to track pageview sessions for wikitext editor init events, which are logged server-side in EditAttemptStep.

Yes. You'd need to hang everything from the ready/loaded events for wikieditor. This one can't be changed, since the init happens before we have any way to pass a pageview token in.

It's something that could be added in much the same way as we made sure the editing session ID was passed through, if that's required.

Ok. I think we can hold on to creating that task pending resolution T305541 . If we can ensure all posted topics and comments are logged correct as saveSuccess in EditAttemptStep, then I can reliably use that schema for the metrics that require determining pageview. But good to know this can be added if needed.

Yes. You'd need to hang everything from the ready/loaded events for wikieditor. This one can't be changed, since the init happens before we have any way to pass a pageview token in.

Thanks for clarifying. This is not ideal but not a blocker. I'll make sure to account for this in the planned analyses.

I recommend we go ahead with investigating adding the pageview token to the UI actions schemas. I believe @ppelberg is creating a ticket for that work and this ticket can be resolved unless there are any additional questions or issues. Reassigning this task @ppelberg to confirm.

DLynch renamed this task from [SPIKE] Investigate how sessionID is used in various *desktop* talk page schemas to [SPIKE] Investigate how sessionID is used in various schemas on talk pages.May 2 2022, 3:18 PM

I recommend we go ahead with investigating adding the pageview token to the UI actions schemas. I believe @ppelberg is creating a ticket for that work...

@MNeisler the ticket you're referring to above is the newly-created T307640. I've assigned this task over to you to populate the ===Requirements section based on the approach you and David converged on.

@DLynch: when you're ready to share the patch that "adds a page view token to UIAction schemas," can you please attach it to T307640?

...and this ticket can be resolved unless there are any additional questions or issues. Reassigning this task @ppelberg to confirm.

David + Megan: it sounds like y'all have all that you need to move forward. As such, I'm going to resolve this task.