Page MenuHomePhabricator

Update the WikiLambda instrumentation to use core interaction events
Closed, ResolvedPublic5 Estimated Story Points

Description

Epic: T346285: [Epic] Create Metrics Platform API for Submitting Core Interaction Events

Background

In T350495: [EPIC] Deploy latest version of Metrics Platform client libraries, we deployed the latest version of the JS client library. We can now migrate the WikiLambda instrumentation to use core interaction events.

Since the the WikiLambda instrumentation has a lot of instrumentation-specific (domain-specific?) custom data, this will provide a good case study in creating and using a new schema that extends the core interaction event schema with the JS client library.

AC

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Hey @DMartin-WMF, @MNeisler – do you know what work would be involved in this, and are we able to do it right now?

Jdforrester-WMF moved this task from To triage to Backlog on the Abstract Wikipedia team board.

Hey @Jdforrester-WMF - Yes, I have a good idea what's involved, but also need some time to get the latest details and do some schema-creation. I've been anticipating this, but didn't think it would be time to do this until the new year. We could do it now, but I think it would unnecessarily disrupt my remaining H1-phase activities. I think it would make more sense to incorporate this into our team's H2 planning (for which I could write a brief PRD).

Note: now that we have to provide schemas with MP instruments, although it's relatively easy to do in the new version of MP, we need to allow time for getting them reviewed and possibly some iteration.

phuedx set the point value for this task to 5.Jan 16 2024, 10:47 AM

I think I have already a draft about how to map fields. Or at least something to start a discussion. I have create a summary in the following document: https://docs.google.com/spreadsheets/d/1WiPR-uzGlJ2LJ5DKks3WYRGV0W2uOl6l2WqwfdqZbrc

  • Once that's done, my main question is: Can we use the event name as another field? I mean, we considered the action field to store the action that is happening for each event but I realized that the event name already contains that information (it's pretty self-descriptive). At this moment the eventName and action fields seems to be a bit redundant because both are storing the same concept
  • Anyway it seems we'll need a custom fragment/schema because, in some cases, it's not enough with the 4-tuple (see repeated fields column in the spreadsheet). In that case, maybe it isn't worth to try to save/reuse fields as I mentioned before
  • According this mapping strategy I have already created the needed fragment and schema and I have also updated the instrument code and stream configuration. Would it be useful to push them right now to discuss it all together?

Thanks, @Sfaci ! Regarding your 3rd question, yes, it would definitely be useful for me to see what the potential new fragment, schema, and stream configuration would look like; that is, if it's easy for you to share them. (I don't feel like I need to see the instrument code at this time, though.) Regarding the first question, I will comment later on that.

Change 992223 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[operations/mediawiki-config@master] [DNM] Update the WikiLambda instrumentation to use core interaction events

https://gerrit.wikimedia.org/r/992223

Change 992224 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[schemas/event/secondary@master] [DNM] Update the WikiLambda instrumentation to use core interaction events

https://gerrit.wikimedia.org/r/992224

Change 992226 had a related patch set uploaded (by Santiago Faci; author: Santiago Faci):

[mediawiki/extensions/WikiLambda@master] [DNM] Update the WikiLambda instrumentation to use core interaction events

https://gerrit.wikimedia.org/r/992226

@DMartin-WMF @phuedx I have pushed all the changes I have on my laptop to show you the entire proposal for Wikilambda to use core interaction events.

First of all, I think it's really useful to read the document that David shared with us for a better understanding of the Wikilambda events: https://docs.google.com/document/d/1lwicEluVvgTfidmGuEsLNudR7BDb4EBijA4Ebi6HuCk/edit

And the following is the entire proposal we have so far:

I just wanted to post here some discussion we had in Slack because I think it's really interesting, I don't want to lose it. I was unable to catch it in the first approach. It's about the namespace field that I confused with the namespace one we use in the common schema. After clarifying these things, I realized there are more events than I thought at the beginning and I'm working on a new approach taking all this into consideration. It turned out that eventNameSpace is a variable that can have different values depending on some events that are described in the Wikifunction's documentation.

In slack:

@DMartin-WMF . Just a couple of questions about it:

  • I have seen there are two .editTester. but I can't find them in the code. Aren't they being used?
  • I have realized I missed some events, .editFunction.publish and .editFunction.cancel, .editImplementation.publish and .editImplementation.cancel these ones seem to call console.log instead of the dispatchEvent function > (if I'm not understanding wrongly the code). Am I right?
  • I don't find in the document two events that I found in the code. They are wf.ui.${eventNamespace}.canceland wf.ui.${eventNamespace}.publish. Should they be considered?

Hi @Sfaci Regarding your questions, I think they are all related to our use of the variable eventNamespace, which appears as follows in our code (in Publish.vue and PublishDialog.vue):
eventLogger.dispatchEvent( wf.ui.${ eventNamespace }.cancel, this.eventData );
this.dispatchEvent( wf.ui.${ eventNamespace }.publish, customData );
Both of these calls construct an event name which incorporates the current value of the variable eventNamespace. The value of eventNamespace depends on context, and it can take 5 different values (editFunction, editImplementation, editTester, editType, editZObject). So that means these 2 calls to dispatchEvent can produce 10 different event names:

  • wf.ui.editFunction.cancel, wf.ui.editImplementation.cancel, wf.ui.editTester.cancel, wf.ui.editType.cancel, wf.ui.editZObject.cancel
  • wf.ui.editFunction.publish, wf.ui.editImplementation.publish, wf.ui.editTester.publish, wf.ui.editType.publish, wf.ui.editZObject.publish

Additional notes:

  • wf.ui.${eventNamespace}.cancel and wf.ui.${eventNamespace}.publish never appear per se as event names, because ${eventNamespace} always resolves to the value of the variable, and that value always becomes part of the event name.
  • Instead of action=cancel and action=publish, as appears in the spreadsheet, I'm thinking it would be more natural to use the 10 event names above as possible values of action (after eliminating "wf.ui." and possibly replacing other "." with "_" if that's more correct syntax). Does that make sense?
  • In that case, there's no need to figure out a mapping for eventNamespace, so that proposed mapping could be deleted from your spreadsheet.
  • Finally, I should say I'm not completely sold yet on the adoption of the action schema fragment (although I do understand the motivation for it). More about this in future communications, after I've finished reviewing things.

I think this info covers your questions, but please let me know if not.

At the time of writting this, the mapping strategy approach has already been modified taking into consideration what I mentioned in my previous comment.

At the time of writing this, the instrument code, schemas and stream configuration are aligned with the new mapping strategy.
Details about the WIP MRs that need to be reviewed are in https://phabricator.wikimedia.org/T350497#9478332

After some comments and suggestions we have made some changes to the mapping strategy. In addition to that, I have changed instrument code and the fragment/schema to keep all these parts aligned during your review.

During the last days we have been discussing a bit in Slack. Just to catch up here I'm going to sum up where we are and what's being reviewed right now:

  • We are working in the new approach using more events with less data (we are considering removing 'is_dirty' and 'isnewzobject' fields
  • We have added new events (create, edit and view) to have the opportunity to manage the new even flows
  • cancel event has been converted to abandon or cancel event depending on whether there is pending changes or not when user finishes editing/creation.
  • We have added zobjecttypeas a field to edit and create events

All details are available at https://docs.google.com/spreadsheets/d/1WiPR-uzGlJ2LJ5DKks3WYRGV0W2uOl6l2WqwfdqZbrc/edit#gid=1314514655 (there is a new tab called event flows)

Please @DMartin-WMF @MNeisler, add what you consider if needed.

@Sfaci I finished reviewing the updated proposal. Overall, all the suggested changes and approach look good to me and I like the new proposed event flows. Below are a few comments and suggested changes/additions:

Addition of activity session or funnel token:

Since the new approach focuses on creating a sequence of events, I would recommend adding a new field to correlate all related events within a sequence, such as funnel_token or specific activity session id (similar to EditAttemptSteps editing_session_id). Right now, we have a performer.session_id logged however that's difficult to rely on as a browser session can be long and a user can perform multiple edit types within a single browser session.

Rethinking approach to cancel and abandon events

cancel event has been converted to abandon or cancel event depending on whether there is pending changes or not when user finishes editing/creation.

I agree with the idea of removing isdirty in an effort to replace some custom fields with individual events; however, creating a new "abandon" event would create two events both related to when a user clicks the "cancel" button.

If possible, I would prefer one single cancel or abort type event that tracks all edits that don't end in publish. This would make analytics simpler as we wouldn't need to aggregate multiple event types to determine how many users finish editing/creating without publishing.

To track if there are pending changes, would it be possible instead to create a new action to track when someone starts to make a change (such as, action = firstChange(flexible with naming here)). This would allow me to review the sequence of events to determine if changes were made.

For example, A user starts to create something and cancels after writing something would be indicated by the following sequence of events create -> firstChange -> cancel.
Would this approach be possible?

cc @DMartin-WMF

Glad to read you like the new proposal!
About your suggestions:

  • Addition of activity session or funnel token: I'd say there is no problem regarding the new field you said (funnel_token) or something similar because some fields like this are already included in the common contract. I added some of them to our spreadsheet (at the end as "some common fields that could be interesting") just to keep them in mind (funnel_token, funnel_name, funnel_event_sequence_position). I thought they could be useful to track the user. We could use those fields without adding them to the custom contract we are working on for Wikilambda. That's why I decided to include only the custom field for each event in the table below. They will be the ones we'll use to prepare the custom schema you'll need.
  • Rethinking approach to cancel and abandon events: I think is something we could do and it fits with the new way we decided to follow. And, if it's better for you to analyze the data, why not do it? I'll prepare the new event flows to review it.

@phuedx any thought about both suggestions?

@DMartin-WMF @MNeisler Just some updates I had pending on my side:

  • The custom schema is already updated. You can take a look at it in the gerrit change. It's created according to the event flows approach we are currently discussing. Keep in mind that fields like funnel_token and other similar ones are already included in the common schema and we don't need to add them to the custom one.
  • I have added a new tab to our common spreadsheet (called "event flows 2.0") to include what @MNeisler proposed before about adding a change event instead of using two events when user cancel editing (cancel/abandon events). In that case, I considered that cancel is the right one we can keep and I have removed the abandon event. That way, we will know that the user is abandoning or not a pending change depending on there was a change event before the cancel one.

cc @phuedx

@Sfaci Thank you. The changes shown in "event flows 2.0" look good to me.

I believe the "publish" button is always disabled unless the user makes a change so we should expect to see an action = change occur before every publish event.

Thanks, @MNeisler for the suggestions.

Addition of activity session or funnel token: If activity session IDs (from mw.eventLog.id.getSessionId, as documented in datahub) can be used easily and reliably, then I'm on board. However:

  • I'm also wondering if Metrics Platform might want to consider providing this property; that is, adding performer.activity_session_id as a contextual attribute, along with performer.session_id and the rest. It would make sense to inquire now if the MP team might want to do this.
  • The datahub documentation linked above mentions that activity session IDs were omitted from the session_tick stream to protect privacy. Does that have any relevance for us?

Rethinking approach to cancel and abandon events: The proposed new event (firstChange) requires a new instrument, but it apparently can be added in a straightforward manner, so I'd be okay with that.

Cc: @phuedx @Sfaci

I'm also wondering if Metrics Platform might want to consider providing this property; that is, adding performer.activity_session_id as a contextual attribute, along with performer.session_id and the rest. It would make sense to inquire now if the MP team might want to do this.

I'll flag this to @VirginiaPoundstone and @WDoranWMF. Thanks!

Some updates have been done:

  • Instrumentation code is updated according to the new mapping strategy (including the new event change)
  • Stream configuration has been updated to include the new contextual attribute activity_token)
  • The change to add the new contextual attribute activity_token to the web/base schema is already pending review: T358758: Adding a new contextual attribute to the Metrics Platform JS client library: active_browsing_session_token
  • The update for fragment/schema for wikilambda is blocked until we merge the previous change (wikilambda schema is based on the web/base one). That's why activity_token is not included yet in the current change

In the ticket description there are links to the current changes updated according to this just, in case you want/can take a look to review something in advance

I'll moved the task to 'Blocked' just while we are reviewing the change about the new contextual attribute. We need that to start a formal review of all the work. Of course anyone can start reviewing anything in advance if you consider it could be useful.

Jdforrester-WMF changed the task status from Open to In Progress.Mar 8 2024, 7:29 PM
Jdforrester-WMF moved this task from Backlog to In Progress on the Abstract Wikipedia team board.

At this moment all changes related to this ticket are updated and active and can be formally reviewed.
All the details about all of them are included in the ticket description.

Sfaci updated the task description. (Show Details)

Change #992224 merged by jenkins-bot:

[schemas/event/secondary@master] Update the WikiLambda instrumentation to use core interaction events

https://gerrit.wikimedia.org/r/992224

Change #992226 merged by jenkins-bot:

[mediawiki/extensions/WikiLambda@master] Update the WikiLambda instrumentation to use core interaction events

https://gerrit.wikimedia.org/r/992226

Change #992223 merged by jenkins-bot:

[operations/mediawiki-config@master] Update the WikiLambda instrumentation to use core interaction events

https://gerrit.wikimedia.org/r/992223

Mentioned in SAL (#wikimedia-operations) [2024-04-03T20:26:20Z] <jforrester@deploy1002> Started scap: Backport for [[gerrit:992223|Update the WikiLambda instrumentation to use core interaction events (T350497)]]

Mentioned in SAL (#wikimedia-operations) [2024-04-03T20:28:52Z] <jforrester@deploy1002> sfaci and jforrester: Backport for [[gerrit:992223|Update the WikiLambda instrumentation to use core interaction events (T350497)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2024-04-03T20:45:24Z] <jforrester@deploy1002> Finished scap: Backport for [[gerrit:992223|Update the WikiLambda instrumentation to use core interaction events (T350497)]] (duration: 19m 03s)

Change #1016914 had a related patch set uploaded (by David Martin; author: David Martin):

[mediawiki/extensions/WikiLambda@master] Remove null and undefined properties from Metrics Platform events

https://gerrit.wikimedia.org/r/1016914

Change #1016914 merged by jenkins-bot:

[mediawiki/extensions/WikiLambda@master] Remove null and undefined properties from Metrics Platform events

https://gerrit.wikimedia.org/r/1016914