Page MenuHomePhabricator

Mark EditAttemptStep events from VisualEditor reusers like Content Translation
Closed, ResolvedPublic

Description

Background

Situation

  • Content Translation uses the visual editor codebase for its editing surface, so it's very likely that it is emitting EditAttemptStep (and VisualEditorFeatureUse) events.
  • The above is relevant to the Editing Team because they depends on the two schemas mentioned above to understand how people are using VE and DiscussionTools.

Complication

  • As they are currently implemented, when any team does analysis that uses data from the EditAttemptStep and VisualEditorFeatureUse schemas, that analysis will conflate usage data from distinct editing experiences/contexts: Visual editor and Content Translation. Specifically, we'd like to be able to do each of the following:
    • (1) Remove events related to Content translation if we want to review just primary Visual Editor page edits
    • (2) See content translation-related VE edits only
    • (3) Review ContentTranslation and VE edits together

Resulting action

  • Create a way for Product Analytics to differentiate between events logged in VEFU and EAS that are emitted through/by ContentTranslation usage. See the ===Approaches section below for how this could be implemented and achieved.

Approaches

Approach #1: suppress/prevent ContentTranslation events from being logged in EAS and VEFU schemas

  • "We could make sure these are suppressed to avoid skewing the data in those streams, where are meant to be for the "main" visual editor only. Whatever strategy we use could also be used for any other tools that reuse visual editor and should not be included in the same analytics (I'm not aware of any, but if not now, hopefully there will be more in the future!)."

Approach #2: explicitly tag/mark events being logged in EAS and VEFU schemas as being emitted by ContentTranslation

Requirements

  • Any person/group (e.g. members of the Editing and Product Analytics teams) looking at data logged using the VisualEditorFeatureUse and/or EditAttemptStep schemas can differentiate visual editor and non-visual editor events.

Open questions

  • How is ContentTranslation integrated into VE? The answer to this question will inform how we go about "dinstringuishiing" between the events emitted by these two interfaces.
    • ✅ContentTranslation is "integrated" into VE in ways that should give us the flexibility to decide whether it/its events are explicitly noted in integration or editor_interface within the two schemas. See: T270636#6759278.
  • What is an optimal approach for ensuring it is clear where the events VEFU and EAS are being emitted from?
    • ✅Approach #2: explicitly tag/mark events being logged in EAS and VEFU schemas as being emitted by ContentTranslation. See: T270636#6759278.

Done

  • All ===Open questions are addressed
  • ===Requirements are completed
  • An ===Approach is decided upon
  • The ===Requirements are implemented

Event Timeline

ppelberg moved this task from To Triage to Triaged on the VisualEditor board.

Assigning this to @ppelberg to comment the notes from the conversation he and @MNeisler has on 6-January-2021.

Assigning this to @ppelberg to comment the notes from the conversation he and @MNeisler has on 6-January-2021.

✅Done.

ppelberg updated the task description. (Show Details)

Next step

  • @MNeisler to decide which of the ===Approaches listed in the task description will deliver on the ===Requirements. Of course, if there is an approach that should be considered that we have not yet written on this ticket, please add it here.

Next step

  • @MNeisler to decide which of the ===Approaches listed in the task description will deliver on the ===Requirements. Of course, if there is an approach that should be considered that we have not yet written on this ticket, please add it here.

If feasible from an engineering perspecitve, I'd recommend Approach #2: explicitly tag/mark events being logged in EAS and VEFU schemas as being emitted by ContentTranslation as it gives us the most options and information when reviewing the data. With this approach, we have the options to (1) remove events related to Content translation if we want to review just primary Visual Editor page edits (2) see content translation-related VE edits only; or (3) review them both together. This could help highlight any impacts from the Content Translation or other tools on overall Visual Editor trends if needed.

EditAttemptStep already includes a field called event.integration , which lets us distinguish primary page edits (event.integration' = page) from other editor integrations such as discussion tool-related events (event.integration = discussiontools). This seems like it would be the best place to add a type to distinguish edits from content translation and other tools that may use the visual editor codebase in the future.

Next step

  • @MNeisler to decide which of the ===Approaches listed in the task description will deliver on the ===Requirements. Of course, if there is an approach that should be considered that we have not yet written on this ticket, please add it here.

If feasible from an engineering perspecitve, I'd recommend Approach #2: explicitly tag/mark events being logged in EAS and VEFU schemas as being emitted by ContentTranslation as it gives us the most options and information when reviewing the data. With this approach, we have the options to (1) remove events related to Content translation if we want to review just primary Visual Editor page edits (2) see content translation-related VE edits only; or (3) review them both together. This could help highlight any impacts from the Content Translation or other tools on overall Visual Editor trends if needed.

This context is helpful and the strategy you are proposing above sounds good to me, @MNeisler.

EditAttemptStep already includes a field called event.integration , which lets us distinguish primary page edits (event.integration' = page) from other editor integrations such as discussion tool-related events (event.integration = discussiontools). This seems like it would be the best place to add a type to distinguish edits from content translation and other tools that may use the visual editor codebase in the future.

Based on what @Esanders shared with me this morning [i], it sounds like the approach you are describing above will be feasible.

I've updated the task description accordingly and will leave @Esanders or @DLynch to comment whether anything needs to be adjusted once we prioritize working on this.

Next

  • In the meantime, @MNeisler can you please review the now-updated ===Requirements section of the task description to ensure it matches with what you had in mind? If everything looks good, please unassign yourself from the task.

i. ContentTranslation is "integrated" into VE in ways that should give us the flexibility to decide whether it/its events are explicitly noted in integration or editor_interface within the two schemas.

In the meantime, @MNeisler can you please review the now-updated ===Requirements section of the task description to ensure it matches with what you had in mind? If everything looks good, please unassign yourself from the task.

Looks good to me. Thanks for updating.

Pginer-WMF moved this task from Needs Triage to Upstream/Other teams on the ContentTranslation board.
Pginer-WMF removed a project: Language-Team.

There's two changes that need to happen here:

  1. ContentTranslation needs to adjust its Target (in ve.init.mw.CXTarget.js) to contain an appropriate static property integrationType.
  2. The schema needs to be updated to know about whatever that property is set to, since integration is an enum.

The former is easy. The latter does require me to work out how the new schema system works, since EditAttemptStep has been ported over.

DLynch renamed this task from Suppress EditAttemptStep events from visual editor reusers like Content Translation to Mark EditAttemptStep events from VisualEditor reusers like Content Translation.Aug 5 2021, 8:09 PM
DLynch claimed this task.

Change 710348 had a related patch set uploaded (by DLynch; author: DLynch):

[schemas/event/secondary@master] EditAttemptStep: Add a new integration

https://gerrit.wikimedia.org/r/710348

Also, to adjust this going forwards, we could suppress (much) unknown reuse causing logging by just not having a default and making it abort logging if none is set. This wouldn't help with reusers who extend ve.init.mw.ArticleTarget, but the more-generic ve.init.mw.Target would cope.

Change 710353 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/ContentTranslation@master] Set integrationType on CXTarget

https://gerrit.wikimedia.org/r/710353

@DLynch We have two integration to Visual Editor. One is ContentTranslation which you already added. Second is the integration to the Vue based Section Translation application. Should we add this change to SectionTranslationTarget too?

Change 710353 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] Set integrationType on CXTarget

https://gerrit.wikimedia.org/r/710353

Change 710348 merged by jenkins-bot:

[schemas/event/secondary@master] EditAttemptStep: Add a new integration

https://gerrit.wikimedia.org/r/710348

This should probably be verified once the train has gone out -- just double-checking that (a) @MNeisler seeing if there are events which are being stored with the appropriate integration, and (b) me making sure that we don't get some spike in schema validation errors in logstash. (I normally wouldn't worry, but this is the first time I've had to update one of the new-style schemas, so there's at least a chance I've missed something.)

I'll revisit B on Thursday, assuming the train goes smoothly.

Having checked, I don't see any validation errors showing up referencing the integration field. If @MNeisler could confirm we're actually seeing events getting logged with integration==contenttranslation, this'll be done.

@DLynch I checked the data logged in EditAttemptStep and have not seen any events logged to date with event.integration = 'contenttranslation' since the deployment of this addition last Thursday through today (24 August 2021). Only page and discussiontools integration events are currently being logged.

SELECT 
    event.integration,
   COUNT(*) AS n_events
FROM event.editattemptstep
WHERE
   year = 2021
   AND month = 08
   AND day > 18
GROUP BY
   event.integration

Results:
integration	n_events
page	4580299
discussiontools	59247

@MNeisler Thanks for checking -- let me go nudge this and see whether it's just logging being disabled, or if something else needs to happen to the schema...

I finally looked into this. There's a combination of things:

  1. ContentTranslation doesn't log anything to EditAttemptStep because it implements its own Target which doesn't inherit from DesktopArticleTarget (quite reasonably; they're very different workflows). CX could start doing this logging if it's desired -- it'd just need to add the calls to ve.track into the appropriate places in CXTarget.
  2. ContentTranslation doesn't log anything to VisualEditorFeatureUse because CXTarget doesn't specify a platformType, which the logging code requires. This would be trivial to add, but we'd probably want to do the EditAttemptStep work first otherwise you'd wind up with feature use events with no associated editing session which might confuse analysis.

Separately, the schema version in WikimediaEvents wasn't updated (but has been now); this would just have meant there were errors logged.

@MNeisler I think we could call this unblocked and close it. Our current state is that if CX starts emitting events (e.g. if the work is done, or we loosen our requirements) they'll now be suitably tagged and our dataset won't be confused. However, as-is no events should be logged.

Thanks for looking into this @DLynch!

While thinking about this task, I also realized that the different workflow to publish a content translation edit might not make it the best fit for EditAttemptStep. For example, with the ContentTranslation tool, draft or in-progress translations can be saved after a user starts an edit (translation) and then you can come back days or months after to complete it. Not sure how these edit sessions might end up being read by EditAttemptStep but I'm wondering if these in-progress translations might result in a number of edit sessions that include "init", "ready" and "firstChange" events but not "saveSuccess", resulting in lower edit completion rate metrics.

If CX ever starts emitting these events, it's useful to have this tag already in place so we can filter out these events easily and review them separately from other editing events.

@MNeisler I think we could call this unblocked and close it. Our current state is that if CX starts emitting events (e.g. if the work is done, or we loosen our requirements) they'll now be suitably tagged and our dataset won't be confused. However, as-is no events should be logged.

That makes sense to me. I'm fine with closing this.

(cc @ppelberg)

Per the discussion @MNeisler and I had on 9 Feb, we are going to mark this task as resolved with the following new information in our minds:

  1. Content Translation is NOT currently logging any events in EditAttemptStep
  2. If at any point in the future Content Translation had a need to log events in EditAttemptStep it is important that those events be explicitly tagged as being from Content Translation
  3. The tag referenced in "2)" is now implemented making it possible for ContentTranslation to log events in EAS if they choose to do so

cc @Pginer-WMF