Page MenuHomePhabricator

MinT for Readers MVP instrumentation: Use custom schema to include translation fragment
Closed, ResolvedPublic4 Estimated Story Points

Description

Currently, the instrumentation for MinT For Readers MVP (Special:AutomaticTranslation page), uses the default generic /analytics/product_metrics/web/base/*schema.

However, a new schema for this instrumentation has been introduced (T369687), to support a translation segment, to easily include translation-related data like source/target language, source/target title, etc (https://github.com/wikimedia/schemas-event-secondary/blob/master/jsonschema/fragment/analytics/product_metrics/translation/current.yaml).

The instrumentation for MinT for Readers MVP should use this schema and replace the action context field when needed, with the translation fragment for such events.

Instrumentation schema specification: https://docs.google.com/spreadsheets/d/1NmZKFZ2otet34hwnbpHtBWUWR1Llmbans1G7i01eR7E

Event Timeline

ngkountas changed the task status from Open to In Progress.Sep 30 2024, 9:41 AM
ngkountas triaged this task as Medium priority.
ngkountas moved this task from Backlog to LPL Essential 2024 Jul-Oct on the LPL Essential board.

Change #1076738 had a related patch set uploaded (by Nik Gkountas; author: Nik Gkountas):

[mediawiki/extensions/ContentTranslation@master] MinT MVP For Readers instrumentation: Use custom schema

https://gerrit.wikimedia.org/r/1076738

@ngkountas per our conversation today

  • schemaID: /analytics/product_metrics/web/translation/1.0.0
    • a materialized version that includes the base Metrics Platform schema along with fields from the fragment.
  • streamName: mediawiki.product_metrics.translation_mint_for_readers
    • slightly different from the previous one (addition of translation_), this because we need to have a new streamName following reason described by the Metrics Platform team.

If the instrument is already collecting data in production using the base schema, the stream config can't be updated to map to /analytics/product_metrics/web/translation as we are settings that translation data object as required in your fragment which is a breaking change on the consumer end of the data flow. So in this case, a new stream config should be deployed, and the old one to be remove (along with the corresponding Hive table).

Also, another note, all events should have the translation field. If we are not using any of the fields, please pass in an empty object {} to pass EventGate validation.


Given that I have to update the stream configuration and register the new streamName, let us coordinate the change. As soon as your path is ready to be merged, I will work on the changes to stream configuration and registration.

Change #1076738 merged by jenkins-bot:

[mediawiki/extensions/ContentTranslation@master] MinT For Readers MVP instrumentation: Update schema id and stream name

https://gerrit.wikimedia.org/r/1076738

Nikerabbit set the point value for this task to 4.Nov 11 2024, 9:30 AM

The events are being logged fine with new fragment and I am able to query the data at event.mediawiki_product_metrics_translation_mint_for_readers. Here is a quick overview of session initiations

mint for readers daily.png (393×1 px, 40 KB)


However, I am little confused with some of events during the article confirmation step (i.e. when I confirm an article from the search results). Here are a couple examples:

mint for readers moon article event.png (895×1 px, 280 KB)

In the event logged, the source_language is all and the source_title is చంద్రుడు which is the name of the article in target language (te). I'd expect these to be 'en' and 'Moon' instead.

mint for readers tenali article event.png (895×1 px, 189 KB)

In the above case, the source_language has been captured correctly as en but the source_title is the title of the article in the target language.


As the migration piece is complete (as the that is scope of the task), I can file a separate ticket to fix the above issues.

Thank you for the review @KCVelaga_WMF and also thank you for flagging these issues!

The issue here is that when the search result is clicked inside "Search for a topic" step, there is often, no source_language pre-selected, the current source language is set to all. Then, once the search result is selected, and the user is navigated to the "Confirm topic" step, we select an appropriate source language as the most suitable for this translation, and only then, we can log the source_languagethat would match your expectation.

Now, we can leave the implementation as it is, in case it is useful for us to know the language that was selected during search - as we do not log the search language anywhere else - and the displayed title of the search result. But if we want to log the event with the (finally) selected source language and source title, then we need to log it inside "Confirm topic" step. I would suggest if we want to follow the second approach, to change the event from a search_result click event to a search_result_selected event (or something similar), for consistency.

Sorry, missed the above message.

Now, we can leave the implementation as it is, in case it is useful for us to know the language that was selected during search - as we do not log the search language anywhere else - and the displayed title of the search result. But if we want to log the event with the (finally) selected source language and source title, then we need to log it inside "Confirm topic" step. I would suggest if we want to follow the second approach, to change the event from a search_result click event to a search_result_selected event (or something similar), for consistency.

Sounds good to me. We can close this ticket in that case, I will create a separate ticket for the modification of the event itself.

Thank you @KCVelaga_WMF! I'm closing this task as done. Please feel free to create a new ticket about the issue discussed (and maybe mention it here, too, for visibility).