Page MenuHomePhabricator

Reenable Schema:Echo
Closed, ResolvedPublic

Description

This task represents the work with re-enabling the Echo schema.

Doing so will enable the Editing Team to gather metrics like those listed below which we will use to help decide whether Topic Subscriptions can be made available as an opt-out feature at a limited number of wikis (T280896).

Requirements

  • All events specified within Schema:Echo are being emitted and logged.

Done

  • All ===Requirements are met
  • @MNeisler verifies that aggregate data is landing in database(s) as expected

Event Timeline

MNeisler edited projects, added Product-Analytics; removed Product-Analytics (Kanban).
MNeisler moved this task from Triage to Tracking on the Product-Analytics board.

I think we don't actually want Schema:Echo, but instead we want Schema:EchoInteraction.

I can't find anywhere what exactly we wanted this for, but I assume it's something about whether users are actually viewing the notifications that we're sending. This data is available in Schema:EchoInteraction (per the description: "Logs activity related to how users interact with notifications produced via the Echo extension"), but not in Schema:Echo ("Logs events related to the generation of notifications via the Echo extension").

Schema:EchoInteraction is currently enabled and is logging events (I checked in Superset). If that's the only thing we need, then we're done!

If we actually need Schema:Echo for something, we'd need to migrate it to the new thing (the definition on meta.wikimedia.org doesn't do anything, we'd need to add it in the schemas/event/secondary repository), and also restore the actual code that does the logging (it was deleted in rECHO7336a1a67c77: Finalize migration of EchoMail and EchoInteraction to Event Platform).

(I dug up the spreadsheet we were using to talk about requirements.)

Echo was wanted for tracking:

  • A notification is sent to a user
  • Timestamp of notification sent

Those are distinct from EchoInteraction, because that only logs when the user actually checks their notifications. I believe the idea was to be able to track how long it takes between a user being sent a notification and when they actually see it -- which you can do because these schemas share an eventId field that lets you track a notification between them. (Also watching out for "Sharp increase in the number of notifications sent per day per contributor.")

The deletion is sort of a nuisance -- I think that hadn't happened at the point where we were originally investigating what was available.

That said, if this schema isn't available, it's probable that similar data could be extracted just by looking at the raw database table for echo events. Not sure how much more of a pain that is for Megan's analysis.

(I dug up the spreadsheet we were using to talk about requirements.)

Thanks @DLynch! For others reference, those are documented here.

Echo was wanted for tracking:
A notification is sent to a user
Timestamp of notification sent

Yes - Specifically, we identified "Sharp increase in the number of notifications sent per day per contributor" as one of the primary quantitative indicators of disruption. We also identified "Percent of notifications sent that are unopened" as a leading indicator for several medium-ranked priority scenarios in the pre-mortem.

We are also currently planning on using data on when a notification was sent as part of the notification workflow funnel analysis being planned as part of the engagement analysis in T280898.

That said, if this schema isn't available, it's probable that similar data could be extracted just by looking at the raw database table for echo events. Not sure how much more of a pain that is for Megan's analysis.

@DLynch - A couple of questions regarding the raw database table before I can confirm if that can be used :

Are https://www.mediawiki.org/wiki/Extension:Echo/echo_event_table and https://www.mediawiki.org/wiki/Extension:Echo/echo_notification_table the raw database tables for echo events?

Yes. I think for our purposes we'd only really need echo_notification -- it should contain a row per notification sent to each user, and includes a timestamp when the notification was sent and when it was read. We might be able to skip analysis of Schema:EchoInteraction entirely with that -- though the schema does include a lot more data about how the notification was interacted with.

Do the raw database tables and EchoInteraction share an event_id field?

Yes, the eventId field in EchoInteraction should be the same as echo_event. event_id and echo_notification.notification_event.

Thanks @DLynch! Based on this, I agree that we can echo_notification table can be used as an alternative for the metrics we originally we planning to collect using Echo.

I updated the instrumentation spec with this change.

I think we can close this ticket out as not needing any action then?