Set up a new board for "Working with templates in Visual Editor" and add graphs for stats specified in T258920: Collect on-going numbers related to VE template dialog and TemplateWizard use and T260343: Collect on-going numbers related to TemplateData editor dialog use
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T258917 Record template use and dialog interaction metrics | |||
Open | None | T262209 Reportupdater aggregation and Grafana board for VE template dialog, TemplateWizard and TemplateData editor metrics |
Event Timeline
Change 647742 had a related patch set uploaded (by Andrew-WMDE; owner: Andrew-WMDE):
[analytics/reportupdater-queries@master] [WIP] Process EventLogging events for VisualEditor
@mforns I'm stuck, trying to aggregate the TemplateWizard eventlogging schema. We introduced two new fields user_id and user_edit_count in here, deployed in 1.36.0-wmf.20 . These fields appear in the kafka stream but not in event.templatewizard, and I don't understand why. I'm also blocked trying to make my query robust to the missing field, since this is something that legitimately might happen in the future after a schema migration. However, the event's revision field is null which means I have no way of detecting the updated schema, and apparently HiveQL has no facility for testing for a struct field's existence.
Pinging you because this topic was rumored to be migrated to the new event infrastructure recently, so there might be an issue with sanitization and revision schema wiring. Possibly I need to edit a schema definition somewhere...
Change 649351 had a related patch set uploaded (by Awight; owner: Awight):
[analytics/reportupdater-queries@master] [WIP] Aggregate TemplateWizard metrics
I've learned a bit about the new event platform, but it doesn't quite match what I see in the existing TemplateWizard code. My suspicion is that this extension was used to pilot some kind of EventLogging legacy transition mode, but unfortunately this left the stream's pipeline in a slightly broken state.
Observations:
- The "revision" field is null for all new events in hive event.TemplateWizard. This must be fixed (for new data, does not need to be backfilled), we need some way to set expectations about each event's structure.
- The metawiki schema page has been deprecated and should be deleted. We've been editing this with the expectation that it would update the validated schema, but it is already disconnected.
- The event platform schema must be updated to include our new fields.
- Local debugging requires an additional service, and a $wgEventLoggingServiceUri setting which I haven't quite figured out yet, something like "http://localhost:8192/v1/events".
- Seems there is an upcoming gotcha for new-event projects, $wgEventStreams and $wgEventLoggingStreamNames are not designed for encapsulation inside extensions. In other words, deployment will rely on syncing these production configuration variables.
The metawiki schema page has been deprecated and should be deleted. We've been editing this with the expectation that it would update the validated schema, but it is already disconnected.
This was my fault completely. We've since established more of a process for migrating schemas. This one should have been marked as migrated to Event Platform in the talk page. We should also explore if deleting these schemas from metawiki is ok.
The "revision" field is null for all new events
This is expected, there is no longer a schema 'revision' id.
Local debugging requires an additional service, and a $wgEventLoggingServiceUri setting which I haven't quite figured out yet, something like "http://localhost:8192/v1/events".
If you are using the eventgate-devserver, this will probably be the correct value.
Seems there is an upcoming gotcha for new-event projects, $wgEventStreams and $wgEventLoggingStreamNames are not designed for encapsulation inside extensions. In other words, deployment will rely on syncing these production configuration variables.
$wgEventLoggingStreamNames is, but if it is set, it must be set to keys inside of $wgEventStreams.
If wgEventLoggingStreamNames is false, then no stream config will be used by EventLogging, meaning it will not attempt to use $wgEventStreams. This is the default in EventLogging's extension.json, and the default for development envs, or any place that doesn't use stream config. See also: https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration
After chatting, it looks like my course of action should be:
- Update the new event platform schema for TemplateWizard to include our new fields.
- Wait for deployment and the hive migration job to add fields to the hive table's schema.
- At this point, it's safe to deploy aggregation depending on the new field.
Change 649594 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/TemplateWizard@master] Switch event to use the new platform
Change 649599 had a related patch set uploaded (by Awight; owner: Awight):
[schemas/event/secondary@master] Add new action and user fields
Change 649600 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/TemplateWizard@master] Update event schema
@awight sorry for the delay, I see you got answers already.
I'm also blocked trying to make my query robust to the missing field, since this is something that legitimately might happen in the future after a schema migration.
Just to add a nit here: Removing or renaming fields (as well as making optional fields required) is not allowed when modifying schemas.
This is meant to ensure schema backwards compatibility. So, you shouldn't need to worry about making queries robust against missing fields!
In this case, I was adding a field but thought that the query needed to handle the pre-migration schema. It turns out, after migration hive will have the new struct schema everywhere.
BTW, I ran into another issue (mentioned in code review), it seems to be illegal to add a new option to an enum. Seems like this should be a normal, supported evolution?
Change 649660 had a related patch set uploaded (by Awight; owner: Awight):
[operations/puppet@production] Add a job for some visualeditor metrics aggregation
Change 647742 merged by Mforns:
[analytics/reportupdater-queries@master] Process EventLogging events for VisualEditor
Change 649662 had a related patch set uploaded (by Awight; owner: Awight):
[operations/puppet@production] Add a job for TemplateWizard metrics aggregation
Change 649888 had a related patch set uploaded (by Awight; owner: Awight):
[analytics/reportupdater-queries@master] Include a bucket for anonymous editors
Change 649888 merged by Mforns:
[analytics/reportupdater-queries@master] Include a bucket for anonymous editors
Change 650237 had a related patch set uploaded (by Awight; owner: Awight):
[mediawiki/extensions/TemplateWizard@master] Don't send user fields when not logged in
Change 649600 abandoned by Awight:
[mediawiki/extensions/TemplateWizard@master] Update event schema
Reason:
Squashed into Id60b87a1811ede8e3e4757f282d7792dbb9efae2
Change 649351 merged by Mforns:
[analytics/reportupdater-queries@master] Aggregate TemplateWizard metrics
Change 649660 merged by Elukey:
[operations/puppet@production] Add a job for some visualeditor metrics aggregation
Change 649594 merged by jenkins-bot:
[mediawiki/extensions/TemplateWizard@master] Switch event to explicitly use the new platform
Change 649599 merged by Awight:
[schemas/event/secondary@master] Add new action and user fields
Change 650237 merged by jenkins-bot:
[mediawiki/extensions/TemplateWizard@master] New event semantics for performer fields
Change 655949 had a related patch set uploaded (by Awight; owner: Awight):
[analytics/reportupdater-queries@master] Push job start date forward to first data collection
Change 655949 merged by Mforns:
[analytics/reportupdater-queries@master] Push job start date forward to first data collection
Change 649662 merged by Razzi:
[operations/puppet@production] Add a job for TemplateWizard metrics aggregation
Added to the sprint to re-check if aggregations/backfills are/were happening. Let's not do the boards for the moment.
This is a bit of a weird ticket. It serves as umbrella for the aggregation work that needs to be done for VisualEditor MediaWiki-extensions-TemplateWizard and TemplateData metrics.
As far as I can tell this is done for the former two as described in T258920: Collect on-going numbers related to VE template dialog and TemplateWizard use.
But for the T260343: Collect on-going numbers related to TemplateData editor dialog use I think there's the some stuff not aggregated yet that's mentioned in the ticket. I would make most sense to look at that step by step.
Set free again for the work to update the Grafana boards like specified in the tickets mentioned above.