Page MenuHomePhabricator

Create a logging schema for TemplateWizard
Closed, ResolvedPublic3 Estimated Story Points

Description

We want to log the following for every instance of TemplateWizard use (for every saved edit):

  • When TemplateWizard launches
  • When a template from recently used list is selected
  • When a template is discarded using the trash icon
  • When the 'Insert' button is clicked to insert a template
  • When the dialog is closed using the 'Close' button
  • Record which templates were inserted into the article on insertion
  • Record which templates were inserted into the article on successful save

Along with the above, we want to -

record the wiki on which the action occurred
record the parent edit revision ID so we can later see how many of the edits made with TemplateWizard got reverted.

For all of these actions, we need to come up with a robust logging schema that will allow us to pull data into graphs and examine the behavior.

Schema location: https://meta.wikimedia.org/wiki/Schema:TemplateWizard

Event Timeline

Mooeypoo triaged this task as Medium priority.Aug 28 2018, 9:53 PM
Mooeypoo created this task.
aezell set the point value for this task to 3.Sep 5 2018, 11:15 PM
aezell added a subscriber: nettrom_WMF.

Is any of this private data such that we need to limit retention? I'm thinking specifically of Revision ID.

Also, should @nettrom_WMF review this schema before we have the Schema Council of Schemas take a look?

Samwilson subscribed.

Other schemas (e.g. CitationUsage) record revision ID, so I think it's fine (as long as we only keep it for 90 days; is that right?).

Schemas don't seem to ever record what wiki the event occurred, so I'm assuming that is done for all events (at some higher level)?

I've started a schema: https://meta.wikimedia.org/wiki/Schema:TemplateWizard

Another point about the revision ID: we can only know the current revision ID (and then only for existing pages). We can't even know that it'll be the parent of the revision created upon save, can we? If there was another intervening but non-conflicting edit, then our edit would be a child of that one wouldn't it? Does that make using this for analysis harder? e.g. finding out if it was reverted.

Another point about the revision ID: we can only know the current revision ID (and then only for existing pages). We can't even know that it'll be the parent of the revision created upon save, can we? If there was another intervening but non-conflicting edit, then our edit would be a child of that one wouldn't it? Does that make using this for analysis harder? e.g. finding out if it was reverted.

We can't know the parent ID? I'm okay with saving the parent ID and knowing it won't be correct 100% of the time. It will be correct in majority of cases, I'm guessing.

Schemas don't seem to ever record what wiki the event occurred, so I'm assuming that is done for all events (at some higher level)?

Hmm, I think we should look into this some more. @nettrom_WMF do you know where/how the wiki gets recorded? Are the database tables segregated by wiki?

Looks like the EventCapsule is wrapped around the events, and that's how fields like wiki, timestamp, etc, are added. I don't know if tables are ever segregated by wiki, all the logged data I've worked with have always been a single table with the wiki as a column for querying.

So I think the only outstanding question is about revision_id, and whether we get the information we want when

  1. the logged revision ID may not be the actual parent (e.g. when there's an edit conflict), and
  2. after inserting a template into a new page, no revision ID will be logged on save.

It sounds like that's fine, in which case we're good to with adding the logging code to the extension.

@Samwilson I think that's fine. The revision ID may not be the parent 100% of the time. That's alright - we are only looking for rough numbers.

@Mooeypoo @aezell I'll let you sign off on this.

I was worried about the usage of enum but as I look at other schemas, it seems standard practice in this context.

👍

Samwilson moved this task from In Development to Q1 2018-19 on the Community-Tech-Sprint board.

https://meta.wikimedia.org/w/index.php?title=Schema:TemplateWizard&oldid=18374327 is the schema, subject of course to revising as we implement logging and learn more. I've started work on T200970: Add logging to gauge TemplateWizard usage.