Page MenuHomePhabricator

[4 hours] Spike: How hard would it be to track number of edits saved that used TemplateWizard?
Closed, ResolvedPublic

Description

TemplateWizard use happens before an article is saved, so it's a bit tricky figuring out how many TemplateWizard edits actually get successfully saved. For example, an edit could have an edit conflict and be abandoned. Let's figure out how difficult that would be and if it's worth doing (as opposed to just counting how many times someone clicks the save button after inserting a template, which might be the better, but less accurate, option).

Event Timeline

kaldari triaged this task as Medium priority.

@Samwilson @Mooeypoo @MusikAnimal @MaxSem @aezell This ticket will be estimated in our Tuesday meeting. Let's discuss any potential concerns/missing information on the ticket ahead of that meeting.

Thanks for the ping. A couple of thoughts and questions off the top of my head:

What numbers we get

  • If the template is saved and the edit is saved but later deleted (or reverted), do we care? Are we just counting the original save?
  • We'll need to track not only the saving of the template to the editor (ie, inserting the template) but then the saving of the edit with the template still in it. Seems like we'd want both numbers.

How we get them after the fact

  • We can scan wikitext in the database for template tags, right? Could we then cross-reference that with the editor used to make those edits? That is, can we see something like, "Here's an edit that has a template and that edit was made with the wikitext editor, therefore, the template was inserted with template wizard." The problem is if they just type or copy/paste the template tag into the editor. We'd have to have a date filter here to only check edits made after the TemplateWizard was enabled on that wiki.
  • What data exists about revisions that could be helpful here?

How we get them in "real-time"

  • I mentioned elsewhere that based on my reading, EventLogging is preferable to EventBus for this kind of analytics
  • Should we modify the TemplateWizard code to somehow set a flag on the edit?
  • Can TemplateWizard set a temporary value in the JS window environment that we could "catch" on the save of the edit which could ping EventLogging?

All good questions, Alex.

What numbers we get

  • If the template is saved and the edit is saved but later deleted (or reverted), do we care? Are we just counting the original save?

From the spec here: T200970 we would like to know about how many edits that were made using TemplateWizard got reverted later. That will be done (potentially) by storing the parent revision ID of the edit.
This ticket specifically is only about counting if the edit that was made using TemplateWizard got saved in the first place.

We can scan wikitext in the database for template tags, right? Could we then cross-reference that with the editor used to make those edits? That is, can we see something like, "Here's an edit that has a template and that edit was made with the wikitext editor, therefore, the template was inserted with template wizard." The problem is if they just type or copy/paste the template tag into the editor. We'd have to have a date filter here to only check edits made after the TemplateWizard was enabled on that wiki.

@aezell: Even after the TemplateWizard is deployed, most templates are still going to be inserted manually. For example, it will still be easier for me to type {{citation needed}} than to use the TemplateWizard to add it. I expect most TemplateWizard use will be for more complicated templates like Infobox templates.

@kaldari That makes sense. So, that idea won't work. I was dubious about it even as I wrote it.

Niharika renamed this task from Spike: How hard would it be to track number of edits saved that used TemplateWizard? to [4 hours] Spike: How hard would it be to track number of edits saved that used TemplateWizard?.Aug 14 2018, 11:23 PM
Niharika moved this task from Needs Discussion to Up Next (June 3-21) on the Community-Tech board.
Niharika set the point value for this task to 1.Aug 14 2018, 11:49 PM
Niharika removed the point value 1 for this task.

One not-too-complicated way to do it could be to log whenever someone clicks the Insert button (which we want to do anyway), and at that point also remember that they have done so and then later when they click publish send another log event e.g. page-saved-after-template-insertion or whatever. (And of course never send the 2nd event if they've not inserted a template.) My understanding is that we can't link the two events, so we'd re-send whatever info we're wanting to know.

However, sending the 2nd event might fall under the same topic as Logging clicking on links and so we're advised not to do it (because the browser will navigate away from the page before the event request is sent, unless we purposefully delay it). Or is this out of date? I'm thinking it might be, because we are using sendBeacon (although IE and iOS Safari don't support that). I'll find out...

Also, I don't think the complexities around tracking the actual template text insertion, or alteration of the template text in the edit form, or later reversion of the inserted template, are worth worrying about. My understanding is that we're just trying to get a general idea of usage, and maybe if certain patterns emerge we can add more elaborate logging later to find out more about them. This could mean that we get false data if e.g. someone inserts a template and then removes it from the wikitext before saving.

@Samwilson I agree that having 100% perfectly correct numbers is wasted effort.

As to your question about the page navigating away, surely we aren't the only people wanting to send an event when the Publish button is clicked. I'd wager this is already happening and we could just add our event (and its data) to what's already happening.

Also, I don't think the complexities around tracking the actual template text insertion, or alteration of the template text in the edit form, or later reversion of the inserted template, are worth worrying about. My understanding is that we're just trying to get a general idea of usage, and maybe if certain patterns emerge we can add more elaborate logging later to find out more about them. This could mean that we get false data if e.g. someone inserts a template and then removes it from the wikitext before saving.

That works for me. We don't want perfectly accurate numbers, rather a general sense of the project impact.

@aezell yes, I think you're right. And if there is an issue with sending events on submit, it's a bigger issue and TemplateWizard doesn't need to fix for it.

I think the answer to the other part of this is that it's too hard to try to take into account edit conflicts, and that the log event emitted on save will just have to be taken with a pinch of salt.

If we have these log events:

  1. {{tpl}} is inserted on Page by User
  2. Page is saved by User after having templates [{{tpl1}}, {{tpl2}}, ...] inserted

Then we can count general usage of TemplateWizard (including how often a template is inserted but not followed up with a page-save) but will need to keep in mind that:

  1. {{tpl}} may have been edited or removed after insertion;
  2. Page may never actually have been saved then (i.e. edit conflict and abandoned).

Is that about right?

We could search for the presence of {{tpl name...}} on save, but there's no way to tell if it's the same text as was inserted so I don't think it's worthwhile.

The outcomes of this investigation have been added to the other tickets.

Niharika moved this task from QA to Q1 2018-19 on the Community-Tech-Sprint board.