Page MenuHomePhabricator

Create a method to derive the number of 'Participants' and 'New Editors'
Closed, ResolvedPublic5 Estimated Story Points

Description

Implementation description

Create a way to fetch the number of participants and new editors per event to be displayed in the reports.

  • The query should fetch all unique users that edited or created pages that are related to the event (based on the Page IDs stored for that event)
  • The query should return two numerical counts:
    • "Participants" represent the total count of unique users who edited any revision within the timeframe (creating new page or editing existing page from the page ids—or any of the other types of edits listed below under "The contributions we're tracking")
    • "New Editors" are a subset of "Participants", and are defined as a count of new accounts created, including users who registered up to 14 days before the event (on any wiki)
  • Timeframe for the calculation: The counts should consider participants/new-editors at the timeframe: start of the event to the last time we updated the Title IDs.

Deeper dive

Organizers want to know how many people participated in an event, and how many of those people are recently registered. In the past, Grant Metrics required a participant list; in such cases, the number of participants is more or less provided, and all we have to do is figure out which of the participants are New Editors. But Event Metrics will not require a participant list, so a total # of participants must be derived. This will be a big help for online events, for example, which don't typically require or even request signup.

Participant metrics defined

As used in the "Event summary" reports, T205561 and T206692 —see T205561 for definitions

  • "Participants"
  • "New editors"

Note on fixed vs. continuing metrics.

  • Both the "Participants" and "New Editors" metrics remain fixed once the event period ends.

Note on presumed method

  • The Participants metric will presumably be achieved by applying the active event filters (time period, wikis, category, worklist), assessing the various contributions we track (see below), and then counting the unique (non-duplicate) actors who made those contributions. (If there is some easier way to get the same number that's even better.)
    • The contributions we're tracking include Pages created, Pages improved, Edits, Uploaded files, Wikidata items created, and Wikidata items improved—all of which are also defined in T205561 and the various subtasks referenced there.
  • Calculate New Editors if feasible:Once we have the Participants number, we'll then determine the subset of those users who fit the "New Editors" definition (as per T205561). There is a concern that this may prove too resource-intensive and slow results too much. If that's so, it will be fine to provide this metric for events with Participants lists only.
    • If we end up dropping the New Editors figure in cases where there is no Participants list, then present the metric (label) onscreen and show the relevant column in downloadable reports, but supply the answer as "n/a" for not available.

Related Objects

Event Timeline

jmatazzoni created this task.

@Mooeypoo, here's the ticket we said I should write. Is this good for estimation? If not, please fix what needs fixing.

jmatazzoni renamed this task from Create a method to fetch number of 'Participants' and 'New Editors' to Create a method to derive the number of 'Participants' and 'New Editors'.Nov 1 2018, 9:30 PM
jmatazzoni mentioned this in Event Metrics.

@MusikAnimal please see the note in the Description about "Calculate New Editors if feasible." This is from a note I have from you about how this won't be possible. But I wasn't sure whether you were just assuming we wouldn't show participants at all if we didn't have a list. As long as we plan to calculate Participants, is there a special reason that New Editors is a bridge too far?

As long as we have participants, yes we can determine if they are a new editor.

Until T206783 is resolved I don't think we can do anything without having either a list of participants or a list of pages (work list).

In T208546#4714651, @MusikAnimal wrote:

As long as we have participants, yes we can determine if they are a new editor.

Until T206783 is resolved I don't think we can do anything without having either a list of participants or a list of pages (work list).

@Mooeypoo, do you want to put T206783 on the list for engineering team to discuss? Also, I see that that ticket is actually a request for help from Analytics, so maybe you want/need to reach out to Nuria or someone for help?

jmatazzoni set the point value for this task to 5.Nov 7 2018, 1:00 AM

@Mooeypoo @MusikAnimal, This ticket creates a method for metrics presented in the Event Summary reports. The Phase II report "Pages Improved" includes a metric for "Editors," which seems pretty similar to these. In the ticket for Pages Improved, this metric is defined as follows:

Editors:Lists usernames of everyone who made edits to the page, with the qualification that the edits and editors must meet the filtering criteria (e.g, if a Participants list is supplied, Editors must be on that list; edits must be in the event time period, etc.).

So, do I need to make a separate Method ticket for Editors, or is it similar enough that I can just add it to this ticket?

Current new editors definition says 15 days, do we want to change it to 14 or we just forgot?

Actually, I'm not sure, but isn't the "official" definition is different between wikis? It might be that we are defining something specific for Event Metric's purposes rather than going with the (inconsistent, if I remember right?) definition of the different wikis?

No such definition exists, we have to set our own standard.

In T208546#4883001, @MaxSem wrote:

No such definition exists, we have to set our own standard.

The definition included here, of 14 days, is the one that already exists from Grant Metrics, as per the Help page. I don't know where it came from but suggest we just move forward with that one and change it later if there is reason. pinging @Mooeypoo, fyi

@MaxSem, I just realized that this ticket is also relevant to the "7-day retention" metric, which measures retention of New Editors. This is an existing metric in Grant Metrics; the work of the present ticket is to make it work when there's no Participant list. So being able to identify New Editors without a Participant list is a pre-requisite for that metric.

Below is the definition. Does it change the present ticket at all? Should I add it to the Description above?

  • 7-days retention The % of New Editors (see definition above) who make at least one edit, in any Wikimedia project (in any namespace), between 7 days after the event and the time that the report is run.t is run.

I've commented on the Github PR.

For a few events, I compared the figures of Participants and New Editors from the Event Summary page to what I could derive from the User Creation Log (https://en.wikipedia.org/wiki/Special:Log?type=newusers). I could not see any inconsistencies, although the user creation log did not always show when the user was first created (e.g. https://en.wikipedia.org/wiki/Special:Log?type=newusers&user=Dilettante%20Army&page=&wpdate=&tagfilter=).

I am assuming we want to count someone as a new editor if there account is created anytime until the end of the event, which experimenting changing the date of the event seems to show. I don't know if this would ever happen as the user needs to exist before you can add them as a participant. Perhaps some participants might join an event late and have to be retrospectively added...

I checked the wikitext of the event summary, but this showed "Participants" twice and no "New Editors" (as noted also in T206692#4963802). I guess this is a bug which will be dealt with separately or in the linked task.

In T208546#4965014, @dom_walden wrote:

I am assuming we want to count someone as a new editor if there account is created anytime until the end of the event, which experimenting changing the date of the event seems to show. I don't know if this would ever happen as the user needs to exist before you can add them as a participant. Perhaps some participants might join an event late and have to be retrospectively added...

New Editors is defined (in T205561 ) as"a count of new accounts created, including users who registered up to 14 days before the event (on any wiki)." Yes, that would include people who registered during the event period.

The only way for a retrospective addition would be if the organizer fills in a Participants list. In that case, then yes, the added person would be counted I think, since I'm pretty sure that if a Participants list is used, we don't use this method but simply count the number of Participants in the filter list—that is correct, right @MaxSem?

I checked the wikitext of the event summary, but this showed "Participants" twice and no "New Editors" (as noted also in T206692#4963802). I guess this is a bug which will be dealt with separately or in the linked task.

The duplication in the Event Summary wikitext is gone, but as noted in that ticket, no New Editors figure is called for in the wikitext. So if there is no other way to test, I suppose we will have to wait until T205561 is complete to see this figure in the CSV report? @dom_walden
do you think we should close this and reopen if it looks wrong there, or leave this open?

I'm Resolving this, but created the QA ticket T217100, to remind us to QA this for the CSV version of Event Summary.