Page MenuHomePhabricator

Define new campaign editor - for use in production to query for this metric
Closed, ResolvedPublic

Description

We would like to count the number of new editors that come in through campaigns.
This has been done in a few ways, including:

Method 1: registers between the event start and the event end date --- FROM campaign_events from wikishared db within x1 (this is the method used by Event Metrics)
Method 2: registers between the event start and the event end date, for global accounts for an event; used by P&E Dashboard.
Method 3: registers from the event page --- FROM event_sanitized.serversideaccountcreation, SSAC; see also T328615

Questions to consider:

  • What about those that register after an event registration has been enabled but before the campaign starting?
  • What about those that are brought in through the campaign but register after an event has ended?
  • Do we want a buffer period before or after an event starting? or before or after an event registration tooling is enabled?
  • If we opt to incorporate various scenarios, how to we reconcile the different methods, if at all?
  • Does an existing editor's new account on project y constitute a new campaign editor account if y account joins an event?

How does our definition mirror or interact or intersect with the Product Analytic's New Active Editors definition? What about growth definitions on new editors?

Further detail questions:
How to attribute - what about multiple events with overlapping dates...and a new editor without an eventReturnTo field attributed to an event (in the SSAC table) that joins more than one event? > attribute this editor to the first campaign that they sign up for.
And how to attribute that on the return, to retention? attribute to the multiple events?

Are there more questions to consider? Methods to add which aren't listed above?

Subtasks

  • Dashboard calculation method documented
  • Event metrics calculation method documented
  • feedback from campaigns
  • GDI feedback
  • feedback from Grants
  • discuss with research roundtable
  • feedback from product-campaigns ambassadors +
  • define and name 1-2 metrics (document use case and intended audience)

Event Timeline

Iflorez triaged this task as Medium priority.Feb 10 2023, 7:36 PM

Considering the following:
New editor when:

  • this is the first logging of their username (a global account) and it's not auto logged
  • returnTo field = campaign page
  • account created within 24 hours of their signing up for an event (for cases where the returnTo = Wikipédia:Accueil principal, None, Special:OAuth/authorize, Wikipedia:Why create an account?, etc.)

So one thing to think about is that most of the new accounts will be created before an event, potentially immediately before registering (won't be the same workflow).

I don't think its very important to capture newcomers registered after an event -- if they didn't register to the event, we wouldn't have a signal that they were participating. Accounts need to exist before or during the event for other tracking.

I would count global newcomers rather than by specific project -- it's probably worth tracking if community members create accounts on new wikis during an event though: this is a signal that a campaign is expanding participation to other projects (i.e. capacity building) -- but this should be a seperate metric signallling increased diversity of skills of editors.

As for the retention problem across multiple events -- it should be the first event they register to -- and then we should count the second event that they register to as a short-term retention.

Iflorez updated the task description. (Show Details)

Questions to consider:

What about those that register after an event registration has been enabled but before the campaign starting? Yes I would include these
What about those that are brought in through the campaign but register after an event has ended? I think that if they registered for an event and created an account in a time frame after (ie 1 month) they should be considered as someone brought in by a campaign even if they did not actively edit in the campaign. This differs a bit from Alex's view, but I would go with what the campaign teams decides. The problem as Alex notes is how to track this. We would need to ensure there is an event registration and account registration and that we can track the two.
Do we want a buffer period before or after an event starting? or before or after an event registration tooling is enabled? I think it depends how long the event registration tool is activated before the event. It may be difficult to set a time frame of before. The after could be the month I mentioned before.
If we opt to incorporate various scenarios, how to we reconcile the different methods, if at all?
Does an existing editor's new account on project y constitute a new campaign editor account if y account joins an event? I don't understand this question. But agree I would focus on global new editors across projects, as Alex says.
How does our definition mirror or interact or intersect with the Product Analytic's New Active Editors definition? What about growth definitions on new editors? I am not sure. Is this the definition of active editor you are referring to? "The number of registered users who made at least 5 content edits across all projects in the given month". I think here we are focusing first on knowing if they are new editors. Then we can focus on retention.
I think that we should think of two different definitions of retention. 1. Becoming an active editor (5 edits a month), 2. Continuing to edit after campaigns (someone that comes through a campaign and has edited at least once X amount of time after the event (ie. 1 month, 6 months)

Further detail questions:
How to attribute - what about multiple events with overlapping dates...and a new editor without an eventReturnTo field attributed to an event (in the SSAC table) that joins more than one event? > attribute this editor to the first campaign that they sign up for. I think it is an interesting data point per ser. We can attribute it to the first events but also see how many new editors are registering to multiple events as this shows more engagement (and probably higher chances of retention).
And how to attribute that on the return, to retention? attribute to the multiple events?

Are there more questions to consider? Methods to add which aren't listed above? Happy to review further if useful. Maybe a conversation with Masana, Alex and I so we can align what makes sense. I thnk there is still a lot to be built and aligned.

  • What about those that register after an event registration has been enabled but before the campaign starting? My general POV here is similar to Jessica's (would prefer to include) , but would defer to Alex's subject matter expertise here.
  • What about those that are brought in through the campaign but register after an event has ended? I agree with Jessica in that I'd prefer to be more inclusive in who we code as "recruited" through a campaign, although as everyone has noted, tracking seems like a challenge.
  • Do we want a buffer period before or after an event starting? or before or after an event registration tooling is enabled? I'd defer to Alex and the Campaigns team here as I don't know that I'd be able to suggest a principled rule that would cover all or most events.
  • If we opt to incorporate various scenarios, how do we reconcile the different methods, if at all? I know this is an unhelpful non-answer, but it's hard to say without knowing the methods adopted.
  • Does an existing editor's new account on project y constitute a new campaign editor account if y account joins an event? My impulse is to say yes.
  • How to attribute - what about multiple events with overlapping dates...and a new editor without an eventReturnTo field attributed to an event (in the SSAC table) that joins more than one event? > attribute this editor to the first campaign that they sign up for. Since we have no way of definitively attributing editors to events here, I think this is a good option as long as we apply it consistently and transparently.
  • And how to attribute that on the return, to retention? attribute to the multiple events? This one is tricky, but I think we should just attribute to the first event.

Note: returnTo "Indicates the wiki page the user was on when initiating Create account."

Considering the following three metrics:

  1. New campaign global editor when:
    • this is the first logging of their username (a global account) and it's not auto logged AND
    • returnTo field = campaign page and the campaign is happening or going to happen in the next y days OR
    • account created within x days of their signing up for an event (for cases where the returnTo = Wikipédia:Accueil principal, None, Special:OAuth/authorize, Wikipedia:Why create an account?, etc.); Here the x could be as little as 24hours or as much as 30days. Y could be 30 days or 60 days or whatever period prior to an event we allow registration.
  1. New global editor by way of a closed campaign event when:
    • this is the first logging of their username (a global account) and it's not auto logged AND
    • returnTo field = campaign page and the event has closed (note: returnTo data is only available for up to 90 days)
  1. New campaign wiki edition editor:
    • same as that of New campaign global editor except this is not the first logging of their username; this is only the first logging of a new wiki-specific editor ID.
    • Use case: "it's probably worth tracking if community members create accounts on new wikis during an event though: this is a signal that a campaign is expanding participation to other projects (i.e. capacity building) -- but this should be a seperate metric signallling increased diversity of skills of editors."

Also, Retention tracking: for T300414 and others: Note this comment: "As for the retention problem across multiple events -- it should be the first event they register to -- and then we should count the second event that they register to as a short-term retention." per Alex's feedback

@JStephenson @Sadads @YLiou_WMF

Can you take a look at the last comment and provide your feedback?
Given what's here so far, it looks like I can close this ticket out within the next 7 days. If you disagree, please share more.

They look good to me, I don't have any significant suggestions for
revisions.

  • What about those that register after an event registration has been enabled but before the campaign starting?

Yes, some event organizers may ask participants to create their account before registering for the event,
It is a good idea to consider that a new account was created less than a week ago and that it has been modified less than 10 times, for example.

  • What about those that are brought in through the campaign but register after an event has ended?

I think the event registration should be blocked or closed, there is no need to count these late participants, there should be no registration .

  • Do we want a buffer period before or after an event starting? or before or after an event registration tooling is enabled?
  • If we opt to incorporate various scenarios, how to we reconcile the different methods, if at all? It's possible and it makes sense,
  • Does an existing editor's new account on project y constitute a new campaign editor account if y account joins an event? It is considered a new user, so: yes, count it.

Notes & Details

  • We will focus on New campaign global user
  • see Research: Standard Metrics and Product Analytics: Data Glossary
  • Note: returnTo "Indicates the wiki page the user was on when initiating Create account."
  • Also, Retention tracking: for T300414 and others: Note this comment: "As for the retention problem across multiple events -- it should be the first event they register to -- and then we should count the second event that they register to as a short-term retention." per Alex's feedback

New user

  1. New campaign global user A newly registered campaigns user is a previously unregistered user creating a username for the first time on a Wikimedia project through campaign event pages or registration extension.
    • this is the first logging of their username (a global account) and it's not auto logged AND
    • returnTo field = campaign page and the campaign is happening or going to happen in the next y days OR
    • account created within y days of their signing up for an event (for cases where the returnTo = Wikipédia:Accueil principal, None, Special:OAuth/authorize, Wikipedia:Why create an account?, etc.);
      • y = the max number of days prior to event start during which event registration can take place; currently set to 30.
      • Some y discussions can be found in T328032
  1. New user by way of a closed campaign event when:
    • this is the first logging of their username (a global account) and it's not auto logged AND
    • returnTo field = campaign page and the event has closed (note: returnTo data is only available for up to 90 days)
  1. New campaign wiki edition user:
    • same as that of New campaign global user except this is not the first logging of their username; this is only the first logging of a new wiki-specific editor ID.
    • Use case: "it's probably worth tracking if community members create accounts on new wikis during an event though: this is a signal that a campaign is expanding participation to other projects (i.e. capacity building) -- but this should be a seperate metric signallling increased diversity of skills of editors."

New editor

  1. New campaign global editor A new campaign global editor is a newly registered campaign global user completing n edits to pages in any namespace of a Wikimedia project within t days since registration ( T). n=1, t=1
    • this is the first logging of their username (a global account) and it's not auto logged AND
    • returnTo field = campaign page and the campaign is happening or going to happen in the next y days OR
    • account created within x days of their signing up for an event (for cases where the returnTo = Wikipédia:Accueil principal, None, Special:OAuth/authorize, Wikipedia:Why create an account?, etc.); Here the x could be as little as 24hours or as much as 30days. Y could be 30 days or 60 days or whatever period prior to an event we allow registration.
  1. New editor by way of a closed campaign event when:
    • this is the first logging of their username (a global account) and it's not auto logged AND
    • returnTo field = campaign page and the event has closed (note: returnTo data is only available for up to 90 days)
  1. New campaign wiki edition editor:
    • same as that of New campaign global editor except this is not the first logging of their username; this is only the first logging of a new wiki-specific editor ID.
    • Use case: "it's probably worth tracking if community members create accounts on new wikis during an event though: this is a signal that a campaign is expanding participation to other projects (i.e. capacity building) -- but this should be a separate metric signaling increased diversity of skills of editors."

Thanks Irene for sharing. For New campaign global user: just wondering if there will be a set time frame for this. I see various options.
What does the "returnTo data is only available for up to 90 days" mean?
The New campaign wiki edition is a very interesting metric.

Thank you @JStephenson and @Sadads for walking through this with me. I've updated https://phabricator.wikimedia.org/T329382#8806349 with our discussion highlights.

  1. New Campaign User

a) this is the first logging of their username (a global account)
b) it is not auto logged
c) i) returnTo field = campaign page and the campaign is happening or going to happen in the next 30 days
OR
c) ii) account created within 30 days of their signing up for an event

acp_ssac_TEST_CODE = spark.run("""
SELECT event.userName AS username,
    event.returnTo AS acp,
    CONCAT(cast(year as string), '-', LPAD(cast(month as string), 2, '0'), '-', LPAD(day, 2, '0')) AS `date`
FROM event.serversideaccountcreation 
WHERE 
    event.userName IN {usernames}
    event.isselfmade = true AND
    CONCAT(cast(year as string), '-', LPAD(cast(month as string), 2, '0'), '-', LPAD(day, 2, '0')) >= date_sub(current_date(), 60)
    --date_sub(60, CONCAT(cast(year as string), '-', LPAD(cast(month as string), 2, '0'), '-', LPAD(day, 2, '0')) AS `date`)
    --AND event.isApi = false -- app made
""".format(**query_vars))

adjust current_date() as appropriate

When I queried, earlier this month, for

event.returnTo AS acp

15 out of 77 accounts created in the prior 90 days had been created on an event page. All other pages either had no acp (None) or had a main or general page as their account creation pages, such as:

'Main page'
'Wikipédia:Accueil principal'
'Commons:Username policy', 
"Wikipédia:Nom d'utilisateur",
'Portail:Accueil',
'Aide:Compte utilisateur',
'Spécial:Accueil de l’espace personnel',
'Commons:Username policy',

Given these results, I will stick to dates and will not rely on the returnTo field. returnTo may be helpful in the future as part of T328615 which could help us view trends at large for campaign events and new user account creation counts but it's less helpful on a per event basis.