Page MenuHomePhabricator

Personalized first day: how many accounts are made from which contexts?
Closed, ResolvedPublic

Description

This will be a data analysis task for @nettrom_WMF.

In the "Personalized first day" project, we will be asking a set of questions to users right after they create their accounts. This will be potentially disruptive to a user's activities if they were in the middle of editing. In order for us to know how many users we will be disrupting in what way, we would like to know what percentage of accounts are created from which contexts? Specifically:

  • How many accounts are created from the homepage?
  • How many accounts are created from the reading context?
  • How many accounts are created from the editing context?

We would like to know this for Czech, Korean, English, Ukrainian, German, and Arabic Wikipedias. We don't necessarily need to know this over time -- aggregated data from the last few months will probably be just fine.

In conversation with the team, someone mentioned that the Schema:ServerSideAccountCreation should capture this with the returnToQuery parameter.

Giving this a due date of two weeks weeks from task creation.

Event Timeline

Restricted Application changed the subtype of this task from "Task" to "Deadline". · View Herald TranscriptOct 6 2018, 1:03 AM
Restricted Application added subscribers: Base, revi. · View Herald Transcript

Might as well assign this to me, no?

Since @nettrom_WMF and I already talked about when we would need this (and that day occurs in our current sprint), I'm going to put this in the sprint.

Here's the HQL query I wrote to get data for this using the data logged by the ServerSideAccountCreation schema in the Data Lake. The query below is for the Czech Wikipedia. For the others, the reference to the main page ("Hlavní strana" in Czech) will be modified accordingly, as well as the wiki condition in the WHERE clause.

CREATE TABLE nettrom_growth.cs_creation_contexts AS
SELECT CONCAT(year, '-', month) AS reg_month,
  SUM(IF(event.returnTo = 'Hlavní strana'
         AND event.displayMobile = FALSE
         AND (event.returnToQuery IS NULL
              OR event.returnToQuery NOT REGEXP "action=v?edit"), 1, 0))
      AS num_mainpage_desktop,
  SUM(IF(event.returnTo != 'Hlavní strana'
         AND event.displayMobile = FALSE
         AND (event.returnToQuery IS NULL
              OR event.returnToQuery NOT REGEXP "action=v?edit"), 1, 0))
      AS num_reading_desktop,
  SUM(IF((event.returnToQuery REGEXP "action=edit"
          OR event.returnToQuery REGEXP "action=vedit")
         AND event.displayMobile = FALSE, 1, 0))
      AS num_editing_desktop,
  SUM(IF(event.returnTo = 'Hlavní strana'
         AND event.displayMobile = TRUE
         AND (event.returnToQuery IS NULL
              OR event.returnToQuery NOT REGEXP "action=v?edit"), 1, 0))
      AS num_mainpage_mobile,
  SUM(IF(event.returnTo != 'Hlavní strana'
         AND event.displayMobile = TRUE
         AND (event.returnToQuery IS NULL
              OR event.returnToQuery NOT REGEXP "action=v?edit"), 1, 0))
      AS num_reading_mobile,
  SUM(IF((event.returnToQuery REGEXP "action=edit"
          OR event.returnToQuery REGEXP "action=vedit")
         AND event.displayMobile = TRUE, 1, 0))
      AS num_editing_mobile
FROM event.serversideaccountcreation
WHERE year = 2018
AND month >= 3
AND month < 10
AND wiki = 'cswiki'
AND event.isApi = FALSE
AND event.returnTo IS NOT NULL
GROUP BY CONCAT(year, '-', month);

A couple of things to note:

  • This query filters out accounts created through the apps (using the event.isApi = FALSE condition).
  • It requires the account creation to have a return-to page. I decided to do this as I figured it would capture creations done through the normal account creation flow, rather than also capturing auto-created accounts. Since the requirements in the task do not appear to make an exhaustive list, I decided to apply this shortcut rather than join with the logging table.

I got concerned and decided to test the assumption that autocreated accounts do not have the event.returnTo field set. Wrote this query to test it on for the Czech Wikipedia, and altered it to also test on Korean.

SELECT l.log_action, COUNT(*) AS num_registrations
FROM (SELECT event.userId, event.returnTo AS returnTo
      FROM event.serversideaccountcreation
      WHERE year = 2018
      AND month > 2
      AND month < 9
      AND wiki = 'cswiki'
      AND event.returnTo IS NOT NULL) AS s
JOIN wmf_raw.mediawiki_user u
ON s.userId = u.user_id
   AND u.wiki_db = 'cswiki'
   AND u.snapshot = '2018-08'
JOIN wmf_raw.mediawiki_logging l
ON u.user_id = l.log_user
   AND l.wiki_db = 'cswiki'
   AND l.snapshot = '2018-08'
   AND l.log_type = 'newusers'
GROUP BY l.log_action
LIMIT 50;

In both cases they show no autocreated accounts.

I've got graphs and some discussion of the results in this work log entry on mediawiki.org. Below are the specific answers to the questions asked. Phabricator doesn't allow for pretty formatting, the numbers are better shown in the table in the work log.

  • How many accounts are created from the homepage?
    • Czech: 1,151 on desktop, 347 on mobile
    • Korean: 2,812 on desktop, 476 on mobile
    • English: 93,116 on desktop, 35,087 on mobile
    • German: 6,608 on desktop, 2,434 on mobile
    • Arabic: 5,961 on desktop, 4,986 on mobile
    • Ukrainian: 1,347 on desktop, 360 on mobile
  • How many accounts are created from the reading context?
    • Czech: 2,902 on desktop, 646 from mobile
    • Korean: 6,826 on desktop, 1,247 from mobile
    • English: 385,538 on desktop, 139,195 from mobile
    • German: 19,309 on desktop, 5,027 from mobile
    • Arabic: 10,808 on desktop, 13,947 from mobile
    • Ukrainian: 3,889 on desktop, 801 from mobile
  • How many accounts are created from the editing context?
    • Czech: 336 on desktop, 861 on mobile
    • Korean: 675 on desktop, 2,000 on mobile
    • English: 21,798 on desktop, 120,393 on mobile
    • German: 6,436 on desktop, 5,964 on mobile
    • Arabic: 871 on desktop, 13,939 on mobile
    • Ukrainian: 301 on desktop, 1,011 on mobile

Some of the trends are maybe easier to spot in the graphs in the work log. For many of the wikis, the vast majority of accounts on desktop are created from the reading context, with German seeing a larger proportion of creations from editing. Account creation on mobile is around equal for editing and reading, with reading perhaps being somewhat more common.

After discussing with @nettrom_WMF today, we decided that the one last thing we would like to add is the count of people creating accounts from contexts other than reading, editing, or homepage. @nettrom_WMF -- if initial checks show that it is vanishingly small, I think we could just note that and skip actually producing the numbers and rewriting things.

@MMiller_WMF : I looked into the queries behind this and found that I was incorrect. As far as I can tell, the query captures 100% of reading/editing contexts but the "reading" context does not mean only reading an article. It will also capture accounts created from a context that has a query associated with it (e.g. a search). I don't think it's feasible to separate out queries, because they can also be used to read articles.

I updated the work log today to add to the description of the contexts. This was done to make it clear that we capture 100% of the relevant contexts, and that the "reading" context does not exclusively mean they were reading a wiki page, it will also capture for instance looking at search results.

That concludes the work on this task, so I'm closing it as resolved.

Today I learned that my data gathering is most likely biased, shifting accounts from the "editing" to the "reading' context. @SBisson showed me that the English Wikipedia's warning message that is shown to users who try to edit without being logged in does not contain the returntoqueryparameter that we check for. This means that if a user clicks on the link to create an account from that warning message, it would not be counted. Note that the "Create account" link in the upper right hand corner on English Wikipedia does contain the returntoquery parameter if an edit is attempted, so those account creations are counted correctly.

I went through the six Wikipedias in our dataset and tried to figure out how this would affect our data gathering. Here's what I found:

Two Wikipedias are not affected: Czech and German. The Czech Wikipedia's warning message does not have a link to create an account, nor one to log in, so the only way users can create an account from the editing context is to use the "Create account" link in the upper right hand side of the screen. When it comes to the German Wikipedia, its warning message contains the returntoquery parameter, which is captured by our query.

Three Wikipedias are affected: Korean, Arabic, and Ukrainian. All of these display a warning message if the user is not logged in with a link to create an account, but the link does not contain any query information that we can use to determine that it was created from the edit context.

When it comes to the English Wikipedia, the link to create an account in the warning message does not contain the returntoquery parameter, but it contains the key/value pair campaign=anoneditwarning. This is captured in the ServerSideAccountCreation schema as the campaignparameter. I updated the HQL query for enwiki to the one below, gathered new data, verified that it was different, and updated the graphs accordingly. I'm in the process of updating the work log based on the new findings. Proportion of accounts created from the edit context in English is now 13.7%. While it's not the 19.9% we see in German, it's definitely higher, and might suggest that the actual proportions in the three affected Wikipedias are significantly higher than our data gathering found.

@Trizek-WMF and @revi : We've proposed in the survey measurements in the experiment plan for Personalized first day to split by account creation context. That means we'd like to have a way to confidently detect creations from the edit context on Korean Wikipedia. Have we already talked about updating the warning message so it contains something we can track (either the returntoquery or campaign parameter discussed above? By the way, the German Wikipedia version contains both of those, so that's potentially a way to find how to write it.

Wrapping it up with the updated HQL query for the English Wikipedia:

-- Based on examining the data, "anoneditwarning" appears to be the
-- key campaign to capture edit-based creation on enwiki.
DROP TABLE IF EXISTS nettrom_growth.en_creation_contexts;
CREATE TABLE nettrom_growth.en_creation_contexts AS
SELECT CONCAT(year, '-', month) AS reg_month,
  SUM(IF(event.returnTo = 'Main Page'
         AND event.displayMobile = FALSE
         AND ((event.returnToQuery IS NULL
               AND event.campaign != "anoneditwarning")
              OR event.returnToQuery NOT REGEXP "action=v?edit"), 1, 0))
      AS num_mainpage_desktop,
  SUM(IF(event.returnTo != 'Main Page'
         AND event.displayMobile = FALSE
         AND ((event.returnToQuery IS NULL
               AND event.campaign != "anoneditwarning")
              OR event.returnToQuery NOT REGEXP "action=v?edit"), 1, 0))
      AS num_reading_desktop,
  SUM(IF((event.returnToQuery REGEXP "action=v?edit"
          OR event.campaign = "anoneditwarning")
         AND event.displayMobile = FALSE, 1, 0))
      AS num_editing_desktop,
  SUM(IF(event.returnTo = 'Main Page'
         AND event.displayMobile = TRUE
         AND ((event.returnToQuery IS NULL
               AND event.campaign != "anoneditwarning")
              OR event.returnToQuery NOT REGEXP "action=v?edit"), 1, 0))
      AS num_mainpage_mobile,
  SUM(IF(event.returnTo != 'Main Page'
         AND event.displayMobile = TRUE
         AND ((event.returnToQuery IS NULL
               AND event.campaign != "anoneditwarning")
              OR event.returnToQuery NOT REGEXP "action=v?edit"), 1, 0))
      AS num_reading_mobile,
  SUM(IF((event.returnToQuery REGEXP "action=v?edit"
          OR event.campaign = "anoneditwarning")
         AND event.displayMobile = TRUE, 1, 0))
      AS num_editing_mobile
FROM event.serversideaccountcreation
WHERE year = 2018
AND month >= 3
AND month < 10
AND wiki = 'enwiki'
AND event.isApi = FALSE
AND event.returnTo IS NOT NULL
GROUP BY CONCAT(year, '-', month);
Restricted Application changed the subtype of this task from "Deadline" to "Task". · View Herald TranscriptNov 6 2018, 12:54 AM

@nettrom_WMF -- thanks for following up on this. I clicked around just to make sure I understood all the combinations here, and then I started taking notes, and I made the outline below. Some notes:

  • It looks to me that English does have a return-to parameter. Like, here is a URL I got when I clicked "Create account" from the notice in the Visual Editor: https://en.wikipedia.org/w/index.php?title=Special:CreateAccount&campaign=anoneditwarning&returnto=Richardson_Dilworth
  • I noticed that in English and Korean, clicking "Create account" from the notice/warning in the source editor opened Special:CreateAccount in the same tab, but clicking it from the notice/warning in the visual editor opened Special:CreateAccount in a new tab. Czech doesn't have a way to create account from the editor (as you said).
  • @Trizek-WMF, @revi, and I can talk about this tomorrow in our check-in. I kind of think we shouldn't change the site's behavior with the return-to parameter, but maybe the campaign parameter could work.

Here's my outline:

English

Korean

  • Single edit tab
  • Click edit --> overlay comes up with choice of editor (source is encouraged)
  • Defaults to Visual Editors
    • There is a notice about creating an account.
    • “Create account” opens in new tab with no return-to.
    • After account creation, user is redirected to the homepage.
  • In source editor
    • There is a yellow banner about creating an account.
    • “Create account" opens in same tab with no return-to.
    • After account creation, user is redirected to the homepage.

Czech

  • Separate tabs for source editing and visual editor
  • Clicking on either brings up an overlay giving a choice between the two editors, with whichever tab you've clicked on (source or visual) having affordance in the choice.
  • In both, notices say that your IP address will be saved, but they do not encourage account creation (banner for source, notice for visual).

@MMiller_WMF : Good testing! I hadn't caught that the text editor and Visual Editor have different tab behavior.

You are correct that the warning in the English Wikipedia doesn't have the returntoquery parameter, but does have the campaign parameter. The one in German Wikipedia has both, e.g. trying to edit a random article the link for the "Melde dich an" text is https://de.wikipedia.org/w/index.php?title=Spezial:Anmelden/signup&campaign=anoneditwarning&returnto=Wolfgang_M%C3%BCns&returntoquery=action%3Dedit

As far as I'm concerned, if the campaign parameter is set we're good since that allows us to determine that the link in the warning message was used to create the account. Might need an engineer to verify that it doesn't alter any business logic, though, as I'm not sure if the campaign parameter is used for anything besides being logged by the account creation schema.

@nettrom_WMF -- @revi and I discussed this today. If you specify exactly what they should add to which exact links, they can add it. Please add that here.

revi moved this task from Incoming to Doing on the User-revi board.

I'd like to keep things transparent, so here's an update on why I edited this comment to remove my suggestion. The proposal was to add the returnto and returntoquery parameters to the signup link in the warning message shown to users who try to edit without logging in (the anoneditwarning message). Currently, neither of those are present, which means that after signing up, the user is returned to the main page. Adding those two will change the user experience so that they return to the page they tried to edit, and the editor will load.

We're unsure how changing the new user experience in this way will affect our proposed interventions, and therefore need to discuss this more. Once we've figured this out, I'll post an update.

@revi : after a bit more discussion with @SBisson and @MMiller_WMF, we have two options on how to do this. I'll first describe our preferred approach (option 1), and then provide an alternative (option 2).

Option 1: Add the returnto and returntoquery parameters to the signup link.

This will make the signup link in the anoneditwarning message work exactly the same way as the other signup link on screen ("계정 만들기" in the upper right hand corner). Here's how to make this happen:

The Korean version of anoneditwarning is on 미디어위키:Anoneditwarning. The wikitext for the signup link on that page is currently:

<span class="plainlinks">'''[//ko.wikipedia.org/w/index.php?title=특수:로그인&type=signup 계정을 만들면]'''</span>

From my previous investigation into the signup links, I found that the German message has the parameters we want. Using that as a template, the Korean version becomes this piece of wikitext:

<span class="plainlinks">'''[{{fullurl:특수:로그인/signup|returnto={{FULLPAGENAMEE}}&returntoquery=action%3Dedit}} 계정을 만들면]'''</span>

One thing to note is that this will change how the signup link works, adding the two parameters means the user will be returned to the page they were editing with the editor loaded after they sign up. Currently this does not happen, they are instead taken to the main page (as @MMiller_WMF mentions in T206377#4723346). This behavior is different from what the other signup link on the page does, as I described above. I think those two links should behave the same way and take the user back to editing the page they were on, but if the Korean community doesn't want to do that we have option 2.

Option 2: Add the campaign parameter with a value of anoneditwarning.

This parameter is used in various places to identify specific links that take the user to the signup page (one example is the link at the bottom of the login page, which has campaign=loginCTA in the URL). Both the English and German wikipedias use this in their anoneditwarning messages. From testing this out on Korean, what is needed is to add campaign=anoneditwarning to the link. That means that the first piece of wikitext mentioned above turns into this:

<span class="plainlinks">'''[//ko.wikipedia.org/w/index.php?title=특수:로그인&type=signup&campaign=anoneditwarning 계정을 만들면]'''</span>

Please get in touch with any questions you might have about this, and I'll do my best to help out!

I'll ask community opinion @ VPT (whether this behavior change is fine to them), and will proceed with Option 1 if nobody cares (tm) until Tuesday KST. Is this fine to you or do you need this done quicker?

</workhat>

@revi told me today that his community did not have a problem with this. Please let us know when the change has been made.

Changes made less than a minute ago.

Thanks for helping make this happen, @revi!

I've confirmed that an account created using the link has the parameters set and is logged by the ServerSideAccountCreation schema in the way we expect. Everything looks good, so I'll close this task again.