Page MenuHomePhabricator

Analyze blocked account creations
Open, MediumPublic

Description

Recently, Growth team deployed instrumentations to track blocked account creations (T306018) . With the instrumentation, we'd like to explore and answer below questions :

  • What is the geographic distribution of blocked account attempts?
  • How does the geographic distribution of blocked account attempts compare to successful account registrations?
  • Are we able to distinguish between “human” and “automated” account creation requests when it comes to blocked attempts?
Deliveries

Summary of Initial Findings:https://docs.google.com/document/d/1RjGMbCeVeIuDurMoqTBBcgMJVTT0zPCPiBECxVaNMFc/edit?usp=sharing
Data of all wikis and all countries: https://docs.google.com/spreadsheets/d/1DabDRQdN9O2gdhoUBxEXasp0DegiCwBphCGGxmeftCM/edit?usp=sharing

Event Timeline

jwang triaged this task as Medium priority.Nov 15 2022, 3:10 AM
jwang moved this task from Triage to Current Quarter on the Product-Analytics board.

Here are some other questions that I recommend investigating:

  • How has the volume of blocked account registrations changed over time during the time period for which we have this data?
  • What can we say about the device type, platform, browser, etc. of the blocked attempts versus the successful ones?
  • In addition to looking at geographies, I recommend also looking at language editions.
  • Do we see the same user (i.e. IP address) attempt many failed account creations in a row?

Hi, @MMiller_WMF , @nettrom_WMF. Here are some initial findings (draft) from the analysis of blocked account creations. Let me know if you have any questions or comments.
The initial analysis focused on answering below questions/requests:

  • What is the geographic distribution of blocked account attempts?
  • How does the geographic distribution of blocked account attempts compare to successful account registrations?
  • How has the volume of blocked account registrations changed over time during the time period for which we have this data?
  • In addition to looking at geographies, I recommend also looking at language editions.

Note:

  • Above data excluded account creations/attempts from API and bot, but included events from both self-creation and auto creation. After we figure out how to identify auto-creation from mediawiki_accountcreation_block schema, we can refine the data.
  • Above data excluded account creations/attempts from API , bot and auto creation
  • @nettrom_WMF, could you code review the data extraction part? (code)

Next steps

  • Discuss the initial findings and deep dive.
  • Analyze following questions:
    • Are we able to distinguish between “human” and “automated” account creation requests when it comes to blocked attempts?
    • What can we say about the device type, platform, browser, etc. of the blocked attempts versus the successful ones?
    • Do we see the same user (i.e. IP address) attempt many failed account creations in a row?