Page MenuHomePhabricator

Estimate the number of temporary accounts that would be created once IP Masking goes into effect
Closed, ResolvedPublic

Description

Motivation

We want to estimate the number of temporary accounts that would be created once IP Masking goes into effect to evaluate the impact for performance and SRE.

Spec:

  • Calculate how many unique IPs perform unregistered edits per
    • 7 days
    • 30 days
    • 90 days
  • [If we have this data] Calculate how many unique devices perform unregistered edits per Note: We only have unique devices data for readers not editors. ref
    • 7 days
    • 30 days
    • 90 days
  • Calculate the above for the top 5 wikis and a total for all wikis

Event Timeline

jwang triaged this task as High priority.Jan 24 2023, 6:09 PM
jwang edited projects, added Product-Analytics (Kanban); removed Product-Analytics.

For all wikis, how many unique IPs perform unregistered edits per 7 days, 30 days, 90 days?
Below summary is based on the stats between 2022-07-01 and 2022-12-31:

  • Per 7 days, the number of unique IPs where unregistered edits were submitted is 148973 on average. The maximum is 160066. The average is calculated by taking the average of the 12 complete weeks within the time frame.
  • Per 30 days, the number of unique IPs where unregistered edits were submitted is 591285 on average. The maximum is 627461. The average is calculated by taking the average of the 6 months within the time frame with the adjustment of the number of days. For example, for the month with 31 days, the number is adjusted by number of unique IPs *30/31.
  • Per 90 days, 1682704 unique IPs perform unregistered edits across wikis on average. The average is calculated by taking the average of 2022Q3 and 2022Q4 with the adjustment by number of days. For Q3 and Q4, it is adjusted by number of unique IPs *90/92.

Query for reference

query_ip_editor_7days='''
SELECT
       weekofyear(event_timestamp) AS week_n, COUNT(DISTINCT event_user_text) AS ips
FROM wmf.mediawiki_history
WHERE
        event_entity = 'revision' AND
        event_type = 'create' AND
        DATE(event_timestamp) BETWEEN '{START_YYYY_MM_DD}' AND  '{END_YYYY_MM_DD}' 
        AND event_user_is_anonymous = true  
        AND snapshot = '{SNAPSHOT}'
GROUP BY weekofyear(event_timestamp) 
'''
query_ip_editor_30days='''
SELECT
       substr(event_timestamp,1,7) AS month_n, COUNT(DISTINCT event_user_text) AS ips
FROM wmf.mediawiki_history
WHERE
        event_entity = 'revision' AND
        event_type = 'create' AND
        DATE(event_timestamp) BETWEEN '{START_YYYY_MM_DD}' AND '{END_YYYY_MM_DD}' 
        AND event_user_is_anonymous = true  
        AND snapshot = '{SNAPSHOT}'
GROUP BY substr(event_timestamp,1,7) 
'''
query_ip_editor_90days='''
SELECT QUARTER(event_timestamp) AS quarter_n , COUNT(DISTINCT event_user_text) AS ips
FROM wmf.mediawiki_history
WHERE
        event_entity = 'revision' AND
        event_type = 'create' AND
        DATE(event_timestamp) BETWEEN '{START_YYYY_MM_DD}' AND '{END_YYYY_MM_DD}' AND
        event_user_is_anonymous = true  
        AND snapshot = '{SNAPSHOT}'
GROUP BY QUARTER(event_timestamp) 
'''

For top 5 wikis, how many unique IPs perform unregistered edits per 7 days, 30 days, 90 days?
The top 5 wikis are selected based on the ranking in wiki comparison 2021. The top 5 wikis are: English Wikipedia, Spanish Wikipedia, German Wikipedia, Japanese Wikipedia, French Wikipedia.

WikiUnique IPs per 7 daysUnique IPs per 30 daysunique IPs per 90 days
English Wikipedia58402230788656485
Spanish Wikipedia1327752620150109
Japanese Wikipedia65512572973358
German Wikipediai69962850482312
French Wikipedia76523111789870

Thanks for this context, @Niharika! I think it's ok to wait and see when the temp user cookies roll out how those numbers compare with these. If we wanted to know ahead of time, we could probably devise some kind of sampled test by setting a simple cookie on edits and incrementing it... But I don't think it's worth the trouble.

mpopov subscribed.

Thank you, Jennifer!

@Niharika: I'm resolving this task as it seems to be compete according to original scope. If you have any follow-up questions or additional requests, please create a new task and work with Jennifer to figure out its priority relative to existing planned work.