Page MenuHomePhabricator

Explore the size of the potential users of IPBEE on ptwiki
Closed, ResolvedPublic

Description

Background

In the discussion about whether we should do AB test for IPBEE, we are curious the size of the potential users of IPBEE on ptwiki.

Acceptance Criteria

  • The number of anonymous users who attempt to edit on ptwiki
  • Estimate session_id per users ratio for logged-in users
  • How many account creations from the editing context?

Event Timeline

jwang triaged this task as Medium priority.Oct 25 2022, 5:41 PM
jwang created this task.
jwang edited projects, added Product-Analytics (Kanban); removed Product-Analytics.
jwang moved this task from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
  • The number of anonymous users who attempt to edit on ptwiki

Summary:

  • The number of daily edit attempts by anonymous editors is 760 in average at 6.25% sample rate.
  • The total number of daily edit attempts by anonymous editors is estimated to be 12160 daily.
  • The number of unique event.editing_session_id is mostly same as the number of events. It seems the event.editing_session_id is issued for each editing attempt.
dtis_oversamplenum_sessionsnum_events
2022-10-01FALSE729729
2022-10-02FALSE857857
2022-10-03FALSE715715
2022-10-04FALSE13051305
2022-10-05FALSE830831
2022-10-06FALSE958958
2022-10-07FALSE11211121
2022-10-08FALSE603603
2022-10-09FALSE538538
2022-10-10FALSE723723
2022-10-11FALSE724724
2022-10-12FALSE631631
2022-10-13FALSE924924
2022-10-14FALSE736736
2022-10-15FALSE542542
2022-10-16FALSE521521
2022-10-17FALSE919919
2022-10-18FALSE716716
2022-10-19FALSE710710
2022-10-20FALSE709709
2022-10-21FALSE689689
2022-10-22FALSE555555
2022-10-23FALSE646646
2022-10-24FALSE825825
2022-10-25FALSE791791

Query

select concat(year,'-', LPAD(month,2,'0'),'-', LPAD(day,2,'0')) as dt, 
event.is_oversample,
count(DISTINCT event.editing_session_id) AS num_sessions, count(1) AS num_events
FROM event.editattemptstep
WHERE year=2022 and month= 10 
AND wiki='ptwiki'
--anonymous users
AND event.user_id =0
AND event.action = 'init'
--edit attempts on page edit
AND event.init_type = 'page' 
GROUP BY concat(year,'-', LPAD(month,2,'0'),'-', LPAD(day,2,'0')) , event.is_oversample
ORDER BY dt
LIMIT 10000
  • Estimate session_id per users ratio for logged-in users

Based on the data between 2022-10-01 and 2022-10-25, 1 unique logged in editor has 3.8 sessions (unique event.editing_session_id) in average

num_sessionsnum_editorsnum_events
674017706771

Query

select 
count(DISTINCT event.editing_session_id) AS num_sessions,  
count(DISTINCT event.user_id) AS num_editors, count(1) AS num_events
FROM event.editattemptstep
WHERE year=2022 and month= 10 
AND wiki='ptwiki'
--loggedin users
AND event.user_id!=0
AND event.action = 'init'
--edit attemps on page edit
AND event.init_type = 'page'
jwang updated the task description. (Show Details)
  • How many account creations from the editing context?

Summary:

  • In average, 10 account creations are from editing context daily.
  • @Niharika, do you know whether users on ptwiki will return to the original editing page after they create an account? If yes, the rate of account creations/edit attempts is estimated to be 0.082%.
dtaccounts
2022-10-019
2022-10-028
2022-10-035
2022-10-045
2022-10-0512
2022-10-069
2022-10-076
2022-10-088
2022-10-099
2022-10-1016
2022-10-119
2022-10-1215
2022-10-1312
2022-10-1416
2022-10-157
2022-10-166
2022-10-1711
2022-10-1817
2022-10-1914
2022-10-2015
2022-10-2115
2022-10-2211
2022-10-238
2022-10-2420
2022-10-2513

Query

SELECT concat(year,'-', LPAD(month,2,'0'),'-', LPAD(day,2,'0')) as dt, 
event.displayMobile,
COUNT(1) AS accounts
FROM event.ServerSideAccountCreation 
WHERE year =2022 and month=10 
AND wiki='ptwiki'
AND event.returnToQuery REGEXP "action=v?edit"
AND event.isApi = FALSE
AND event.returnTo IS NOT NULL
GROUP BY concat(year,'-', LPAD(month,2,'0'),'-', LPAD(day,2,'0')), event.displayMobile
ORDER BY dt
LIMIT 1000

@Longvape @Niharika, It seems we have a good number of daily traffic from anonymous users who attempted to edit on ptwiki.
Although, some essential info to determine the total sample size are still unknown , like the baseline of the account creation rate, the standard deviation, the lift ( increase of the account creation rate) we expected to see, we confirmed there are a lot of potential users of IPBEE to sample.

Have completed the exploration analysis for traffic. Feel free to reopen if we have more questions about it.