Page MenuHomePhabricator

Teahouse retention analysis
Closed, ResolvedPublic

Description

Analyze long term (> 1 month) retention of a sample of Teahouse invitees vs. a control sample. Goal is to understand whether the positive socialization opportunities that the Teahouse was designed to provide reduce new editor attrition.

Data were gathered between October 2014 and January 2015. Around 200 new editors were invited to the Teahouse per day during the sample period, and 50 editors were randomly assigned to the control (no invite) condition per day.

See https://meta.wikimedia.org/wiki/Research:Teahouse_long_term_new_editor_retention

Related Objects

StatusSubtypeAssignedTask
Resolved Capt_Swing
ResolvedHalfak

Event Timeline

Capt_Swing claimed this task.
Capt_Swing raised the priority of this task from to High.
Capt_Swing updated the task description. (Show Details)
Halfak updated the task description. (Show Details)
Halfak added a project: Research.
Halfak moved this task from Backlog to In Progress on the Research board.

@Halfak a list of all edits made by the users in both the experimental and control samples is available at S1-analytics-slave/jmorgan.th_retention_sample_preinvite_edits

This table contains metadata for all deleted edits, as well as edits to deleted pages. You can find the edits to deleted pages by filtering page_latest = 0. I'll update the worklog with the queries I used to build the table (updating work logs whee!!! Look at me, I'm turning into a bona fide Halfakerian data scientist!).

@Halfak I've built you a table of edits to user's talkpages before they received an invite (or would have, in the case of the control group). Check out S1-analytics-slave/jmorgan.th_retention_sample_preinvite_talkpage_edits

Join this with the th_retention_sample table on th_retention_sample.user_id = th_retention_sample_preinvite_talkpage_edits.sample_user

This table includes deleted edits, and edits to deleted pages. Edits to deleted pages have a '1' flag in the ut_is_deleted column so you can tell them apart.

In the case of talkpages that were moved (whether or not there was a redirect left behind), The rev_page column contains the page_id of the new talkpage, where the history is.

This list should contain everything, except for a very few weird exceptions which I discarded. Details in the
work log entry.

I've added some data on which users in the sample visited the Teahouse in th_retention_sample, and updated the work log.

I've included some relevant information about that first visit: the revision of their first Teahouse edit, timestamp of that edit, the page they edited, and the number of days between their invite (or pseudo-invite) and their first visit.

I found some interesting things:

  • only about 2% of invitees visited (less than half the percentage of visitors from my earlier research which required at least 10 pre-invite edit, rather than 5 in this sample)
  • 47% of invitees waited more than a week before visiting the Teahouse
  • 31% waited a month or more before visiting

That's a lot of people waiting a long time to visit! Should be interesting to compare the edits they were making before they (finally) visited with the immediately preceding edits made by people who visited much sooner.

To retrieve just the data on invitees who visited, use this query:

select count(user_id) from th_retention_sample where sample_group = "invited" and first_teahouse_rev_id IS NOT NULL AND days_to_ftr >= 0;

Thanks for continuing to poke on this, folks :)

@LuisV_WMF as of today, plan is to give a presentation on our findings at the next Metrics meeting!

Still gotta get some more findings together before then. Expect some substantial updates on Tuesday.

Some updates here: https://meta.wikimedia.org/wiki/Research_talk:Teahouse_long_term_new_editor_retention/Work_log/2015-10-20

TL;DR: Not exactly more clear from the regression analysis. It looks like there might be a positive relationship between the effect of the teahouse invite and the number of edits saved before invite. But it doesn't look like there's a strong relationship between the invite effect and negative feedback.

Halfak moved this task from Blocked to Done (current quarter) on the Research board.