Page MenuHomePhabricator

HostBot-AI study coordination with NotConfusing
Open, NormalPublic

Description

Research page: https://meta.wikimedia.org/wiki/Research:ORES-powered_TeaHouse_Invites

Currently scheduled for deployment in early/mid Jan.

  • review research page
  • calculate average response rate to invites over past 12 months (it's about 2.5%)
  • implement "odd numbered user ID" check in HostBot code
  • securely share HostBot credentials with Max
  • log bot request on User:HostBot
  • remove odd-number check after experiment concludes

Event Timeline

Capt_Swing triaged this task as Normal priority.

@notconfusing @Halfak please tag and inherit as appropriate. Not sure where else you're tracking this work.

notconfusing added a comment.EditedDec 5 2018, 3:16 PM

Did some power analysis:
https://egap.shinyapps.io/power-app/

With the binary variable : did the user post on Wikipedia talk:Teahouse?
With the baseline proportion 0.04
To see the an increase to 0.05
with an 80% chance
requires: 14000 samples per group

28000 samples = 96 days * 300 invites per day.


To see the an increase to 0.06
requires: 3721 samples per group

7442 samples = 25 days * 300 invites per day.


So the ultra conservative route is to go for 12 weeks. The medium road is to go for 4 weeks.

Capt_Swing moved this task from Staged to In Progress on the Research board.Dec 10 2018, 6:05 PM
Capt_Swing updated the task description. (Show Details)Jan 3 2019, 10:08 PM

@Capt_Swing says the base rate is actually 2-3% so the updated calculations are:

With 80% chance of seeing an effect

from 2% to 3% we need 7,645 users per group
from 3% to 4% we need 11,000 users per group

At a rate of 150 users per group per day we need:
between 50 and 73 days.
Since we've gone this far I think we may as well run for 75 days.

If we start Feb 1. Then Feb 1 + 75 days is 17 Apr 2019.

Halfak added a comment.Jan 7 2019, 9:15 PM

+1 I think this sounds like a reasonable test. We should probably set aside some time to run a 2-3 day pilot as well. Is that in the plan?

Updates (ping @Halfak , @Capt_Swing )

+ HostBot-AI is live now!

+ Completed:
+ Login procedure as HostBot with OAuth
+ Extra checks for bot-exclusion templates and user warnings.
+ Inviting evens-only.
+ Lowered invite thresholds by 2% points to compensate for issue where live-scores have lower average values than what I saw in training. (Drift?)
+ Completed manual tests triggering bot from commandline. Now executing once per hour via cron on wikimedia vps.

+ Need to check in on code tomorrow to ensure that :

+ Maximum invites per day was not exceede
+ Minimum invites (150) were sent.
+ Error logs check out. 
+ No angry emails or Talk Pages.

Lab Notebook comments

  1. Things I'm noticing about the AI-live. Lots of people invited who have just edited their own user-pages. Not necessarily a bad sign, but it maybe be a different sort of user we are inviting (they don't even have any main namespace edits in some cases, just futzing about in their own talk-pages).
  2. Drift issue to lower predictions as mentioned earlier.

Also, if we don't need to restart the experiment for some reason.
Setting a calendar reminder for :29 Apr 2019 = 12 Feb 2019 + 76 days (75 full days, plus half of the 12th).

Thanks for the updates. It's exciting to see this moving forward. Would you mind posting your observations about newcomers who are only editing their user page on the teahouse talk page? https://en.wikipedia.org/wiki/Wikipedia_talk:Teahouse#Research_about_new_users

I imagine there could be concerns about that. Presumably, we could filter out newcomers who never edit mainspace if that is desirable.

You know, I noticed that also Jmo's hostbot was inviting just as many users with No namespace edits, which put me at ease. So I don't know if mentioning it will be too alarming.
After running for one day, I noticed that Hostbot-AI is

  1. functioning smoothly (at least not crashing)
  2. not inviting
  3. had one bug revolving around inviting editors who are "re-predicted" (considering editors again if its the same day and they have a higher edit count than last time).

I wanna just conduct a few more days of live testing before I'm happy with the class threshholds then I think we are set to really go live with our experiment.

Aha! If old hostbot was doing it, I don't see reason to raise alarm.

Capt_Swing updated the task description. (Show Details)Feb 21 2019, 7:40 PM

Update: the experiment is live!

Update: we've requested a ~3 week extension of the trial, to make sure we gather sufficient data, see discussion here: https://en.wikipedia.org/wiki/Wikipedia:Bots/Requests_for_approval#HostBot_9