Aug 26 2019
Thanks for writing. Yes, as you noticed there are many confounding factors. The original Teahouse paper showed a retention improvement of heuristics-vs-control, so that side is "proved" (although a replication would always be great). As for AI-vs-control (not AI-vs-heuristics), I did have a plan for how to do that. Since I was to invite no more than 150 users per day, I tried to tune the threshhold of the model to invite about 150 user per day, but on days that had more positives than that I gave those users the status "overflow", that is would-be-invited. I planned to use theses as a control for AI-vs-control. I see that I had 4,281 overflows and 8,223 invited, so I'm hoping we will have enough statistical power.
Aug 18 2019
- create draft domain schema
@Envlh and I decided to make a roadmap for a "new" tool that would combine the best features of our 2 tools, and make a common platform to de-duplicate engineering effort.
Aug 9 2019
Jun 26 2019
Jun 25 2019
Jun 24 2019
Thank you for reporting @kai.nissen . Yes I will take a look. This is not WMDE problem. The essence of what happened was that I cloned the WMDE-fundraising-style "at the very top" banner, but at the last moment some central-notice admins on Meta did not want that very bold style, so I converted the CSS to be in the standard banner position. This banner's run ends in 12 hours. So won't push a fix in time, but for the next civilservant banner we will fix. Thanks for your attention everyone and especially @Tacsipacsi for uploading screenshots.
Jun 23 2019
Jun 7 2019
Am I correct in thinking that CentralNotice banners that are displayed organically, that is not via the &banner= url parameter, will not give the CSP error? Otherwise this probably would have popped up sooner, right?
I find the CSP error happening diferently on different language Wikis. Errors on
May 16 2019
Thank you @Andrew !
Apr 28 2019
Hello @Krenair ,
Thanks for your response:
Apr 25 2019
Hello @bd808 , I would prefer to get the VPS if possible. The reason being is that our stack is built on top of mysql8, and I'm not sure what issues lie ahead with using the Maria-DB per-tool instances . Our stack was built 2-years ago without toolforge compatibility in mind. I appreciate your response that it would be possible to run toolforge, and was going to roll-up my sleeves to try just that, until I saw your response about approval in the WMCS meeting. Here is another argument I would make for needing the VPS: (A) if our experiment is successful ("does nudged-thanking improve editor retention and performance?") then we would convert the experimental software to be a tool that would run for all users, which definitely need lots of disk for cacheing, probably necessitating a VPS. Otherwise, if (B) the experiment is not successful we would just spin down the VPS anyway.
Apr 16 2019
Mar 14 2019
Normally whgi.wmflabs.org tries to run the day after the dumps are created to make it's data as real time as possible. 1st and 15th would work for WHGI, I would just have to adjust the crontab. Thanks @ArielGlenn for being thoughtful beforehand.
Feb 15 2019
You know, I noticed that also Jmo's hostbot was inviting just as many users with No namespace edits, which put me at ease. So I don't know if mentioning it will be too alarming.
After running for one day, I noticed that Hostbot-AI is
- functioning smoothly (at least not crashing)
- not inviting
- had one bug revolving around inviting editors who are "re-predicted" (considering editors again if its the same day and they have a higher edit count than last time).
Feb 13 2019
Also, if we don't need to restart the experiment for some reason.
Setting a calendar reminder for :29 Apr 2019 = 12 Feb 2019 + 76 days (75 full days, plus half of the 12th).
+ HostBot-AI is live now!
+ Completed: + Login procedure as HostBot with OAuth + Extra checks for bot-exclusion templates and user warnings. + Inviting evens-only. + Lowered invite thresholds by 2% points to compensate for issue where live-scores have lower average values than what I saw in training. (Drift?) + Completed manual tests triggering bot from commandline. Now executing once per hour via cron on wikimedia vps.
+ Need to check in on code tomorrow to ensure that :
+ Maximum invites per day was not exceede + Minimum invites (150) were sent. + Error logs check out. + No angry emails or Talk Pages.
Jan 26 2019
The way I found to get my screen back and still develop while this is erroring is:
$wgDevelopmentWarnings = false;
Jan 7 2019
@Capt_Swing says the base rate is actually 2-3% so the updated calculations are:
Dec 12 2018
- using a local mysql database
- using a (dev with ssh tunnel)
Dec 7 2018
Using min instead of mean, is better because it's A) stricter and only 30% of users end up going multi-session
Threshhold analysis from: http://localhost:8888/notebooks/determine%20bot%20thresholds.ipynb
Dec 5 2018
Did some power analysis:
+ past-paper chi2 on whether the user made an edit after X period - groups whether or not they were invited.
+ H1 so comparison be the number of people surviving in time based on which bot inviting
Nov 29 2018
Nov 26 2018
Final metric used was precision at k=300 (300 recommendations needed per day).
but need to move the experiment page to meta
Did a lot of work on this over the weekend.
Nov 22 2018
Nov 16 2018
the data and analysis of gtm
Done, for results see presentation.
Nov 15 2018
- Jmo and I chatted about deploying session-based newcomer-quality ML model (NCM) for use in TeaHouse at CSCW2018
- Jmo sounded encouraged by early results and was willing to have a go at using in NCM in HostBot instead of current heuristics in Hostbot1 (HB1), but there are a few obstacles to overcome:
Nov 13 2018
Great @Andrew, I would be happy create it in the eqiad1-r region . Currently Horizon still shows as this eqiad1-r region as "not-enabled" for the wikidumpparse project. Also are resources shared between regions? Will I still need the qouta increase to have both xlarge instances running at the same time?
Oct 30 2018
Metrics (Maximum recall with minimum precision at 95%)
@Halfak and I reconvened and found that redoing the dependency-management of revscoring for this project's scope was too much at the moment. Our new goal is to create a python-package or simple "code snippet" that will allow other applications to use this model as easily as possible. There will be a very pretty function like newcomer_quality(user_id, timestamp) or newcomer_quality([revid(s)]). This is enwiki-only for now, although because of the interest from CivilServant frwiki is strong possibility. I am going to ask @Capt_Swing if he knows how HostBot would want the AI optimized? We have thought about it being a minimum precision model at around (1-5% minimum).
Oct 29 2018
Note: I sent this talkpage message to everyone on en:Wikipedia:Labels/Edit quality
Starting repo with ipynb documenting work so far: https://github.com/notconfusing/newcomerquality.
Oct 25 2018
Oy, thanks for letting me know the deadline moved up.
Oct 16 2018
Ideas for feature engineering:
Yes it is, via this PR, https://github.com/wikimedia/wikilabels/pull/242
Oct 4 2018
TODO Aug 27
+ (Done) I18n'ize, the revision tags
Oct 1 2018
Aug 27 2018
I loaded a test dataset to https://labels-staging.wmflabs.org/
Aug 20 2018
The PR is merged. Next to do is to load make test data. Questions to answer for test data that I'm pondering:
- Which wiki?
- I'm thinking newcomers. Which definitions?
- How many observations and how many observations per workset?
Aug 10 2018
@Halfak updated to include new suggested data format. Viewable here: https://gist.github.com/notconfusing/908d7b1a077909ff84b61a0ef131ceea
Aug 9 2018
Created pull request at: https://github.com/wiki-ai/wikilabels/pull/242
Aug 6 2018
working on this here: https://github.com/notconfusing/wikilabels/tree/multi-diff-to-previous
Jul 24 2018
Jul 10 2018
Jul 9 2018
- From meeting notes: