Sun, Feb 25
Feb 21 2017
@leila We could talk to Magnus about how he does category search in https://tools.wmflabs.org/not-in-the-other-language. Also, this may be very easy to implement by using WDQS now ...
Feb 14 2017
That would be @DarTar .
Feb 9 2017
Thanks @RobH. Nithum signed an NDA that was approved by Manprit, Dario and Wes. I pointed Nithum to this ticket and asked him to complete the tasks you listed. The access group should be analytics-privatedata-users.
Feb 7 2017
Feb 6 2017
Jan 30 2017
Jan 27 2017
I set up a prototype on tool labs that consumes the related articles API.
Jan 23 2017
I'm in no rush, especially if I can get some budget to rent GPUs on AWS in the meantime.
Jan 18 2017
Dec 15 2016
Dec 9 2016
Dec 8 2016
Dec 2 2016
@RobH Thank you for the thorough investigation :). Now we know that the stat machines cannot accommodate a top-of-the-line GPU. That being said, there are many different options. Looking at what Nvidia has on offer, do you have a sense of what the most powerful model we can accommodate is?
Nov 30 2016
Nov 14 2016
Nov 11 2016
Oct 31 2016
This is the GPU we would like to order.
Oct 26 2016
The pseudo code does not quite match the current text description of the Double Bucket proposal.
Is there a way to link events from log.ContentTranslationCTA_11616099 to wikishared.cx_translations. At a high level, I want to see which individual translations where started from our tool.
Yes, I was operating under the assumption that stat1004 was the local "compute" node and that stat1002 is more or less reserved for Zachte.
@elukey I have a slight preference for stat1004 since it has access to HDFS
Oct 25 2016
Oct 21 2016
Oct 17 2016
Oct 13 2016
First draft of the paper is complete.
Oct 12 2016
@Halfak I'm happy to help out. What would be the best way for me to contribute. I'm happy to help out with facilitating discussions, doing some demo's, or giving a technical talk ...
Oct 11 2016
sent email on logistics to speakers yesterday
Oct 7 2016
Sep 30 2016
Sep 20 2016
We have models for personal attacks and aggression already. We have data on civility.
@Nuria I certainly don't disagree that segmentation must be done at the user level. I'm saying that the test statistics (or metrics as you are calling them) also need to be computed at a user level (i.e. compare the average number of clicks per user between treatment and control instead of just comparing the number of clicks across all users between treatment and control). To do this, there needs to be some way of grouping data by user in each experiment. The current proposal is missing a mechanism to achieve this.
@Neil_P._Quinn_WMF I'm saying that for any online AB test you to be able to group the experimental data by user. The proposed framework does not provide a mechanism to do this. It is great that Discovery uses a per-experiment unique user token to do user-level grouping. The system that fundraising uses does not do this, leading to many false positive test results.
Aug 30 2016
@schana Has this been deployed yet?
One idea for comparing the two would be to run an quick experiment on Amazon Mechanical Turk. For some set of articles, generate recommendation sets from both systems and ask the turkers to compare the quality/relevance. You would have to take some care in designing your question, but it could be something along the lines of: "Which set of recommended topics would you consider more relevant to the seed topic?" Then we can get a confidence interval over what fraction of users prefer one version over the other. After choosing a set of seed articles and nailing down the question, this should be a pretty fast and cheap way to get a first assessment.
It seems like CX logs a special campaign name for translations started from the suggestions pane:
Aug 29 2016
The paper draft is in progress and is unlikely to be fully complete by the end of the quarter. The submission deadline is October 24.
Code is up on github. Once we publish the data and paper, we will want to put some of the notebooks on PAWS.
We gathered labels for 50k article talk pages and built models that generalize to both the user and article talk namespaces. The following sample file shows ROC scores on held out data broken down by namespace.