Thu, Feb 14
Wed, Feb 13
@MMiller_WMF : We should document the questions we have around this variation, similarly as we did for search. I think at this point, my main interest is behaviour across namespaces, in other words: how do users who see the help panel in the Help & Wikipedia namespaces differ in behaviour from those who see it when editing? We'll be able to answer that because we track the page namespace. While it might be interesting to know if certain pages are more likely to lead to questions being asked, I suspect we might also learn about that from the questions that are asked? It could also be something we consider looking into if we start seeing questions flow in.
Mon, Feb 11
Apart from "recommendations work", I'm not sure which parts of SuggestBot are relevant :) Maybe that at we did attempt to simplify the process of getting suggestions for newcomers through WP:Teahouse/Suggestions. It's still complicated, though, partly because we were operating within the constraints of MediaWiki and SuggestBot's infrastructure.
Fri, Feb 8
Thu, Feb 7
Wed, Feb 6
One potential way to find articles to recommend is to search for articles in given task categories. For example, on the English Wikipedia there is a category called All articles needing copy edit. You can pass that as a parameter to search to restrict articles to that category, e.g. find articles about electric guitars that need copy editing.
Tue, Feb 5
Mon, Feb 4
The first set of responses has been exported. Reassigning to @MMiller_WMF to follow up with the ambassadors.
@JAllemandou : Thanks for meeting up with me during All Hands to discuss this, and also giving me handy tips on working with the Data Lake, I really appreciated that! One thing we discussed was whether there is a need to support both blocks on IP ranges as well as single addresses. I looked into that and found that @TBolliger and I discussed it, and we're only interested in blocks of single IPs.
Thu, Jan 31
Fri, Jan 25
After much discussion, we are proposing to not store data for EditorJourney beyond 90 days. There is not currently a compelling use case that requires that we retain data.
While I'm closing this, I'll also thank @Neil_P._Quinn_WMF for pointing me to the right wikis, and for having the underlying data readily available, which made the work easy.
@MMiller_WMF : checked the data in the database, and from what I can tell it looks good to me.
Wed, Jan 23
Based on the data logged in the Data Lake, this is no longer a problem. There are very few events with editor_interface set to other, typically one or two, some days none at all.
Confirmed that the schema data in the Data Lake does not contain the page_token field. As far as I'm concerned, the schema development is now complete, so I'm reassigning to @MMiller_WMF so he can review/close as needed.
I defined "mobile edits" as edits made through both the mobile site and the apps (since the request mentions iOS), and used the canonical list of mobile-heavy wikis. I removed edits by users with a bot flag. Comparing Oct–Dec 2017 with Oct–Dec 2018, I get the following:
We have done our initial analysis and published our initial report. That completes this work, so I'm closing this.
This analysis has been done, and a short writeup of the results is now available. Our conclusion is that we do not see a significant difference in activation rate between users who get the survey, and users in the control group.
Jan 18 2019
I gathered data from the Welcome survey from the Czech and Korean Wikipedias between 2018-11-19 and 2018-12-25, so as to use whole weeks and avoid dates where the spambot attack affected registrations. Here's what I found:
Jan 17 2019
Jan 16 2019
I verified that there is data flowing into the Data Lake also from Vietnamese Wikipedia, with 57 events currently recorded. Closing this ticket as this work is now completed.
Jan 15 2019
I went ahead and deleted all tables starting with "nettrom_" except the four tables referenced in T190434#4085830.
I wrote many of the queries during testing, but also found that things don't necessarily translate easily between the MariaDB testing environment we have for EventLogging, and the Data Lake where the production data ends up. Something to note for future projects, while we wait for a fully functional staging environment.
We've QA'ed as much as we can, closing this for now.
Jan 11 2019
Jan 10 2019
I was working on another ad-hoc analysis case a couple of days ago where I needed information about when a specific abuse filter was in effect. This abuse filter is hidden, meaning I couldn't access its history on-wiki, but I have access to that information in the MW database. In this case, it was also a recently updated filter, meaning having access to up-to-date information was needed.
Jan 8 2019
I ran a couple of queries of the EditorJourney data in the Data Lake, and there are no events there with user ID 0 that aren't log out events from before the fix was put in place. Neat to be able to verify that there wasn't a quality issue there, thanks for checking that @Etonkovidova ! And thanks @kostajh for identifying the fix!
Jan 4 2019
@kostajh During my meeting with @MMiller_WMF today we discussed storing the HelpPanel data, and when I was walking through the schema I noticed that we do not have a session identifier that we can store (we have page_token and session_token in the schema, but those cannot be stored indefinitely). I would like to have an identifier that allows us to combine all interaction with the help panel that occurs during the same editing session, and that we can store for the duration of our Help Panel experiment. Looking at EditAttemptStep, I noticed that has an editing_session_id field. Could we add something like that to the HelpPanel schema? Might be possible to reuse it for all I know, I'll leave the implementation details to you.
Jan 3 2019
Here's another use case that came up during the analysis of the survey results. I was asked if I could figure out what proportion of users who didn't supply an email address at registration added one in the survey.
Jan 2 2019
I've discussed this with both the Product Analytics team as well as @MMiller_WMF and we've decided that the experiment with Variation C will depend on the outcome of our A/B test of Variation A, specifically how the survey affects editor activation rate (which is measured in T212799).
Dec 21 2018
During our check-in with @Nuria today, I briefly mentioned the current use case I have for getting data from MariaDB. Let me describe that use case and how it's connected to what the Growth Team is doing.
Dec 20 2018
Dec 14 2018
@kostajh : as far as I can see from looking up the Helpdesk pages in the data from the EditorJourney schema, we don't obfuscate those as they're in the Wikipedia namespace on both Czech and Korean? Or did I miss something?
I've created a GitHub repo where I'll put notebooks and graphs for analysis: https://github.com/nettrom/AHT-block-effectiveness-2018
Dec 12 2018
Dec 11 2018
Dec 10 2018
Dec 8 2018
Dec 7 2018
After discussing this with @Neil_P._Quinn_WMF, we propose to remove the information that identifies what page the user was attempting to edit. I've updated the patch so that the page_id, page_title, and revision_id fields are deleted.