Fri, Apr 19
The previous results of analyzing the abandonment rate between the survey and control groups can be found in T206380#5045440. That analysis includes a split between desktop and mobile registrations, and we found that on the mobile side there was no significant difference in abandonment, while on the desktop side the difference was statistically significant. Because of this difference in effect I've also chosen to split further analysis in the same way.
We've decided on a data retention strategy and added the schema to the whitelist in T220033. I don't see a need to keep this task open.
Fri, Apr 12
@JAllemandou : Thanks for clarifying that, very much appreciated!
Thu, Apr 11
From reading this, it sounds to me like it'll be possible to identify these events for pages because of the difference between the creation and first edit timestamps. What I'm wondering is if this means that we'll have two fields in mediawiki_page_history? In other words, that we'll have page_creation_timestamp, which reflects the creation timestamp per the logging table, and page_first_edit_timestamp, which has the timestamp of the earliest edit for that page?
Wed, Apr 10
Nothing that I updated P8309 with the current state of the schema, so that's also reflected here.
Noting that I've updated P8308 with the current state of the schema, so that's reflected here as well.
Agreed, it looks like we'll stay well below the limit. The team discussed this today, and I've added to our list of leading indicators to also measure the volume of hover events, to make sure that we stay below the limits. That will then also provide us with an indication of how this will scale when adding additional wikis.
Tue, Apr 9
Both @MMiller_WMF and I did some back-of-the-envelope estimation of active users today, and reach roughly the same conclusions, that we'll be well below thousands of active users at a given time. I based mine on the number of users who register on these wikis and used a generous estimate, and find that we might be looking at 10 visits per minute.
Mon, Apr 8
Sat, Apr 6
Wed, Apr 3
Tue, Apr 2
@Niharika : I'm not sure who on the Analytics Engineering team is responsible, and I noticed that neither of the tasks are assigned to anyone. My current understanding is that these changes are likely to arrive with the next snapshot, which should be available in a few days, or the one after that (in early May).
Mon, Apr 1
@Niharika : I picked this up again last week. At this point, I'd like to wait until partial block data is in the Data Lake to continue the work, because then I'll get block duration and edit revert detection for free rather than handle those myself. It would also be great to have IP blocks in the Data Lake, because so far a lot of the partial blocks are of IPs.
Fri, Mar 29
The measurement specification already listed the HelpPanel schema as an associated schema. I've now also updated the spec to mention how the helppanel_session_id and homepage_pageview_token fields will match for Homepage-related events.
@kostajh : This looks good to me, and I've gone through the measurement specification and updated it to reflect the current state of what we're capturing.
Thu, Mar 28
I'm a bit confused at this point about what we're proposing to do here, so here's a question to try to clear that up: is the suggestion here to use EditorJourney for capturing this, rather than have an event in the proposed HomePage schema with action=impression?
Just noting the discussion we had in T219387, as mentioned there this change is OK with me! I've also updated the measurement specification so it mentions the HelpPanel schema and how we'll be using it.
Wed, Mar 27
@MMiller_WMF : this looks good to go to me!
Should we consider renaming the schema since it's becoming a more generic schema for posting questions to pages? Or split out the question-asking part from the search & links part (in other words: de-normalize the schema)?
Tue, Mar 26
We've added table with the findings as a new section in the Help Panel experiment plan. There's also a description of the affected indicators and which ones we've acted upon.
Mar 22 2019
I went back and dug into the data around the spike in Echo blacklist usage in October 2017 a bit more. From what I can tell, there doesn't seem to be a reason to suspect the data is invalid. Since the spike does make interpretation of the remainder of the data very difficult, and we're in this case mainly interested in development over time with a higher weight to more recent developments, I changed the start date of the Echo blacklist usage graphs to 2018-01-01 as that removes the spike.
Mar 21 2019
I started this analysis on 2019-03-18, at which point we had 3,624 non-autocreated registrations since switching on the survey/control A/B test. Using the week of data prior to deployment I had earlier estimated the overall abandonment rate at 17.2%. A power analysis indicated that if the control group's abandonment rate equalled the estimate, we would be able to determine a significant change if the survey group's abandonment rate was outside the [13%,21%] range.
I used data from this schema in T216185. My experience was similar to what @Tbayer mentions in that some preferences appear to have issues with duplication, and some do not. In this case the echo-notifications-blacklist and email-blacklist preferences were affected. It also seems that the issue changed over time, in other words that some users logged lots of preference changes during certain periods, and not during others.
Mar 20 2019
@Etonkovidova : Searched through the data from EditAttemptStep and HelpPanel in the Data Lake starting from 2019-01-01, and I didn't find any indications of HTML in the page_title field in either of those.
Mar 18 2019
Using all available data from EditorJourney, excluding known test users, and allowing "visit" to be either viewing the page or opening the editor, the results are as follows:
Mar 13 2019
Mar 12 2019
I have completed the snapshot analyses as well as some data cleanup to remove the graph spike in email usage in July 2017. Will do a little bit more work to look into the Echo notification usage spike in October 2017 but I suspect that could be initial adoption and that we might prefer to omit the first few months and start graph on January 2018.
Mar 11 2019
What's the current state of search in this module? I went back and read through the comment above about the result of the deep dive, but didn't see much about it since then (and it's not present in the mockup). Curious about that, as it'll inform the measurement specification.
Mar 8 2019
@TBolliger : Thanks again for providing those tasks and dates, much appreciated, and I'll make sure to incorporate those as I continue this work.
Mar 7 2019
We've completed our initial experiment and found no obvious detrimental effect from the survey. We've also run a second experiment against variation C, and found that Var A is preferable. Currently, we are running an experiment on Vietnamese Wikipedia with Var A and a control group, to learn more about the abandonment rate on that wiki (ref T216668 and T216669).
Mar 6 2019
Analysis by @MMiller_WMF and some digging by me indicates that events are not recorded correctly if the Visual Editor is used. Those events are instead recorded as reading events.
We monitored the SWAT deployment of the patch yesterday and confirmed through Kafka that reading events were flowing in. I've now also confirmed that reading events are present in the Data Lake.
During the analysis of usage of the email and Echo notification block lists, an issue with spikes in usage has come up. One key question about analyzing historical data of features like these is when these features were available, and in what capacity. For example, the email block list might've been available as a beta feature for a while. Secondly, there might've been announcements of the feature being available that would affect usage. Having that kind of information available aids analysis as it removes guesswork around why certain patterns emerge, so I'm documenting that here for future reference in other analysis tasks.
Mar 5 2019
Mar 1 2019
Feb 28 2019
We're fine with letting this data get purged as it otherwise would, so I'm closing this.
Just adding a comment to document that since we're also planning to soon add User & User talk, I'll regard the timestamp of that going live as the start of experimentation with additional contexts.
It's been about 7 hours since this went live, and using data from the replicated database, I get the following overview:
Feb 27 2019
Feb 26 2019
Feb 25 2019
I've completed a quick analysis of the abandonment rate using the eight days of of data we have between the deployment of EditorJourney (on 2019-01-16) and the Welcome Survey (on 2019-01-24). In this analysis, I used the same approach that I used for the control group during the initial Welcome Survey experiment: does the user have more than three events logged in the EditorJourney data? Split by whether the account was created on the mobile or desktop site, for Vietnamese the result is:
Feb 22 2019
Feb 21 2019
The data captured by the HelpPanel schema has been QA'ed in various fashions since deployment. We monitored and verified data during deployment, have been investigating it afterwards, and also have checked it during analysis. Closing this task as resolved.
The instrumentation document has gone through the DACI and received approval. We've put the instrumentation in production and monitored the data periodically, and things seem to be going well. Am therefore closing this task.
Feb 20 2019
Feb 14 2019
Feb 13 2019
@MMiller_WMF : We should document the questions we have around this variation, similarly as we did for search. I think at this point, my main interest is behaviour across namespaces, in other words: how do users who see the help panel in the Help & Wikipedia namespaces differ in behaviour from those who see it when editing? We'll be able to answer that because we track the page namespace. While it might be interesting to know if certain pages are more likely to lead to questions being asked, I suspect we might also learn about that from the questions that are asked? It could also be something we consider looking into if we start seeing questions flow in.
Feb 11 2019
Apart from "recommendations work", I'm not sure which parts of SuggestBot are relevant :) Maybe that at we did attempt to simplify the process of getting suggestions for newcomers through WP:Teahouse/Suggestions. It's still complicated, though, partly because we were operating within the constraints of MediaWiki and SuggestBot's infrastructure.