Fri, Jun 21
Discussed the Growth team metrics with @MMiller_WMF and updated the table with links to Growth's reports.
Thu, Jun 20
Wed, Jun 19
Fri, Jun 14
Thought I'd respond to this and make it actionable in case this moves forward:
Wed, Jun 12
We've calculated this for our target wikis individually, and now also for analyzing effects of interventions across multiple wikis with multilevel models (using simulations to estimate the statistical power). For the latter, we also simulated scenarios with additional wikis to understand whether those would allow us to detect smaller effect sizes in a reasonable amount of time.
Upon reviewing this patch, @mforns raised concerns about the information we store about the mentors in this schema. It likely makes it possible to connect mentor and mentee, so the suggestion is to bucket edit counts. Thinking about this, we might want to bucket "time since last activity" as well to make both pieces of information less identifying.
Tue, Jun 11
@MMiller_WMF : Sure! I've added counts of the number of users who had events on desktop and mobile in EditorJourney, to account for any users who might've switched between sites. Then the table and the percentages work out like this:
Mon, Jun 10
I restricted the query to visits to their contributions specifically. This uses the 90 days worth of data from EditorJourney, and is limited to non-autocreated accounts that were not registered through the API, and that were not known test accounts by Growth Team members. A user can be counted twice by visiting the page on both desktop and mobile, but the number of users who do so might be rather small.
Verified that this is now deployed and data is retained/purged as expected.
Fri, Jun 7
Thu, Jun 6
Mon, Jun 3
@MMiller_WMF : I've drafted some suggested addition to the measurement specification, could you take a look at those and let me know if further changes are needed? Also, I updated this task's description so it mentions that we'd also like to capture clicks on the "back" buttons in the modules on the mobile site, because that kind of interaction isn't available on the desktop site.
Thu, May 30
@MMiller_WMF : The list in the description seems fine to me. I'm wondering about the need for capturing the user browsing behavior to their user and/or user talk page, though. We don't do that for desktop, so why capture that specifically for mobile? Is there a particular question we're interested in answering here? (In the measurement plan, we're mainly interested in how users get to the homepage, not how they leave it)
The patch for this has been reviewed and +2'ed. Could it be merged in time for next week's deployment? Let me know if something on my end is needed to make that happen.
Documenting that we've published the results in mediawiki.org, see this analytics update. As mentioned there, we're working on further analysis grouping data from all wikis together.
Tue, May 28
Adding the other analysts as this might affect schemas they work with.
This is unlikely to be an issue for the schemas that the Growth Team works with, as those only store short-term tokens.
As far as I'm concerned, we don't need to store the specific rule since I can look that up in the log in the MW database. It's perhaps more important to understand why the post failed (e.g. because the user was blocked, or AbuseFilter triggered), rather than specifically what caused it? So for me, knowing the error message works, but that's largely because I can look the specifics up when needed.
May 24 2019
We moved the state of the modules in the HomepageModule schema into the state field, which for the email module is either "noemail", "unconfirmed", or "confirmed".
May 22 2019
I've used the data for all the leading indicator measurements and done a bit of digging into the data around that. Found no apparent issues with the instrumentation. Closing this as resolved, and if we find specific issues during further analysis we'll open specific tasks for those.
May 14 2019
@JAllemandou and @Ottomata : you're both listed as reviewers on https://gerrit.wikimedia.org/r/#/c/analytics/refinery/+/508626/ A chance one of you could get that reviewed soon?
May 9 2019
May 8 2019
The patch was submitted for review yesterday morning (Pacific time). I'll update this task and move it to "In progress" on the current sprint board.
May 7 2019
May 2 2019
@JAllemandou : Thanks for your patience while this has been stuck in my backlog! I think this looks great as a first step and will enable me to answer the remaining questions around Partial Blocks that the Anti-Harassment Tools Team is interested in.
We've so far not found any significant differences and at the moment other things take priority. Moving this to Q1 so that we can pick it up again then.
May 1 2019
I've again updated P8308 with the current state of the schema, to reflect changes to referer_route, referer_namespace, and impact_module_state.
A draft has been written and sent to @MMiller_WMF for review, so this ends up in his column for now.
Apr 30 2019
Based on the measurement specification, if I open one of the Help or Mentor modules from the homepage, enter some text and submit it, then close the module, I'd expect the following events to be recorded in the HelpPanel schema:
Apr 29 2019
Apr 25 2019
I have verified that HelpPanel data is now being sanitized and stored correctly, so it's time to close this task.
Apr 19 2019
The previous results of analyzing the abandonment rate between the survey and control groups can be found in T206380#5045440. That analysis includes a split between desktop and mobile registrations, and we found that on the mobile side there was no significant difference in abandonment, while on the desktop side the difference was statistically significant. Because of this difference in effect I've also chosen to split further analysis in the same way.
We've decided on a data retention strategy and added the schema to the whitelist in T220033. I don't see a need to keep this task open.
Apr 12 2019
@JAllemandou : Thanks for clarifying that, very much appreciated!
Apr 11 2019
From reading this, it sounds to me like it'll be possible to identify these events for pages because of the difference between the creation and first edit timestamps. What I'm wondering is if this means that we'll have two fields in mediawiki_page_history? In other words, that we'll have page_creation_timestamp, which reflects the creation timestamp per the logging table, and page_first_edit_timestamp, which has the timestamp of the earliest edit for that page?
Apr 10 2019
Nothing that I updated P8309 with the current state of the schema, so that's also reflected here.
Noting that I've updated P8308 with the current state of the schema, so that's reflected here as well.
Agreed, it looks like we'll stay well below the limit. The team discussed this today, and I've added to our list of leading indicators to also measure the volume of hover events, to make sure that we stay below the limits. That will then also provide us with an indication of how this will scale when adding additional wikis.
Apr 9 2019
Both @MMiller_WMF and I did some back-of-the-envelope estimation of active users today, and reach roughly the same conclusions, that we'll be well below thousands of active users at a given time. I based mine on the number of users who register on these wikis and used a generous estimate, and find that we might be looking at 10 visits per minute.
Apr 8 2019
Apr 6 2019
Apr 3 2019
Apr 2 2019
@Niharika : I'm not sure who on the Analytics Engineering team is responsible, and I noticed that neither of the tasks are assigned to anyone. My current understanding is that these changes are likely to arrive with the next snapshot, which should be available in a few days, or the one after that (in early May).
Apr 1 2019
@Niharika : I picked this up again last week. At this point, I'd like to wait until partial block data is in the Data Lake to continue the work, because then I'll get block duration and edit revert detection for free rather than handle those myself. It would also be great to have IP blocks in the Data Lake, because so far a lot of the partial blocks are of IPs.
Mar 29 2019
The measurement specification already listed the HelpPanel schema as an associated schema. I've now also updated the spec to mention how the helppanel_session_id and homepage_pageview_token fields will match for Homepage-related events.
@kostajh : This looks good to me, and I've gone through the measurement specification and updated it to reflect the current state of what we're capturing.
Mar 28 2019
I'm a bit confused at this point about what we're proposing to do here, so here's a question to try to clear that up: is the suggestion here to use EditorJourney for capturing this, rather than have an event in the proposed HomePage schema with action=impression?
Just noting the discussion we had in T219387, as mentioned there this change is OK with me! I've also updated the measurement specification so it mentions the HelpPanel schema and how we'll be using it.
Mar 27 2019
@MMiller_WMF : this looks good to go to me!
Should we consider renaming the schema since it's becoming a more generic schema for posting questions to pages? Or split out the question-asking part from the search & links part (in other words: de-normalize the schema)?
Mar 26 2019
We've added table with the findings as a new section in the Help Panel experiment plan. There's also a description of the affected indicators and which ones we've acted upon.
Mar 22 2019
I went back and dug into the data around the spike in Echo blacklist usage in October 2017 a bit more. From what I can tell, there doesn't seem to be a reason to suspect the data is invalid. Since the spike does make interpretation of the remainder of the data very difficult, and we're in this case mainly interested in development over time with a higher weight to more recent developments, I changed the start date of the Echo blacklist usage graphs to 2018-01-01 as that removes the spike.