@Nuria : I can confirm what @mforns mentions. During my conversations with him yesterday, it became clear to me that how the Growth team is using EventLogging is an in-between case. Since we're running fairly long experiments, we need data for longer than the default 90 days, but we also need richer data than what we'd limit ourselves to if we were to store data indefinitely. Hence a 270 day sliding window for our sanitized data would work well for us. (This is also why we asked for deletion of sanitized data in T234870 as we completed the Help Panel experiment, by the way, we no longer could keep that data around).
Tue, Dec 3
I've now completed a preliminary analysis of question 3, quarterly measurement media containing structured fields using non-English languages. As discussed in our meeting last week, this translates to "files with captions in a non-English language". The code behind the analysis can be found in this notebook on GitHub.
Mon, Dec 2
Wed, Nov 27
I think this is partly a design issue, which @RHo should chime in on, and partly a measurement issue. With regards to the design part, I'm trying to think ahead to how things might work with guidance. If the user is recommended the Egg tart article, clicks through to it, goes somewhere else, and then later returns to it, should they again see the guidance (meaning we treat them as if they came through from the Homepage)? I think the answer to that affects how we connect the Homepage schema to the Help Panel and EditAttemptStep schemas in that situation.
Tue, Nov 26
Using the top 10 wikis based on the Wiki segmentation's size ranking (geometric mean of monthly active editors and monthly unique devices), I grabbed the number of accounts for each of them, the number of accounts with an email address set, and the number who have verified their email address. While verification numbers weren't requested, I already had that in the query I reused for this, and maybe the differences between wikis would be meaningful to the Community Tech team.
Mon, Nov 25
Like so many others, I'd like to request my credentials for access on stat100x and notebook100x. My username is nettrom. I'll keep an eye out on my Gmail spam folder as well, cheers! :)
Mon, Nov 18
Thu, Nov 7
Nov 1 2019
What happens here is most likely related to T237124.
Oct 30 2019
@MMiller_WMF : adding the Growth Team and assigning this to you so you can discuss with the engineers and prioritize it as needed.
Oct 28 2019
@Niharika : Is any additional work needed on this task? If not, I was thinking that it can be closed as resolved.
Oct 25 2019
@aezell : I agree with you. I went through all the schemas listed on meta, and from what I could tell the only schema that Comm Tech owns is TemplateWizard. That schema was labelled as "in development" but appears to be actively gathering data, so I updated its status to reflect that. It isn't whitelisted, though, so as far as I can tell, there's nothing more to do here. Closing as "resolved".
Oct 21 2019
Oct 16 2019
I grabbed data for users who registered on the given wikis between Sept 2018 and Sept 2019 for our four target wikis (Czech, Korean, Vietnamese, and Arabic) using the mediawiki_history table in the Data Lake. This allows for 14 days of editing after registration, as well as 48 hours for a revert to occur. Auto-created accounts and accounts created by others were excluded, as were any accounts identified as bots. For specifics, the notebook is on GitHub.
Oct 15 2019
@mforns : thanks for taking care of this! I've verified that the table doesn't contain any data prior to Oct 1st. Everything looks good here, so I'm closing this.
This work is done.
Oct 14 2019
@Aklapper : thanks for the ping on this! This task should still be open as we've got a couple of related tasks open, once they are resolved I'll make sure to close this task as well. I've removed the deadline date since that's no longer relevant.
Oct 10 2019
@fdans : Can confirm that the range to be deleted is the beginning of time (which is like April 2019) up to Oct 1. And yes, all fields are to be deleted. Thanks!
Oct 8 2019
The second part of this, me deleting the initial set of data has been done:
This makes no sense as we'd probably just ask to whitelist it again in a month's time. Declining.
Oct 7 2019
Ran the following query to back up the data:
One aspect around this that has come up during the work on the Newcomer Tasks measurement plan is how we handle A/B testing variants of the intervention for existing users. We'll be discussing that in the team and see what we end up with.
Oct 3 2019
I used the current dataset from EditorJourney (last 90 days). For each of the three wikis where the Welcome Survey is deployed, I count the number of views of the survey (rather than registrations, because users can go back to the survey and access the links), as well as the number of views of the Tutorial and Help Desk pages. For the latter two, I only count views that have source=survey in the request to identify views that originated from the Welcome Survey. I counted the Tutorial and Help Desk links separately, and also split between views on Desktop and Mobile as requested. Finally, I calculate % of clicks on these links relative to the number of views of the Welcome Survey. The results are as follows:
Oct 2 2019
Moving this to the Icebox on the Product Analytics board. Not sure when/if this is going to be an issue (we're currently nowhere near danger territory at 20–30 events/sec being the higher end of the dataset). Adding @MMiller_WMF to the subscriber list as well as I figured he should learn if something changes here.
Oct 1 2019
@Tgr : likewise, thanks for letting me know about these changes. I don't have any current analysis or reporting that will be affected, so we're good there as well. (My main concern was Marshall's reports, but since he already checked those all is well)
Sep 30 2019
Draft measurement plan has been created, reassigning to @MMiller_WMF for reviews and revisions.
Sep 28 2019
Sep 27 2019
@Mayakp.wiki has led the QA work on this, thanks for helping out!
Verified that the obfuscated namespaces are indeed obfuscated, and that the others are not. We don't have any data longer than 24 hours. Also spot-checked the dataset and didn't see anything of concern. Closing as resolved.
The draft measurement plan is being worked on, so I've claimed this task, updated tags, and moved it to the right columns.
Sep 25 2019
Sep 24 2019
As done in the previous comments (which I've deleted due to them using erroneous data), I decided to reuse the code and graphs we had for our analysis around emails (T204785), where we got proportion of registrations. In order to reflect any changes around deployment of the Homepage to Czech and Korean Wikipedias in May, the data gathering starts on 2019-01-01. Auto-created accounts are excluded from the analysis.