Thu, Mar 15
Wed, Mar 14
This is mostly done—all I need to do now is to finish updating the code for uploading the MassMessage lists.
My current workaround for this is to track my projects in Git (which is a good practice generally) and then push them to GitHub, which has pretty good support for viewing Jupyter notebooks. It works relatively well for me, and you can look at my sampling work for the annual editor survey for a good example.
Okay, I think this is all taken care of!
Tue, Mar 13
@JAnstee_WMF, @egalvezwmf Our main problem right now with the edit bins is that the initial set I proposed (the "N bins") has small buckets on the high and low ends, which leads some of the final bins to be too small (you can see the sizes in this notebook).
Mon, Mar 12
We've done this pretty successfully—the Audiences department is likely to make a big new editors commitment in the upcoming annual plan, which product managers would obviously lead, and the Community Engagement department is seriously considering a new hire to coordinate their work in the area.
Thu, Mar 8
I was just trying to edit the same page again, and I had all the content disappear after pasting again. However, this time, I was pasting into the middle of the page rather than the bottom, so maybe the problem isn't the tags at the bottom like I thought?
Wed, Mar 7
@egalvezwmf, @JAnstee_WMF, here are the counts for some of our dimensions for the November–January sampling frame (I can easily update it to the December–February frame once the Analytics Engineering team has finished updating the Data Lake for the new month).
Tue, Mar 6
Oh, I also forgot that we need to decide how many users we want to sample from each of our sampling strata.
I've finished the Analytics Data Lake version of the sampling frame, which means it's now updated to use our new analytics infrastructure and to fix a few limitations in last year's query (e.g. looking at total edits from the last year, but not their distribution).
This is mostly done, but there are still a few outstanding questions about our buckets for total annual edits (which I will do some analysis to help answer) and whether we will use "active on English Wikipedia" as a dimension in our stratification.
The Analytics Data Lake is the way of the future!
We actually have a very strong need for better edit tagging for analysis purposes. Currently, we can't effectively distinguish between edits made with the 2010 wikitext editor and with non-bot editing tools which use the API. This is a problem, because when evaluating the need for and effects of interface changes, we obviously want to exclude edits made from interfaces which we don't control.
Mon, Mar 5
It's now clear that the Data Lake is the future; the mediawiki history data is not yet publicly available, but I'm confident it will be eventually.