Wed, Dec 11
@Iflorez, sorry for the delay. Here's the full list of issues I've noticed.
I discussed this with @Pginer-WMF today, and we do want to prioritize this above other Language work. We are not currently doing any development work, but there will be more community conversations as Google expands Toledo to new languages, so we want this data to be publicly available.
Tue, Dec 10
@Nuria I don't understand the logic of whitelisting only those three special pages (Search, RecentChanges, and Version). The definition page of Meta says that "On the other hand, special pages that users purposefully navigate to, like Special:RecentChanges or Special:Version, are included." (I did write that line, back in 2016, but that was intended as better phrasing of what the page said before, that "automatically-called special pages are excluded").
Thu, Dec 5
The other thing I was wondering about was the December 2017 data. The monthly unique devices data is exactly identical to the June 2018 data, which shouldn't happen, and I'm wondering what's going on there.
Wed, Dec 4
@Iflorez, @cchen, we shouldn't call this done until we've updated the formatting to make all the sheets consistent (I suggest we follow the Jun 2018 sheet, which I spent a lot of time on!) and sent out an announcement to the interested departments (Product and the teams formerly known as Community Engagement). I'm happy to do that if you'd rather not :)
Mon, Dec 2
@mpopov, have you thought about the possibility of instead using "standard" Gerrit repos, which would get automatically mirrored to GitHub/wikimedia? I don't have much experience with this, but it seems like a good idea to use the standard set-up unless we have some strong reason for doing otherwise. That way, we can more easily get contributions from other Wikimedians and more easily apply our experience to contributing to other Wikimedia code (e.g. the analytics repos, instrumentation code).
Wed, Nov 27
I've finished this up; the data is in the same spreadsheet. I've highlighted some insights for Stephane and Angie in various meetings and emails; I was planning to summarize them here, but I don't think it's worth the time.
Tue, Nov 26
Fri, Nov 22
@SBisson I think we're planning to use this same schema for KaiOS app events too; if so, we should probably change the os field to something like`platform` and have kaios-web and kaios-app as possible values.
Thinking about sections made me think of another thing: our operating system filter will still catch Android and iOS tablets where the sections are big enough to be opened by default. The experience on these devices isn't really relevant to KaiOS, so I feel like we should use a screen-size filter to limit our data collection to phones only. @AMuigai, @SBisson what do you think?
Thu, Nov 21
Thank you for the quick responses, @SBisson!
Now that I've added the wiki names to the dataset, along with some other fixes, (commit 817dc0d), this is all finished.
Wed, Nov 20
- My intention for the time_on_page field was to know the amount of time the user had the page visible/focused, but currently it just gives the absolute amount of time from page load to page unload. If it's feasible to track visibility time, could we add that as page_visible_time (and maybe rename time_on_page to page_open_time)?
- I just saw your comment about sections being open by default on mobile web; on Chrome for Android, I see them closed by default. Were you using the mobile website on a desktop browser? I think that when the screen size is tablet-size or higher, sections are opened by default, so they'd be closed by default for KaiOS devices. If that's the case, perhaps we can add a field like section_open_count which tracks the number of sections opened (ideally we'd track first time opens/unique section opens only, but it's fine if it's just a raw count with multiple opens of the same section included). This would also require a count of the number of sections on the page to contextualize.
- The dt field that's recorded by default will give the time that the event is received by the EventLogging server. Since we'll be holding events until the page unload, we should have a client_dt or load_dt field that gives the actual time the page was loaded so we can do our sessionizing accurately.
- The schema in its current form won't explicitly tell us if the user is on Special:Search. The fact that the page namespace will be -1 will be a strong signal, but is there any reason not to include an is_search_page variable?
I too am requesting Kerberos credentials for the stat and notebook machines. My username is neilpquinn-wmf.
Tue, Nov 19
This isn't urgent, but we should still do it in the next month or so.
The current draft dashboard is available at https://analytics.wikimedia.org/published/notebooks/WMF-Language/key-metrics.html (code at . Some notes:
- The new/experienced user classification is incorrect, producing a false decline in translations by new users (T237788).
- The dashboard currently needs to be manually updated and depends on the monthly mediawiki_history snapshots, so even if it were automatically updated it would only should new data once per month.
Mon, Nov 18
- Remove deleted wikis, which will require basing this on something other than than the MediaWiki sites table.
- Include the full wiki name
Sat, Nov 16
I'm now mostly finished; I've put a bunch of data into this spreadsheet [WMF only], which now contains:
- Pageviews and unique devices by country
- Indian pageviews by state
- Top user agents
- Top viewed pages in India
- Top viewed projects in India
- Referrer types in India
- Top referring sites in India
Thu, Nov 14
Nov 11 2019
Assigning to remind you!
@kzimmerman send me the photos of our sticky notes 😁
Nov 9 2019
Nov 7 2019
Oct 30 2019
Stats about Firefox OS pageviews by Indian state are done and in this spreadsheet [WMF only]). I still need to rerun those numbers, limited to Wikipedias only.
Currently, https://analytics.wikimedia.org/published-datasets/ is returning 404. Any idea what's going on?
@Ottomata thank you! This looks like a great plan 😁
Oct 29 2019
Looks good! Since Growth doesn't want the data anymore, I filed T236770: Retire the ChangesListHighlights data stream, which we can deal with separately.