@EBernhardson, actually this is quite close to being done! If you look at the wiki segmentation spreadsheet, the database code is in column Z and the language name in English is in column AD. It doesn't include private wikis at all, and the data past row 663 is messed up (T199266), but it wouldn't be that much work to fix it up.
Wed, Feb 20
Fri, Feb 15
As Kate said, closing this because this parent task doesn't have any value. I've separated out the reverts issue as T216297.
@kzimmerman based on my discussions with Analytics, I decided that I should move my calculations away the slowly dying dbstore1002 immediately rather than working around the locking of the staging database (T215589#4958363). That means I'll need to do this before we can run the February metrics.
It's not clear that porting these queries to reportupdater would be worth the effort, so we have no particular plans to actually do this right now.
Thu, Feb 14
Tue, Feb 12
Thanks to @DLynch's bugfixes, we are now properly collecting mobile feature use data (P8074). We will have 2 weeks of data after 22 February, so at that point I will be able to calculate the corresponding numbers for the mobile editor.
Mon, Feb 11
For the record, he needs this access so he can inspect EventLogging data related to the Editing team's products in Hadoop (one of the most important data streams, EditAttemptStep, is not available in the MariaDB EventLogging store).
Sun, Feb 10
Thu, Feb 7
You can do this using the mariadb.run function with the parameter host = "logs"—see https://github.com/neilpquinn/wmfdata/blob/master/wmfdata/mariadb.py#L21 for details.
Wed, Jan 30
Tue, Jan 29
Thu, Jan 24
Wed, Jan 23
Jan 23 2019
Jan 22 2019
Jan 19 2019
Jan 18 2019
This is actually kind of blocked on T212529: Standardize datetimes/timestamps in the Data Lake, unless we want to code the UDF to deal with four possible formats...
I going to boldly broaden this into a plea to fully standardize datetimes/timestamps in the Data Lake 🙏 There are actually two more formats that Morten didn't mention! I'll update the description.
Perhaps I can convince you to do the dirty work, @DLynch? 😁
As I mentioned at the Editing offsite in December, my hypothesis is that the difference is due to 2010 wikitext editor init events triggered by bots.
@Milimetric How quickly will you be able to set the cube up in Turnilo once we provide the transform spec? We're trying to prioritize this among our other work 😁
I discussed this with @Nuria in a meeting today and she clarified that T212386 is important work to Analytics Engineering and said the team plans to try installing the production SQL query script on the new hardware once it's set up (probably in mid-February). However, she can't make any commitments about what the team might do after that.
I sent the list to Edward, and now I'm just waiting for him to make the changes (or possibly object) before I close this.