Fri, Jan 18
Okay, here are the numbers which were calculated with the following conditions:
Thu, Jan 17
Thanks for clarifying! Okay, one more question for @Abit & @Ramsey-WMF just so everyone is on the same page. The statistic you want is: the % of all uploaded files which have had additions to their pages in the first 2 months after upload.
Wed, Jan 16
Just updated my database of sampled pages (using December 2018 snapshot) and recounted pageviews from 2018-11-01 to 2019-01-15. There has not been any change, up or down since the rollout:
@Ramsey-WMF: hi, I would like to clarify what "metadata" includes. Here's my initial list:
Tue, Jan 15
Yup, I just double checked that I could still reproduce it in 6.1.4 (yes) as a consistency check and then installed 6.2.0, tried it, and this has been fixed. Erasing articles & lists works correctly in v6.2! Thanks & good job @NHarateh_WMF & @JoeWalsh!
A huge chunk of my analysis got invalidated when Dmitry & I found out that the underlying data was faulty (T213190). Specifically, all of the analysis related to session length, number of sessions, number of pages read per session. Unfortunately the nature of the bug means that we won't be able to compare those metrics before & after the update.
Thu, Jan 10
Wed, Jan 9
By the way, on ouR side, the package 'wmf' (which I maintain) that we use for querying databases from R can be augmented to have an internal map of dbs to shards, and I can easily add a way to update that map from an external file. We would just require that the JSON/YAML/CSV/whatever file with the mapping is publicly accessible.
Tue, Jan 8
Thu, Jan 3
Wed, Jan 2
Fri, Dec 28
Dec 19 2018
Done. Just tested everything and it's all good, so I've deleted the instances running Ubuntu Trusty. The only instances up are running Debian Stretch.
Dec 18 2018
@EBjune Hiya! Even though I'm an admin for the project I can't do much because Horizon thinks there are 2 more instances than there actually are.
Dec 17 2018
We seem to have a few ghost instances that are preventing me from launching a new Stretch instance which would replace the existing discovery-production one that runs on Trusty:
It works! :D Thanks, @aborrero!
Dec 13 2018
Thank you, @Nuria!
@EBjune I am waiting for someone with access to the internal WMF apt repository to make shiny-server available.
Dec 12 2018
- Plan of action
Dec 11 2018
Finished what I could and the findings are available over at https://people.wikimedia.org/~bearloga/reports/wikidata-incompleteness.html
I'll check with the team at our meeting later today and let you know :)
Alright, dependencies for Shiny Server resolved. Now on to the problem of Shiny Server itself:
Dec 10 2018
I think the window on this closed.
Updating status to reflect reality.
Here's a draft of the report I sent out privately back in July: https://commons.wikimedia.org/wiki/File:Wikipedia_Android_app_multilingual_update_post-release_report.pdf
@ovasileva feel free to re-open if you have additional questions
Dec 7 2018
Things are looking okay so far:
Dec 6 2018
Dec 5 2018
Dec 4 2018
Nov 29 2018
Nov 22 2018
Nov 16 2018
@ovasileva just reminded me that we did discuss this before, but I forgot with everything else that was going on.
Nov 13 2018
Just met with Ramsey & Cormac. We are going to go with 2-schema system as was recommended :)
Oct 31 2018
Oct 27 2018
Oct 26 2018
Just want to make a note that as Android team has started including Echo notifications as app notifications (see https://www.mediawiki.org/wiki/Android_editing_features#Q1_-_July-September_2018), results of this analysis are of interest to that team.
Oct 25 2018
Oct 22 2018
(Updated the funnel analysis diagram because I had a brain toot that made me write "users" in place of "uploads")
I've thought about this and I think the current event-per-interaction approach should be scrapped in favor of a more forward-thinking solution. Analytics Engineering has some guidelines in place for creating EventLogging schemas in a way that the events are easily ingested into Druid, which makes them easy to visualize in Turnilo/Superset, which is usable by non-analysts which means @Ramsey-WMF et al. wouldn't be blocked by, say, the unavailability of a data analyst ;)
Currently Shiny Server is available (via its developer, RStudio) as a package only for Ubunty Trusty. This task is about packaging it up ourselves to make it available on VMs running Debian…I guess Stretch at this point. (I'll update the task title & description.)
Oct 18 2018
Oct 17 2018
I've put together the results of the much, much clustering that I did into https://github.com/wikimedia-research/wiki-segmentation/tree/master/clustering-initial/deliverable