@ovasileva just reminded me that we did discuss this before, but I forgot with everything else that was going on.
Fri, Nov 16
Feature will be disabled for wikis:
- Indonesian: idwiki
- Portuguese: ptwiki
- Punjabi: pawiki, pnbwiki
- Dutch: nlwiki, nds_nlwiki
- Korean: kowiki
- Bhojpuri: bhwiki
- Cherokee: chrwiki
- Kazakh: kkwiki
- Catalan: cawiki
- French: frwiki
- Yoruba: yowiki
- Kalmyk: xalwiki
Tue, Nov 13
Just met with Ramsey & Cormac. We are going to go with 2-schema system as was recommended :)
Wed, Oct 31
Sat, Oct 27
Fri, Oct 26
Just want to make a note that as Android team has started including Echo notifications as app notifications (see https://www.mediawiki.org/wiki/Android_editing_features#Q1_-_July-September_2018), results of this analysis are of interest to that team.
Thu, Oct 25
Hi @mpopov! We are in the process of migrating everybody from stat1005 to stat1007 (not announced yet for users) but I am wondering if I could move the statistics::discovery stuff beforehand (afaics the wikimedia-discovery-golden cron and related things). What do you think?
Mon, Oct 22
(Updated the funnel analysis diagram because I had a brain toot that made me write "users" in place of "uploads")
I've thought about this and I think the current event-per-interaction approach should be scrapped in favor of a more forward-thinking solution. Analytics Engineering has some guidelines in place for creating EventLogging schemas in a way that the events are easily ingested into Druid, which makes them easy to visualize in Turnilo/Superset, which is usable by non-analysts which means @Ramsey-WMF et al. wouldn't be blocked by, say, the unavailability of a data analyst ;)
Currently Shiny Server is available (via its developer, RStudio) as a package only for Ubunty Trusty. This task is about packaging it up ourselves to make it available on VMs running Debian…I guess Stretch at this point. (I'll update the task title & description.)
Hi all, just to let you know this now has a deadline of 2018-12-18 per https://wikitech.wikimedia.org/wiki/News/Trusty_deprecation#Cloud_VPS_projects
Please get in contact if you need help.
Oct 18 2018
Oct 17 2018
Who made the package for Ubuntu Trusty / where did it come from?
I've put together the results of the much, much clustering that I did into https://github.com/wikimedia-research/wiki-segmentation/tree/master/clustering-initial/deliverable
Oct 11 2018
That's fair :)
Oct 10 2018
Oct 3 2018
All good now :)
Oct 2 2018
Sep 28 2018
Alright, I wiped all the request counts starting with August 10th (after making a backup) so Golden/Reportupdater is going to start a re-count using the webrequests in the 'text' partition. WDQS stats re-count should be done by Monday. Thanks for your patience, folks!
Sep 27 2018
For example usage of Hive with Reportupdater, see: https://github.com/wikimedia/wikimedia-discovery-golden/tree/master/modules/metrics/wdqs
Sep 25 2018
Ok, I've added the analytics-search system user to the analytics-search-users group. You should make your script chgrp analytics-search-users <file> after it creates it.
Logging out and back in worked.
Sep 24 2018
@Ottomata @Gehel: I tried editing stat1005:/srv/published-datasets/discovery/metrics/wdqs/basic_usage.tsv but couldn't because the file belongs to group analytics-search, not analytics-search-users (which I belong to) and that sort of makes sense because of how we have it configured right now in statistics::discovery:
If someone wants to take that on, there are instructions for building Shiny Server from source and we would enormously appreciate it.
Assigned to @mpopov Again, our apologies that the data sources are hardcoded like this. As I mentioned on our meeting abetter path to go forward would be using the tags for wdqs to identify the requests: https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-core/src/main/java/org/wikimedia/analytics/refinery/core/webrequest/tag/WDQSTagger.java
For archive happiness, this was done in T164603 :)
Oooh, exciting!!! :D
Sep 22 2018
Sep 20 2018
Updating to better reflect its actual priority in the grander scheme of things.
Sep 17 2018
I added a whole lot of tracking to UW, for pretty much every element that can be interacted with (full list below)
All of these would go into the UploadWizardFlowEvent schema, where only the flow id, flow position & event name are logged.
This should be enough to figure out *where* people drop out.
I'll wait for further input from @Tbayer before pushing to get this code-reviewed, in case we want to change things (log more context, ...)
Another question: with ~85000 monthly UploadWizard uses, logging every action will generate an enormous amount of data.
We should probably only log a fraction; what would a good sampling ratio be? 10%? 5%?
Sep 14 2018
Sep 12 2018
Sep 11 2018
Sep 7 2018
Sorry, I haven't checked my Phabricator emails in a while! Thanks so much @Ottomata! The upgrade has fixed the chart that wasn't working and has revealed that there's an issue with the data:
Hi! Thanks a lot for this task, it triggered some useful discussions. A couple of notes after checking the CDH release details:
On a more general note, we are currently thinking if it would be worth to change distribution and move away from Cloudera (more details will appear in the parent task, T203693), but it would of course require a ton of time :)
After this looong post, I just want to say that we support this request and that we'll try to do everything that we can to upgrade asap, but it might require a couple (or more) quarters before we'll be able to hit it.
Whoops, realized I was missing a digit in the version.
Sep 6 2018
Sep 5 2018
Dmitry and I have gone over the schemas I proposed and he gave them a thumbs up for instrumentation:
I updated the existing ToC interactions schema for the redesigned ToC: https://meta.wikimedia.org/wiki/Schema:MobileWikiAppToCInteraction
Sep 4 2018
Aug 28 2018
We'll discuss with Josh and Charlotte (once she's back from vacation)
Aug 20 2018
1, Any devices that send requests and have a uuid, regardless of intentional requests or unintentional request (e.g. when the app is in the background, request could be send to fetch the feed and the reading list)WHERE access_method = 'mobile app' AND COALESCE(x_analytics_map['wmfuuid'], parse_url(concat('http://bla.org/woo/', uri_query), 'QUERY', 'appInstallID')) IS NOT NULL AND webrequest_source IN ('text')
I prefer the first one, but I agree that we need to compare with historical data sometimes. What do you think if we keep both 1 and 4 in two tables?
Aug 9 2018
Aug 8 2018
Aug 7 2018
Motivation (beyond that it's just nice to have the latest and greatest): I'm trying to add a filter to a slice (which usually works) but when the filter is added, the slice goes from working totally fine to unorderable types: str() < int():
Jul 19 2018
@Charlotte: is this still relevant or can we close it?
Proof of concept dashboard up using the new test_gsc_all datasource in Druid: https://superset.wikimedia.org/superset/dashboard/wikipediagoogledemo/