Question for @Dbrant: would the app be able to store data in user_properties table? Might need to be a separate API call after an account is created since we probably can't include that data in the account creation API call. Would rather do that than store a user_id in event data.
@Dbrant: let's finally get rid of 1:100 sampling in the sessions funnel?
Mon, Mar 18
After talking with Chelsy and thinking even more about this over the weekend I'm gonna simplify the specs. Once Robin returns I'll talk with him about data he'd be interested in with respect to user experience flow to aid future redesigns/adjustments.
Thu, Mar 14
Managed with others! :) I'll reopen if needed. Thanks for checking in!
Wed, Mar 13
Measure the length of sessions and volume of edits made within the different app editing tasks. (We want to know how many cards deep into the feed people look vs. how many edits they actually make, to see whether the stuff we're presenting is compelling to people. Also interested if people do this for long at a time to see how compelling they find the stuff.)
Mon, Mar 11
Thank you, @Catrope!!!
Thu, Mar 7
Draft posted at: https://www.mediawiki.org/wiki/User:MPopov_(WMF)/SEO/sameAs_test
Fri, Mar 1
Okay, Roan confirmed for me that's the correct interpretation.
Thu, Feb 28
Hi, @jrobell! On Android: 40711 in Dec 2017, 35882 in Dec 2018 – a decrease of 11.8%
I've identified a few potential issues with the query I've written for the past check-ins so I'm working on resolving that to make sure the analysis is performed on vetted, correct data. (Gotta love those joins of partitioned tables in Hive.)
Wed, Feb 27
I've come across a potential block and I would like some clarification. For context:
Wed, Feb 20
Tried using analytics-mysql on stat1007 and got "permission denied". Follow-up question: will it be made available on SWAP?
Tue, Feb 19
Working on acquiring data for this and it turned out to be much, much harder and more involved than I anticipated. The analytics data we get from the app just has the notification ID, so I'm in the process of getting the Echo extension tables into Hive (that's the part that's causing problems & delays) so that I can get editing activity for users who got the notifications on Android.
@Charlotte: thanks for the ping! Right now my priorities are: the notifications analysis, SEO sameAs analysis, some Search query migration, and then this. I'm working on acquiring data for T213458 and it turned out to be much, much harder than I anticipated so that's creating some delays.
I just noticed that the tables related to the Echo extension are (surprisingly) not yet available in the enwiki shard (s1-analytics-replica.eqiad.wmnet), but are in analytics-store.eqiad.wmnet. Is there a page we can refer to to check on parity/status of data availability?
Feb 15 2019
@elukey: is there a recommendation for how to sqoop with the shards? since a shell command would look like:
Feb 14 2019
Feb 13 2019
@jcrespo: is it safe to assume that the current config of s3 (default) will stay that way? and if not, can I assume that the shard which is designated as the default one will have "(default)" in the comment?
Actually, I would like to request for https://github.com/wikimedia/operations-mediawiki-config/tree/master/dblists to have a single file I can download which has a mapping.
Feb 11 2019
Oh, cool, thanks!
Feb 8 2019
Feb 7 2019
Query & scripts: https://github.com/wikimedia-research/SEO-Experiment-Sitemaps
Just like with sameAs (T211191#4885005), there is no visible change in traffic due to the intervention:
Feb 6 2019
- Can we see whether people have made accounts through the app?
Jan 25 2019
Jan 24 2019
Jan 23 2019
Cool! Thank you!
Jan 22 2019
Okay, here are the numbers which were calculated with the following conditions:
Jan 18 2019
Jan 17 2019
Thanks for clarifying! Okay, one more question for @Abit & @Ramsey-WMF just so everyone is on the same page. The statistic you want is: the % of all uploaded files which have had additions to their pages in the first 2 months after upload.
Jan 16 2019
Just updated my database of sampled pages (using December 2018 snapshot) and recounted pageviews from 2018-11-01 to 2019-01-15 (code & data over at GitHub). There has not been any change, up or down since the rollout:
Jan 15 2019
Yup, I just double checked that I could still reproduce it in 6.1.4 (yes) as a consistency check and then installed 6.2.0, tried it, and this has been fixed. Erasing articles & lists works correctly in v6.2! Thanks & good job @NHarateh_WMF & @JoeWalsh!
A huge chunk of my analysis got invalidated when Dmitry & I found out that the underlying data was faulty (T213190). Specifically, all of the analysis related to session length, number of sessions, number of pages read per session. Unfortunately the nature of the bug means that we won't be able to compare those metrics before & after the update.
Jan 10 2019
Jan 9 2019
By the way, on ouR side, the package 'wmf' (which I maintain) that we use for querying databases from R can be augmented to have an internal map of dbs to shards, and I can easily add a way to update that map from an external file. We would just require that the JSON/YAML/CSV/whatever file with the mapping is publicly accessible.
Jan 8 2019
Jan 3 2019
Jan 2 2019
Dec 28 2018
Dec 19 2018
Done. Just tested everything and it's all good, so I've deleted the instances running Ubuntu Trusty. The only instances up are running Debian Stretch.
Dec 18 2018
@EBjune Hiya! Even though I'm an admin for the project I can't do much because Horizon thinks there are 2 more instances than there actually are.
Dec 17 2018
We seem to have a few ghost instances that are preventing me from launching a new Stretch instance which would replace the existing discovery-production one that runs on Trusty:
It works! :D Thanks, @aborrero!