In T215819#4955223, @chelsyx wrote:
- Queries
- All Stories
- Search
- Advanced Search
Feed Advanced Search
Advanced Search
Advanced Search
Wed, Feb 20
Wed, Feb 20
mpopov added a comment to T212386: Provide tools for querying MediaWiki replica databases without having to specify the shard.
Tried using analytics-mysql on stat1007 and got "permission denied". Follow-up question: will it be made available on SWAP?
Tue, Feb 19
Tue, Feb 19
mpopov moved T213458: Analyse Android notifications release from Backlog to Doing on the Product-Analytics board.
Working on acquiring data for this and it turned out to be much, much harder and more involved than I anticipated. The analytics data we get from the app just has the notification ID, so I'm in the process of getting the Echo extension tables into Hive (that's the part that's causing problems & delays) so that I can get editing activity for users who got the notifications on Android.
mpopov added a comment to T211366: Android navigation refresh - understand impact on user engagement metrics.
@Charlotte: thanks for the ping! Right now my priorities are: the notifications analysis, SEO sameAs analysis, some Search query migration, and then this. I'm working on acquiring data for T213458 and it turned out to be much, much harder than I anticipated so that's creating some delays.
@Charlotte: thanks for the ping! I'm working on acquiring data for T213458 and it turned out to be much, much harder than I anticipated.
Add support for querying x1
In T172410#4965227, @Marostegui wrote:In T172410#4965217, @mpopov wrote:I just noticed that the tables related to the Echo extension are (surprisingly) not yet available in the enwiki shard (s1-analytics-replica.eqiad.wmnet), but are in analytics-store.eqiad.wmnet. Is there a page we can refer to to check on parity/status of data availability?
The echo tables are on x1. x1 is a separate instance in production, and hence on the new dbstore model.
They are available on the old analytics because it has all the instances mixed.
I just noticed that the tables related to the Echo extension are (surprisingly) not yet available in the enwiki shard (s1-analytics-replica.eqiad.wmnet), but are in analytics-store.eqiad.wmnet. Is there a page we can refer to to check on parity/status of data availability?
mpopov added a comment to T212386: Provide tools for querying MediaWiki replica databases without having to specify the shard.
In T212386#4964982, @elukey wrote:In T212386#4964927, @Ottomata wrote:Sure, we can definitely work on a shared sqoop wrapper
I don't even mean a sqoop wrapper! Just I think the script should be able to output the proper hostname:port, maybe in a 'dry-run'n mode, rather than always connecting via mysql CLI.
analytics-mysql --output (or --dry-run?) enwikiWould just output the hostname:port for easy use with other tools.
Didn't read it carefully, this seems a great idea, going to work on it asap!
Fri, Feb 15
Fri, Feb 15
Update README.md
mpopov added a comment to T212386: Provide tools for querying MediaWiki replica databases without having to specify the shard.
@elukey: is there a recommendation for how to sqoop with the shards? since a shell command would look like:
Thu, Feb 14
Thu, Feb 14
Update for sharding setup
mpopov moved T216055: Move backend for current dashboard to pull data from Hadoop from Triage to Backlog on the Product-Analytics board.
Update for sharding setup
Wed, Feb 13
Wed, Feb 13
mpopov added a comment to T212386: Provide tools for querying MediaWiki replica databases without having to specify the shard.
@jcrespo: is it safe to assume that the current config of s3 (default) will stay that way? and if not, can I assume that the shard which is designated as the default one will have "(default)" in the comment?
mpopov added a comment to T212386: Provide tools for querying MediaWiki replica databases without having to specify the shard.
In T212386#4951229, @jcrespo wrote:Important, dblists are not the canonical place for database distribution (while it tries to be in sync). The canonical method is the array at the sectionsByDB array:
mpopov reopened T212386: Provide tools for querying MediaWiki replica databases without having to specify the shard as "Open".
Actually, I would like to request for https://github.com/wikimedia/operations-mediawiki-config/tree/master/dblists to have a single file I can download which has a mapping.
Mon, Feb 11
Mon, Feb 11
Oh, cool, thanks!
Fri, Feb 8
Fri, Feb 8
Thu, Feb 7
Thu, Feb 7
mpopov added a comment to T209720: Determine impact of sitemaps on search traffic to Indonesian, Portuguese, Punjabi, Dutch, and Korean Wikipedias.
Query & scripts: https://github.com/wikimedia-research/SEO-Experiment-Sitemaps
mpopov moved T209720: Determine impact of sitemaps on search traffic to Indonesian, Portuguese, Punjabi, Dutch, and Korean Wikipedias from Next Up to Doing on the Product-Analytics board.
Just like with sameAs (T211191#4885005), there is no visible change in traffic due to the intervention:
Wed, Feb 6
Wed, Feb 6
Add YMD component extraction
- Can we see whether people have made accounts through the app?
Fri, Jan 25
Fri, Jan 25
mpopov moved T214129: Provide Product Analytics input on Modern Event Platform schema conventions from Triage to Doing on the Product-Analytics board.
In T213702#4908349, @Gilles wrote:The folks who (may) still work on sitemaps might be interested in that data, though. @ovasileva @mpopov ?
Thu, Jan 24
Thu, Jan 24
mpopov added a comment to T211366: Android navigation refresh - understand impact on user engagement metrics.
In T211366#4900240, @JKatzWMF wrote:
- i saw in the ticket about the bug that hundreds of duplicate events were being sent.
- If we can detect them is there a way to correct for them? and
- if there are hundreds, what % of total events is that? The data doesn't have to be perfect, so if it's less than 5% of events of any given event-type, then I wouldn't let it get in the way of analysis.
mpopov added a comment to T211841: Update Audiences page and Key Product Metrics deck with December 2018 Readers data.
Done
mpopov updated the task description for T211841: Update Audiences page and Key Product Metrics deck with December 2018 Readers data.
Jan 23 2019
Jan 23 2019
mpopov updated subscribers of T202664: [EPIC] Count unique iOS & Android users precisely and in a privacy conscious manner that does not require opt in to send data.
@JMinor @Charlotte: after speaking with @kzimmerman we decided I should manage this project and that it has priority (at least on our team). I'll follow up with you on next steps.
mpopov renamed T202664: [EPIC] Count unique iOS & Android users precisely and in a privacy conscious manner that does not require opt in to send data from Calculate precisely number of unqiue users for IOS and Android in a privacy conscious manner that does not require opt in to send data to [EPIC] Count unique iOS & Android users precisely and in a privacy conscious manner that does not require opt in to send data.
Cool! Thank you!
mpopov added a comment to T211840: Update Audiences page and Key Product Metrics with November 2018 Readers data.
Done
mpopov updated the task description for T211840: Update Audiences page and Key Product Metrics with November 2018 Readers data.
mpopov moved T214490: page_creation_timestamp not always correct in mediawiki_history from Triage to Tracking on the Product-Analytics board.
@Abit @Ramsey-WMF in addition to T213597#4900741, here's the history of that metric with a 7-day rolling average to smooth the daily data a bit:
In T213597#4900903, @Neil_P._Quinn_WMF wrote:True, but its revisions do have revision_is_deleted set, so you've already filtered them out of your query.
Jan 22 2019
Jan 22 2019
Okay, here are the numbers which were calculated with the following conditions:
In T213597#4893765, @Neil_P._Quinn_WMF wrote:I noticed once big thing: it seems like your counts of file page edits (n_edits_total, n_additions_total, etc.) include the initial edit that creates the pages, so in the end you're getting the proportion of files which have metadata added in the first 2 months, including during the initial upload.
I tried excluding those initial creations (event_timestamp != page_creation_timestamp), and it looks like the proportion goes from 99% to 50%.
Jan 18 2019
Jan 18 2019
Jan 17 2019
Jan 17 2019
Thanks for clarifying! Okay, one more question for @Abit & @Ramsey-WMF just so everyone is on the same page. The statistic you want is: the % of all uploaded files which have had additions to their pages in the first 2 months after upload.
Jan 16 2019
Jan 16 2019
Just updated my database of sampled pages (using December 2018 snapshot) and recounted pageviews from 2018-11-01 to 2019-01-15 (code & data over at GitHub). There has not been any change, up or down since the rollout:
mpopov closed T211191: Check in sameAs A/B test results, a subtask of T209891: Analyze results of sameAs A/B test, as Resolved.
mpopov moved T211191: Check in sameAs A/B test results from Next Up to Doing on the Product-Analytics board.
@Ramsey-WMF @Abit: hi, I would like to clarify what "metadata" includes. Here's my initial list:
Jan 15 2019
Jan 15 2019
mpopov moved T211191: Check in sameAs A/B test results from Backlog to Next Up on the Product-Analytics board.
Yup, I just double checked that I could still reproduce it in 6.1.4 (yes) as a consistency check and then installed 6.2.0, tried it, and this has been fixed. Erasing articles & lists works correctly in v6.2! Thanks & good job @NHarateh_WMF & @JoeWalsh!
Dbrant awarded T211366: Android navigation refresh - understand impact on user engagement metrics a Mountain of Wealth token.
mpopov updated subscribers of T211366: Android navigation refresh - understand impact on user engagement metrics.
A huge chunk of my analysis got invalidated when Dmitry & I found out that the underlying data was faulty (T213190). Specifically, all of the analysis related to session length, number of sessions, number of pages read per session. Unfortunately the nature of the bug means that we won't be able to compare those metrics before & after the update.
Jan 10 2019
Jan 10 2019
mpopov moved T191081: Explore event logging and metrics for Multilingual file captions from Triage to Next Up on the Product-Analytics board.
mpopov moved T213458: Analyse Android notifications release from Triage to Backlog on the Product-Analytics board.
mpopov moved T213460: Analytics schema for edit action feed from Triage to Backlog on the Product-Analytics board.
mpopov moved T213190: Duplicates sent by Android app's event logging from Triage to Tracking on the Product-Analytics board.
Jan 9 2019
Jan 9 2019
mpopov updated subscribers of T212172: Provide feature parity between the wiki replicas and the Analytics Data Lake.
Nevermind, per T170022#4800915 & T170022#4866564 I guess there's nobody actually managing Maps and RI is just doing maintenance and fixing critical bugs.
mpopov added a comment to T212386: Provide tools for querying MediaWiki replica databases without having to specify the shard.
By the way, on ouR side, the package 'wmf' (which I maintain) that we use for querying databases from R can be augmented to have an internal map of dbs to shards, and I can easily add a way to update that map from an external file. We would just require that the JSON/YAML/CSV/whatever file with the mapping is publicly accessible.
mpopov updated subscribers of T212172: Provide feature parity between the wiki replicas and the Analytics Data Lake.
In T212172#4853129, @chelsyx wrote:Here're some use cases from my work for the iOS app team:
- Of course, as @Neil_P._Quinn_WMF mentioned in T161149, I will need change_tag and change_tag_def to figure out which edits are made through the iOS app.
- For Wikidata short description edits made through the app, revision_comment_temp and comment tables are needed to figure out which language this edits is.
Jan 8 2019
Jan 8 2019
Jan 3 2019
Jan 3 2019
Jan 2 2019
Jan 2 2019
mpopov closed T202791: Update Audiences page and Key Product Metrics with October 2018 Readers data as Resolved.
Done
mpopov updated the task description for T202791: Update Audiences page and Key Product Metrics with October 2018 Readers data.
Instances deleted
Dec 28 2018
Dec 28 2018
chelsyx awarded T204688: cloudvps: shiny-r project trusty deprecation a Like token.
Dec 19 2018
Dec 19 2018
Done. Just tested everything and it's all good, so I've deleted the instances running Ubuntu Trusty. The only instances up are running Debian Stretch.
Dec 18 2018
Dec 18 2018
@EBjune Hiya! Even though I'm an admin for the project I can't do much because Horizon thinks there are 2 more instances than there actually are.
Dec 17 2018
Dec 17 2018
We seem to have a few ghost instances that are preventing me from launching a new Stretch instance which would replace the existing discovery-production one that runs on Trusty:
It works! :D Thanks, @aborrero!
Dec 13 2018
Dec 13 2018
Thank you, @Nuria!
mpopov awarded T211706: Superset Updates a Love token.
mpopov added a comment to T211366: Android navigation refresh - understand impact on user engagement metrics.
In T211366#4820766, @Charlotte wrote:If you can suggest ways to redesign the reading list analytics (separate ticket for us) let's do that based on the research questions we memorialised in the design deck for that feature.
mpopov closed T211606: As a user of Superset I would like it to be up-to-date so I'm not blocked by bugs that have already been fixed as Declined.
In T211606#4814078, @elukey wrote:@mpopov please also keep in mind that things like T211605#4814020 could happen with a project that is still developing fast and does not care much about breaking existing users, so upgrades might not be easy :D
@EBjune I am waiting for someone with access to the internal WMF apt repository to make shiny-server available.
mpopov updated the task description for T202790: Update Audiences page and Key Product Metrics with September 2018 Readers data.
Dec 12 2018
Dec 12 2018
mpopov updated subscribers of T211366: Android navigation refresh - understand impact on user engagement metrics.
- Plan of action
mpopov moved T211366: Android navigation refresh - understand impact on user engagement metrics from Next Up to Doing on the Product-Analytics board.
Dec 11 2018
Dec 11 2018
mpopov updated subscribers of T197986: Evaluate and Quantify the state of multilingual labels on Wikidata.
Finished what I could and the findings are available over at https://people.wikimedia.org/~bearloga/reports/wikidata-incompleteness.html
mpopov added a comment to T211606: As a user of Superset I would like it to be up-to-date so I'm not blocked by bugs that have already been fixed.
I'll check with the team at our meeting later today and let you know :)
Alright, dependencies for Shiny Server resolved. Now on to the problem of Shiny Server itself:
Dec 10 2018
Dec 10 2018
mpopov moved T186044: Reorganize metrics dashboard for Search Platform from Doing to Stalled on the Product-Analytics board.
I think the window on this closed.
Updating status to reflect reality.
Here's a draft of the report I sent out privately back in July: https://commons.wikimedia.org/wiki/File:Wikipedia_Android_app_multilingual_update_post-release_report.pdf
mpopov closed T184091: Multi-lingual use of Android app, a subtask of T184098: [EPIC] Analytics baseline for Android app, as Resolved.
@ovasileva feel free to re-open if you have additional questions
mpopov closed T211190: SameAs A/B test preliminary analysis, a subtask of T211191: Check in sameAs A/B test results, as Resolved.
@kzimmerman, @Neil_P._Quinn_WMF, and I need to schedule a discussion with @MNovotny_WMF to follow-up on the results
So @Gehel told me we might actually be fine. I'm testing out this theory but ran into a problem with the current puppet config for shiny_server module:
Dec 7 2018
Dec 7 2018
Things are looking okay so far:
Dec 6 2018
Dec 6 2018
mpopov moved T209891: Analyze results of sameAs A/B test from Triage to Backlog on the Product-Analytics board.
Content licensed under Creative Commons Attribution-ShareAlike 3.0 (CC-BY-SA) unless otherwise noted; code licensed under GNU General Public License (GPL) or other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL