Using statistical analysis, Bayesian inference, machine learning, and software/data engineering to solve problems and inform decisions in Product Analytics
User Details
- User Since
- Jul 27 2015, 4:15 PM (457 w, 1 d)
- Availability
- Available
- IRC Nick
- bearloga
- LDAP User
- Bearloga
- MediaWiki User
- MPopov (WMF) [ Global Accounts ]
Wed, Apr 24
Approved! (airflow-analytics-product-admins membership)
Approved! (airflow-analytics-product-admins membership)
Fri, Apr 19
@AndrewTavis_WMDE asked me for some thoughts/suggestions here :)
Thu, Apr 18
Tue, Apr 16
a simple way to accomplish this
I think there's a simple way to accomplish this but I don't think the end result would be particularly useful. I believe that for the end result to be useful "I want to know how many users have this feature enabled" this will need careful planning and consideration to account for the complexity of user preferences.
@VirginiaPoundstone: Howdy! The underlying dataset will probably be the hardest part of this because of the challenges of how user preferences are stored and used. And then yeah, a Superset dashboard would be the simplest way to make that data available to the end users. It wouldn't be through Turnilo because the metrics aren't additive across dimensions, so it would need to be Superset.
Mon, Apr 15
@VirginiaPoundstone Howdy! This is not a blocking anything. Thanks for checking!
Prompted by confusion in Slack
Wed, Apr 3
@Pginer-WMF: Please keep in mind that this curiosity needs to be prioritized in the context of other work requested of KC for Language. For him, the highest priority tasks are:
- T356765: Correlation between article length, number of translations within a time period, experience of users, and deletion rate.
- T341185: Design instrumentation approach for machine translation of Wikipedia article contents
So he might not get around to this for a while.
Mon, Apr 1
@KSarabia-WMF: You will probably need to submit a request through L3SC for this.
Mar 26 2024
import os
@nettrom_WMF will be developing it as an essential metric in the next FY for SDS 2.2 and will likely use that definition.
Mar 25 2024
Assigning to Megan who will work on this in Q4 to get baselines & reasonable target in before start of next FY.
Mar 22 2024
Thank you so much for looking into it, @phuedx!!!
@jwang: Hive has been / is being deprecated. I recommend you use Spark SQL instead (which also works with Iceberg tables, which hive CLI and wmfdata.hive do not).
Mar 19 2024
After examining:
- mediawiki.web_ui_actions stream config
- Schema for the MP-based instrument: /analytics/mediawiki/product_metrics/web_ui_actions/1.0.1
- The (desktop) instrument which makes a few modifications (like storing name as action_source) before submitting the interaction data with Metrics Platform.
I think it depends on the technical specifics of the migration, which @KSarabia-WMF would be able to verify. Essentially my main concern was potential presence of new permalink-copied events coming from original instrument that might not be present in the new MP-based instrument, which would impact the data QA. If the new instrument automatically produces these new permalink events, then all is good and there's no concern.
Mar 17 2024
Mar 15 2024
Hey folks, is this being coordinated with the Web team who are currently in the middle of migrating *UIActionTracking to the Metrics Platform? T344274: Adopt Web Team Instrumentation to Metrics Platform
Mar 14 2024
Assigning to @Iflorez to perform the last step (as owner of the dashboard)
$ sudo -u analytics-product kerberos-run-command analytics-product hive -e "use wmf_product; drop table wikipedia_chatgpt_plugin_searches;" OK Time taken: 1.486 seconds OK Time taken: 1.972 seconds $ sudo -u analytics-product kerberos-run-command analytics-product hdfs dfs -rm -R -skipTrash /user/analytics-product/data/wikipedia_chatgpt_plugin_searches Deleted /user/analytics-product/data/wikipedia_chatgpt_plugin_searches
Mar 12 2024
+1 to Isaac's proposed solution of carrying wprov forward as wprov but also setting rprov=1 in case of a redirect to simplify analysis.
Mar 1 2024
Thank you @cmooney for taking this non-standard case on and helping KC out! This dual account thing has become a real thorn for KC so I'm glad we're on a path to get it taken care of.
Approved from my side, both for the request in general and analytics-product-users membership :)
Feb 29 2024
I would advise against using "session" in the name. When Jason and I were writing https://docs.google.com/document/d/100B4c1GqHHCAGnWLbDrgMQIJzxNI8vuf7jEMKG7DKeg/edit?usp=sharing we surveyed the landscape of schemas/instruments and saw that everything was just a session and there being multiple levels of sessions: https://docs.google.com/document/d/11xTwL_j0BWgfdtZ_GOIlxg22rLP2lRraaA2c_-uMwJQ/edit#
Feb 28 2024
Okay, so it's been a few years now and this bug still exists and impacts the quality of our analyses substantially (especially for Future Audiences experiments that are aimed at mobile users) and we're not any closer to instrumenting pageviews.
As with the other task, Irene has a to-do this week to wrap this up by updating https://meta.wikimedia.org/wiki/Research_and_Decision_Science/Data_glossary#Contributor_metrics
Irene has a to-do this week to wrap this up by updating https://meta.wikimedia.org/wiki/Research_and_Decision_Science/Data_glossary#Contributor_metrics
Feb 23 2024
@SBisson: root-level dt is the client-side timestamp, while meta.dt is the server-side timestamp. Both are useful to have for analysis.
Feb 22 2024
On contribution side:
In some cases the instrument sends event data containing activity_session_id and dt fields which are not present in version 1.2.0 of the contribution schema: https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/jsonschema/analytics/mediawiki/wikistories_contribution_event/1.2.0.yaml so a lot of events aren't passing schema validation.
Feb 19 2024
Oh that's brilliant! Thanks so much for looking into it and shining light on this @phuedx!
Feb 16 2024
The decision has been made (to go with Vue/Codex).
Feb 15 2024
Thank you, Connie!
Feb 14 2024
Unsure to what end this request was for.
The ChatGPT plug-in experiment has concluded. We'll just need to do some clean-up (Phab task coming later).
https://docs.google.com/document/d/1rzD_rJEzH3HmyIekmtojfiwyXqPVdEIHBviFA8_HgH0/edit#heading=h.w7kdor9r67qo is as final as it will get. We'll need to adopt it widely on the team (and do more peer reviews in the first place) but right now there's nothing left to do here.
Feb 13 2024
SELECT normalized_host.project, namespace_id IS NULL AS ns_id_is_null, element_at(x_analytics_map, 'ns') IS NULL AS x_ns_is_null, page_id IS NULL AS page_id_is_null, element_at(x_analytics_map, 'page_id') IS NULL AS x_page_id_is_null, element_at(x_analytics_map, 'rev_id') IS NULL AS x_rev_id_is_null, COUNT(1) AS n_pageviews FROM wmf.webrequest WHERE webrequest_source = 'text' AND year = 2024 AND month = 2 AND day = 12 AND hour = 1 AND is_pageview AND uri_host IN('en.wikipedia.org', 'en.m.wikipedia.org', 'commons.wikimedia.org', 'commons.m.wikimedia.org') GROUP BY 1, 2, 3, 4, 5, 6 ORDER BY project, ns_id_is_null, x_ns_is_null, page_id_is_null, x_page_id_is_null, x_rev_id_is_null
@Mayakp.wiki: you and others are already in product-analytics-announce@, but the alerts aren't sent to that.
Feb 9 2024
SELECT user_agent from wmf.webrequest WHERE webrequest_source = 'text' AND year = 2024 AND month = 2 AND day = 9 AND hour = 14 AND uri_host = 'www.wikidata.org' AND is_pageview AND namespace_id = 640 AND agent_type = 'spider' AND user_agent = '-' LIMIT 10
Just tried the query in the description with some recent dates but the dates I picked didn't have any requests with '-' UA strings, so it's hard to know if the problem persists. The query runs fine with user_agent = '-' just returns no data. I just made a request to https://www.wikidata.org/wiki/EntitySchema:E1 (according to https://www.wikidata.org/wiki/Help:Namespaces namespace 640 is EntitySchema so I just picked one):
Feb 8 2024
@Jelto: This is to access dashboards in Superset that do access private data, no need for Hadoop/analytics client servers access. So no SSH public key needed. See https://wikitech.wikimedia.org/wiki/Analytics/Data_access
Feb 7 2024
@cchen: How about Superset & Hue?
Tagging DPE SRE in case this is specific to those tools.
Not yet. I believe @AndrewTavis_WMDE will be sharing some findings from WMDE side soon.
Thanks for merging & deploying, @JAllemandou!
Feb 6 2024
Thanks so much @MoritzMuehlenhoff!!
@BTullis: heads-up that @cchen won't be able to until T356645: Production data & systems access restoration for Connie Chen is resolved
@Pginer-WMF: Does this question relate to any of the hypothesis work your team is doing? If so, can you please share with hypothesis?
Feb 5 2024
@VirginiaPoundstone @WDoranWMF I renamed this task to refer to the (now deprecated) monoschema version of Metrics Platform. The partial migrations of the instruments were removed (see T351337, T351335).
Feb 1 2024
Shay met with me for consultation on this.
Before closing this task / withdrawing the request I'd like to get a confirmation from Goran whether the level of access is still needed and if it's just for the WMDE pipelines, in which case it would make sense to prioritize migrating those to WMDE's Airflow instance so that we can eventually revoke the unrestricted access to highly sensitive information.
Jan 31 2024
I'm also sorry for making the request without knowing about the prior request.
We should add expiry_date and expiry_contact fields to reflect the NDA
Thank you @AndrewTavis_WMDE for alerting us of this. Pinging @Manuel for visibility.
Jan 29 2024
This leads me to conclude that use of the MediaWiki core JS HTML docs is minimal, while interest in the docs may be high.
Jan 26 2024
@nettrom_WMF Thank you for sharing that code! I recently used it in T353666 and it was very helpful! Just wanted to show my appreciation.
Jan 24 2024
@KCVelaga_WMF Sam and I just chatted about this and enwiki might actually be good enough as a proxy for the service in general, and that's unlikely for the service to break on one particular wiki and not others.