@ngkountas @KCVelaga_WMF: Perhaps this will be useful: https://wikitech.wikimedia.org/wiki/Metrics_Platform/How_to/Create_An_Instrument#In_JavaScript
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Today
Fri, May 17
Thu, May 16
@CDanis: Can you please paste the Spark code / Spark SQL query you used for reproducibility?
Wed, May 15
@Krinkle: Thank you for sharing the results of your queries in a manner consistent with the Data Publication Guidelines.
Tue, May 14
Mon, May 13
Thu, May 9
Tue, May 7
Thu, May 2
@AUgolnikova-WMF: Can you please fill out the details in the description to help me understand if/how this should be prioritized?
Wed, Apr 24
Approved! (airflow-analytics-product-admins membership)
Approved! (airflow-analytics-product-admins membership)
Apr 19 2024
@AndrewTavis_WMDE asked me for some thoughts/suggestions here :)
Apr 18 2024
Apr 16 2024
a simple way to accomplish this
I think there's a simple way to accomplish this but I don't think the end result would be particularly useful. I believe that for the end result to be useful "I want to know how many users have this feature enabled" this will need careful planning and consideration to account for the complexity of user preferences.
@VirginiaPoundstone: Howdy! The underlying dataset will probably be the hardest part of this because of the challenges of how user preferences are stored and used. And then yeah, a Superset dashboard would be the simplest way to make that data available to the end users. It wouldn't be through Turnilo because the metrics aren't additive across dimensions, so it would need to be Superset.
Apr 15 2024
In T252227#9655162, @dr0ptp4kt wrote:Okay, if I understand correctly, then the idea would be to...
- Continue "allowing" tagging of wprov for non-200 HTTP responses. It's mainly important people don't accidentally count those as pageviews when they're not pageviews (i.e., they should be using is_pageview or something similarly precise). It's useful to be able to quickly zoom in on these sorts of requests anyway, so even for a 30x response it is nice to have.
- If there's a 30x response for a redirect from desktop to mobile web and the URL came bearing a wprov, add that same wprov parameter name-value pair and also add the parameter name-value pair of rprov=1 in the target redirect URL (that's the thing that will be emitted in the Location: header).
Do I understand correctly?
@VirginiaPoundstone Howdy! This is not a blocking anything. Thanks for checking!
Prompted by confusion in Slack
Apr 3 2024
@Pginer-WMF: Please keep in mind that this curiosity needs to be prioritized in the context of other work requested of KC for Language. For him, the highest priority tasks are:
- T356765: Correlation between article length, number of translations within a time period, experience of users, and deletion rate.
- T341185: Design instrumentation approach for machine translation of Wikipedia article contents
So he might not get around to this for a while.
Apr 1 2024
@KSarabia-WMF: You will probably need to submit a request through L3SC for this.
Mar 26 2024
import os
@nettrom_WMF will be developing it as an essential metric in the next FY for SDS 2.2 and will likely use that definition.
Mar 25 2024
Assigning to Megan who will work on this in Q4 to get baselines & reasonable target in before start of next FY.
Mar 22 2024
Thank you so much for looking into it, @phuedx!!!
@jwang: Hive has been / is being deprecated. I recommend you use Spark SQL instead (which also works with Iceberg tables, which hive CLI and wmfdata.hive do not).
Mar 19 2024
After examining:
- mediawiki.web_ui_actions stream config
- Schema for the MP-based instrument: /analytics/mediawiki/product_metrics/web_ui_actions/1.0.1
- The (desktop) instrument which makes a few modifications (like storing name as action_source) before submitting the interaction data with Metrics Platform.
I think it depends on the technical specifics of the migration, which @KSarabia-WMF would be able to verify. Essentially my main concern was potential presence of new permalink-copied events coming from original instrument that might not be present in the new MP-based instrument, which would impact the data QA. If the new instrument automatically produces these new permalink events, then all is good and there's no concern.
Mar 17 2024
Mar 15 2024
Hey folks, is this being coordinated with the Web team who are currently in the middle of migrating *UIActionTracking to the Metrics Platform? T344274: Adopt Web Team Instrumentation to Metrics Platform
Mar 14 2024
Assigning to @Iflorez to perform the last step (as owner of the dashboard)
$ sudo -u analytics-product kerberos-run-command analytics-product hive -e "use wmf_product; drop table wikipedia_chatgpt_plugin_searches;" OK Time taken: 1.486 seconds OK Time taken: 1.972 seconds $ sudo -u analytics-product kerberos-run-command analytics-product hdfs dfs -rm -R -skipTrash /user/analytics-product/data/wikipedia_chatgpt_plugin_searches Deleted /user/analytics-product/data/wikipedia_chatgpt_plugin_searches
Mar 12 2024
+1 to Isaac's proposed solution of carrying wprov forward as wprov but also setting rprov=1 in case of a redirect to simplify analysis.
Mar 1 2024
Thank you @cmooney for taking this non-standard case on and helping KC out! This dual account thing has become a real thorn for KC so I'm glad we're on a path to get it taken care of.
Approved from my side, both for the request in general and analytics-product-users membership :)
Feb 29 2024
I would advise against using "session" in the name. When Jason and I were writing https://docs.google.com/document/d/100B4c1GqHHCAGnWLbDrgMQIJzxNI8vuf7jEMKG7DKeg/edit?usp=sharing we surveyed the landscape of schemas/instruments and saw that everything was just a session and there being multiple levels of sessions: https://docs.google.com/document/d/11xTwL_j0BWgfdtZ_GOIlxg22rLP2lRraaA2c_-uMwJQ/edit#
Feb 28 2024
Okay, so it's been a few years now and this bug still exists and impacts the quality of our analyses substantially (especially for Future Audiences experiments that are aimed at mobile users) and we're not any closer to instrumenting pageviews.
As with the other task, Irene has a to-do this week to wrap this up by updating https://meta.wikimedia.org/wiki/Research_and_Decision_Science/Data_glossary#Contributor_metrics
Irene has a to-do this week to wrap this up by updating https://meta.wikimedia.org/wiki/Research_and_Decision_Science/Data_glossary#Contributor_metrics
Feb 23 2024
@SBisson: root-level dt is the client-side timestamp, while meta.dt is the server-side timestamp. Both are useful to have for analysis.
Feb 22 2024
On contribution side:
In some cases the instrument sends event data containing activity_session_id and dt fields which are not present in version 1.2.0 of the contribution schema: https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/jsonschema/analytics/mediawiki/wikistories_contribution_event/1.2.0.yaml so a lot of events aren't passing schema validation.
Feb 19 2024
Oh that's brilliant! Thanks so much for looking into it and shining light on this @phuedx!
Feb 16 2024
The decision has been made (to go with Vue/Codex).
Feb 15 2024
Thank you, Connie!
Feb 14 2024
Unsure to what end this request was for.
The ChatGPT plug-in experiment has concluded. We'll just need to do some clean-up (Phab task coming later).
https://docs.google.com/document/d/1rzD_rJEzH3HmyIekmtojfiwyXqPVdEIHBviFA8_HgH0/edit#heading=h.w7kdor9r67qo is as final as it will get. We'll need to adopt it widely on the team (and do more peer reviews in the first place) but right now there's nothing left to do here.
Feb 13 2024
SELECT normalized_host.project, namespace_id IS NULL AS ns_id_is_null, element_at(x_analytics_map, 'ns') IS NULL AS x_ns_is_null, page_id IS NULL AS page_id_is_null, element_at(x_analytics_map, 'page_id') IS NULL AS x_page_id_is_null, element_at(x_analytics_map, 'rev_id') IS NULL AS x_rev_id_is_null, COUNT(1) AS n_pageviews FROM wmf.webrequest WHERE webrequest_source = 'text' AND year = 2024 AND month = 2 AND day = 12 AND hour = 1 AND is_pageview AND uri_host IN('en.wikipedia.org', 'en.m.wikipedia.org', 'commons.wikimedia.org', 'commons.m.wikimedia.org') GROUP BY 1, 2, 3, 4, 5, 6 ORDER BY project, ns_id_is_null, x_ns_is_null, page_id_is_null, x_page_id_is_null, x_rev_id_is_null
@Mayakp.wiki: you and others are already in product-analytics-announce@, but the alerts aren't sent to that.
Feb 9 2024
SELECT user_agent from wmf.webrequest WHERE webrequest_source = 'text' AND year = 2024 AND month = 2 AND day = 9 AND hour = 14 AND uri_host = 'www.wikidata.org' AND is_pageview AND namespace_id = 640 AND agent_type = 'spider' AND user_agent = '-' LIMIT 10
Just tried the query in the description with some recent dates but the dates I picked didn't have any requests with '-' UA strings, so it's hard to know if the problem persists. The query runs fine with user_agent = '-' just returns no data. I just made a request to https://www.wikidata.org/wiki/EntitySchema:E1 (according to https://www.wikidata.org/wiki/Help:Namespaces namespace 640 is EntitySchema so I just picked one):
Feb 8 2024
@Jelto: This is to access dashboards in Superset that do access private data, no need for Hadoop/analytics client servers access. So no SSH public key needed. See https://wikitech.wikimedia.org/wiki/Analytics/Data_access
Feb 7 2024
@cchen: How about Superset & Hue?
Tagging DPE SRE in case this is specific to those tools.
Not yet. I believe @AndrewTavis_WMDE will be sharing some findings from WMDE side soon.
Thanks for merging & deploying, @JAllemandou!
Feb 6 2024
Thanks so much @MoritzMuehlenhoff!!
@BTullis: heads-up that @cchen won't be able to until T356645: Production data & systems access restoration for Connie Chen is resolved
@Pginer-WMF: Does this question relate to any of the hypothesis work your team is doing? If so, can you please share with hypothesis?
Feb 5 2024
@VirginiaPoundstone @WDoranWMF I renamed this task to refer to the (now deprecated) monoschema version of Metrics Platform. The partial migrations of the instruments were removed (see T351337, T351335).