Page MenuHomePhabricator
Feed Advanced Search

Today

mpopov added a comment to T363685: MinT MVP: Implement instrumentation for key events .

@ngkountas @KCVelaga_WMF: Perhaps this will be useful: https://wikitech.wikimedia.org/wiki/Metrics_Platform/How_to/Create_An_Instrument#In_JavaScript

Tue, May 21, 3:19 PM · Patch-For-Review, Language-Team (Language-2024-April-June), MinT

Fri, May 17

mpopov updated the task description for T365203: [Data Quality] Implement wiki completeness check for MediaWiki History.
Fri, May 17, 1:30 PM · Data-Engineering

Thu, May 16

mpopov renamed T365203: [Data Quality] Implement wiki completeness check for MediaWiki History from [Data Quality] Implement completeness check for MediaWiki History to [Data Quality] Implement wiki completeness check for MediaWiki History.
Thu, May 16, 9:12 PM · Data-Engineering
mpopov created T365203: [Data Quality] Implement wiki completeness check for MediaWiki History.
Thu, May 16, 9:10 PM · Data-Engineering
mpopov added a comment to T365197: ISPDatabaseReader null pointer exception.

@CDanis: Can you please paste the Spark code / Spark SQL query you used for reproducibility?

Thu, May 16, 8:58 PM · Data-Platform-SRE (2024.05.06 - 2024.05.26), Patch-For-Review, Data-Engineering
mpopov created T365188: Portuguese and Spanish Wikipedia access for Mikhail Popov and Connie Chen.
Thu, May 16, 6:08 PM · Search-Console-access-request
mpopov changed the status of T304086: wmfdata-r v2 should mainly be a wrapper for wmfdata-py from Open to Stalled.
Thu, May 16, 5:55 PM · Product-Analytics
mpopov created T365144: Application Security Review Request : Quarto.
Thu, May 16, 1:14 PM · Product-Analytics, secscrum, Security, Application Security Reviews

Wed, May 15

mpopov added a comment to T342267: Investigate surprising "10% Other" portion of Analytics Browsers report.

@Krinkle: Thank you for sharing the results of your queries in a manner consistent with the Data Publication Guidelines.

Wed, May 15, 2:32 PM · Analytics-Data-Problem, Data Products (Data Products Sprint 14), MediaWiki-Platform-Team (Radar), Data-Engineering, Data-Engineering-Dashiki

Tue, May 14

mpopov awarded T363616: Explore citations included with revisions by editor experience and revert rate a Love token.
Tue, May 14, 7:31 PM · Wikimedia-Hackathon-2024
mpopov updated the task description for T364398: Add MW table 'cu_log' to data lake.
Tue, May 14, 4:12 PM · Data-Engineering, Data-Platform

Mon, May 13

mpopov updated the task description for T364547: Year-end report on FY2023-24 KR WE2.1.
Mon, May 13, 8:24 PM · Product-Analytics, FY2023-24-WE 2.1 Typography and palette customizations, Web-Team-Backlog

Thu, May 9

mpopov triaged T363238: Create measurement plan and instrumentation spec for IP reputation instrumentation as Medium priority.
Thu, May 9, 7:08 PM · Product-Analytics (Kanban)

Tue, May 7

mpopov triaged T364398: Add MW table 'cu_log' to data lake as Medium priority.
Tue, May 7, 3:34 PM · Data-Engineering, Data-Platform

Thu, May 2

mpopov added a comment to T362211: Baseline metrics for logo detection on upload wizard.

@AUgolnikova-WMF: Can you please fill out the details in the description to help me understand if/how this should be prioritized?

Thu, May 2, 7:00 PM · Product-Analytics, Structured-Data-Backlog
mpopov placed T362211: Baseline metrics for logo detection on upload wizard up for grabs.
Thu, May 2, 6:56 PM · Product-Analytics, Structured-Data-Backlog

Wed, Apr 24

mpopov added a comment to T363360: Requesting membership in airflow-analytics-product-admins for hghani.

Approved! (airflow-analytics-product-admins membership)

Wed, Apr 24, 6:22 PM · Movement-Insights, SRE, SRE-Access-Requests
mpopov added a comment to T363288: Requesting membership in airflow-analytics-product-admins for nshahquinn-wmf.

Approved! (airflow-analytics-product-admins membership)

Wed, Apr 24, 6:22 PM · Movement-Insights, SRE, SRE-Access-Requests

Apr 19 2024

mpopov updated subscribers of T362849: [Analytics] Segments of Wikidata's data over time.

@AndrewTavis_WMDE asked me for some thoughts/suggestions here :)

Apr 19 2024, 7:16 PM · Wikidata Analytics (Kanban), Wikidata

Apr 18 2024

mpopov triaged T362874: Gather baseline data for time-to-data-collection as High priority.
Apr 18 2024, 12:15 PM · Product-Analytics (Kanban)
mpopov edited projects for T362874: Gather baseline data for time-to-data-collection, added: Product-Analytics (Kanban); removed Product-Analytics.
Apr 18 2024, 12:14 PM · Product-Analytics (Kanban)
mpopov created T362874: Gather baseline data for time-to-data-collection.
Apr 18 2024, 12:14 PM · Product-Analytics (Kanban)

Apr 16 2024

mpopov added a comment to T361684: Create per-wiki user preference metrics.

a simple way to accomplish this

I think there's a simple way to accomplish this but I don't think the end result would be particularly useful. I believe that for the end result to be useful "I want to know how many users have this feature enabled" this will need careful planning and consideration to account for the complexity of user preferences.

Apr 16 2024, 2:59 PM · Product-Analytics, Data Products
mpopov added a comment to T361684: Create per-wiki user preference metrics.

@VirginiaPoundstone: Howdy! The underlying dataset will probably be the hardest part of this because of the challenges of how user preferences are stored and used. And then yeah, a Superset dashboard would be the simplest way to make that data available to the end users. It wouldn't be through Turnilo because the metrics aren't additive across dimensions, so it would need to be Superset.

Apr 16 2024, 2:56 PM · Product-Analytics, Data Products

Apr 15 2024

mpopov added a comment to T252227: Mobile redirects drop provenance parameters.

Okay, if I understand correctly, then the idea would be to...

  1. Continue "allowing" tagging of wprov for non-200 HTTP responses. It's mainly important people don't accidentally count those as pageviews when they're not pageviews (i.e., they should be using is_pageview or something similarly precise). It's useful to be able to quickly zoom in on these sorts of requests anyway, so even for a 30x response it is nice to have.
  1. If there's a 30x response for a redirect from desktop to mobile web and the URL came bearing a wprov, add that same wprov parameter name-value pair and also add the parameter name-value pair of rprov=1 in the target redirect URL (that's the thing that will be emitted in the Location: header).

Do I understand correctly?

Apr 15 2024, 8:41 PM · Data-Engineering, Data Pipelines, Traffic-Icebox, SRE
mpopov added a comment to T355182: Past edits increase in wmf.edit_hourly with every new snapshot.

@VirginiaPoundstone Howdy! This is not a blocking anything. Thanks for checking!

Apr 15 2024, 3:02 PM · Data Products, Data-Engineering
mpopov added a comment to T362535: Outdated referral traffic info in FAQ.

Prompted by confusion in Slack

Apr 15 2024, 2:15 PM · Tool-Pageviews
mpopov created T362535: Outdated referral traffic info in FAQ.
Apr 15 2024, 2:14 PM · Tool-Pageviews

Apr 3 2024

mpopov triaged T361086: Fetch information about 2 millionth CX/SX edit/article as Low priority.

@Pginer-WMF: Please keep in mind that this curiosity needs to be prioritized in the context of other work requested of KC for Language. For him, the highest priority tasks are:

So he might not get around to this for a while.

Apr 3 2024, 1:47 PM · Product-Analytics (Kanban), Language-analytics

Apr 1 2024

mpopov added a comment to T361475: Add mwclientpreferences cookie to Wikimedia Cookie Statement.

@KSarabia-WMF: You will probably need to submit a request through L3SC for this.

Apr 1 2024, 2:40 PM · Web-Team-Backlog

Mar 26 2024

mpopov closed T361024: NEW BUG REPORT SSL certificate verification error when using internal API endpoints from conda-analytics and Jupyter on stat host as Resolved.
import os
Mar 26 2024, 8:57 PM · Data-Platform-SRE, Data-Platform
mpopov added a comment to T360829: WE 1.2: Establish baseline for constructive activation.

@nettrom_WMF will be developing it as an essential metric in the next FY for SDS 2.2 and will likely use that definition.

Mar 26 2024, 3:43 PM · Product-Analytics (Kanban), Editing-team (Tracking)
mpopov updated the task description for T361024: NEW BUG REPORT SSL certificate verification error when using internal API endpoints from conda-analytics and Jupyter on stat host.
Mar 26 2024, 3:14 PM · Data-Platform-SRE, Data-Platform
mpopov triaged T361024: NEW BUG REPORT SSL certificate verification error when using internal API endpoints from conda-analytics and Jupyter on stat host as Low priority.
Mar 26 2024, 3:13 PM · Data-Platform-SRE, Data-Platform

Mar 25 2024

mpopov moved T360829: WE 1.2: Establish baseline for constructive activation from Triage to Upcoming Quarter on the Product-Analytics board.
Mar 25 2024, 5:28 PM · Product-Analytics (Kanban), Editing-team (Tracking)
mpopov assigned T360829: WE 1.2: Establish baseline for constructive activation to MNeisler.

Assigning to Megan who will work on this in Q4 to get baselines & reasonable target in before start of next FY.

Mar 25 2024, 5:27 PM · Product-Analytics (Kanban), Editing-team (Tracking)

Mar 22 2024

mpopov added a comment to T360829: WE 1.2: Establish baseline for constructive activation.

From https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Newcomer_tasks/Experiment_analysis,_November_2020#Detailed_findings

Mar 22 2024, 11:19 PM · Product-Analytics (Kanban), Editing-team (Tracking)
mpopov added a comment to T346350: Add revision ID to X-Analytics header.

Thank you so much for looking into it, @phuedx!!!

Mar 22 2024, 11:09 PM · MW-1.42-notes (1.42.0-wmf.17; 2024-02-06), Moderator-Tools-Team (Kanban), Traffic, good first task, Data Products, Product-Analytics, Automoderator
mpopov added a comment to T359993: Slowdown when querying via Hive.

@jwang: Hive has been / is being deprecated. I recommend you use Spark SQL instead (which also works with Iceberg tables, which hive CLI and wmfdata.hive do not).

Mar 22 2024, 5:06 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Data-Platform

Mar 19 2024

mpopov added a comment to T359182: Instrument permalink timestamps.

After examining:

Mar 19 2024, 6:02 PM · Editing QA, Product-Analytics (Kanban), MW-1.42-notes (1.42.0-wmf.22; 2024-03-12), Patch-For-Review, Editing-team (Kanban Board), DiscussionTools
mpopov updated subscribers of T359182: Instrument permalink timestamps.

I think it depends on the technical specifics of the migration, which @KSarabia-WMF would be able to verify. Essentially my main concern was potential presence of new permalink-copied events coming from original instrument that might not be present in the new MP-based instrument, which would impact the data QA. If the new instrument automatically produces these new permalink events, then all is good and there's no concern.

Mar 19 2024, 12:56 PM · Editing QA, Product-Analytics (Kanban), MW-1.42-notes (1.42.0-wmf.22; 2024-03-12), Patch-For-Review, Editing-team (Kanban Board), DiscussionTools

Mar 17 2024

mpopov awarded T347970: [L] MachineVision: archive and remove all events and event schemas a Like token.
Mar 17 2024, 1:06 AM · Patch-For-Review, Structured-Data-Backlog (Current Work), MachineVision

Mar 15 2024

mpopov added a comment to T359182: Instrument permalink timestamps.

Hey folks, is this being coordinated with the Web team who are currently in the middle of migrating *UIActionTracking to the Metrics Platform? T344274: Adopt Web Team Instrumentation to Metrics Platform

Mar 15 2024, 1:41 PM · Editing QA, Product-Analytics (Kanban), MW-1.42-notes (1.42.0-wmf.22; 2024-03-12), Patch-For-Review, Editing-team (Kanban Board), DiscussionTools

Mar 14 2024

mpopov added a comment to T359440: [REQUEST] Product Analytics data instrumentation: CTR, Cirrus Search data and Superset dashboarding thereof.

Queries done: https://gitlab.wikimedia.org/repos/product-analytics/data-pipelines/-/tree/main/citation_needed/searches

Mar 14 2024, 7:41 PM · Product-Analytics (Kanban)
mpopov reassigned T360129: Decommission Wikipedia ChatGPT Plugin searches data from mpopov to Iflorez.

Assigning to @Iflorez to perform the last step (as owner of the dashboard)

Mar 14 2024, 4:56 PM · Product-Analytics (Kanban)
mpopov updated the task description for T360129: Decommission Wikipedia ChatGPT Plugin searches data.
Mar 14 2024, 4:55 PM · Product-Analytics (Kanban)
mpopov added a comment to T360129: Decommission Wikipedia ChatGPT Plugin searches data.
$ sudo -u analytics-product kerberos-run-command analytics-product hive -e "use wmf_product; drop table wikipedia_chatgpt_plugin_searches;"
OK
Time taken: 1.486 seconds
OK
Time taken: 1.972 seconds
$ sudo -u analytics-product kerberos-run-command analytics-product hdfs dfs -rm -R -skipTrash /user/analytics-product/data/wikipedia_chatgpt_plugin_searches
Deleted /user/analytics-product/data/wikipedia_chatgpt_plugin_searches
Mar 14 2024, 4:54 PM · Product-Analytics (Kanban)
mpopov moved T360129: Decommission Wikipedia ChatGPT Plugin searches data from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Mar 14 2024, 4:44 PM · Product-Analytics (Kanban)
mpopov triaged T360129: Decommission Wikipedia ChatGPT Plugin searches data as Medium priority.
Mar 14 2024, 4:44 PM · Product-Analytics (Kanban)

Mar 12 2024

mpopov added a comment to T252227: Mobile redirects drop provenance parameters.

+1 to Isaac's proposed solution of carrying wprov forward as wprov but also setting rprov=1 in case of a redirect to simplify analysis.

Mar 12 2024, 2:38 PM · Data-Engineering, Data Pipelines, Traffic-Icebox, SRE

Mar 1 2024

mpopov added a comment to T358658: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga.

Thank you @cmooney for taking this non-standard case on and helping KC out! This dual account thing has become a real thorn for KC so I'm glad we're on a path to get it taken care of.

Mar 1 2024, 5:07 PM · SRE, SRE-Access-Requests
mpopov added a comment to T358658: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga.

Approved from my side, both for the request in general and analytics-product-users membership :)

Mar 1 2024, 5:04 PM · SRE, SRE-Access-Requests

Feb 29 2024

mpopov added a comment to T358758: Adding a new contextual attribute to the Metrics Platform JS client library: active_browsing_session_token.

I would advise against using "session" in the name. When Jason and I were writing https://docs.google.com/document/d/100B4c1GqHHCAGnWLbDrgMQIJzxNI8vuf7jEMKG7DKeg/edit?usp=sharing we surveyed the landscape of schemas/instruments and saw that everything was just a session and there being multiple levels of sessions: https://docs.google.com/document/d/11xTwL_j0BWgfdtZ_GOIlxg22rLP2lRraaA2c_-uMwJQ/edit#

Feb 29 2024, 4:08 PM · Data Products (Data Products Sprint 11), MW-1.42-notes (1.42.0-wmf.24; 2024-03-26), Patch-For-Review, Metrics Platform Backlog

Feb 28 2024

mpopov updated the task description for T356765: Correlation between article length, number of translations within a time period, experience of users, and deletion rate..
Feb 28 2024, 1:37 PM · Product-Analytics (Kanban), Language-analytics
mpopov updated subscribers of T252227: Mobile redirects drop provenance parameters.

Okay, so it's been a few years now and this bug still exists and impacts the quality of our analyses substantially (especially for Future Audiences experiments that are aimed at mobile users) and we're not any closer to instrumenting pageviews.

Feb 28 2024, 12:40 PM · Data-Engineering, Data Pipelines, Traffic-Icebox, SRE
mpopov added a comment to T313622: Confirm whether or not the current definition of new_editor_retention is based on global registration and update data glossary.

As with the other task, Irene has a to-do this week to wrap this up by updating https://meta.wikimedia.org/wiki/Research_and_Decision_Science/Data_glossary#Contributor_metrics

Feb 28 2024, 12:32 PM · Product-Analytics (Kanban)
mpopov added a comment to T313549: Confirm whether or not the current definition of new active editors are based on global registration and update data glossary.

Irene has a to-do this week to wrap this up by updating https://meta.wikimedia.org/wiki/Research_and_Decision_Science/Data_glossary#Contributor_metrics

Feb 28 2024, 12:31 PM · Product-Analytics (Kanban)

Feb 23 2024

mpopov added a comment to T343183: Instrument the Wikistories share feature.

@SBisson: root-level dt is the client-side timestamp, while meta.dt is the server-side timestamp. Both are useful to have for analysis.

Feb 23 2024, 5:49 PM · MW-1.43-notes (1.43.0-wmf.5; 2024-05-14), Patch-For-Review, MW-1.42-notes (1.42.0-wmf.23; 2024-03-19), Product-Analytics (Kanban), Inuka-Team (Kanban), Wikistories

Feb 22 2024

mpopov added a comment to T343183: Instrument the Wikistories share feature.

On contribution side:
In some cases the instrument sends event data containing activity_session_id and dt fields which are not present in version 1.2.0 of the contribution schema: https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/jsonschema/analytics/mediawiki/wikistories_contribution_event/1.2.0.yaml so a lot of events aren't passing schema validation.

Feb 22 2024, 8:05 PM · MW-1.43-notes (1.43.0-wmf.5; 2024-05-14), Patch-For-Review, MW-1.42-notes (1.42.0-wmf.23; 2024-03-19), Product-Analytics (Kanban), Inuka-Team (Kanban), Wikistories

Feb 19 2024

mpopov added a comment to T346350: Add revision ID to X-Analytics header.

Oh that's brilliant! Thanks so much for looking into it and shining light on this @phuedx!

Feb 19 2024, 7:19 PM · MW-1.42-notes (1.42.0-wmf.17; 2024-02-06), Moderator-Tools-Team (Kanban), Traffic, good first task, Data Products, Product-Analytics, Automoderator

Feb 16 2024

mpopov closed T357439: [REQUEST] Help me understand how often Administrators disable JS as Invalid.

The decision has been made (to go with Vue/Codex).

Feb 16 2024, 1:20 PM · Product-Analytics

Feb 15 2024

mpopov closed T343184: Gather basic stats about use of the KaiOS app segmented by store as Resolved.

Thank you, Connie!

Feb 15 2024, 6:04 PM · Inuka-Team, Product-Analytics (Kanban), KaiOS-Wikipedia-app

Feb 14 2024

mpopov closed T341744: Draft a debugging 1pager as Resolved.
Feb 14 2024, 7:53 PM · Product-Analytics (Kanban), User-Iflorez
mpopov closed T343246: Document data engineering items for Campaigns Product as Resolved.
Feb 14 2024, 7:53 PM · Product-Analytics (Kanban), Campaign-Tools, User-Iflorez
mpopov closed T342294: Output Jan-June 2023 Campaigns data as Resolved.
Feb 14 2024, 7:53 PM · Product-Analytics (Kanban), Campaign-Registration, User-Iflorez
mpopov closed T342069: productionize Isaac's chatgpt plug-in data cron job to airflow as Declined.
Feb 14 2024, 7:52 PM · Product-Analytics (Kanban), User-Iflorez
mpopov closed T336432: Baseline for new editors, new active editors, and new editor retention in SSA as Declined.

Unsure to what end this request was for.

Feb 14 2024, 7:52 PM · User-Iflorez, Product-Analytics (Kanban)
mpopov closed T343152: [REQUEST] Analyze labeled ChatGPT plugin data to understand if responses based on it are useful and not harmful as Resolved.
Feb 14 2024, 7:50 PM · User-Iflorez, Product-Analytics (Kanban)
mpopov closed T341747: Locate data on GLAM collaboration with Growth Team as Resolved.
Feb 14 2024, 7:48 PM · Campaign-Tools, Product-Analytics (Kanban), User-Iflorez
mpopov closed T347206: Upgrades to Superset ChatGPT plugin dashboard as Invalid.
Feb 14 2024, 7:48 PM · Product-Analytics (Kanban), Future-Audiences, User-Iflorez
mpopov claimed T313549: Confirm whether or not the current definition of new active editors are based on global registration and update data glossary.
Feb 14 2024, 7:45 PM · Product-Analytics (Kanban)
mpopov claimed T313622: Confirm whether or not the current definition of new_editor_retention is based on global registration and update data glossary.
Feb 14 2024, 7:44 PM · Product-Analytics (Kanban)
mpopov closed T318123: Draft a Campaigns Registration Tool Measurement Plan as Resolved.
Feb 14 2024, 7:42 PM · User-Iflorez, Campaign-Tools, Campaign-Registration, Product-Analytics (Kanban)
mpopov closed T320289: Draft Event-Registration/Creation Measurement Plan Specifications Document as Resolved.
Feb 14 2024, 7:40 PM · User-Iflorez, Product-Analytics (Kanban), Campaign-Registration, Campaign-Tools
mpopov closed T329382: Define new campaign editor - for use in production to query for this metric as Resolved.
Feb 14 2024, 7:39 PM · User-Iflorez, Campaign-Registration, Campaign-Tools, Product-Analytics (Kanban)
mpopov closed T334363: [SPIKE] Gather information on editor regional data pull differences as Resolved.
Feb 14 2024, 7:37 PM · User-Iflorez, Product-Analytics (Kanban)
mpopov closed T336603: Document methods for obtaining Grant ID for the campaigns registration tool , a subtask of T321814: EPIC: Add Grant ID support to event registration, as Resolved.
Feb 14 2024, 7:34 PM · WikimediaCampaignEvents, CampaignEvents, Campaign-Tools, Campaign-Registration
mpopov closed T336603: Document methods for obtaining Grant ID for the campaigns registration tool as Resolved.
Feb 14 2024, 7:33 PM · User-Iflorez, Product-Analytics (Kanban), Campaign-Tools (Campaign-Tools-Current-Sprint), Campaign-Registration
mpopov closed T336598: Document how to pull affiliate data for the campaigns product extension as Resolved.
Feb 14 2024, 7:33 PM · User-Iflorez, Product-Analytics (Kanban), Campaign-Tools (Campaign-Tools-Current-Sprint), Campaign-Registration
mpopov closed T340496: Consult on Organizer Lab as Resolved.
Feb 14 2024, 7:32 PM · Campaign-Tools, Product-Analytics (Kanban)
mpopov claimed T313549: Confirm whether or not the current definition of new active editors are based on global registration and update data glossary.
Feb 14 2024, 7:31 PM · Product-Analytics (Kanban)
mpopov closed T343167: [ChatGPT Plugin] Track long-term traffic & search API usage as Resolved.

The ChatGPT plug-in experiment has concluded. We'll just need to do some clean-up (Phab task coming later).

Feb 14 2024, 7:28 PM · Product-Analytics (Kanban)
mpopov closed T317927: Draft a code review checklist as Resolved.

https://docs.google.com/document/d/1rzD_rJEzH3HmyIekmtojfiwyXqPVdEIHBviFA8_HgH0/edit#heading=h.w7kdor9r67qo is as final as it will get. We'll need to adopt it widely on the team (and do more peer reviews in the first place) but right now there's nothing left to do here.

Feb 14 2024, 7:23 PM · User-Iflorez, Product-Analytics (Kanban)

Feb 13 2024

mpopov awarded T357462: Enable notifications for completion of Hive table snapshots a Like token.
Feb 13 2024, 8:23 PM · Movement-Insights, Data-Engineering, Data-Platform
mpopov added a comment to T346350: Add revision ID to X-Analytics header.
SELECT
  normalized_host.project,
  namespace_id IS NULL AS ns_id_is_null,
  element_at(x_analytics_map, 'ns') IS NULL AS x_ns_is_null,
  page_id IS NULL AS page_id_is_null,
  element_at(x_analytics_map, 'page_id') IS NULL AS x_page_id_is_null,
  element_at(x_analytics_map, 'rev_id') IS NULL AS x_rev_id_is_null,
  COUNT(1) AS n_pageviews
FROM wmf.webrequest 
WHERE webrequest_source = 'text'
  AND year = 2024 AND month = 2 AND day = 12 AND hour = 1
  AND is_pageview
  AND uri_host IN('en.wikipedia.org', 'en.m.wikipedia.org', 'commons.wikimedia.org', 'commons.m.wikimedia.org')
GROUP BY 1, 2, 3, 4, 5, 6
ORDER BY project, ns_id_is_null, x_ns_is_null, page_id_is_null, x_page_id_is_null, x_rev_id_is_null
Feb 13 2024, 6:30 PM · MW-1.42-notes (1.42.0-wmf.17; 2024-02-06), Moderator-Tools-Team (Kanban), Traffic, good first task, Data Products, Product-Analytics, Automoderator
mpopov added a comment to T301281: Add Product-Analytics Announcements to Airflow job for notifications.

@Mayakp.wiki: you and others are already in product-analytics-announce@, but the alerts aren't sent to that.

Feb 13 2024, 4:52 PM · Data Pipelines, Data-Engineering, Product-Analytics

Feb 9 2024

mpopov added a comment to T320926: wmf.webrequest: 'presto error: Corrupted statistics for column "[user_agent] optional binary " in Parquet file ...'.
SELECT
  user_agent
from wmf.webrequest
WHERE  webrequest_source = 'text'
    AND year = 2024
    AND month = 2
    AND day = 9
    AND hour = 14
    AND uri_host = 'www.wikidata.org'
    AND is_pageview
    AND namespace_id = 640
    AND agent_type = 'spider'
    AND user_agent = '-'
LIMIT 10
Feb 9 2024, 4:27 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Data-Engineering
mpopov added a comment to T320926: wmf.webrequest: 'presto error: Corrupted statistics for column "[user_agent] optional binary " in Parquet file ...'.

Just tried the query in the description with some recent dates but the dates I picked didn't have any requests with '-' UA strings, so it's hard to know if the problem persists. The query runs fine with user_agent = '-' just returns no data. I just made a request to https://www.wikidata.org/wiki/EntitySchema:E1 (according to https://www.wikidata.org/wiki/Help:Namespaces namespace 640 is EntitySchema so I just picked one):

Feb 9 2024, 2:50 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Data-Engineering

Feb 8 2024

mpopov added a comment to T356917: Requesting access to analytics-privatedata-users for JTanner.

@Jelto: This is to access dashboards in Superset that do access private data, no need for Hadoop/analytics client servers access. So no SSH public key needed. See https://wikitech.wikimedia.org/wiki/Analytics/Data_access

Feb 8 2024, 6:25 PM · SRE, SRE-Access-Requests

Feb 7 2024

mpopov added a comment to T356645: Production data & systems access restoration for Connie Chen.

@cchen: How about Superset & Hue?

Feb 7 2024, 9:28 PM · Data-Platform-SRE, SRE, SRE-Access-Requests
Dzahn awarded T356645: Production data & systems access restoration for Connie Chen a Like token.
Feb 7 2024, 8:07 PM · Data-Platform-SRE, SRE, SRE-Access-Requests
mpopov added a project to T356645: Production data & systems access restoration for Connie Chen: Data-Platform-SRE.

Tagging DPE SRE in case this is specific to those tools.

Feb 7 2024, 7:16 PM · Data-Platform-SRE, SRE, SRE-Access-Requests
mpopov updated subscribers of T356279: Remove production data access for former WMDE staff member goransm.
Feb 7 2024, 6:49 PM · Patch-For-Review, User-ItamarWMDE, SRE, Data-Platform-SRE, SRE-Access-Requests
mpopov updated subscribers of T356279: Remove production data access for former WMDE staff member goransm.

Not yet. I believe @AndrewTavis_WMDE will be sharing some findings from WMDE side soon.

Feb 7 2024, 6:49 PM · Patch-For-Review, User-ItamarWMDE, SRE, Data-Platform-SRE, SRE-Access-Requests
mpopov closed T356214: Add `event.app_donor_experience` fields to event sanitization allowlist as Resolved.

Thanks for merging & deploying, @JAllemandou!

Feb 7 2024, 2:36 PM · Data-Engineering (Sprint 8)

Feb 6 2024

mpopov added a comment to T356645: Production data & systems access restoration for Connie Chen.

Thanks so much @MoritzMuehlenhoff!!

Feb 6 2024, 8:28 PM · Data-Platform-SRE, SRE, SRE-Access-Requests
mpopov added a comment to T284604: Running into errors while adding Hive table to Superset dataset.

@BTullis: heads-up that @cchen won't be able to until T356645: Production data & systems access restoration for Connie Chen is resolved

Feb 6 2024, 4:48 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), superset.wikimedia.org, Data-Engineering
mpopov updated the task description for T356765: Correlation between article length, number of translations within a time period, experience of users, and deletion rate..
Feb 6 2024, 3:27 PM · Product-Analytics (Kanban), Language-analytics
mpopov added a comment to T356765: Correlation between article length, number of translations within a time period, experience of users, and deletion rate..

@Pginer-WMF: Does this question relate to any of the hypothesis work your team is doing? If so, can you please share with hypothesis?

Feb 6 2024, 3:27 PM · Product-Analytics (Kanban), Language-analytics

Feb 5 2024

mpopov updated subscribers of T309013: EditAttemptStep Migration to (monoschema) MP.

@VirginiaPoundstone @WDoranWMF I renamed this task to refer to the (now deprecated) monoschema version of Metrics Platform. The partial migrations of the instruments were removed (see T351337, T351335).

Feb 5 2024, 9:05 PM · Metrics Platform Backlog (Metrics Platform Kanban), MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Editing-team, DiscussionTools