Page MenuHomePhabricator

mpopov (Mikhail Popov)
Manager, Data Science

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Jul 27 2015, 4:15 PM (456 w, 17 h)
Availability
Available
IRC Nick
bearloga
LDAP User
Bearloga
MediaWiki User
MPopov (WMF) [ Global Accounts ]

Using statistical analysis, Bayesian inference, machine learning, and software/data engineering to solve problems and inform decisions in Product Analytics

Recent Activity

Fri, Apr 19

mpopov updated subscribers of T362849: [Analytics] Segments of Wikidata's data over time.

@AndrewTavis_WMDE asked me for some thoughts/suggestions here :)

Fri, Apr 19, 7:16 PM · Wikidata Analytics (Kanban), Wikidata

Thu, Apr 18

mpopov triaged T362874: Gather baseline data for time-to-data-collection as High priority.
Thu, Apr 18, 12:15 PM · Product-Analytics (Kanban)
mpopov edited projects for T362874: Gather baseline data for time-to-data-collection, added: Product-Analytics (Kanban); removed Product-Analytics.
Thu, Apr 18, 12:14 PM · Product-Analytics (Kanban)
mpopov created T362874: Gather baseline data for time-to-data-collection.
Thu, Apr 18, 12:14 PM · Product-Analytics (Kanban)

Tue, Apr 16

mpopov added a comment to T361684: Create per-wiki user preference metrics.

a simple way to accomplish this

I think there's a simple way to accomplish this but I don't think the end result would be particularly useful. I believe that for the end result to be useful "I want to know how many users have this feature enabled" this will need careful planning and consideration to account for the complexity of user preferences.

Tue, Apr 16, 2:59 PM · Product-Analytics, Data Products
mpopov added a comment to T361684: Create per-wiki user preference metrics.

@VirginiaPoundstone: Howdy! The underlying dataset will probably be the hardest part of this because of the challenges of how user preferences are stored and used. And then yeah, a Superset dashboard would be the simplest way to make that data available to the end users. It wouldn't be through Turnilo because the metrics aren't additive across dimensions, so it would need to be Superset.

Tue, Apr 16, 2:56 PM · Product-Analytics, Data Products

Mon, Apr 15

mpopov added a comment to T252227: Mobile redirects drop provenance parameters.

Okay, if I understand correctly, then the idea would be to...

  1. Continue "allowing" tagging of wprov for non-200 HTTP responses. It's mainly important people don't accidentally count those as pageviews when they're not pageviews (i.e., they should be using is_pageview or something similarly precise). It's useful to be able to quickly zoom in on these sorts of requests anyway, so even for a 30x response it is nice to have.
  1. If there's a 30x response for a redirect from desktop to mobile web and the URL came bearing a wprov, add that same wprov parameter name-value pair and also add the parameter name-value pair of rprov=1 in the target redirect URL (that's the thing that will be emitted in the Location: header).

Do I understand correctly?

Mon, Apr 15, 8:41 PM · Data-Engineering, Data Pipelines, Traffic-Icebox, SRE
mpopov added a comment to T355182: Past edits increase in wmf.edit_hourly with every new snapshot.

@VirginiaPoundstone Howdy! This is not a blocking anything. Thanks for checking!

Mon, Apr 15, 3:02 PM · Data Products (Data Products Sprint 13), Data-Engineering
mpopov added a comment to T362535: Outdated referral traffic info in FAQ.

Prompted by confusion in Slack

Mon, Apr 15, 2:15 PM · Tool-Pageviews
mpopov created T362535: Outdated referral traffic info in FAQ.
Mon, Apr 15, 2:14 PM · Tool-Pageviews

Wed, Apr 3

mpopov triaged T361086: Fetch information about 2 millionth CX/SX edit/article as Low priority.

@Pginer-WMF: Please keep in mind that this curiosity needs to be prioritized in the context of other work requested of KC for Language. For him, the highest priority tasks are:

So he might not get around to this for a while.

Wed, Apr 3, 1:47 PM · Product-Analytics (Kanban), Language-analytics

Mon, Apr 1

mpopov added a comment to T361475: Add mwclientpreferences cookie to Wikimedia Cookie Statement.

@KSarabia-WMF: You will probably need to submit a request through L3SC for this.

Mon, Apr 1, 2:40 PM · Web-Team-Backlog (FY2023-24 Q4 Sprint 2)

Tue, Mar 26

mpopov closed T361024: NEW BUG REPORT SSL certificate verification error when using internal API endpoints from conda-analytics and Jupyter on stat host as Resolved.
import os
Tue, Mar 26, 8:57 PM · Data-Platform-SRE, Data-Platform
mpopov added a comment to T360829: WE 1.2: Establish baseline for constructive activation.

@nettrom_WMF will be developing it as an essential metric in the next FY for SDS 2.2 and will likely use that definition.

Tue, Mar 26, 3:43 PM · Editing-team (Tracking), Product-Analytics
mpopov updated the task description for T361024: NEW BUG REPORT SSL certificate verification error when using internal API endpoints from conda-analytics and Jupyter on stat host.
Tue, Mar 26, 3:14 PM · Data-Platform-SRE, Data-Platform
mpopov triaged T361024: NEW BUG REPORT SSL certificate verification error when using internal API endpoints from conda-analytics and Jupyter on stat host as Low priority.
Tue, Mar 26, 3:13 PM · Data-Platform-SRE, Data-Platform

Mon, Mar 25

mpopov moved T360829: WE 1.2: Establish baseline for constructive activation from Triage to Upcoming Quarter on the Product-Analytics board.
Mon, Mar 25, 5:28 PM · Editing-team (Tracking), Product-Analytics
mpopov assigned T360829: WE 1.2: Establish baseline for constructive activation to MNeisler.

Assigning to Megan who will work on this in Q4 to get baselines & reasonable target in before start of next FY.

Mon, Mar 25, 5:27 PM · Editing-team (Tracking), Product-Analytics

Mar 22 2024

mpopov added a comment to T360829: WE 1.2: Establish baseline for constructive activation.

From https://www.mediawiki.org/wiki/Growth/Personalized_first_day/Newcomer_tasks/Experiment_analysis,_November_2020#Detailed_findings

Mar 22 2024, 11:19 PM · Editing-team (Tracking), Product-Analytics
mpopov added a comment to T346350: Add revision ID to X-Analytics header.

Thank you so much for looking into it, @phuedx!!!

Mar 22 2024, 11:09 PM · MW-1.42-notes (1.42.0-wmf.17; 2024-02-06), Moderator-Tools-Team (Kanban), Traffic, good first task, Data Products, Product-Analytics, Automoderator
mpopov added a comment to T359993: Slowdown when querying via Hive.

@jwang: Hive has been / is being deprecated. I recommend you use Spark SQL instead (which also works with Iceberg tables, which hive CLI and wmfdata.hive do not).

Mar 22 2024, 5:06 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Data-Platform

Mar 19 2024

mpopov added a comment to T359182: Instrument permalink timestamps.

After examining:

Mar 19 2024, 6:02 PM · Product-Analytics (Kanban), MW-1.42-notes (1.42.0-wmf.22; 2024-03-12), Patch-For-Review, Editing-team (Kanban Board), DiscussionTools
mpopov updated subscribers of T359182: Instrument permalink timestamps.

I think it depends on the technical specifics of the migration, which @KSarabia-WMF would be able to verify. Essentially my main concern was potential presence of new permalink-copied events coming from original instrument that might not be present in the new MP-based instrument, which would impact the data QA. If the new instrument automatically produces these new permalink events, then all is good and there's no concern.

Mar 19 2024, 12:56 PM · Product-Analytics (Kanban), MW-1.42-notes (1.42.0-wmf.22; 2024-03-12), Patch-For-Review, Editing-team (Kanban Board), DiscussionTools

Mar 17 2024

mpopov awarded T347970: [L] MachineVision: archive and remove all events and event schemas a Like token.
Mar 17 2024, 1:06 AM · Patch-For-Review, Structured-Data-Backlog (Current Work), MachineVision

Mar 15 2024

mpopov added a comment to T359182: Instrument permalink timestamps.

Hey folks, is this being coordinated with the Web team who are currently in the middle of migrating *UIActionTracking to the Metrics Platform? T344274: Adopt Web Team Instrumentation to Metrics Platform

Mar 15 2024, 1:41 PM · Product-Analytics (Kanban), MW-1.42-notes (1.42.0-wmf.22; 2024-03-12), Patch-For-Review, Editing-team (Kanban Board), DiscussionTools

Mar 14 2024

mpopov added a comment to T359440: [REQUEST] Product Analytics data instrumentation: CTR, Cirrus Search data and Superset dashboarding thereof.

Queries done: https://gitlab.wikimedia.org/repos/product-analytics/data-pipelines/-/tree/main/citation_needed/searches

Mar 14 2024, 7:41 PM · Product-Analytics (Kanban)
mpopov reassigned T360129: Decommission Wikipedia ChatGPT Plugin searches data from mpopov to Iflorez.

Assigning to @Iflorez to perform the last step (as owner of the dashboard)

Mar 14 2024, 4:56 PM · Product-Analytics (Kanban)
mpopov updated the task description for T360129: Decommission Wikipedia ChatGPT Plugin searches data.
Mar 14 2024, 4:55 PM · Product-Analytics (Kanban)
mpopov added a comment to T360129: Decommission Wikipedia ChatGPT Plugin searches data.
$ sudo -u analytics-product kerberos-run-command analytics-product hive -e "use wmf_product; drop table wikipedia_chatgpt_plugin_searches;"
OK
Time taken: 1.486 seconds
OK
Time taken: 1.972 seconds
$ sudo -u analytics-product kerberos-run-command analytics-product hdfs dfs -rm -R -skipTrash /user/analytics-product/data/wikipedia_chatgpt_plugin_searches
Deleted /user/analytics-product/data/wikipedia_chatgpt_plugin_searches
Mar 14 2024, 4:54 PM · Product-Analytics (Kanban)
mpopov moved T360129: Decommission Wikipedia ChatGPT Plugin searches data from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Mar 14 2024, 4:44 PM · Product-Analytics (Kanban)
mpopov triaged T360129: Decommission Wikipedia ChatGPT Plugin searches data as Medium priority.
Mar 14 2024, 4:44 PM · Product-Analytics (Kanban)

Mar 12 2024

mpopov added a comment to T252227: Mobile redirects drop provenance parameters.

+1 to Isaac's proposed solution of carrying wprov forward as wprov but also setting rprov=1 in case of a redirect to simplify analysis.

Mar 12 2024, 2:38 PM · Data-Engineering, Data Pipelines, Traffic-Icebox, SRE

Mar 1 2024

mpopov added a comment to T358658: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga.

Thank you @cmooney for taking this non-standard case on and helping KC out! This dual account thing has become a real thorn for KC so I'm glad we're on a path to get it taken care of.

Mar 1 2024, 5:07 PM · SRE, SRE-Access-Requests
mpopov added a comment to T358658: Requesting access to wmf-nda, analytics-private-data, analytics-product for kcvelaga.

Approved from my side, both for the request in general and analytics-product-users membership :)

Mar 1 2024, 5:04 PM · SRE, SRE-Access-Requests

Feb 29 2024

mpopov added a comment to T358758: Adding a new contextual attribute to the Metrics Platform JS client library: active_browsing_session_token.

I would advise against using "session" in the name. When Jason and I were writing https://docs.google.com/document/d/100B4c1GqHHCAGnWLbDrgMQIJzxNI8vuf7jEMKG7DKeg/edit?usp=sharing we surveyed the landscape of schemas/instruments and saw that everything was just a session and there being multiple levels of sessions: https://docs.google.com/document/d/11xTwL_j0BWgfdtZ_GOIlxg22rLP2lRraaA2c_-uMwJQ/edit#

Feb 29 2024, 4:08 PM · Data Products (Data Products Sprint 11), MW-1.42-notes (1.42.0-wmf.24; 2024-03-26), Patch-For-Review, Metrics Platform Backlog

Feb 28 2024

mpopov updated the task description for T356765: Correlation between article length, number of translations within a time period, experience of users, and deletion rate..
Feb 28 2024, 1:37 PM · Product-Analytics (Kanban), Language-analytics
mpopov updated subscribers of T252227: Mobile redirects drop provenance parameters.

Okay, so it's been a few years now and this bug still exists and impacts the quality of our analyses substantially (especially for Future Audiences experiments that are aimed at mobile users) and we're not any closer to instrumenting pageviews.

Feb 28 2024, 12:40 PM · Data-Engineering, Data Pipelines, Traffic-Icebox, SRE
mpopov added a comment to T313622: Confirm whether or not the current definition of new_editor_retention is based on global registration and update data glossary.

As with the other task, Irene has a to-do this week to wrap this up by updating https://meta.wikimedia.org/wiki/Research_and_Decision_Science/Data_glossary#Contributor_metrics

Feb 28 2024, 12:32 PM · Product-Analytics (Kanban)
mpopov added a comment to T313549: Confirm whether or not the current definition of new active editors are based on global registration and update data glossary.

Irene has a to-do this week to wrap this up by updating https://meta.wikimedia.org/wiki/Research_and_Decision_Science/Data_glossary#Contributor_metrics

Feb 28 2024, 12:31 PM · Product-Analytics (Kanban)

Feb 23 2024

mpopov added a comment to T343183: Instrument the Wikistories share feature.

@SBisson: root-level dt is the client-side timestamp, while meta.dt is the server-side timestamp. Both are useful to have for analysis.

Feb 23 2024, 5:49 PM · MW-1.42-notes (1.42.0-wmf.23; 2024-03-19), Product-Analytics (Kanban), Inuka-Team (Kanban), Wikistories

Feb 22 2024

mpopov added a comment to T343183: Instrument the Wikistories share feature.

On contribution side:
In some cases the instrument sends event data containing activity_session_id and dt fields which are not present in version 1.2.0 of the contribution schema: https://gerrit.wikimedia.org/r/plugins/gitiles/schemas/event/secondary/+/refs/heads/master/jsonschema/analytics/mediawiki/wikistories_contribution_event/1.2.0.yaml so a lot of events aren't passing schema validation.

Feb 22 2024, 8:05 PM · MW-1.42-notes (1.42.0-wmf.23; 2024-03-19), Product-Analytics (Kanban), Inuka-Team (Kanban), Wikistories

Feb 19 2024

mpopov added a comment to T346350: Add revision ID to X-Analytics header.

Oh that's brilliant! Thanks so much for looking into it and shining light on this @phuedx!

Feb 19 2024, 7:19 PM · MW-1.42-notes (1.42.0-wmf.17; 2024-02-06), Moderator-Tools-Team (Kanban), Traffic, good first task, Data Products, Product-Analytics, Automoderator

Feb 16 2024

mpopov closed T357439: [REQUEST] Help me understand how often Administrators disable JS as Invalid.

The decision has been made (to go with Vue/Codex).

Feb 16 2024, 1:20 PM · Product-Analytics

Feb 15 2024

mpopov closed T343184: Gather basic stats about use of the KaiOS app segmented by store as Resolved.

Thank you, Connie!

Feb 15 2024, 6:04 PM · Inuka-Team, Product-Analytics (Kanban), KaiOS-Wikipedia-app

Feb 14 2024

mpopov closed T341744: Draft a debugging 1pager as Resolved.
Feb 14 2024, 7:53 PM · Product-Analytics (Kanban), User-Iflorez
mpopov closed T343246: Document data engineering items for Campaigns Product as Resolved.
Feb 14 2024, 7:53 PM · Product-Analytics (Kanban), Campaign-Tools, User-Iflorez
mpopov closed T342294: Output Jan-June 2023 Campaigns data as Resolved.
Feb 14 2024, 7:53 PM · Product-Analytics (Kanban), Campaign-Registration, User-Iflorez
mpopov closed T342069: productionize Isaac's chatgpt plug-in data cron job to airflow as Declined.
Feb 14 2024, 7:52 PM · Product-Analytics (Kanban), User-Iflorez
mpopov closed T336432: Baseline for new editors, new active editors, and new editor retention in SSA as Declined.

Unsure to what end this request was for.

Feb 14 2024, 7:52 PM · User-Iflorez, Product-Analytics (Kanban)
mpopov closed T343152: [REQUEST] Analyze labeled ChatGPT plugin data to understand if responses based on it are useful and not harmful as Resolved.
Feb 14 2024, 7:50 PM · User-Iflorez, Product-Analytics (Kanban)
mpopov closed T341747: Locate data on GLAM collaboration with Growth Team as Resolved.
Feb 14 2024, 7:48 PM · Campaign-Tools, Product-Analytics (Kanban), User-Iflorez
mpopov closed T347206: Upgrades to Superset ChatGPT plugin dashboard as Invalid.
Feb 14 2024, 7:48 PM · Product-Analytics (Kanban), Future-Audiences, User-Iflorez
mpopov claimed T313549: Confirm whether or not the current definition of new active editors are based on global registration and update data glossary.
Feb 14 2024, 7:45 PM · Product-Analytics (Kanban)
mpopov claimed T313622: Confirm whether or not the current definition of new_editor_retention is based on global registration and update data glossary.
Feb 14 2024, 7:44 PM · Product-Analytics (Kanban)
mpopov closed T318123: Draft a Campaigns Registration Tool Measurement Plan as Resolved.
Feb 14 2024, 7:42 PM · User-Iflorez, Campaign-Tools, Campaign-Registration, Product-Analytics (Kanban)
mpopov closed T320289: Draft Event-Registration/Creation Measurement Plan Specifications Document as Resolved.
Feb 14 2024, 7:40 PM · User-Iflorez, Product-Analytics (Kanban), Campaign-Registration, Campaign-Tools
mpopov closed T329382: Define new campaign editor - for use in production to query for this metric as Resolved.
Feb 14 2024, 7:39 PM · User-Iflorez, Campaign-Registration, Campaign-Tools, Product-Analytics (Kanban)
mpopov closed T334363: [SPIKE] Gather information on editor regional data pull differences as Resolved.
Feb 14 2024, 7:37 PM · User-Iflorez, Product-Analytics (Kanban)
mpopov closed T336603: Document methods for obtaining Grant ID for the campaigns registration tool , a subtask of T321814: EPIC: Add Grant ID support to event registration, as Resolved.
Feb 14 2024, 7:34 PM · WikimediaCampaignEvents, CampaignEvents, Campaign-Tools, Campaign-Registration
mpopov closed T336603: Document methods for obtaining Grant ID for the campaigns registration tool as Resolved.
Feb 14 2024, 7:33 PM · User-Iflorez, Product-Analytics (Kanban), Campaign-Tools (Campaign-Tools-Current-Sprint), Campaign-Registration
mpopov closed T336598: Document how to pull affiliate data for the campaigns product extension as Resolved.
Feb 14 2024, 7:33 PM · User-Iflorez, Product-Analytics (Kanban), Campaign-Tools (Campaign-Tools-Current-Sprint), Campaign-Registration
mpopov closed T340496: Consult on Organizer Lab as Resolved.
Feb 14 2024, 7:32 PM · Campaign-Tools, Product-Analytics (Kanban)
mpopov claimed T313549: Confirm whether or not the current definition of new active editors are based on global registration and update data glossary.
Feb 14 2024, 7:31 PM · Product-Analytics (Kanban)
mpopov closed T343167: [ChatGPT Plugin] Track long-term traffic & search API usage as Resolved.

The ChatGPT plug-in experiment has concluded. We'll just need to do some clean-up (Phab task coming later).

Feb 14 2024, 7:28 PM · Product-Analytics (Kanban)
mpopov closed T317927: Draft a code review checklist as Resolved.

https://docs.google.com/document/d/1rzD_rJEzH3HmyIekmtojfiwyXqPVdEIHBviFA8_HgH0/edit#heading=h.w7kdor9r67qo is as final as it will get. We'll need to adopt it widely on the team (and do more peer reviews in the first place) but right now there's nothing left to do here.

Feb 14 2024, 7:23 PM · User-Iflorez, Product-Analytics (Kanban)

Feb 13 2024

mpopov awarded T357462: Enable notifications for completion of Hive table snapshots a Like token.
Feb 13 2024, 8:23 PM · Movement-Insights, Data-Engineering, Data-Platform
mpopov added a comment to T346350: Add revision ID to X-Analytics header.
SELECT
  normalized_host.project,
  namespace_id IS NULL AS ns_id_is_null,
  element_at(x_analytics_map, 'ns') IS NULL AS x_ns_is_null,
  page_id IS NULL AS page_id_is_null,
  element_at(x_analytics_map, 'page_id') IS NULL AS x_page_id_is_null,
  element_at(x_analytics_map, 'rev_id') IS NULL AS x_rev_id_is_null,
  COUNT(1) AS n_pageviews
FROM wmf.webrequest 
WHERE webrequest_source = 'text'
  AND year = 2024 AND month = 2 AND day = 12 AND hour = 1
  AND is_pageview
  AND uri_host IN('en.wikipedia.org', 'en.m.wikipedia.org', 'commons.wikimedia.org', 'commons.m.wikimedia.org')
GROUP BY 1, 2, 3, 4, 5, 6
ORDER BY project, ns_id_is_null, x_ns_is_null, page_id_is_null, x_page_id_is_null, x_rev_id_is_null
Feb 13 2024, 6:30 PM · MW-1.42-notes (1.42.0-wmf.17; 2024-02-06), Moderator-Tools-Team (Kanban), Traffic, good first task, Data Products, Product-Analytics, Automoderator
mpopov added a comment to T301281: Add Product-Analytics Announcements to Airflow job for notifications.

@Mayakp.wiki: you and others are already in product-analytics-announce@, but the alerts aren't sent to that.

Feb 13 2024, 4:52 PM · Data Pipelines, Data-Engineering, Product-Analytics

Feb 9 2024

mpopov added a comment to T320926: wmf.webrequest: 'presto error: Corrupted statistics for column "[user_agent] optional binary " in Parquet file ...'.
SELECT
  user_agent
from wmf.webrequest
WHERE  webrequest_source = 'text'
    AND year = 2024
    AND month = 2
    AND day = 9
    AND hour = 14
    AND uri_host = 'www.wikidata.org'
    AND is_pageview
    AND namespace_id = 640
    AND agent_type = 'spider'
    AND user_agent = '-'
LIMIT 10
Feb 9 2024, 4:27 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Data-Engineering
mpopov added a comment to T320926: wmf.webrequest: 'presto error: Corrupted statistics for column "[user_agent] optional binary " in Parquet file ...'.

Just tried the query in the description with some recent dates but the dates I picked didn't have any requests with '-' UA strings, so it's hard to know if the problem persists. The query runs fine with user_agent = '-' just returns no data. I just made a request to https://www.wikidata.org/wiki/EntitySchema:E1 (according to https://www.wikidata.org/wiki/Help:Namespaces namespace 640 is EntitySchema so I just picked one):

Feb 9 2024, 2:50 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Data-Engineering

Feb 8 2024

mpopov added a comment to T356917: Requesting access to analytics-privatedata-users for JTanner.

@Jelto: This is to access dashboards in Superset that do access private data, no need for Hadoop/analytics client servers access. So no SSH public key needed. See https://wikitech.wikimedia.org/wiki/Analytics/Data_access

Feb 8 2024, 6:25 PM · SRE, SRE-Access-Requests

Feb 7 2024

mpopov added a comment to T356645: Production data & systems access restoration for Connie Chen.

@cchen: How about Superset & Hue?

Feb 7 2024, 9:28 PM · Data-Platform-SRE, SRE, SRE-Access-Requests
Dzahn awarded T356645: Production data & systems access restoration for Connie Chen a Like token.
Feb 7 2024, 8:07 PM · Data-Platform-SRE, SRE, SRE-Access-Requests
mpopov added a project to T356645: Production data & systems access restoration for Connie Chen: Data-Platform-SRE.

Tagging DPE SRE in case this is specific to those tools.

Feb 7 2024, 7:16 PM · Data-Platform-SRE, SRE, SRE-Access-Requests
mpopov updated subscribers of T356279: Remove production data access for former WMDE staff member goransm.
Feb 7 2024, 6:49 PM · Patch-For-Review, User-ItamarWMDE, SRE, Data-Platform-SRE, SRE-Access-Requests
mpopov updated subscribers of T356279: Remove production data access for former WMDE staff member goransm.

Not yet. I believe @AndrewTavis_WMDE will be sharing some findings from WMDE side soon.

Feb 7 2024, 6:49 PM · Patch-For-Review, User-ItamarWMDE, SRE, Data-Platform-SRE, SRE-Access-Requests
mpopov closed T356214: Add `event.app_donor_experience` fields to event sanitization allowlist as Resolved.

Thanks for merging & deploying, @JAllemandou!

Feb 7 2024, 2:36 PM · Data-Engineering (Sprint 8)

Feb 6 2024

mpopov added a comment to T356645: Production data & systems access restoration for Connie Chen.

Thanks so much @MoritzMuehlenhoff!!

Feb 6 2024, 8:28 PM · Data-Platform-SRE, SRE, SRE-Access-Requests
mpopov added a comment to T284604: Running into errors while adding Hive table to Superset dataset.

@BTullis: heads-up that @cchen won't be able to until T356645: Production data & systems access restoration for Connie Chen is resolved

Feb 6 2024, 4:48 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), superset.wikimedia.org, Data-Engineering
mpopov updated the task description for T356765: Correlation between article length, number of translations within a time period, experience of users, and deletion rate..
Feb 6 2024, 3:27 PM · Product-Analytics (Kanban), Language-analytics
mpopov added a comment to T356765: Correlation between article length, number of translations within a time period, experience of users, and deletion rate..

@Pginer-WMF: Does this question relate to any of the hypothesis work your team is doing? If so, can you please share with hypothesis?

Feb 6 2024, 3:27 PM · Product-Analytics (Kanban), Language-analytics

Feb 5 2024

mpopov updated subscribers of T309013: EditAttemptStep Migration to (monoschema) MP.

@VirginiaPoundstone @WDoranWMF I renamed this task to refer to the (now deprecated) monoschema version of Metrics Platform. The partial migrations of the instruments were removed (see T351337, T351335).

Feb 5 2024, 9:05 PM · Metrics Platform Backlog (Metrics Platform Kanban), MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Editing-team, DiscussionTools
mpopov renamed T309013: EditAttemptStep Migration to (monoschema) MP from EditAttemptStep Migration to MP to EditAttemptStep Migration to (monoschema) MP.
Feb 5 2024, 9:01 PM · Metrics Platform Backlog (Metrics Platform Kanban), MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Editing-team, DiscussionTools
mpopov closed T320281: Instrumentation Data-QA for event.mediawiki_edit_attempt as Invalid.

Closing this as the partial migrations have been decommissioned (T351335, T351337) following the learnings from T340702 and re-architecture of the Metrics Platform.

Feb 5 2024, 8:59 PM · MW-1.40-notes (1.40.0-wmf.27; 2023-03-13), Product-Analytics (Kanban)
mpopov closed T320281: Instrumentation Data-QA for event.mediawiki_edit_attempt, a subtask of T309013: EditAttemptStep Migration to (monoschema) MP, as Invalid.
Feb 5 2024, 8:59 PM · Metrics Platform Backlog (Metrics Platform Kanban), MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Editing-team, DiscussionTools
mpopov created T356645: Production data & systems access restoration for Connie Chen.
Feb 5 2024, 3:01 PM · Data-Platform-SRE, SRE, SRE-Access-Requests

Feb 1 2024

mpopov added a comment to T356214: Add `event.app_donor_experience` fields to event sanitization allowlist.

Shay met with me for consultation on this.

Feb 1 2024, 9:52 PM · Data-Engineering (Sprint 8)
mpopov added a comment to T356279: Remove production data access for former WMDE staff member goransm.

Before closing this task / withdrawing the request I'd like to get a confirmation from Goran whether the level of access is still needed and if it's just for the WMDE pipelines, in which case it would make sense to prioritize migrating those to WMDE's Airflow instance so that we can eventually revoke the unrestricted access to highly sensitive information.

Feb 1 2024, 4:10 PM · Patch-For-Review, User-ItamarWMDE, SRE, Data-Platform-SRE, SRE-Access-Requests

Jan 31 2024

mpopov updated subscribers of T356279: Remove production data access for former WMDE staff member goransm.

I'm also sorry for making the request without knowing about the prior request.

Jan 31 2024, 8:05 PM · Patch-For-Review, User-ItamarWMDE, SRE, Data-Platform-SRE, SRE-Access-Requests
mpopov updated subscribers of T356279: Remove production data access for former WMDE staff member goransm.

We should add expiry_date and expiry_contact fields to reflect the NDA

Jan 31 2024, 6:58 PM · Patch-For-Review, User-ItamarWMDE, SRE, Data-Platform-SRE, SRE-Access-Requests
mpopov updated subscribers of T356279: Remove production data access for former WMDE staff member goransm.

Thank you @AndrewTavis_WMDE for alerting us of this. Pinging @Manuel for visibility.

Jan 31 2024, 4:01 PM · Patch-For-Review, User-ItamarWMDE, SRE, Data-Platform-SRE, SRE-Access-Requests
mpopov created T356279: Remove production data access for former WMDE staff member goransm.
Jan 31 2024, 4:01 PM · Patch-For-Review, User-ItamarWMDE, SRE, Data-Platform-SRE, SRE-Access-Requests

Jan 29 2024

mpopov added a comment to T353666: [REQUEST] Get pageview data for MediaWiki core JS docs.

This leads me to conclude that use of the MediaWiki core JS HTML docs is minimal, while interest in the docs may be high.

Jan 29 2024, 4:04 PM · Tech-Docs-Team, MediaWiki-Documentation

Jan 26 2024

mpopov added a comment to T294654: Support querying a range of hourly data partitions.

@nettrom_WMF Thank you for sharing that code! I recently used it in T353666 and it was very helpful! Just wanted to show my appreciation.

Jan 26 2024, 4:01 PM · Data-Engineering, Product-Analytics, Wmfdata-Python

Jan 24 2024

mpopov added a comment to T354513: Superset dashboard for The Wikipedia Library eligibility notifications.

@KCVelaga_WMF Sam and I just chatted about this and enwiki might actually be good enough as a proxy for the service in general, and that's unlikely for the service to break on one particular wiki and not others.

Jan 24 2024, 3:45 PM · Product-Analytics (Kanban), Moderator-Tools-Team

Jan 18 2024

mpopov renamed T355182: Past edits increase in wmf.edit_hourly with every new snapshot from Minor data quality issue in wmf.edit_hourly to Past edits increase in wmf.edit_hourly with every new snapshot.
Jan 18 2024, 2:53 PM · Data Products (Data Products Sprint 13), Data-Engineering
mpopov updated the task description for T355182: Past edits increase in wmf.edit_hourly with every new snapshot.
Jan 18 2024, 2:48 PM · Data Products (Data Products Sprint 13), Data-Engineering
mpopov updated the task description for T355182: Past edits increase in wmf.edit_hourly with every new snapshot.
Jan 18 2024, 2:47 PM · Data Products (Data Products Sprint 13), Data-Engineering
mpopov added a comment to T355182: Past edits increase in wmf.edit_hourly with every new snapshot.

@WDoranWMF: Sorry about that! Yes, I will use https://phabricator.wikimedia.org/maniphest/task/edit/form/121/ going forward and update this ticket to use that template. I missed that link and since I was reporting a bug with the data I was referring to

If no task exists please fill in our bug form.

which is currently restricted ("You do not have permission to view this object.") so I went with this format.

Jan 18 2024, 2:45 PM · Data Products (Data Products Sprint 13), Data-Engineering
mpopov updated the task description for T355182: Past edits increase in wmf.edit_hourly with every new snapshot.
Jan 18 2024, 2:40 PM · Data Products (Data Products Sprint 13), Data-Engineering