Page MenuHomePhabricator

Ottomata (Andrew Otto)
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Sunday

  • No visible events.

User Details

User Since
Oct 9 2014, 4:50 PM (582 w, 1 d)
Availability
Available
IRC Nick
ottomata
LDAP User
Ottomata
MediaWiki User
Ottomata [ Global Accounts ]

Recent Activity

Today

Ottomata updated subscribers of T410266: Explore how to migrate PyFlink to Java/Scala.

The choice of pyflink was to help solve a problem: to enable teams to build and own their (realtime) derived data pipelines. But, as you say, no one is doing this. So, before we make a decision like this, I’d really like to work with @GGoncalves-WMF on the broader derived data problem from a platform product management perspective. What do our users need and what do we want to provide for them? So, I’d prefer if we moved a bit slow and carefully on this. There are lots of questions about how to do the data transfer between data platform and online storage for serving, as well as for streaming enrichment, etc.

Fri, Dec 5, 4:00 PM · Spike, Data-Engineering (Q2 FY25/26 October 1st - December 31th), Event-Platform
Ottomata added a comment to T410266: Explore how to migrate PyFlink to Java/Scala.

If we do this...java for sure! we built all of our Flink library tooling in Java. I like scala too, and while it makes coding some things easier, it makes integrating with different unexpected things harder.

Fri, Dec 5, 3:58 PM · Spike, Data-Engineering (Q2 FY25/26 October 1st - December 31th), Event-Platform

Tue, Nov 25

Ottomata added a comment to T406069: Global Editor Metrics - Druid mediawiki_history_reduced changes.

For my own (out of the loop) understanding, here are the changes to previously made decisions:

Tue, Nov 25, 6:32 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Ottomata updated the task description for T410056: EventBus tests fail without EventStreamConfig.
Tue, Nov 25, 5:55 PM · Data-Engineering, Event-Platform

Mon, Nov 24

Ottomata added a comment to T409358: Add page_id and namespace to X-Analytics header in Mobile App requests (2025 remake).

If we implement it on PCS level that would count only the requests that are cache miss on edge.

Mon, Nov 24, 4:34 PM · Essential-Work, Reader Growth Team, MobileFrontend (Core PHP), Content-Transform-Team (Work In Progress), Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog, Data-Engineering

Thu, Nov 6

Ottomata updated the task description for T409469: Enable ChangeProp to consume mediawiki.page_content_change.v1.
Thu, Nov 6, 8:28 PM · Data-Engineering, serviceops, Machine-Learning-Team
Ottomata updated subscribers of T409469: Enable ChangeProp to consume mediawiki.page_content_change.v1.
Thu, Nov 6, 8:26 PM · Data-Engineering, serviceops, Machine-Learning-Team
Ottomata updated subscribers of T409469: Enable ChangeProp to consume mediawiki.page_content_change.v1.

I wanted to understand how multi-DC ness relates to all the pieces here. Just writing down what I found:

Thu, Nov 6, 8:24 PM · Data-Engineering, serviceops, Machine-Learning-Team
Ottomata updated the task description for T401260: Global Editor Metrics - Data Persistence Design Review.
Thu, Nov 6, 7:19 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Data-Persistence
Ottomata added a comment to T405041: Global Editor Metrics - HTTP API endpoints.

Update for most recent API endpoints:

Thu, Nov 6, 4:39 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, OKR-Work, MediaWiki-Page-derived-data, Growth-Team, Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog
Ottomata added a comment to T409105: mediawiki.page_change.v1 event stream - Investigate mistmatched meta.dt and dt (and rev_dt) fields.

If these are all caused by imports (are they? we should check for sure), then we should probably model a page_change_kind: import in the mediawiki.page_change.v1 event.

Thu, Nov 6, 4:36 PM · MW-Interfaces-Team, Data-Engineering, Event-Platform
Ottomata created T409464: mediawiki.page_change.v1 event - add a 'new revision created' field.
Thu, Nov 6, 4:35 PM · Data-Engineering, Event-Platform
Ottomata updated the task description for T409462: mediawiki.page_change.v1 event - add a page type field.
Thu, Nov 6, 4:31 PM · Data-Engineering, Event-Platform
Ottomata renamed T409462: mediawiki.page_change.v1 event - add a page type field from mediawiki.page_change.v1 - add a page type field to mediawiki.page_change.v1 event - add a page type field.
Thu, Nov 6, 4:30 PM · Data-Engineering, Event-Platform
Ottomata created T409462: mediawiki.page_change.v1 event - add a page type field.
Thu, Nov 6, 4:30 PM · Data-Engineering, Event-Platform
Ottomata updated the task description for T347282: [Event Platform] eventutilites-python: improve consistency guarantees of async process functions.
Thu, Nov 6, 1:55 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Event-Platform
Ottomata updated the task description for T347282: [Event Platform] eventutilites-python: improve consistency guarantees of async process functions.
Thu, Nov 6, 1:55 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Event-Platform

Nov 5 2025

Ottomata added a comment to T401260: Global Editor Metrics - Data Persistence Design Review.

Decision ^ here: T403660#11347022

Nov 5 2025, 9:52 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Data-Persistence
Ottomata added a comment to T403660: WE3.3.7 Year in Review and Activity Tab Services - Global Editor Metrics.

In meeting today, we decided that "Good enough product" was sufficient for now. If this is not the case, Product will try to let us know as soon as possible.

Nov 5 2025, 9:51 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data, Growth-Team, Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog
Ottomata updated the task description for T409358: Add page_id and namespace to X-Analytics header in Mobile App requests (2025 remake).
Nov 5 2025, 9:25 PM · Essential-Work, Reader Growth Team, MobileFrontend (Core PHP), Content-Transform-Team (Work In Progress), Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog, Data-Engineering
Ottomata updated subscribers of T409358: Add page_id and namespace to X-Analytics header in Mobile App requests (2025 remake).
Nov 5 2025, 9:22 PM · Essential-Work, Reader Growth Team, MobileFrontend (Core PHP), Content-Transform-Team (Work In Progress), Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog, Data-Engineering
Ottomata created T409358: Add page_id and namespace to X-Analytics header in Mobile App requests (2025 remake).
Nov 5 2025, 9:22 PM · Essential-Work, Reader Growth Team, MobileFrontend (Core PHP), Content-Transform-Team (Work In Progress), Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog, Data-Engineering
Ottomata added a comment to T408798: Spike: investigate incorrect page_id values in pageview_hourly.

Also

Nov 5 2025, 8:46 PM · MediaWiki-Platform-Team (Radar), MW-Interfaces-Team, Data-Engineering, MediaWiki-Core-Hooks
Ottomata added a comment to T408798: Spike: investigate incorrect page_id values in pageview_hourly.

Speaking of redirects:

Nov 5 2025, 8:42 PM · MediaWiki-Platform-Team (Radar), MW-Interfaces-Team, Data-Engineering, MediaWiki-Core-Hooks
Ottomata added a comment to T403660: WE3.3.7 Year in Review and Activity Tab Services - Global Editor Metrics.

Here is product question about the "top k pages viewed" metric to discuss in today's sync meeting: T401260#11341613

Nov 5 2025, 3:30 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data, Growth-Team, Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog
Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

Yep, we'll use mediawiki.page_content_change.v1. I think we just need to change the kafka_topic in change-prop, right?

Nov 5 2025, 3:06 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence

Nov 4 2025

Ottomata added a comment to T405039: Global Editor Metrics - Data Pipeline.

I've been testing backfilling pageview_per_editor_per_page. Fab repartitioned Alek's test table at fab.edit_per_editor_per_page_daily and it performs better now. I can backfill a month of pageview data using this table in a little over 5 minutes.
;

Nov 4 2025, 9:31 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Ottomata added a comment to T405039: Global Editor Metrics - Data Pipeline.

For intermediate Data Lake tables, Add HQL for edit_per_editor_per_page_daily and pageview_per_editor_per_page_daily (1196892) should be good to go from a data model and load query perspective.

Nov 4 2025, 8:19 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Ottomata updated the task description for T405039: Global Editor Metrics - Data Pipeline.
Nov 4 2025, 8:17 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Ottomata added a comment to T374656: Iceberg query planning for wmf_dumps.wikitext_raw takes a long time.

^^ did we create a new ticket? :)

Nov 4 2025, 7:59 PM · Dumps 2.0 (Kanban Board)
Ottomata added a comment to T401260: Global Editor Metrics - Data Persistence Design Review.

At T401260#11230961, we decided to not store per editor per page pageviews metrics in cassandra just to support the top K pageviews use case. This wasn't our favorite decision, because it means we have to maintain 2 different cassandra tables and data pipelines, and the top k pageviews metric is no longer an additive timeseries metric. Product teams can't do 'top k in last 30 days', they can only do e.g. 'top k in October'.

Nov 4 2025, 6:33 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Data-Persistence
Ottomata added a comment to T401260: Global Editor Metrics - Data Persistence Design Review.

We've got our first actual daily pageviews per editor per page data lake table record! @amastilovic backfilled and ran the pageviews daily query for 2025-10-25. On that day, we stored 26637692 records.

Nov 4 2025, 6:26 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Data-Persistence
Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

But it's not about pages. It's about paragraphs, which are implicitly part of pages (revisions of pages),

True, but specifically about paragraphs that belong to MediaWiki pages. Paragraphs do not have a corresponding MediaWiki entity concept. Paragraphs do not have a unique id with which they can be referred to alone. They require a page_id (and/or revision_id) to be contextualized.

Nov 4 2025, 4:37 AM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence

Nov 3 2025

Ottomata added a comment to T409105: mediawiki.page_change.v1 event stream - Investigate mistmatched meta.dt and dt (and rev_dt) fields.

Here is a suspicious event from October

Nov 3 2025, 8:30 PM · MW-Interfaces-Team, Data-Engineering, Event-Platform
Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

 If that is not the case then I think we have to also consider page_revision_paragraph_tone_scores, or even wiki_page_revision_paragraph_tone_scores.

Nov 3 2025, 7:53 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Ottomata created T409105: mediawiki.page_change.v1 event stream - Investigate mistmatched meta.dt and dt (and rev_dt) fields.
Nov 3 2025, 7:31 PM · MW-Interfaces-Team, Data-Engineering, Event-Platform
Ottomata renamed T405040: Global Editor Metrics - backfill pageview metric data from Global Editor Metrics - backfill data to Global Editor Metrics - backfill pageview metric data.
Nov 3 2025, 5:17 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Ottomata added a comment to T408850: mediawiki_event_enrichment should enrich all events for the page_content_change stream.

...This conversation makes me think it would be useful to have a property in the event that indicates if the latest revision_id has changed. IIRC MW DomainEvents are actually named and modeled around this concept.

Nov 3 2025, 5:14 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Event-Platform
Ottomata added a comment to T408850: mediawiki_event_enrichment should enrich all events for the page_content_change stream.

Ya move is good.

Nov 3 2025, 5:12 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Event-Platform
Ottomata added a comment to T408939: Fix iceberg table location in hive metastore.

Both data and metadata get deleted when dropping an Iceberg managed table.

Hm, okay just asking for my education. This is different than regular Hive external tables then, yes?

Nov 3 2025, 5:09 PM · Data-Engineering
Ottomata added a comment to T400380: MediaWiki\Revision\RevisionAccessException: Unable to load fresh row for rev_id: {rev_id}.

@daniel, moving the convo from the patch to this ticket.

Nov 3 2025, 5:02 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, MW-Interfaces-Team, Event-Platform, MediaWiki-DomainEvents, Unstewarded-production-error, MediaWiki-Core-Revision-backend, Wikimedia-production-error
Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

Could we go with page_paragraph_tone_scores?

I think it's clear that the data represents paragraphs from MediaWiki pages when we have page_id as part of primary key

Nov 3 2025, 2:49 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Ottomata added a comment to T408850: mediawiki_event_enrichment should enrich all events for the page_content_change stream.

@xcollazo do you want content also on page_change_kind == visibility_change and page_change_kind == delete? (Well uh, we can't do delete, because we can't get content after a page has been deleted.)

Nov 3 2025, 2:45 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Event-Platform
Ottomata added a comment to T408939: Fix iceberg table location in hive metastore.

location would have changed as part of the ALTER TABLE RENAME, and it would have broken the Iceberg table because Iceberg keeps track of fully qualified file names.

Nov 3 2025, 2:39 PM · Data-Engineering
Ottomata added a comment to T408939: Fix iceberg table location in hive metastore.

prevent data-dropping errors

Nov 3 2025, 2:39 PM · Data-Engineering
Ottomata added a comment to T407779: mediawiki_event_enrichment - update default params and tests to use mediawiki/page_change 1.3.0 (latest) schema.

I have a concern about changing the job name as I don't know what can be affected.

Nov 3 2025, 2:26 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Event-Platform

Oct 31 2025

Ottomata added a comment to T408798: Spike: investigate incorrect page_id values in pageview_hourly.

For diffs: could we not just modify the pageview algorithm and add an is_diff or pageview_kind=diff field that indicates if it was a diff pageview? We should know pretty easily by the URI path.

Oct 31 2025, 8:08 PM · MediaWiki-Platform-Team (Radar), MW-Interfaces-Team, Data-Engineering, MediaWiki-Core-Hooks
Ottomata added a comment to T408939: Fix iceberg table location in hive metastore.

I also wonder if there is a real need to always use a specific (external) location for hive managed iceberg tables. We did this with Hive tables in the past especially because not all tables were created via SQL. Some were directories and files in hdfs before a Hive table is layered on them (e.g. wmf_raw.webrequest, etc.)

Oct 31 2025, 7:59 PM · Data-Engineering
Ottomata added a comment to T408939: Fix iceberg table location in hive metastore.

When fixing this, we should use a fully fully qualified URL, not just "hdfs:///..." but "hdfs://analytics-hadoop/..." , specifying the specific Hadoop cluster where the location is. (We do have an analytics-test-hadoop cluster ;) )

Oct 31 2025, 7:55 PM · Data-Engineering
Ottomata updated subscribers of T408939: Fix iceberg table location in hive metastore.
Oct 31 2025, 7:54 PM · Data-Engineering
Ottomata added a comment to T408942: Add code styles rules to analytics-refinery-source.

Hm, I thought we used https://gitlab.wikimedia.org/repos/maven/wmf-jvm-parent-pom#maven-checkstyle-plugin already?

Oct 31 2025, 7:49 PM · Data-Engineering, Essential-Work

Oct 30 2025

Ottomata added a comment to T408798: Spike: investigate incorrect page_id values in pageview_hourly.

Relevant: T371321: [Idea] Collect pageview data using client-side instrumentation

Oct 30 2025, 7:53 PM · MediaWiki-Platform-Team (Radar), MW-Interfaces-Team, Data-Engineering, MediaWiki-Core-Hooks
Ottomata removed a subtask for T405039: Global Editor Metrics - Data Pipeline: T408798: Spike: investigate incorrect page_id values in pageview_hourly.
Oct 30 2025, 7:53 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Ottomata removed a parent task for T408798: Spike: investigate incorrect page_id values in pageview_hourly: T405039: Global Editor Metrics - Data Pipeline.
Oct 30 2025, 7:53 PM · MediaWiki-Platform-Team (Radar), MW-Interfaces-Team, Data-Engineering, MediaWiki-Core-Hooks
Ottomata added a comment to T408798: Spike: investigate incorrect page_id values in pageview_hourly.

Relevant: T371321: [Idea] Collect pageview data using client-side instrumentation

Oct 30 2025, 7:52 PM · MediaWiki-Platform-Team (Radar), MW-Interfaces-Team, Data-Engineering, MediaWiki-Core-Hooks
Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

We won't use a different source unit, so I think including page is unnecessary.

Oct 30 2025, 7:51 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Ottomata added a comment to T408798: Spike: investigate incorrect page_id values in pageview_hourly.

It seems there is more to look into here, but I wrote up the implications for Global Editor Metrics here: at T405039#11329322.

Oct 30 2025, 7:42 PM · MediaWiki-Platform-Team (Radar), MW-Interfaces-Team, Data-Engineering, MediaWiki-Core-Hooks
Ottomata added a comment to T405039: Global Editor Metrics - Data Pipeline.

@mforns and I were debugging our pageviews/per_editor queries yesterday, and we ran into a very unexpected issue with pageviews_hourly. This issue is explored and (will be) documented in T408798: Spike: investigate incorrect page_id values in pageview_hourly.

Oct 30 2025, 7:27 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Ottomata updated subscribers of T408850: mediawiki_event_enrichment should enrich all events for the page_content_change stream.

Wow nice find. Def high priority and probably a great task for @JMonton-WMF to take on!

Oct 30 2025, 6:16 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Event-Platform
Ottomata added a comment to T405041: Global Editor Metrics - HTTP API endpoints.

I just deployed the pageviews/v3/per_editor endpoint. It will not work because there is no data behind it.

Oct 30 2025, 4:49 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, OKR-Work, MediaWiki-Page-derived-data, Growth-Team, Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog
Ottomata added a comment to T408798: Spike: investigate incorrect page_id values in pageview_hourly.

For T405039: Global Editor Metrics - Data Pipeline, we are using pageview_hourly to compute editor impact metrics. We wanted to include the page_title in the output dataset, to make the metrics more useable. Since the same page_id is associated with many page_titles, this won't be possible.

Oct 30 2025, 4:11 PM · MediaWiki-Platform-Team (Radar), MW-Interfaces-Team, Data-Engineering, MediaWiki-Core-Hooks
Ottomata updated the task description for T408798: Spike: investigate incorrect page_id values in pageview_hourly.
Oct 30 2025, 3:40 PM · MediaWiki-Platform-Team (Radar), MW-Interfaces-Team, Data-Engineering, MediaWiki-Core-Hooks
Ottomata added a comment to T408719: Stop adding user-agent details to http.request_header.user-agent directly via EventLogging.

Just wondering what you meant when you say "other usages"

Oct 30 2025, 3:14 PM · Data-Engineering, Test Kitchen, Essential-Work, Technical-Debt, MediaWiki-extensions-EventLogging
Ottomata added a subtask for T405039: Global Editor Metrics - Data Pipeline: T408798: Spike: investigate incorrect page_id values in pageview_hourly.
Oct 30 2025, 3:02 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Ottomata added a parent task for T408798: Spike: investigate incorrect page_id values in pageview_hourly: T405039: Global Editor Metrics - Data Pipeline.
Oct 30 2025, 3:02 PM · MediaWiki-Platform-Team (Radar), MW-Interfaces-Team, Data-Engineering, MediaWiki-Core-Hooks
Ottomata added a comment to T408701: Enable event logging for the mediawiki.product_metrics.suggested_investigations_interaction stream on loginwiki.

there isn't any reason for restricting things.

Oct 30 2025, 3:01 PM · Product Safety and Integrity (Sprint Mint Choc Chip Ice Cream (Oct 20 - Nov 7)), CheckUser-SuggestedInvestigations, Metrics Platform
Ottomata added a comment to T408719: Stop adding user-agent details to http.request_header.user-agent directly via EventLogging.

Moving a Slack convo here to phab.

Oct 30 2025, 1:51 PM · Data-Engineering, Test Kitchen, Essential-Work, Technical-Debt, MediaWiki-extensions-EventLogging

Oct 29 2025

Ottomata moved T408165: Requesting Kerberos access for Jmoore111 from Incoming (new tickets) to Tag with Radar on the Data-Engineering board.
Oct 29 2025, 4:03 PM · SRE, SRE-Access-Requests, Data-Engineering-Radar, Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Data-Engineering
Ottomata moved T408696: Requesting Kerberos access for slyngshede from Incoming (new tickets) to Tag with Radar on the Data-Engineering board.
Oct 29 2025, 4:03 PM · Data-Engineering-Radar, Data-Engineering
Ottomata added a comment to T408701: Enable event logging for the mediawiki.product_metrics.suggested_investigations_interaction stream on loginwiki.

I'm not quite sure who would be responsible for figuring this out for sure. I would guess that it was disabled for loginwiki a long long time ago by the old and nonexistent Services team. I assume there was a reason?

Oct 29 2025, 3:22 PM · Product Safety and Integrity (Sprint Mint Choc Chip Ice Cream (Oct 20 - Nov 7)), CheckUser-SuggestedInvestigations, Metrics Platform
Ottomata updated subscribers of T408701: Enable event logging for the mediawiki.product_metrics.suggested_investigations_interaction stream on loginwiki.
Oct 29 2025, 3:22 PM · Product Safety and Integrity (Sprint Mint Choc Chip Ice Cream (Oct 20 - Nov 7)), CheckUser-SuggestedInvestigations, Metrics Platform
Ottomata renamed T408687: Create example dbt models using Iceberg from Create example models using Iceberg to Create example dbt models using Iceberg.
Oct 29 2025, 3:15 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Essential-Work, Movement-Insights, Data-Engineering (Q2 FY25/26 October 1st - December 31th), Epic
Ottomata added a comment to T408701: Enable event logging for the mediawiki.product_metrics.suggested_investigations_interaction stream on loginwiki.

If there are no privacy concerns with just enabling all TYPE_EVENT for loginwiki, that would be the simplest way to accomplish this. Otherwise, we will have to do something similar for loginwiki as was done for private wikis in T346046: [Search Update Pipeline] Source streams for private wikis.

Oct 29 2025, 3:07 PM · Product Safety and Integrity (Sprint Mint Choc Chip Ice Cream (Oct 20 - Nov 7)), CheckUser-SuggestedInvestigations, Metrics Platform
Ottomata moved T407779: mediawiki_event_enrichment - update default params and tests to use mediawiki/page_change 1.3.0 (latest) schema from Backlog to Stream Processing on the Event-Platform board.
Oct 29 2025, 2:42 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Event-Platform
Ottomata added a comment to T408641: stat1011: cannot create directory ‘/srv/published/datasets/one-off’: Permission denied.
13:26:24 [@stat1011:/home/otto] $ ls -la /srv/published/
total 28
drwxrwxr-x  6 root     wikidev           4096 Oct 31  2024 .
drwxr-xr-x  3 stats    wikidev           4096 Jun 21  2024 datasets
...
Oct 29 2025, 1:29 PM · SRE, Data-Engineering

Oct 28 2025

Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

IIUC, you're okay with not naming this table more specifically about structured tasks?

Oct 28 2025, 8:51 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Ottomata added a comment to T407863: AQS Wikimedia REST API - new API version.

each type of metric is basically its own microservice

Which begs the question: should it be? ;)

Oct 28 2025, 7:30 PM · MW-Interfaces-Team, RESTBase-API, serviceops, Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work
Ottomata added a comment to T403660: WE3.3.7 Year in Review and Activity Tab Services - Global Editor Metrics.

Docs are live! https://doc.wikimedia.org/generated-data-platform/aqs/analytics-api/reference/edits.html#get-number-of-edits-by-a-editor

Oct 28 2025, 7:01 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data, Growth-Team, Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog
Ottomata updated the task description for T406069: Global Editor Metrics - Druid mediawiki_history_reduced changes.
Oct 28 2025, 6:59 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Ottomata added a comment to T406069: Global Editor Metrics - Druid mediawiki_history_reduced changes.

user_central_id is now in Druid mediawiki_history_reduced! Thanks @amastilovic !

Oct 28 2025, 6:59 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Ottomata added a comment to T407863: AQS Wikimedia REST API - new API version.

could you confirm that the version change is going to be applied for all AQS endpoints?

Oct 28 2025, 6:56 PM · MW-Interfaces-Team, RESTBase-API, serviceops, Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work
Ottomata added a comment to T403660: WE3.3.7 Year in Review and Activity Tab Services - Global Editor Metrics.

@Dbrant! Great news! edits/v3/per_editor is live!

Oct 28 2025, 6:17 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data, Growth-Team, Wikipedia-Android-App-Backlog, Wikipedia-iOS-App-Backlog
Ottomata added a comment to T307040: Propagate field descriptions from event schemas to Hive event tables and into DataHub.

@aqu why mergeComments vs e.g mergeFieldsMetadata like in https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/987195/10/refinery-spark/src/main/scala/org/wikimedia/analytics/refinery/spark/sql/HiveExtensions.scala? Since Spark treats comments like a kind of field metadata, shouldn't we make the SparkSqlExtension stuff do the same for all metadata?

Oct 28 2025, 5:46 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Patch-For-Review, Product-Analytics
Ottomata added a comment to T377023: Add CI step to event schema repositories to test to fail if a schema is deleted.

Ya exactly.

Oct 28 2025, 5:33 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th), Event-Platform
Ottomata added a comment to T408538: Create a Revise Tone Task Generator in LiftWing.

Ya, currently codfw is the active datacenter, so only its topic will have real data. Try:

Oct 28 2025, 5:31 PM · Patch-For-Review, Machine-Learning-Team
Ottomata added a comment to T377023: Add CI step to event schema repositories to test to fail if a schema is deleted.

@JMonton-WMF thanks for the patch!

Oct 28 2025, 5:14 PM · Patch-For-Review, Data-Engineering (Q2 FY25/26 October 1st - December 31th), Event-Platform
Ottomata added a comment to T407863: AQS Wikimedia REST API - new API version.

I just deployed edit-analytics with the metrics/v3/edits/per_editor endpoint. It does not work externally at https://wikimedia.org/api/rest_v1/metrics/v3/edits/per_editor.

Oct 28 2025, 4:57 PM · MW-Interfaces-Team, RESTBase-API, serviceops, Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work
Ottomata added a project to T366248: Source the CirrusSearch index dumps from hadoop instead of a MW maintenance script: MediaWiki-Page-derived-data.
Oct 28 2025, 12:04 AM · MediaWiki-Page-derived-data, Discovery-Search (2025.10.20 - 2025.12.31), Data-Platform-SRE, Essential-Work, Patch-For-Review, DPE-Mediawiki-Content, Data-Engineering, CirrusSearch

Oct 27 2025

Ottomata renamed T407559: Global Editor Metrics - Data Pipeline - edit_per_editor_per_page_daily from Global Editor Metrics - Data Pipeline - edit_per_user_per_page_daily to Global Editor Metrics - Data Pipeline - edit_per_editor_per_page_daily.
Oct 27 2025, 8:27 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Ottomata renamed T407559: Global Editor Metrics - Data Pipeline - edit_per_editor_per_page_daily from Global Editor Metrics - Data Pipeline - user_edited_pages to Global Editor Metrics - Data Pipeline - edit_per_user_per_page_daily.
Oct 27 2025, 7:48 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), OKR-Work, MediaWiki-Page-derived-data
Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

...and also back to the 'is it a cache' discussion!

Oct 27 2025, 6:02 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

every change to the model absolutely requires a change to the application code as well

This is probably a good thing. IIUC, model_version rarely changes, but if it does, you probably want to have a managed upgrade path. This also would give you the ability to A/B test serving different model versions. I would expect when this happens that ML could generate and store tasks using both models, until we are sure the new model_version is the one to use for sure.

Oct 27 2025, 5:33 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

Yep, we'll use mediawiki.page_content_change.v1. I think we just need to change the kafka_topic in change-prop, right?

Oct 27 2025, 5:06 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

Yup, but is it otherwise different in any meaningful way?

Technically, maybe not. But in terminology/usage/common understanding maybe! But yes, agree that we should sidetrack this discussion for larger stuff, as is this is fine!

Oct 27 2025, 4:38 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

But also, it kind of is cache isn't?

I'm not sure if it is! At the very least, it is not a read-through cache. But as we discussed in slack, the line is blurry.

Oct 27 2025, 3:44 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

consumes the mediawiki.page_content_change.v1 events (triggered by changeprop)

Oct 27 2025, 3:38 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence
Ottomata added a comment to T400852: Prepare mediawiki-client-error Logstash dashboards for mobile subdomain sunsetting.

FYI, in case this is useful to you all in T304373, eventgate-logging-external events, like mediawiki.client_error are now ingested into Hive in the Data Lake. There is now an event.mediawiki_client_error Hive table

Oct 27 2025, 3:24 PM · MW-1.45-notes (1.45.0-wmf.24; 2025-10-21), Test Kitchen, Data-Engineering, MediaWiki-extensions-WikimediaEvents
Ottomata added a comment to T406765: Create a new gitlab repository for use with dbt.

I don't have a strong preference other than not dbt.

Oct 27 2025, 3:21 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th)
Ottomata moved T400360: Fix Hive event.development_network_probe table from In progress to Done on the Data-Engineering (Q2 FY25/26 October 1st - December 31th) board.
Oct 27 2025, 3:18 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Traffic
Ottomata added a comment to T401021: Data Persistence Design Review: Improve Tone Suggested Edits newcomer task.

I like it! Some field naming suggestions:

Oct 27 2025, 3:14 PM · Data-Engineering (Q2 FY25/26 October 1st - December 31th), Data-Persistence-Design-Review, Revise-Tone-Structured-Task, OKR-Work, Machine-Learning-Team, Growth-Team, Data-Persistence