Page MenuHomePhabricator

Ottomata (Andrew Otto)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 9 2014, 4:50 PM (493 w, 6 d)
Availability
Available
IRC Nick
ottomata
LDAP User
Ottomata
MediaWiki User
Ottomata [ Global Accounts ]

Recent Activity

Today

Ottomata updated subscribers of T361214: Public dashboard process.
Thu, Mar 28, 1:57 PM · Data-Engineering-Dashiki, Data-Engineering, Data Products

Yesterday

Ottomata added a comment to T266813: mw.user.generateRandomSessionId should return a UUID.

I think this is a library that Data Engineering owns?

@VirginiaPoundstone I don't think so. I believe mw.user.generate.generateRandomSessionId is part of MediaWiki core.

Wed, Mar 27, 10:46 PM · Metrics Platform Backlog, Data Products, Data-Engineering, Analytics-Radar, Better Use Of Data, Product-Data-Infrastructure

Tue, Mar 26

Ottomata added a comment to T360642: Remove extra fields currently sent to Kafka.

meta.id

Do you know who set these fields with the current webrequest flow?

Tue, Mar 26, 6:37 PM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Fri, Mar 22

Ottomata added a comment to T360642: Remove extra fields currently sent to Kafka.

meta.id and meta.request_id

Fri, Mar 22, 7:41 PM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Thu, Mar 21

Ottomata added a comment to T348958: Bump memory to enable large artifacts sync on HDFS.
Supports only reading, with read-ahead of a predermined block-size.

In the case that the server does not supply the filesize, only reading of
the complete file in one go is supported.
Thu, Mar 21, 8:12 PM · Structured-Data-Backlog, Data-Engineering

Tue, Mar 19

Ottomata added a comment to T348958: Bump memory to enable large artifacts sync on HDFS.

Hm, actually, as far as I can tell, reading from HTTP (and many other sources) uses https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.spec.AbstractBufferedFile, which has a default read blocksize of 5MB.

Tue, Mar 19, 12:17 AM · Structured-Data-Backlog, Data-Engineering

Mon, Mar 18

Ottomata added a comment to T348958: Bump memory to enable large artifacts sync on HDFS.

Or maybe:

Mon, Mar 18, 11:56 PM · Structured-Data-Backlog, Data-Engineering
Ottomata added a comment to T348958: Bump memory to enable large artifacts sync on HDFS.

Maybe: https://filesystem-spec.readthedocs.io/en/latest/api.html?highlight=clear%20cache#fsspec.utils.read_block ?

Mon, Mar 18, 11:37 PM · Structured-Data-Backlog, Data-Engineering
Ottomata added a comment to T359178: Check statsv and eventlogging VarnishKafka instances.

@Fabfur I'm really really hoping we can remove varnishkafka-eventlogging after we complete T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate, as part of T238230: Decommission EventLogging backend components by migrating to MEP.

Mon, Mar 18, 10:25 PM · Data-Engineering, Observability-Logging, Traffic
Ottomata added a comment to T249745: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable".

This doesn't mean that MediaWiki shoudn't try to improve the situation by handling the failure to submit a job by saving it somewhere (a specific db table?) and we can replay them later. At the current failure rate, this would guarantee the jobs would be executed with an irrelevant cost in terms of resources.

Mon, Mar 18, 10:20 PM · MediaWiki-Engineering, Data-Engineering, Unstewarded-production-error, User-brennen, serviceops, WMF-JobQueue, Wikimedia-production-error
Ottomata added a comment to T249745: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable".

I wonder if JobQueueGroup::lazyPush()/JobQueueEventBus could be rigged to make the provided jobs use "hasty" mode in EventGate?

Mon, Mar 18, 10:15 PM · MediaWiki-Engineering, Data-Engineering, Unstewarded-production-error, User-brennen, serviceops, WMF-JobQueue, Wikimedia-production-error

Fri, Mar 15

Ottomata updated subscribers of T291120: MediaWiki Event Carried State Transfer - Problem Statement.
Fri, Mar 15, 11:53 PM · Data-Engineering, Platform Engineering, Event-Platform, tech-decision-forum
Ottomata updated the task description for T347970: [L] MachineVision: archive and remove all events and event schemas.
Fri, Mar 15, 9:13 PM · Patch-For-Review, Structured-Data-Backlog (Current Work), MachineVision
Ottomata created T360210: Document instructions for deleting an event stream and its usages.
Fri, Mar 15, 4:14 PM · Metrics Platform Backlog, Event-Platform, Data-Engineering

Wed, Mar 6

Ottomata updated the task description for T120242: Consistent MediaWiki state change events | MediaWiki events as source of truth.
Wed, Mar 6, 1:30 PM · Data-Engineering, Analytics, DBA, WMF-Architecture-Team, Platform Team Legacy (Later), Event-Platform, Services (later)

Tue, Mar 5

Ottomata added a comment to T357537: Alerts Review: determine if we can use Prometheus to alert based on historical datasets.

Oh cool! @bking I read the linked notes but I'm missing how its gonna work? How can you alert on dataset $X for partition $N is failing? Is there a way to make partition or hour or datetime or whatever a label?

Tue, Mar 5, 2:12 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24), Data-Engineering

Sun, Mar 3

Ottomata added a comment to T249745: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable".

I still wonder why profile::kafka::mirror::properties doesn't blacklist all MW jobs?* Is anything making use of that extra data?

Sun, Mar 3, 2:17 PM · MediaWiki-Engineering, Data-Engineering, Unstewarded-production-error, User-brennen, serviceops, WMF-JobQueue, Wikimedia-production-error

Thu, Feb 29

Ottomata added a comment to T253058: DRY kafka broker declaration in helmfiles.

+1, or add this as a subtask of that?

Thu, Feb 29, 7:57 PM · Data-Engineering, Data-Platform-SRE, serviceops, SRE, Event-Platform
Ottomata added a comment to T358612: Investigate replacing Archiva with Gitlab repositories.

+1! to this idea!

Thu, Feb 29, 3:14 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Java-Scala-Standardization, Security, collaboration-services, Release-Engineering-Team

Tue, Feb 27

Ottomata added a comment to T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics.

<3

Tue, Feb 27, 9:42 PM · Data-Engineering (Sprint 9), observability, ChangeProp, Event-Platform, service-runner

Feb 27 2024

Ottomata updated subscribers of T309772: npm audit reports several security issues with Service runner.
Feb 27 2024, 1:22 PM · CX-cxserver, Security, service-runner

Feb 21 2024

Ottomata added a comment to T354557: Dataset Config Store.

Worth investigating? https://datacontract.com/

Feb 21 2024, 1:30 AM · Epic, Data-Engineering
Ottomata added a comment to T276088: Configuration Management for Kafka settings.

Just came across https://www.jikkou.io/docs/tutorials/get_started/ . Worth a look!

Feb 21 2024, 1:22 AM · Data-Platform-SRE, Data-Engineering, serviceops-radar, Event-Platform, Analytics-Radar, SRE
Ottomata closed T358073: kafka management as Invalid.
Feb 21 2024, 1:20 AM
Ottomata created T358073: kafka management.
Feb 21 2024, 1:20 AM

Feb 19 2024

Ottomata added a comment to T307959: [Event Platform] Design and Implement realtime enrichment pipeline for MW page change with content.

@lbowmaker @gmodena Should we resolve and close this?

Feb 19 2024, 3:39 PM · Data-Engineering, Event-Platform, Epic

Feb 15 2024

Ottomata added a comment to T356597: Investigate if the new 'Multiblocks' user blocks feature affects the mediawiki.user-blocks-change event stream.

@JWheeler-WMF EventBus extension uses the BlockIpComplete hook. If there are no changes to this hook API, then there are no changes needed for EventBus or the mediawiki.user-blocks-change stream. However, I'd assume that to accomodate Multiblocks feature, the hook will need to be changed to represent the multiple expiration dates of the different blocks.

Feb 15 2024, 6:40 PM · Data Products (Data Products Sprint 09), Multiblocks, Community-Tech, Data-Engineering, Event-Platform

Feb 13 2024

Ottomata added a comment to T356762: [NEEDS GROOMING][SPIKE] Extract refine schema management into a dedicated tool.

Oh and in case you haven't seen it: EvolveHiveTable.

Feb 13 2024, 11:44 PM · Data-Engineering
Ottomata added a comment to T266813: mw.user.generateRandomSessionId should return a UUID.

I think because it was on the Event Platform board, but doesn't have anything really to do with Event Platform. Instead, it has to do with MW generated session IDs, which I believe are used in EventLogging instrumentation schemas.

Feb 13 2024, 2:29 AM · Metrics Platform Backlog, Data Products, Data-Engineering, Analytics-Radar, Better Use Of Data, Product-Data-Infrastructure

Feb 12 2024

Ottomata updated the task description for T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.
Feb 12 2024, 4:08 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General
Ottomata updated subscribers of T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.

PHP execution.
Afaik PHP execution is limited for security reasons to only specific directories. This will thus likely need a puppet change first to Apache config to allow this directory to execute PHP.

Given the transition to Docker/Helm/Kubernetes etc this will also need a corresponding change there, which has its own copy of the Apache config.

Feb 12 2024, 4:07 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General

Feb 11 2024

Ottomata added a comment to T357005: eventstreams regularly uses more than 95% of its memory limit.

If you have time to dive deep, you can live inspect a nodejs process and search for memory leaks.

Feb 11 2024, 9:16 PM · Data-Engineering (Sprint 9), Event-Platform, EventStreams, serviceops, Prod-Kubernetes, Kubernetes

Feb 9 2024

Ottomata added projects to T357005: eventstreams regularly uses more than 95% of its memory limit: Event-Platform, Data-Engineering.

wondering about the stream connection duration

Feb 9 2024, 11:47 PM · Data-Engineering (Sprint 9), Event-Platform, EventStreams, serviceops, Prod-Kubernetes, Kubernetes
Ottomata added a comment to T351837: [SPIKE] Assess impact of Move analytics log from Varnish to HAProxy .

all we'd need would be to switch our webrequest pipelines to start consuming from the proposed new table names discussed in T314956: [Event Platform] Declare webrequest as an Event Platform stream.

Feb 9 2024, 10:45 PM · Data Products (Data Products Sprint 07)
Ottomata added a comment to T351117: Move analytics log from Varnish to HAProxy.

I think that this more precise timestamp would be parseable by our ingestion system just fine, but we should verify. If we can get this precise I suppose...why not? I see that existent varnish dt is only seconds, which doesn't seem very precise, especially for webrequest. Perhaps we should take this opportunity to increase the precision a bit. If we can, we should strive for at least millisecond. Not a blocker for this task though.

Feb 9 2024, 10:42 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Ottomata updated the task description for T314956: [Event Platform] Declare webrequest as an Event Platform stream.
Feb 9 2024, 10:39 PM · Patch-For-Review, Data-Engineering, Event-Platform
Ottomata added projects to T336842: Introduce new logging schema: Data Products, Metrics Platform Backlog.
Feb 9 2024, 6:45 PM · WMDE-FUN-Sprint-2024-02-27, WMDE-FUN-Sprint-2024-02-13, Metrics Platform Backlog, Data Products, WMDE-FUN-Sprint-2024-01-30, WMDE-FUN-Team, WMDE-Fundraising-Tech
Ottomata updated subscribers of T336842: Introduce new logging schema.

respective doc page

Feb 9 2024, 6:45 PM · WMDE-FUN-Sprint-2024-02-27, WMDE-FUN-Sprint-2024-02-13, Metrics Platform Backlog, Data Products, WMDE-FUN-Sprint-2024-01-30, WMDE-FUN-Team, WMDE-Fundraising-Tech

Feb 8 2024

Ottomata added a comment to T356762: [NEEDS GROOMING][SPIKE] Extract refine schema management into a dedicated tool.

Hello! I'm not entirely sure what this ticket is trying to do, but here's some hopefully useful information:

Feb 8 2024, 3:33 PM · Data-Engineering

Feb 5 2024

Ottomata updated the task description for T314956: [Event Platform] Declare webrequest as an Event Platform stream.
Feb 5 2024, 2:38 PM · Patch-For-Review, Data-Engineering, Event-Platform

Feb 4 2024

Ottomata created T356597: Investigate if the new 'Multiblocks' user blocks feature affects the mediawiki.user-blocks-change event stream.
Feb 4 2024, 2:26 PM · Data Products (Data Products Sprint 09), Multiblocks, Community-Tech, Data-Engineering, Event-Platform

Jan 30 2024

Ottomata edited projects for T266813: mw.user.generateRandomSessionId should return a UUID, added: Data Products, Metrics Platform Backlog; removed Event-Platform.
Jan 30 2024, 11:52 PM · Metrics Platform Backlog, Data Products, Data-Engineering, Analytics-Radar, Better Use Of Data, Product-Data-Infrastructure
Ottomata added a comment to T352783: Change data platform-related IRC channels to improve communication.

Copypasting comment from Alerts Review doc:

Jan 30 2024, 11:47 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), observability

Jan 18 2024

Ottomata updated the task description for T341229: ProduceCanaryEvents job should be scheduled by Airflow and/or a k8s service.
Jan 18 2024, 6:25 PM · Data-Engineering (Sprint 9), Event-Platform
Ottomata renamed T341229: ProduceCanaryEvents job should be scheduled by Airflow and/or a k8s service from ProduceCanaryEvents job should be scheduled by Airflow to ProduceCanaryEvents job should be scheduled by Airflow and/or a k8s service.
Jan 18 2024, 6:25 PM · Data-Engineering (Sprint 9), Event-Platform
Ottomata added a project to T341229: ProduceCanaryEvents job should be scheduled by Airflow and/or a k8s service: Event-Platform.
Jan 18 2024, 6:13 PM · Data-Engineering (Sprint 9), Event-Platform

Jan 10 2024

Ottomata updated subscribers of T347421: [NEEDS GROOMING] schema services should be moved to k8s.
Jan 10 2024, 10:49 PM · Data-Platform-SRE, Event-Platform, Data-Engineering
Ottomata added a comment to T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.

Oh, and actually, we only need to count requests to mediawiki.org/beacon/event, so:

Jan 10 2024, 2:42 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General

Jan 8 2024

Ottomata added a comment to T349289: Upgrade eventlogging VM to bullseye (or bookworm).

Decommissioning probably won't get done until after I'm back from leave in late April. Can we wait that long?

Jan 8 2024, 3:17 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Data-Engineering, Event-Platform

Jan 5 2024

Ottomata added a comment to T259163: Migrate legacy metawiki schemas to Event Platform.

Okay great! Thank you.

Jan 5 2024, 2:47 PM · Data-Engineering, Better Use Of Data, Product-Analytics, MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), Product-Data-Infrastructure, Event-Platform
Ottomata added a comment to T212482: RFC: Evolve hook system to support "filters" and "actions" only.

Also, from the convo in December's tech leadership CoP meeting, I started thinking about how what we want for T291120: MediaWiki Event Carried State Transfer - Problem Statement is pretty similar to what is in MW's logging table, except we need the data to be structured, comprehensive and consistent (meaning no missing state changes). In T120242: Consistent MediaWiki state change events | MediaWiki events as source of truth, one of the solutions outlined is the 'Transactional Outbox' pattern, which is kinda similar to a comprehensive+structured logging table from which we can generate and externalize state change events. I betcha we could tie these ideas together somehow.

Jan 5 2024, 2:29 PM · Patch-For-Review, Platform Engineering Roadmap Decision Making, MediaWiki-Core-Hooks, Platform Team Initiatives (New Hook System), TechCom-RFC, TechCom
Ottomata added a comment to T259163: Migrate legacy metawiki schemas to Event Platform.

@SNowick_WMF, are latest versions of apps still sending the various MobileApp* events? I see a few events coming in, but maybe those are just from old versions?

Jan 5 2024, 1:42 AM · Data-Engineering, Better Use Of Data, Product-Analytics, MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), Product-Data-Infrastructure, Event-Platform
Ottomata added a comment to T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.

peak request rate was ~1900 requests/s.

Oh, that turnilo chart is per hour (I think), and is also sampled 1/128. 1900/s Seemed like a lot! So more like Peak of 900*128/60/60 == 32 requests/s. (I think you misread the chart, the peak I see shows '900', not 1900).

Jan 5 2024, 1:28 AM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General
Ottomata added a comment to T346463: Identify and label prefetch proxy data in our traffic.

IIRC, the decision was to wait until the new year, so as not to risk a mistake while people were out on holidays.

Jan 5 2024, 1:14 AM · Traffic, Movement-Insights, Data-Engineering
Ottomata added a comment to T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.

Volume
peak request rate was ~1900 requests/s.

Jan 5 2024, 1:12 AM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General
Ottomata added a comment to T307040: Propagate field descriptions from event schemas to Hive event tables.

we should decide sometime soon

Jan 5 2024, 1:01 AM · Patch-For-Review, Product-Analytics, Data-Engineering

Jan 3 2024

Ottomata added a comment to T353680: Android Metrics Platform Migration Data Validation - first pass - first 4 tables.

Which reads to me as EventGate needing a logic update for how it formats a multi-status response

Jan 3 2024, 1:46 PM · Data Products (Data Products Sprint 05), Product-Analytics (Kanban), Patch-For-Review, Wikipedia-Android-App-Backlog (Android Release - FY2023-24)

Jan 2 2024

Ottomata added a comment to T307040: Propagate field descriptions from event schemas to Hive event tables.

Wow it...kinda...works~

Jan 2 2024, 10:01 PM · Patch-For-Review, Product-Analytics, Data-Engineering
Ottomata added a comment to T353715: Enable kafka log compaction for page_rerender on jumbo.

+1 k!

Jan 2 2024, 9:22 PM · Data-Platform-SRE (2024.01.01 - 2024.01.21), serviceops, SRE, CirrusSearch, Discovery-Search
Ottomata added a comment to T307040: Propagate field descriptions from event schemas to Hive event tables.

I think this would automatically just work if we could create/alter the tables through Spark directly, rather than through Hive.

Jan 2 2024, 9:20 PM · Patch-For-Review, Product-Analytics, Data-Engineering
Ottomata added a comment to T209453: Refine: Use Spark SQL instead of Hive JDBC .

I made some progress modifying Spark to make it support adding nested column. I'll stop here and wait for feedback from upstream before I clean it up and try a little harder.

Jan 2 2024, 7:16 PM · Data Pipelines, Data-Engineering
Ottomata added a comment to T353715: Enable kafka log compaction for page_rerender on jumbo.

Are you sure you want delete in the policy then? Perhaps you want to keep all the latest event per page forever, so you can backfill fully from the topic?

Jan 2 2024, 3:40 PM · Data-Platform-SRE (2024.01.01 - 2024.01.21), serviceops, SRE, CirrusSearch, Discovery-Search
Ottomata added a comment to T353454: [Event Platform] Review analytics switch approach VarnishKafka -> HAProxy.

I think the review is done. Migration is being tracked in T351117: Move analytics log from Varnish to HAProxy. Can we close this?

Jan 2 2024, 2:14 PM · Data-Engineering (Sprint 6)
Ottomata moved T353454: [Event Platform] Review analytics switch approach VarnishKafka -> HAProxy from Ready to Deploy to Done on the Data-Engineering (Sprint 6) board.
Jan 2 2024, 2:14 PM · Data-Engineering (Sprint 6)
Ottomata added a comment to T353715: Enable kafka log compaction for page_rerender on jumbo.

Interesting! Curious, so the reason for using compaction here is just to save space, not necessarily to keep the latest record per key forever?

Jan 2 2024, 1:54 PM · Data-Platform-SRE (2024.01.01 - 2024.01.21), serviceops, SRE, CirrusSearch, Discovery-Search

Dec 30 2023

Ottomata updated the task description for T323828: Update Pingback to use the Event Platform.
Dec 30 2023, 4:41 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, MediaWiki-General

Dec 27 2023

Ottomata added a comment to T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.

I might be able to set the ip field to the client IP, usually parsed and provided by varnish in the X-Client-IP header, but I think we don't need it for MediaWikiPingback, and shouldn't collect it if we don't. I'll not support it.

Dec 27 2023, 7:00 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General
Ottomata added a comment to T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.

suggested implementation to be use medaiwiki-config/docroot/mediawiki.org

Dec 27 2023, 6:47 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General

Dec 21 2023

Ottomata added a comment to T314956: [Event Platform] Declare webrequest as an Event Platform stream.

We just had a discussion in DE standup about T335306: [SPIKE] Evaluation on iceberg sensor for airflow. I'm sure there are many existent Hive sensors on the webrequest table. I'd rather not block on that task for this migration. I suggest we keep this has a regular Hive table.

Dec 21 2023, 6:09 PM · Patch-For-Review, Data-Engineering, Event-Platform
Ottomata added a comment to T314956: [Event Platform] Declare webrequest as an Event Platform stream.

Couple questions back at you: is webrequest append only?

yes

If not, how do we do rewrites today?

If we do, they are per hour. We re-refine the entire hour.

Dec 21 2023, 6:06 PM · Patch-For-Review, Data-Engineering, Event-Platform
Ottomata updated the task description for T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.
Dec 21 2023, 5:26 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General
Ottomata added a comment to T351117: Move analytics log from Varnish to HAProxy.

Do you have an estimate of the duration for which we'd be dual-writing?

Dec 21 2023, 4:40 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Ottomata added a comment to T314956: [Event Platform] Declare webrequest as an Event Platform stream.

Hm, alternatively, we could just have the raw and refined tables be brand newly named tables and ingestion jobs during the migration, and then do the final cutover with a RENAME TABLE.

Dec 21 2023, 4:33 PM · Patch-For-Review, Data-Engineering, Event-Platform
Ottomata updated the task description for T314956: [Event Platform] Declare webrequest as an Event Platform stream.
Dec 21 2023, 4:32 PM · Patch-For-Review, Data-Engineering, Event-Platform
Ottomata added a comment to T314956: [Event Platform] Declare webrequest as an Event Platform stream.

As for Hive tables. I'm trying to decide how best to do the migration. Perhaps, it would be easiest to keep the existent wmf.webrequest refined Hive table as is. The raw table would change to webrequest_frontend as imported from the new streams, but the webrequest refine airflow job would switch to refining from webrequest_frontend raw table once we are ready to do the migration cutover.

Dec 21 2023, 4:30 PM · Patch-For-Review, Data-Engineering, Event-Platform
Ottomata updated the task description for T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.
Dec 21 2023, 4:01 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General
Ottomata updated the task description for T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.
Dec 21 2023, 4:01 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General
Ottomata added a comment to T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.

After a discussion in Slack, I have changed the suggested implementation to be use medaiwiki-config/docroot/mediawiki.org. This would make the solution only work from mediawiki.org/beacon/event , but would avoid any need for custom routing or custom deployment. MediaWikiPingback sends events to mediawiki.org, so this would suffice to unblock the eventlogging backend decom.

Dec 21 2023, 4:00 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General
Ottomata updated the task description for T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.
Dec 21 2023, 3:59 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General
Ottomata updated the task description for T314956: [Event Platform] Declare webrequest as an Event Platform stream.
Dec 21 2023, 3:52 PM · Patch-For-Review, Data-Engineering, Event-Platform
Ottomata updated the task description for T314956: [Event Platform] Declare webrequest as an Event Platform stream.
Dec 21 2023, 3:45 PM · Patch-For-Review, Data-Engineering, Event-Platform
Ottomata updated subscribers of T314956: [Event Platform] Declare webrequest as an Event Platform stream.
Dec 21 2023, 3:44 PM · Patch-For-Review, Data-Engineering, Event-Platform
Ottomata updated subscribers of T314956: [Event Platform] Declare webrequest as an Event Platform stream.

@Antoine_Quhen asked if we should consider making the new webrequest Hive table an Iceberg table. @JAllemandou @xcollazo can/should we do this?

Dec 21 2023, 3:41 PM · Patch-For-Review, Data-Engineering, Event-Platform
Ottomata updated subscribers of T314956: [Event Platform] Declare webrequest as an Event Platform stream.

How should we layout and name the new stream(s)?

Dec 21 2023, 3:41 PM · Patch-For-Review, Data-Engineering, Event-Platform
Ottomata added a comment to T338796: Rewrite all Airflow sensors that use datacenter prepartitions to depend on both datacenters.

@xcollazo do we need this anymore now that we've enabled canary events for all MW state event streams? You should be able to depend on both datacenter partitions being marked as ready, even if there are no real events in one of the DCs.

Dec 21 2023, 2:21 PM · Data-Engineering (Sprint 9), Data Products (Data Products Sprint 05), serviceops-radar

Dec 20 2023

Ottomata added a comment to T351117: Move analytics log from Varnish to HAProxy.

To do this migration plan ^, we'd need Kafka jumbo to support 2x webrequest volume while we migrate. Let's check with Data Platform SREs, @brouberol @BTullis ? Whatcha think?

Dec 20 2023, 6:42 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
Ottomata updated the task description for T323828: Update Pingback to use the Event Platform.
Dec 20 2023, 5:30 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, MediaWiki-General
Ottomata updated the task description for T323828: Update Pingback to use the Event Platform.
Dec 20 2023, 5:24 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, MediaWiki-General
Ottomata updated the task description for T323828: Update Pingback to use the Event Platform.
Dec 20 2023, 5:22 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, MediaWiki-General
Ottomata updated the task description for T323828: Update Pingback to use the Event Platform.
Dec 20 2023, 5:19 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, MediaWiki-General
Ottomata updated the task description for T323828: Update Pingback to use the Event Platform.
Dec 20 2023, 5:08 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, MediaWiki-General
Ottomata added a comment to T323828: Update Pingback to use the Event Platform.

T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate

Dec 20 2023, 3:41 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, MediaWiki-General
Ottomata created T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate.
Dec 20 2023, 3:40 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, Data-Engineering, Event-Platform, MediaWiki-General
Ottomata added a comment to T323828: Update Pingback to use the Event Platform.

Alright, I spoke with @CCicalese_WMF today. The pingback data is very useful for making decisions like when we can deprecate versions of PHP, etc. It is impossible to force people to upgrade old installed versions of MediaWiki. If we decommission the legacy eventlogging backend, old installs will stop sending valuable data.

Dec 20 2023, 3:31 PM · MediaWiki-Platform-Team (Radar), Patch-For-Review, MediaWiki-General

Dec 19 2023

Ottomata updated the task description for T331399: Create new mediawiki.page_links_change stream based on fragment/mediawiki/state/change/page.
Dec 19 2023, 6:28 PM · Data-Engineering, Event-Platform, Machine-Learning-Team
Ottomata closed T335982: Upgrade eventutiltilies-flink Java lib to Flink 1.17 as Resolved.
Dec 19 2023, 5:38 PM · Data-Engineering
Ottomata closed T335982: Upgrade eventutiltilies-flink Java lib to Flink 1.17, a subtask of T335408: Upgrade Flink Image to 1.17, as Resolved.
Dec 19 2023, 5:38 PM · Event-Platform (Sprint 12), Data-Engineering
Ottomata closed T335408: Upgrade Flink Image to 1.17 as Resolved.
Dec 19 2023, 5:37 PM · Event-Platform (Sprint 12), Data-Engineering
Ottomata updated the task description for T331894: Improve how we address outside k8s infrastructure from within charts (e.g. network policies).
Dec 19 2023, 5:17 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Patch-For-Review, Prod-Kubernetes, Kubernetes, serviceops
Ottomata updated subscribers of T314956: [Event Platform] Declare webrequest as an Event Platform stream.
Dec 19 2023, 4:27 PM · Patch-For-Review, Data-Engineering, Event-Platform