Page MenuHomePhabricator

Ottomata (Andrew Otto)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 9 2014, 4:50 PM (424 w, 5 d)
Availability
Available
IRC Nick
ottomata
LDAP User
Ottomata
MediaWiki User
Ottomata [ Global Accounts ]

Recent Activity

Today

Ottomata moved T323914: Deploy Mediawiki Stream Enrichment on an-launcher1002. from In Review to Done on the Event-Platform Value Stream (Sprint 05) board.
Wed, Nov 30, 4:17 PM · Data-Engineering, Event-Platform Value Stream (Sprint 05)
Ottomata added a comment to T323914: Deploy Mediawiki Stream Enrichment on an-launcher1002..

Nice!

Wed, Nov 30, 4:17 PM · Data-Engineering, Event-Platform Value Stream (Sprint 05)
Ottomata added a comment to T324101: Request for access to analytics-platform-eng-admins for mlitn.

Reason for access: need query search usage via jupyter for Structured Data pipelines

I'm not sure if analytics-platform-eng-admins is the correct group for this. I think you want analtyics-privatedata-users with kerberos access.

Wed, Nov 30, 3:54 PM · Patch-For-Review, SRE, SRE-Access-Requests
Ottomata renamed T324108: [SPIKE] Use Flink for batch backfilling from [SPIKE] Use Flink to develop bounded service to [SPIKE] Use Flink for batch backfilling.
Wed, Nov 30, 2:59 PM · Event-Platform Value Stream, Data-Engineering
Ottomata added a comment to T324114: Flink + Event Platform integration for writing into streams via Table API.

BTW, there may be other better ways to do this than a custom serialization format. Please comment / update description with findings.

Wed, Nov 30, 2:51 PM · Data-Engineering, Event-Platform Value Stream
Ottomata updated the task description for T324114: Flink + Event Platform integration for writing into streams via Table API.
Wed, Nov 30, 2:50 PM · Data-Engineering, Event-Platform Value Stream
Ottomata created T324114: Flink + Event Platform integration for writing into streams via Table API.
Wed, Nov 30, 2:48 PM · Data-Engineering, Event-Platform Value Stream
Ottomata updated subscribers of T316519: Create a shared flink docker image.
Wed, Nov 30, 2:34 PM · Event-Platform Value Stream (Sprint 05), Patch-For-Review, Data-Engineering-Planning
Ottomata updated subscribers of T316519: Create a shared flink docker image.

Writing down some ideas and thoughts from todays talk with @gmodena:

Wed, Nov 30, 2:33 PM · Event-Platform Value Stream (Sprint 05), Patch-For-Review, Data-Engineering-Planning
Ottomata added a comment to T324074: eventstreams cannot be deployed and its deployments will need to be destroyed and recreated.

Thank you both!

Wed, Nov 30, 12:48 PM · Event-Platform Value Stream, Data-Engineering-Planning, SRE, Kubernetes

Yesterday

Ottomata added a comment to T320367: Check home/HDFS leftovers of bmansurov.

It might be easier to either leave these files in place and revisit this in February, or to have them moved under your ownership. When we archive, we zip everything up and put it in HDFS. We can of course get it back, but maybe it is easier to be able to browse them as usual?

Tue, Nov 29, 6:27 PM · Data-Engineering-Planning
Ottomata closed T321542: Check home/HDFS leftovers of bscarone as Declined.

+1, sounds good. Let's decline this task then. We can reopen or make a new one if/when bscarone leaves again :)

Tue, Nov 29, 6:20 PM · Data-Engineering-Planning
Ottomata added a comment to T319266: Check home/HDFS leftovers of jmads.

@Dendelele, please approve the removal of the following files:

Tue, Nov 29, 6:17 PM · Data-Engineering-Planning
Ottomata closed T319268: Check home/HDFS leftovers of nikafor as Resolved.

There are no leftover data files owned by nikafor in the analytics cluster. nikafor's hdfs and regular homedirs have already been removed.

Tue, Nov 29, 6:16 PM · Data-Engineering-Planning
Ottomata added a comment to T320367: Check home/HDFS leftovers of bmansurov.

@Miriam, please approve for removal of the following files and Hive tables.

Tue, Nov 29, 6:14 PM · Data-Engineering-Planning
Ottomata added a comment to T321542: Check home/HDFS leftovers of bscarone.

Alright! Leaving this task open for now then.

Tue, Nov 29, 6:11 PM · Data-Engineering-Planning
Ottomata added a comment to T322107: Check home/HDFS leftovers of faidon.

@mark, please approve for removal of the following files:

Tue, Nov 29, 6:09 PM · Data-Engineering-Planning
Ottomata closed T322182: Check home/HDFS leftovers of ejoseph as Resolved.

There are no leftover data files owned by ejoseph in the analytics cluster.

Tue, Nov 29, 6:08 PM · Data-Engineering-Planning
Ottomata closed T315841: Check home/HDFS leftovers of dpifke as Resolved.

There are no leftover data files owned by dpifke in the analytics cluster.

Tue, Nov 29, 6:07 PM · Data-Engineering-Planning
Ottomata updated subscribers of T316072: Check home/HDFS leftovers of eyener.

Hi @jrobell1, the following files are leftover in eyener's home directories on the stat boxes. Do you approve their removal? We can archive things that need to be kept, but we'd prefer to remove.

Tue, Nov 29, 6:04 PM · Data-Engineering-Planning
Ottomata closed T313316: requesting Kerberos password for mikeraish (MRaishWMF) as Resolved.
Tue, Nov 29, 5:28 PM · Data-Engineering-Planning

Mon, Nov 28

Ottomata added a comment to T318846: EventGate should support producing keyed messages for Kafka partitioning.

I also merged stream config changes to configure message_key_fields for the rc0.mediawiki.page_change stream, and in beta, tested that keys were produced to consistent topic partitions.

Mon, Nov 28, 8:34 PM · Event-Platform Value Stream (Sprint 05)
Ottomata moved T318846: EventGate should support producing keyed messages for Kafka partitioning from In Review to Done on the Event-Platform Value Stream (Sprint 05) board.
Mon, Nov 28, 8:33 PM · Event-Platform Value Stream (Sprint 05)
Ottomata added a comment to T323828: Update Pingback to use the Event Platform.

Hm, yes, but I guess I mean at least this hardcoded producer code in MW core wouldn't have a hardcoded external dependency?

Mon, Nov 28, 7:57 PM · MediaWiki-General
Ottomata added a comment to T323828: Update Pingback to use the Event Platform.

needs to happen during installation, we can't rely on extensions

We also can't rely on them, as perhaps we want stats on a MediaWiki with no extensions installed?

Mon, Nov 28, 6:45 PM · MediaWiki-General
Ottomata added a comment to T321088: Add support for jupyterlab on conda-analytics.

can you please install the latest conda deb package on an-test-client1001

@xcollazo, done.

Mon, Nov 28, 4:10 PM · Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Analytics-Jupyter, Product-Analytics
Ottomata added a comment to T323911: Grant Access to wmf for abartov.

Approve!

Mon, Nov 28, 3:46 PM · SRE, LDAP-Access-Requests
Ottomata added a comment to T311129: [Shared Event Platform] Produce new mediawiki.page-change stream from MediaWiki EventBus.

Nice, @pfischer, please keep your eye on T308017: Design Schema for page state and page state with content (enriched) streams, there are some structural changes we may make to the schema (flattening?) in the next RC.

Mon, Nov 28, 3:45 PM · Event-Platform Value Stream (Sprint 04), Patch-For-Review
Ottomata added a comment to T323828: Update Pingback to use the Event Platform.

Oof, I didn't know there was a hardcoded use of EventLogging inside of MediaWiki core. This seems pretty fragile. This migration makes sense, but are we sure we want to continue doing this in the long term?

Mon, Nov 28, 3:39 PM · MediaWiki-General
Ottomata added a comment to T319056: Requesting access to analytics-privatedata-users for Wenjun Fan.

Wenjun's access is ssh-less access to analytics-privatedata-users group, right? If so, to remove their public key from the task description

Correct.

Mon, Nov 28, 3:35 PM · SRE, SRE-Access-Requests
Ottomata added a comment to T322145: Requesting access to analytics-privatedata-users & Kerberos identity for Hghani.

Hi, this sounds like an issue with your ssh config and your ssh key. If your key is configured correctly, ssh should not prompt you for a password:

Mon, Nov 28, 3:34 PM · SRE, SRE-Access-Requests
Ottomata added a comment to T322022: Flink SQL queries should access Kafka topics from a Catalog.

Very cool! Code? :)

Mon, Nov 28, 3:27 PM · Event-Platform Value Stream (Sprint 05), Data-Engineering-Planning
Ottomata added a comment to T318397: Optimization of conda-analytics deb package.

I'm fine either way. I think I prefer two packages if we want to keep the worker installed size smaller, if we don't care, then let's just remove the debconf variable.

Mon, Nov 28, 2:53 PM · Data-Engineering-Planning, Data Pipelines, Data-Engineering-Kanban
Ottomata added a comment to T308017: Design Schema for page state and page state with content (enriched) streams.

build increasingly complex code to not fall out of sync with Mediawiki (akin to the heroic scale of what Joseph put together for mediawiki-history)

Mon, Nov 28, 2:51 PM · Event-Platform Value Stream, Data-Engineering, Patch-For-Review
Ottomata added a comment to T323556: Requesting membership of the analytics group in gerrit for 'smunene@wikimedia.org'.

Approved.

Mon, Nov 28, 2:47 PM · Gerrit-Privilege-Requests
Ottomata removed a project from T318193: Clean up wikimetrics: Event-Platform Value Stream.
Mon, Nov 28, 2:09 PM · Data-Engineering-Planning, Projects-Cleanup
Ottomata removed a project from T259804: Rename geoeditors_blacklist_country: Event-Platform Value Stream.
Mon, Nov 28, 2:08 PM · Data-Engineering-Planning, Analytics-Clusters, Voice & Tone
Ottomata edited projects for T317182: Move archiva to private IPs + CDN, added: Data-Engineering; removed Event-Platform Value Stream, Data-Engineering-Planning.

@EChetty I don't think this task belongs in Event Platform. Removing tag.

Mon, Nov 28, 2:07 PM · Data-Engineering-Planning, Shared-Data-Infrastructure
Ottomata added a comment to T307679: EventStreams doesn't show the Wikistories-* streams.

In stream-beta they should show up automatically. I do see https://stream-beta.wmflabs.org/v2/ui/#/?streams=mediawiki.wikistories_contribution_event. Can we close this task?

Mon, Nov 28, 2:05 PM · Event-Platform Value Stream, Data-Engineering-Planning, EventStreams
Ottomata moved T316519: Create a shared flink docker image from Backlog to Sprint 05 on the Event-Platform Value Stream board.
Mon, Nov 28, 2:05 PM · Event-Platform Value Stream (Sprint 05), Patch-For-Review, Data-Engineering-Planning
Ottomata moved T316519: Create a shared flink docker image from Next Up to In Progress on the Event-Platform Value Stream (Sprint 05) board.
Mon, Nov 28, 2:04 PM · Event-Platform Value Stream (Sprint 05), Patch-For-Review, Data-Engineering-Planning

Mon, Nov 21

Ottomata added a comment to T306939: Q4:(Need By: TBD) rack/setup/install kafka-jumbo101[0-5].

@Papaul is it possible the ssds and hdds are reversed, as they were in https://phabricator.wikimedia.org/T314160#8166665 ?

Mon, Nov 21, 6:44 PM · SRE, ops-eqiad, DC-Ops
Ottomata added a comment to T308017: Design Schema for page state and page state with content (enriched) streams.

Thanks @Tgr! At this point it is easy enough to remove, and we can always add it back in later if/when we need it. I'd prefer to solve this problem by making the event model simpler for now anyway.

Mon, Nov 21, 5:31 PM · Event-Platform Value Stream, Data-Engineering, Patch-For-Review
Ottomata added a comment to T320812: [SPIKE] Deploy event driven stateless Flink service to DSE cluster.

o/ I am working on Flink and flink operator images now:

Mon, Nov 21, 5:30 PM · Event-Platform Value Stream, Shared-Data-Infrastructure, Data-Engineering-Planning
Ottomata added a comment to T323280: Grant ssh access to analytics-admins to dcausse and gmodena.

Done, I removed irrelevant parts, if that is okay.

Mon, Nov 21, 5:16 PM · SRE, SRE-Access-Requests, Data-Engineering
Ottomata updated the task description for T323280: Grant ssh access to analytics-admins to dcausse and gmodena.
Mon, Nov 21, 5:16 PM · SRE, SRE-Access-Requests, Data-Engineering
Ottomata added a comment to T323280: Grant ssh access to analytics-admins to dcausse and gmodena.

@jcrespo I can make this change once the other approvals have been given.

Mon, Nov 21, 2:32 PM · SRE, SRE-Access-Requests, Data-Engineering

Thu, Nov 17

Ottomata added a comment to T318846: EventGate should support producing keyed messages for Kafka partitioning.

Although I don't love putting 'kafka' in the name here, who knows maybe one day we won't be using kafka for this.

Thu, Nov 17, 10:11 PM · Event-Platform Value Stream (Sprint 05)
Ottomata added a comment to T318846: EventGate should support producing keyed messages for Kafka partitioning.

Are you open to bikeshedding on the name key_fields

So open.

Thu, Nov 17, 10:08 PM · Event-Platform Value Stream (Sprint 05)
Ottomata added a comment to T308017: Design Schema for page state and page state with content (enriched) streams.

I'm not following the aspect about page properties not being persisted through edits

I don't know if I totally follow either, but there is more context the initial collab design doc see "Do we want page properties?" and the comment.

Thu, Nov 17, 7:52 PM · Event-Platform Value Stream, Data-Engineering, Patch-For-Review
Ottomata added a comment to T306895: Write dedicated cassandra authorization code to read password from file when loading.

OR! You could get fancier and make an HDFS puppet file provider :)

Thu, Nov 17, 7:27 PM · Data Pipelines (Sprint 04), Data-Engineering-Planning, Patch-For-Review, Cassandra
Ottomata added a comment to T306895: Write dedicated cassandra authorization code to read password from file when loading.

Yes, that's what I was thinking. Make it so that the exec in the defined type pulls the file from HDFS and diffs the content with the secret as part of the unless condition.

Thu, Nov 17, 7:26 PM · Data Pipelines (Sprint 04), Data-Engineering-Planning, Patch-For-Review, Cassandra
Ottomata added a comment to T323294: EventBus: Error: Call to a member function isCurrent() on null.

Interesting! I see there are some checks in the older EventBusHooks that guard against this. Will add the same ones in PageChangeHooks now.

Thu, Nov 17, 7:08 PM · MW-1.40-notes (1.40.0-wmf.12; 2022-11-28), User-brennen, Event-Platform Value Stream, Wikimedia-production-error, Data-Engineering
Ottomata updated the task description for T308017: Design Schema for page state and page state with content (enriched) streams.
Thu, Nov 17, 3:27 PM · Event-Platform Value Stream, Data-Engineering, Patch-For-Review
Ottomata updated the task description for T308017: Design Schema for page state and page state with content (enriched) streams.
Thu, Nov 17, 3:27 PM · Event-Platform Value Stream, Data-Engineering, Patch-For-Review
Ottomata updated subscribers of T308017: Design Schema for page state and page state with content (enriched) streams.

In https://phabricator.wikimedia.org/T317768#8400702 @Isaac wrote:

Thu, Nov 17, 3:18 PM · Event-Platform Value Stream, Data-Engineering, Patch-For-Review
Ottomata added a comment to T308017: Design Schema for page state and page state with content (enriched) streams.

2 more questions to answer:

Thu, Nov 17, 3:12 PM · Event-Platform Value Stream, Data-Engineering, Patch-For-Review
Ottomata added a comment to T317768: Proposal: deprecate the mediawiki.revision-score stream in favour of more streams like mediawiki-revision-score-<model>.

@Isaac, for sake of continuity, let's have this discussion over on T308017: Design Schema for page state and page state with content (enriched) streams.

Thu, Nov 17, 3:04 PM · Data-Engineering-Planning, Research, Machine-Learning-Team
Ottomata added a comment to T306895: Write dedicated cassandra authorization code to read password from file when loading.

We can, but it isn't done with the Puppet file resources, so there isn't any detection of file changes, only an exec to put the file if it doesn't exist

Thu, Nov 17, 3:02 PM · Data Pipelines (Sprint 04), Data-Engineering-Planning, Patch-For-Review, Cassandra
Ottomata created T323280: Grant ssh access to analytics-admins to dcausse and gmodena.
Thu, Nov 17, 2:55 PM · SRE, SRE-Access-Requests, Data-Engineering
Ottomata claimed T316519: Create a shared flink docker image.
Thu, Nov 17, 2:10 PM · Event-Platform Value Stream (Sprint 05), Patch-For-Review, Data-Engineering-Planning

Wed, Nov 16

Ottomata updated subscribers of T306939: Q4:(Need By: TBD) rack/setup/install kafka-jumbo101[0-5].

Hi, checking in, any updates here?

Wed, Nov 16, 2:43 PM · SRE, ops-eqiad, DC-Ops
Ottomata added a comment to T323217: [SPIKE] Evaluate a pyflink version of Mediawiki Stream Enrichment.

Best SQL Example here. Will be much better with a catalog.

Wed, Nov 16, 1:46 PM · Event-Platform Value Stream (Sprint 05), Data-Engineering

Tue, Nov 15

Ottomata updated subscribers of T318846: EventGate should support producing keyed messages for Kafka partitioning.

@phuedx , want to check in with you about this, and see if you have any thoughts.

Tue, Nov 15, 4:18 PM · Event-Platform Value Stream (Sprint 05)
Ottomata added a comment to T321925: Allow Cormac Parle and Marco Fossati to deploy analytics-platform-eng Airflow instance.

Yeehaw

Tue, Nov 15, 4:18 PM · Data Pipelines (Sprint 04), Data-Engineering-Planning
Ottomata added a comment to T323032: streamconfigs action API query module returns everything(?) as associative arrays.

Oo, I see. Right empty I remember now.

Tue, Nov 15, 4:16 PM · MW-1.40-notes (1.40.0-wmf.13; 2022-12-05), Metrics-Platform-Planning (Metrics Platform Kanban), EventStreams

Mon, Nov 14

Ottomata added a comment to T323032: streamconfigs action API query module returns everything(?) as associative arrays.

Interesting. Ideally PHP would just do the right thing and distinguish between integer indexed arrays and associative arrays (objects).

Mon, Nov 14, 9:04 PM · MW-1.40-notes (1.40.0-wmf.13; 2022-12-05), Metrics-Platform-Planning (Metrics Platform Kanban), EventStreams
Ottomata moved T318846: EventGate should support producing keyed messages for Kafka partitioning from In Progress to In Review on the Event-Platform Value Stream (Sprint 04) board.
Mon, Nov 14, 8:22 PM · Event-Platform Value Stream (Sprint 05)
Ottomata moved T322320: Investigate using Spark Streaming as an Event Service Platform from Next Up to In Review on the Event-Platform Value Stream (Sprint 04) board.
Mon, Nov 14, 8:22 PM · Spike, Event-Platform Value Stream (Sprint 04)
Ottomata moved T318846: EventGate should support producing keyed messages for Kafka partitioning from Next Up to In Progress on the Event-Platform Value Stream (Sprint 04) board.
Mon, Nov 14, 8:22 PM · Event-Platform Value Stream (Sprint 05)
Ottomata updated the task description for T322320: Investigate using Spark Streaming as an Event Service Platform.
Mon, Nov 14, 8:13 PM · Spike, Event-Platform Value Stream (Sprint 04)
Ottomata added a comment to T322320: Investigate using Spark Streaming as an Event Service Platform.

I've already demoed creating a Event Platform based Spark Streaming DataFrarme here. (If we chose to invest in spark streaming, we'd abstract more of that, like we have for Flink with DataFrame factory functions and or a Catalog implementation). Defining the UDF is pretty much the same as in Flink, except you don't always have to specify the return type. I believe you do if the type is a complex/nested one, and in that case you'd use Spark's own DataType system, which is similar to Flink's.

Mon, Nov 14, 8:11 PM · Spike, Event-Platform Value Stream (Sprint 04)
Ottomata added a comment to T322591: Requesting access to analytics-privatedata-users for Dasm.

Approved.

Mon, Nov 14, 4:18 PM · Patch-For-Review, SRE, SRE-Access-Requests

Thu, Nov 10

Ottomata added a comment to T321925: Allow Cormac Parle and Marco Fossati to deploy analytics-platform-eng Airflow instance.

Okay, did some things, @Cparle or @mfossati can you try deploying now?

Thu, Nov 10, 7:00 PM · Data Pipelines (Sprint 04), Data-Engineering-Planning
Ottomata added a comment to T314981: Add a webrequest sampled topic and ingest into druid/turnilo.

The idea is to proceed for a first iteration (namely a new kafka topic + druid indexation on a separate datasource, say webrequest_128_live) without having a proper event and schema in place, so that we could validate if the whole workflow works and if it is valuable for SRE. Then we can definitely add one, what do you think?

Ya sounds good. When you are ready, the minimal requirement of adding the schema and event stream config won't be hard.

Thu, Nov 10, 4:27 PM · Patch-For-Review, Traffic, Data Pipelines, User-fgiunchedi, Data-Engineering-Planning, Foundational Technology Requests
Ottomata added a comment to T314981: Add a webrequest sampled topic and ingest into druid/turnilo.

overridden by the DE batch jobs

QQ: do the existent DE batch jobs already produce all the same info you are trying to produce here with benthos?

Thu, Nov 10, 3:16 PM · Patch-For-Review, Traffic, Data Pipelines, User-fgiunchedi, Data-Engineering-Planning, Foundational Technology Requests
Ottomata added a comment to T322670: Requesting access to analytics-privatedata-users for David.pujol.

Approved from DE.

Thu, Nov 10, 2:06 PM · SRE, SRE-Access-Requests
Ottomata added a comment to T322795: Requesting access to analytics-privatedata-users for ryasmeen (superset access with no server access).

Approved.

Thu, Nov 10, 2:05 PM · User-Ryasmeen, SRE, SRE-Access-Requests

Wed, Nov 9

Ottomata updated subscribers of T304450: Create conda .deb and docker image.

@EChetty why Event Platform here?

Wed, Nov 9, 9:35 PM · Event-Platform Value Stream, Data-Engineering-Planning, Patch-For-Review
Ottomata moved T308017: Design Schema for page state and page state with content (enriched) streams from Sprint 04 to Backlog on the Event-Platform Value Stream board.
Wed, Nov 9, 9:34 PM · Event-Platform Value Stream, Data-Engineering, Patch-For-Review
Ottomata moved T308017: Design Schema for page state and page state with content (enriched) streams from Backlog to Sprint 04 on the Event-Platform Value Stream board.
Wed, Nov 9, 9:34 PM · Event-Platform Value Stream, Data-Engineering, Patch-For-Review
Ottomata edited projects for T308017: Design Schema for page state and page state with content (enriched) streams, added: Event-Platform Value Stream; removed Event-Platform Value Stream (Sprint 03).
Wed, Nov 9, 9:33 PM · Event-Platform Value Stream, Data-Engineering, Patch-For-Review
Ottomata reopened T308017: Design Schema for page state and page state with content (enriched) streams as "Open".

Re-opening to discuss a schema change.

Wed, Nov 9, 9:33 PM · Event-Platform Value Stream, Data-Engineering, Patch-For-Review
Ottomata reopened T308017: Design Schema for page state and page state with content (enriched) streams, a subtask of T307959: [Shared Event Platform] Design and Implement POC Flink Service to Combine Existing Streams, Enrich and Output to New Topic, as Open.
Wed, Nov 9, 9:32 PM · Data-Engineering-Planning, Event-Platform Value Stream, Epic
Ottomata added a comment to T320968: Easy Flink Python UDF + SQL enrichment.

if we're deriving it from schemas then the user would have to go to the schema repo and figure out what they have to return anyways

Yeah, you are right, and it is kind of weird to associate the return value of a UDF with an event JSONSchema. It makes total sense for the inputs and outputs of the streaming pipelines, but not so much for intermediate steps, like function calls.

Wed, Nov 9, 8:09 PM · Event-Platform Value Stream (Sprint 04), Spike, Data-Engineering-Planning
Ottomata added a comment to T322022: Flink SQL queries should access Kafka topics from a Catalog.

Some thoughts and trials of implementing a Flink Event Platform catalog here and in some comments below.

Wed, Nov 9, 6:28 PM · Event-Platform Value Stream (Sprint 05), Data-Engineering-Planning

Mon, Nov 7

Ottomata added a comment to T320968: Easy Flink Python UDF + SQL enrichment.

I got sidetracked into getting a working content enrichment pyflink UDF SQL thing working. Finally got it!

Mon, Nov 7, 8:46 PM · Event-Platform Value Stream (Sprint 04), Spike, Data-Engineering-Planning
Ottomata added a comment to T320968: Easy Flink Python UDF + SQL enrichment.

Hm, maybe you can do a different topic? It might be better to do a temp topic with your name in it, so it is clear that is just you testing things. 'tchin.test0'?

Mon, Nov 7, 8:40 PM · Event-Platform Value Stream (Sprint 04), Spike, Data-Engineering-Planning
Ottomata added a comment to T322350: Requesting access to analytics-privatedata-users & Kerberos identity & deployment POSIX group & ml-team-admins for Ilias Sarantopoulos.

Approve!

Mon, Nov 7, 1:28 PM · SRE, SRE-Access-Requests
Ottomata added a comment to T320968: Easy Flink Python UDF + SQL enrichment.

If you want to run Flink in k8s and write to HDFS, then this will be a problem: this is the k8s "kerbarrier".

Mon, Nov 7, 1:26 PM · Event-Platform Value Stream (Sprint 04), Spike, Data-Engineering-Planning
Ottomata added a comment to T322339: Requesting access to ops and analytics for stevemunene.

Since this is ops/sre/root(?) access, is there any approval that needs to happen from SRE?

Mon, Nov 7, 12:39 PM · SRE, SRE-Access-Requests

Fri, Nov 4

Ottomata added a comment to T317768: Proposal: deprecate the mediawiki.revision-score stream in favour of more streams like mediawiki-revision-score-<model>.

FYI, we have deployed a rc0.mediawiki.page_change stream to group0 wikis! Example event here. It has the development/mediawiki/page/change schema. We put this in development/ as we wanted to indicate that it is still a WIP and subject to change.

Fri, Nov 4, 1:43 PM · Data-Engineering-Planning, Research, Machine-Learning-Team

Thu, Nov 3

Ottomata updated subscribers of T318863: Event Platform and DataHub Integration.
Thu, Nov 3, 6:46 PM · Data-Catalog, Data-Engineering-Planning, Event-Platform Value Stream
Ottomata added a comment to T318863: Event Platform and DataHub Integration.

No clue if this is the right approach, but perhaps we could use ingestion transforms to augment the existent kafka ingestion with Event Platform event schemas? From 5 minutes of reading docs, I think we'd do this by implementing a transformer that can transform the schemaMetadata aspect of a dataset entity?

Thu, Nov 3, 6:46 PM · Data-Catalog, Data-Engineering-Planning, Event-Platform Value Stream
Ottomata updated the task description for T318863: Event Platform and DataHub Integration.
Thu, Nov 3, 6:04 PM · Data-Catalog, Data-Engineering-Planning, Event-Platform Value Stream
Ottomata added a comment to T319214: Evaluate Benthos as stream processor.

FYI I am working on making a more specific list of requirements spec for event platform producers:

Thu, Nov 3, 4:34 PM · Patch-For-Review, Event-Platform Value Stream, Data-Engineering-Planning, Observability-Logging, Machine-Learning-Team, observability
Ottomata updated the task description for T321557: EventBus' stream config destination_event_service setting should move into producers.mediawikI_eventbus specific settings..
Thu, Nov 3, 2:59 PM · Data-Engineering, Event-Platform Value Stream, MW-1.40-notes (1.40.0-wmf.8; 2022-10-31)
Ottomata added a comment to T311129: [Shared Event Platform] Produce new mediawiki.page-change stream from MediaWiki EventBus.

BTW we are live on group0 wikis now.

Thu, Nov 3, 2:16 PM · Event-Platform Value Stream (Sprint 04), Patch-For-Review
Ottomata added a comment to T316049: Unify all Product Analytics ETL jobs.

To ease the creation of simple DAGs, we could implement a wizard

Instead of a wizard, perhaps we could just create an abstraction (task groups?) around simple input/output jobs? Parameterize input and output frequency and locations (hive table / hdfs path), and the job the user wants to run?

Thu, Nov 3, 2:15 PM · Product-Analytics (Kanban), Epic
Ottomata added a comment to T322147: Requesting access to analytics-privatedata-users & Kerberos identity for Ilooremeta.

Approved.

Thu, Nov 3, 2:04 PM · SRE, SRE-Access-Requests