User Details
- User Since
- Oct 9 2014, 4:50 PM (468 w, 1 d)
- Availability
- Available
- IRC Nick
- ottomata
- LDAP User
- Ottomata
- MediaWiki User
- Ottomata [ Global Accounts ]
Fri, Sep 22
Wed, Sep 20
Why only eventgate-analytics-external is configured to refresh its internal cache, without and embdded schema registry fallback?
Tue, Sep 5
Would this task be more appropriately titled "Document the onboarding journey on for building simple streaming enrichment apps"?
Thu, Aug 31
AIUI the json-schema-merge-allof package allows for its behaviour to be changed on a per-keyword
Jul 27 2023
Related: T336176: MediaWiki user types
Jul 18 2023
Filter out null edits
HuH! Interesting. Current me does not remember this at all.
Jul 11 2023
What's weird about this is that these are actual old state changes. It's not just that there is drift, some events that happened years ago and being emitted into the stream now. E.g. Why is an edit event for https://pl.wikipedia.org/w/index.php?diff=prev&oldid=1130401 being emitted now?
Jul 8 2023
close it in a few days if there are no more comments,
We should keep this open, I really think this should not happen.
We removed meta.dt requiredness in common/2.0.0.
. I understand its impact for MediaWiki's clients, but would some value like 5 considerable to see how Kafka behaves? I mean if the queues start to get down after it.
Jul 7 2023
Oo, it would be really nice if we could modify the job logic a little bit, to be able to produce events with the time appropriate for the schedule task time. That way, we could backfill more easily.
So cool!
Jul 6 2023
Would love to see all of our REST APIs specify deterministic output types, ideally following some of the same schema guidelines used by Event Platform. This would allow us to use REST API endpoints as 'lookup tables', and allow us to join them and other datasets with SQL.
Meeting today, discussed / decided the following:
When investigating, maybe start with event.mediawiki_page_change_v1 instead of page_content_change?
Possibly related: T340471: [Airflow] P.O.C. on Iceberg sensor using Snapshot metadata to keep status of updates, if/when we decide to move all event tables to Iceberg.
FWIW, EventStreamConfig in wikimedia-event-utilities I think only requests all stream config once on instantiation, not individually for each stream. So its not like there are a bunch of http requests all at once. There is only one.
Let's wait a few days, and if we see none (or fewer?) canary events, let's resolve this task.
Pretty sure there is a configurable env var BUILD_LIBRDKAFKA that can conditionally disable this. eventgate-wikimedia installs the librdkafka1 .deb and sets BUILD_LIBRDKAFKA=0
Jul 5 2023
@gmodena and I debugged this today, and realized it was because we never implemented support for specifying the schema versions used by the Kafka source, and always use the latest. This conflicts with the RowTypeInfo we use when reading from the source into a DataStream.
@gmodena we can close this task, ya?
Jul 3 2023
In the end the main drawback is that this modulo value must never change unless you drain the pipeline before changing it (the keyed state cannot be redistributed by flink automatically if the key function changes).
Jun 30 2023
Jun 29 2023
Backfill:
Jun 28 2023
could implement Draft-3 required field support in eventutilities-core JsonSchemaConverter
Done in patch. This will allow us to use this class in refinery-source, and delete the one there.
I'm declining this task. I realized that we let eventgate handle setting meta.dt right now anyway, so its precision should be fine. Not much we can do about MW provided timestamps, since they are in second precision anyway.
^ sounds good!
Handled by flink operator and in config/documentation
This should be done, even though we process in async, we emit events in the order the tasks receive them.
We should avoid team names in functional code / namespacing. Team names change often.
Jun 27 2023
WIP patch for doing this ^.
- Moved decision log to wikitech: https://wikitech.wikimedia.org/wiki/Event_Platform/Decision_Log
- Added better docs for stream config deployment: https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#Stream_Configuration_deployment
- Added docs on how to rename streams: https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#Renaming_streams
Added more logging, and scheduled it for next week's train: https://etherpad.wikimedia.org/p/analytics-weekly-train
The conditional checks if the type is Java null (meaning not present), or if it is set to JSONSchema "null".
Wait, no, the check that throws this error is:
BTW, I don't think this is related to T309717: Event Utilities partially downloads schemas anymore.
I think we can close this? In T330236: Event partitions missing since 2023-02-21T10:00 for stream without events (canary events not produced?) we merged and deployed https://gerrit.wikimedia.org/r/c/analytics/refinery/source/+/894642/, which adds a timeout (and some retries?).
The wikitech doc now says that we (as in service ops) are required to save and restore the state.
This would only be required after cluster updates (like for WQDS), not for regular application lifecycles.
Alright! We are finally on Spark 3, and deployed the error logging change that Dan wrote last year!
Jun 26 2023
Approved.