User Details
- User Since
- Oct 9 2014, 4:50 PM (498 w, 4 d)
- Availability
- Available
- IRC Nick
- ottomata
- LDAP User
- Ottomata
- MediaWiki User
- Ottomata [ Global Accounts ]
Today
^ changed title to remove the controversial 'source of truth' terminology.
Yesterday
@phuedx @VirginiaPoundstone should this task be resolved?
@VirginiaPoundstone another task to decline or resolve?
Oh, or perhaps the subtasks should be done first?
@phuedx @VirginiaPoundstone being bold and declining. Please reopen if this was wrong.
Fri, Apr 26
Do you suggest to use something like uslfo_webrequest_text instead?
Thu, Apr 25
Oh, another piece of info: WMF traffic frontends set a timeout for all connections of 15 minutes. This causes connected SSE clients to reconnect every 15 minutes. The disconnect should decrement the client IP. But, if there are enough connections from the same IP, I think the reconnect would be more likely to end up at a worker that is already at the limit for that IP.
Or, could we just avoid rate limiting Cloud VPS / Toolforge IPs in EventStreams code? Or at least increase the limit by a lot for those IPs?
Wed, Apr 24
This is probably not helpful, but EventStreams naive IP based local rate limiting is pretty dumb. If there was a smarter more global solution in WMF prod (maybe there is these days), we'd much prefer to use that.
Fri, Apr 19
I fear I read that task, the way it is written at least, differently.
Thu, Apr 18
Being bold.
We could append (or prepend) other information pieces to the sequence number (like the haproxy process id) to avoid duplicates
Replied at T120242#9726131
There is a lil discussion about this topic in T249745: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable". Moving that discussion to here.
Hello! I don't think this task is resolved. Perhaps you meant to decline it?
search index not getting updated in 0.001% of edits
Wed, Apr 17
For replicating state changes (T120242) [...]
Why though? Why is 99.9999% (or 99.999999% or 99.99%) not enough?
see the CAP theorem
C != eventual-C. Eventual Consistency + AP is feasible and done often.
Mon, Apr 15
Fri, Apr 5
Very cool!
I prefer the "by functionality" organization
Yap cool with me. Let the namingbikeshed begin.
Curious! What's the status on collaboration with rest of org on NodeJS services and library support? IIUC there is tech department essential work planned to work on this.
Thu, Apr 4
Perhaps we can close T361017: [SPIKE] Can we express Event Platform configs in Datasets Config? as duplicate?
Should we have 2 lib files, one for schema and one for data, for both Hive and Iceberg? Or one file doing both as it is now?
Wed, Apr 3
Just read Antione's patch and I think I'm missing something, so I thought I could ask here.
Being bold, reopen if needed.
Mar 28 2024
Mar 27 2024
I think this is a library that Data Engineering owns?
@VirginiaPoundstone I don't think so. I believe mw.user.generate.generateRandomSessionId is part of MediaWiki core.
Mar 26 2024
meta.id
Do you know who set these fields with the current webrequest flow?
Mar 22 2024
meta.id and meta.request_id
Mar 21 2024
Supports only reading, with read-ahead of a predermined block-size. In the case that the server does not supply the filesize, only reading of the complete file in one go is supported.
Mar 19 2024
Hm, actually, as far as I can tell, reading from HTTP (and many other sources) uses https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.spec.AbstractBufferedFile, which has a default read blocksize of 5MB.
Mar 18 2024
Or maybe:
@Fabfur I'm really really hoping we can remove varnishkafka-eventlogging after we complete T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate, as part of T238230: Decommission EventLogging backend components by migrating to MEP.
This doesn't mean that MediaWiki shoudn't try to improve the situation by handling the failure to submit a job by saving it somewhere (a specific db table?) and we can replay them later. At the current failure rate, this would guarantee the jobs would be executed with an irrelevant cost in terms of resources.
I wonder if JobQueueGroup::lazyPush()/JobQueueEventBus could be rigged to make the provided jobs use "hasty" mode in EventGate?
Mar 15 2024
Mar 6 2024
Mar 5 2024
Oh cool! @bking I read the linked notes but I'm missing how its gonna work? How can you alert on dataset $X for partition $N is failing? Is there a way to make partition or hour or datetime or whatever a label?
Mar 3 2024
I still wonder why profile::kafka::mirror::properties doesn't blacklist all MW jobs?* Is anything making use of that extra data?
Feb 29 2024
+1, or add this as a subtask of that?
+1! to this idea!
Feb 27 2024
<3
Feb 21 2024
Worth investigating? https://datacontract.com/
Just came across https://www.jikkou.io/docs/tutorials/get_started/ . Worth a look!
Feb 19 2024
@lbowmaker @gmodena Should we resolve and close this?
Feb 15 2024
@JWheeler-WMF EventBus extension uses the BlockIpComplete hook. If there are no changes to this hook API, then there are no changes needed for EventBus or the mediawiki.user-blocks-change stream. However, I'd assume that to accomodate Multiblocks feature, the hook will need to be changed to represent the multiple expiration dates of the different blocks.
Feb 13 2024
Oh and in case you haven't seen it: EvolveHiveTable.
I think because it was on the Event Platform board, but doesn't have anything really to do with Event Platform. Instead, it has to do with MW generated session IDs, which I believe are used in EventLogging instrumentation schemas.
Feb 12 2024
PHP execution.
Afaik PHP execution is limited for security reasons to only specific directories. This will thus likely need a puppet change first to Apache config to allow this directory to execute PHP.Given the transition to Docker/Helm/Kubernetes etc this will also need a corresponding change there, which has its own copy of the Apache config.
Feb 11 2024
If you have time to dive deep, you can live inspect a nodejs process and search for memory leaks.
Feb 9 2024
wondering about the stream connection duration
all we'd need would be to switch our webrequest pipelines to start consuming from the proposed new table names discussed in T314956: [Event Platform] Declare webrequest as an Event Platform stream.
I think that this more precise timestamp would be parseable by our ingestion system just fine, but we should verify. If we can get this precise I suppose...why not? I see that existent varnish dt is only seconds, which doesn't seem very precise, especially for webrequest. Perhaps we should take this opportunity to increase the precision a bit. If we can, we should strive for at least millisecond. Not a blocker for this task though.
respective doc page
Feb 8 2024
Hello! I'm not entirely sure what this ticket is trying to do, but here's some hopefully useful information:
Feb 5 2024
Feb 4 2024
Jan 30 2024
Copypasting comment from Alerts Review doc:
Jan 18 2024
Jan 10 2024
Oh, and actually, we only need to count requests to mediawiki.org/beacon/event, so:
Jan 8 2024
Decommissioning probably won't get done until after I'm back from leave in late April. Can we wait that long?
Jan 5 2024
Okay great! Thank you.
Also, from the convo in December's tech leadership CoP meeting, I started thinking about how what we want for T291120: MediaWiki Event Carried State Transfer - Problem Statement is pretty similar to what is in MW's logging table, except we need the data to be structured, comprehensive and consistent (meaning no missing state changes). In T120242: Eventually Consistent MediaWiki State Change Events, one of the solutions outlined is the 'Transactional Outbox' pattern, which is kinda similar to a comprehensive+structured logging table from which we can generate and externalize state change events. I betcha we could tie these ideas together somehow.
@SNowick_WMF, are latest versions of apps still sending the various MobileApp* events? I see a few events coming in, but maybe those are just from old versions?
peak request rate was ~1900 requests/s.
Oh, that turnilo chart is per hour (I think), and is also sampled 1/128. 1900/s Seemed like a lot! So more like Peak of 900*128/60/60 == 32 requests/s. (I think you misread the chart, the peak I see shows '900', not 1900).
IIRC, the decision was to wait until the new year, so as not to risk a mistake while people were out on holidays.
Volume
peak request rate was ~1900 requests/s.
we should decide sometime soon
Jan 3 2024
Which reads to me as EventGate needing a logic update for how it formats a multi-status response
Jan 2 2024
Wow it...kinda...works~