User Details
- User Since
- Oct 9 2014, 4:50 PM (493 w, 6 d)
- Availability
- Available
- IRC Nick
- ottomata
- LDAP User
- Ottomata
- MediaWiki User
- Ottomata [ Global Accounts ]
Today
Yesterday
I think this is a library that Data Engineering owns?
@VirginiaPoundstone I don't think so. I believe mw.user.generate.generateRandomSessionId is part of MediaWiki core.
Tue, Mar 26
meta.id
Do you know who set these fields with the current webrequest flow?
Fri, Mar 22
meta.id and meta.request_id
Thu, Mar 21
Supports only reading, with read-ahead of a predermined block-size. In the case that the server does not supply the filesize, only reading of the complete file in one go is supported.
Tue, Mar 19
Hm, actually, as far as I can tell, reading from HTTP (and many other sources) uses https://filesystem-spec.readthedocs.io/en/stable/api.html#fsspec.spec.AbstractBufferedFile, which has a default read blocksize of 5MB.
Mon, Mar 18
Or maybe:
@Fabfur I'm really really hoping we can remove varnishkafka-eventlogging after we complete T353817: Create legacy EventLogging proxy HTTP intake (for MediaWikiPingback) endpoint to EventGate, as part of T238230: Decommission EventLogging backend components by migrating to MEP.
This doesn't mean that MediaWiki shoudn't try to improve the situation by handling the failure to submit a job by saving it somewhere (a specific db table?) and we can replay them later. At the current failure rate, this would guarantee the jobs would be executed with an irrelevant cost in terms of resources.
I wonder if JobQueueGroup::lazyPush()/JobQueueEventBus could be rigged to make the provided jobs use "hasty" mode in EventGate?
Fri, Mar 15
Wed, Mar 6
Tue, Mar 5
Oh cool! @bking I read the linked notes but I'm missing how its gonna work? How can you alert on dataset $X for partition $N is failing? Is there a way to make partition or hour or datetime or whatever a label?
Sun, Mar 3
I still wonder why profile::kafka::mirror::properties doesn't blacklist all MW jobs?* Is anything making use of that extra data?
Thu, Feb 29
+1, or add this as a subtask of that?
+1! to this idea!
Tue, Feb 27
<3
Feb 27 2024
Feb 21 2024
Worth investigating? https://datacontract.com/
Just came across https://www.jikkou.io/docs/tutorials/get_started/ . Worth a look!
Feb 19 2024
@lbowmaker @gmodena Should we resolve and close this?
Feb 15 2024
@JWheeler-WMF EventBus extension uses the BlockIpComplete hook. If there are no changes to this hook API, then there are no changes needed for EventBus or the mediawiki.user-blocks-change stream. However, I'd assume that to accomodate Multiblocks feature, the hook will need to be changed to represent the multiple expiration dates of the different blocks.
Feb 13 2024
Oh and in case you haven't seen it: EvolveHiveTable.
I think because it was on the Event Platform board, but doesn't have anything really to do with Event Platform. Instead, it has to do with MW generated session IDs, which I believe are used in EventLogging instrumentation schemas.
Feb 12 2024
PHP execution.
Afaik PHP execution is limited for security reasons to only specific directories. This will thus likely need a puppet change first to Apache config to allow this directory to execute PHP.Given the transition to Docker/Helm/Kubernetes etc this will also need a corresponding change there, which has its own copy of the Apache config.
Feb 11 2024
If you have time to dive deep, you can live inspect a nodejs process and search for memory leaks.
Feb 9 2024
wondering about the stream connection duration
all we'd need would be to switch our webrequest pipelines to start consuming from the proposed new table names discussed in T314956: [Event Platform] Declare webrequest as an Event Platform stream.
I think that this more precise timestamp would be parseable by our ingestion system just fine, but we should verify. If we can get this precise I suppose...why not? I see that existent varnish dt is only seconds, which doesn't seem very precise, especially for webrequest. Perhaps we should take this opportunity to increase the precision a bit. If we can, we should strive for at least millisecond. Not a blocker for this task though.
respective doc page
Feb 8 2024
Hello! I'm not entirely sure what this ticket is trying to do, but here's some hopefully useful information:
Feb 5 2024
Feb 4 2024
Jan 30 2024
Copypasting comment from Alerts Review doc:
Jan 18 2024
Jan 10 2024
Oh, and actually, we only need to count requests to mediawiki.org/beacon/event, so:
Jan 8 2024
Decommissioning probably won't get done until after I'm back from leave in late April. Can we wait that long?
Jan 5 2024
Okay great! Thank you.
Also, from the convo in December's tech leadership CoP meeting, I started thinking about how what we want for T291120: MediaWiki Event Carried State Transfer - Problem Statement is pretty similar to what is in MW's logging table, except we need the data to be structured, comprehensive and consistent (meaning no missing state changes). In T120242: Consistent MediaWiki state change events | MediaWiki events as source of truth, one of the solutions outlined is the 'Transactional Outbox' pattern, which is kinda similar to a comprehensive+structured logging table from which we can generate and externalize state change events. I betcha we could tie these ideas together somehow.
@SNowick_WMF, are latest versions of apps still sending the various MobileApp* events? I see a few events coming in, but maybe those are just from old versions?
peak request rate was ~1900 requests/s.
Oh, that turnilo chart is per hour (I think), and is also sampled 1/128. 1900/s Seemed like a lot! So more like Peak of 900*128/60/60 == 32 requests/s. (I think you misread the chart, the peak I see shows '900', not 1900).
IIRC, the decision was to wait until the new year, so as not to risk a mistake while people were out on holidays.
Volume
peak request rate was ~1900 requests/s.
we should decide sometime soon
Jan 3 2024
Which reads to me as EventGate needing a logic update for how it formats a multi-status response
Jan 2 2024
Wow it...kinda...works~
+1 k!
I think this would automatically just work if we could create/alter the tables through Spark directly, rather than through Hive.
I made some progress modifying Spark to make it support adding nested column. I'll stop here and wait for feedback from upstream before I clean it up and try a little harder.
Are you sure you want delete in the policy then? Perhaps you want to keep all the latest event per page forever, so you can backfill fully from the topic?
I think the review is done. Migration is being tracked in T351117: Move analytics log from Varnish to HAProxy. Can we close this?
Interesting! Curious, so the reason for using compaction here is just to save space, not necessarily to keep the latest record per key forever?
Dec 30 2023
Dec 27 2023
I might be able to set the ip field to the client IP, usually parsed and provided by varnish in the X-Client-IP header, but I think we don't need it for MediaWikiPingback, and shouldn't collect it if we don't. I'll not support it.
suggested implementation to be use medaiwiki-config/docroot/mediawiki.org
Dec 21 2023
We just had a discussion in DE standup about T335306: [SPIKE] Evaluation on iceberg sensor for airflow. I'm sure there are many existent Hive sensors on the webrequest table. I'd rather not block on that task for this migration. I suggest we keep this has a regular Hive table.
Couple questions back at you: is webrequest append only?
yes
If not, how do we do rewrites today?
If we do, they are per hour. We re-refine the entire hour.
Do you have an estimate of the duration for which we'd be dual-writing?
Hm, alternatively, we could just have the raw and refined tables be brand newly named tables and ingestion jobs during the migration, and then do the final cutover with a RENAME TABLE.
As for Hive tables. I'm trying to decide how best to do the migration. Perhaps, it would be easiest to keep the existent wmf.webrequest refined Hive table as is. The raw table would change to webrequest_frontend as imported from the new streams, but the webrequest refine airflow job would switch to refining from webrequest_frontend raw table once we are ready to do the migration cutover.
After a discussion in Slack, I have changed the suggested implementation to be use medaiwiki-config/docroot/mediawiki.org. This would make the solution only work from mediawiki.org/beacon/event , but would avoid any need for custom routing or custom deployment. MediaWikiPingback sends events to mediawiki.org, so this would suffice to unblock the eventlogging backend decom.
@Antoine_Quhen asked if we should consider making the new webrequest Hive table an Iceberg table. @JAllemandou @xcollazo can/should we do this?
How should we layout and name the new stream(s)?
@xcollazo do we need this anymore now that we've enabled canary events for all MW state event streams? You should be able to depend on both datacenter partitions being marked as ready, even if there are no real events in one of the DCs.
Dec 20 2023
To do this migration plan ^, we'd need Kafka jumbo to support 2x webrequest volume while we migrate. Let's check with Data Platform SREs, @brouberol @BTullis ? Whatcha think?
Alright, I spoke with @CCicalese_WMF today. The pingback data is very useful for making decisions like when we can deprecate versions of PHP, etc. It is impossible to force people to upgrade old installed versions of MediaWiki. If we decommission the legacy eventlogging backend, old installs will stop sending valuable data.