Fri, May 7
Interesting thanks! So brainstorming how that would work for Debezium, since Debezium is just a slave process consuming a binlog, would it be possible to just stop it, change configs so it points at a new master, and start it? As long as the same binlog position exists on the old and new master, would that work?
Ok let me try to rephrase, there are actually 2 distinct questions here.
Thu, May 6
That'd be great we can work with that! Thank you.
I just google image searched 'migrating elephants', saw one that looked good, and then saw that the link was for an article titled 'Migrating Elephants – How To Migrate Petabyte Scale Hadoop Clusters With Zero Downtime' :p
Can you confirm, then, that we can delete data older than 90 days? :-)
Reopen if needed
The sooner the better, but there isn't yet a deadline. We need to either migrate or decom all legacy EventLogging streams in order to turn off the old eventlogging backend system.
hmm, most likely it doesn't need ips/geo. The stats are language based rather than region
It'll be live on all Wikipedias by EOD today.
Oh my, we should prep the migration!
@mforns I guess I mean what Sam was saying, if we get a ton of errors related to url too long (before we migrate), we can try and copy/paste the url shortening code into EventLogging.
Before migrating, we want to see if we can find an actual owner for this implementation.
Cat herding being tracked in T282131: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned
Not going to colocate after all :)
I've created a new task to track down the EventLogging usage: T282131: Determine which remaining legacy EventLogging schemas need to be migrated or decommissioned
I see, so forward port only if we have to?
deploy a partial revert of your change to re-introduce the URL length limiting code.
Wed, May 5
Oh interesting. Perhaps we should capture the expiry in that stream too!
Ok, I've started sorting through the long tail of schemas in the audit spreadsheet. I was able to mark quite a few as 'To deprecate' based on the schema talk page status and/or DMing some schema owners. A few I had to mark as 'To migrate', so we can start on those ones anytime.
A related q as we are figuring this out.
It's been a couple of weeks since I sent an email asking if anyone needed or used revert info in mediawiki.revision-create. No response, so I think we can move forward with this without blocking on T280538: Capture rev_is_revert event data in a stream different than mediawiki.revision-create.
Hi @EYener, as far as we can tell, this instrumentation code does not have an owner. It is custom code copy/pasted from the MW EventLogging extension years ago. It won't work as is for the Event Platform migration. I started working on this in T282012 but then realized it isn't quite as easy as it sounds.
Hey sorry yall! I thought I had done a code search and removed all occurrences...must not have noticed this on an-test-client somehow. Thank you.
Mon, May 3
Done in recent refactors.
This is now available in wmfdata via anaconda-wmf
If there are no objections, we will stop refining these and remove them from the event database during the week of May 10.
Ok, I've removed the requiredness in all the mediawiki fragments. I'm not going to touch common, as it has $schema and meta, and those should always be required. I'm also not going to touch the rdf_streaming_updater one, that is used by WDQS updater, and unlikely to be $refed by anything else.
more standard model using a hook and EventBus extension.
Meaning, EventBus would make a score request to ORES, and then submit the revision-score event?
Thanks Michael! :)
This stream is exposed publicly so may be used by the community for various purposes. It is also ingested into Hive and used by researchers and product analysts, but I don't have an understanding of how much or for what :)
Could there be a dedicated precache instance/cluster for ORES that didn't serve regular traffic?
There are still some child tasks of this Newpyter parent task, but as of today I think we can call the 'Newpyter' project done.
Did the following on each stat box:
Instead of doing this work to recreate the replicas with a different binlog format now, could we wait for the new db hardware, set up multi instance MariaDBs, and then enable the proper binlog format then? We'll basically be recreating each replica, so it might more be worth our time to just do that then.
Fri, Apr 30
Just saw this task. FYI these schemas were migrated on March 8 2021.