Page MenuHomePhabricator

Update Pingback to use the Event Platform
Open, Needs TriagePublic

Description

From both a recent conversation with @daniel and per T200375: Implement Pingback v2, it seems that Pingback should be updated to use the Event Platform. This migration is a little more complicated as the event producer doesn't use EventLogging. This is because the instrument needs to run during installation when extensions aren't loaded.

TODO

Per T259163: Migrate legacy metawiki schemas to Event Platform:

Links

  1. Implementation in MediaWiki Core
  2. https://wikitech.wikimedia.org/wiki/Event_Platform/EventLogging_legacy

Event Timeline

Oof, I didn't know there was a hardcoded use of EventLogging inside of MediaWiki core. This seems pretty fragile. This migration makes sense, but are we sure we want to continue doing this in the long term?

Oof, I didn't know there was a hardcoded use of EventLogging inside of MediaWiki core. This seems pretty fragile. This migration makes sense, but are we sure we want to continue doing this in the long term?

I think the question of whether we want to have pingback really depoends on the question of how much we want to support 3rd party installations. If we want to support them, having statistics about them is really helpful. So I personally think it's suprt important to have this.

What makes this difficult is that it's somethign that needs to happen during installation, so we can't rely on extensions. That makes it a bit awkward...

needs to happen during installation, we can't rely on extensions

We also can't rely on them, as perhaps we want stats on a MediaWiki with no extensions installed?

Perhaps receiving these stats should be fully encapsulated within MediaWiki core then, rather than relying on EventLogging/Event Platform? Some hardcoded API endpoint to receive and store stats in a MediaWiki database table?

Perhaps receiving these stats should be fully encapsulated within MediaWiki core then, rather than relying on EventLogging/Event Platform? Some hardcoded API endpoint to receive and store stats in a MediaWiki database table?

That code and table would be unused on all except one MediaWiki installations... There is only one instance (mediawiki.org, I guess) that receives Pingack, right?

So the code for sending it should be in core, but the code for receiving it shouldn't. It doesn't even need to be in MediaWiki at all, could be something else entirely.

Hm, yes, but I guess I mean at least this hardcoded producer code in MW core wouldn't have a hardcoded external dependency?

Meh, too much work for what this is. I don't love it, but I don't have a better solution. Proceed! :)

Change 938271 had a related patch set uploaded (by Phuedx; author: Phuedx):

[mediawiki/core@master] WIP: pingback: Also submit pingback to Event Platform

https://gerrit.wikimedia.org/r/938271

Change 938814 had a related patch set uploaded (by Phuedx; author: Phuedx):

[schemas/event/secondary@master] Migrate MediaWikiPingback schema

https://gerrit.wikimedia.org/r/938814

https://gerrit.wikimedia.org/r/938814 can be merged as is.

https://gerrit.wikimedia.org/r/938271 cannot be merged until we've decided on a stream name.

Once the latter is merged, we will (very?) slowly start to see an uptick in events on that stream but we will still receive a lot of events via EventLogging. We will continue to receive events via EventLogging from active MediaWiki installations that aren't up-to-date. AIUI we can support this by the following:

  1. Not manually evolving the Hive table to use new schema
  2. Creating a new stream, e.g. mediawiki.pingback.migrated, which isn't ingested as a "legacy EventLogging" stream
  3. Updating https://gerrit.wikimedia.org/r/938271 to submit events to that stream
  4. Updating the queries in the analytics/reportupdater-queries repo to query both the existing event.mediawikipingback and the new event.mediawiki_pingback_migrated Hive tables
  5. [Later] Updating the queries again after the EventLogging pipeline is decommissioned

Do we want to support this?

phuedx updated the task description. (Show Details)

https://gerrit.wikimedia.org/r/938271 cannot be merged until we've decided on a stream name.

Thankfully, I was wrong about this.

There are already events flowing on the eventlogging_MediaWikiPingback stream from the Legacy EventLogging processor service. We can configure Pingback to submit events to the same stream as they will be compatible. Direct consumers of the stream and/or the event.mediawikipingback Hive table will be unaffected.

This is the last schema to migrate! OH BOYYYYYY

+1 to CR. Do we need a review from someone else to merge to mediawiki/core?

Change 981446 had a related patch set uploaded (by Phuedx; author: Phuedx):

[operations/mediawiki-config@master] ext-EventStreamConfig: Add eventlogging_MediaWikiPingback stream config

https://gerrit.wikimedia.org/r/981446

Updates from slack thread:

Going to talk to Cindy next week about this.

But it seems that even if we merge this, we probably can't decom the eventlogging backend until the current LTS MW expires? TBD.

Alright, I spoke with @CCicalese_WMF today. The pingback data is very useful for making decisions like when we can deprecate versions of PHP, etc. It is impossible to force people to upgrade old installed versions of MediaWiki. If we decommission the legacy eventlogging backend, old installs will stop sending valuable data.

Cindy asked that we wait until all old versions almost stop sending data, meaning that they are not really used anymore. It is not clear how long this will be, but if the trends are the same as some older version, this could be around 5 years.

I'd rather not wait another 5 years to turn of this backend. We came up with an idea that will let us do so:

Since MediaWiki Pingback will be the ONLY remaining eventlogging schema to migrate to event platform, we can find a way to create a proxy service/endpoint that will translate the incoming legacy eventlogging GET request into one that is POSTed to eventgate. We decided that if we can do this as a MediaWiki API endpoint, we won't have to run any bespoke service to do this.

I'll make a subtask to describe the work.

Change 938814 merged by jenkins-bot:

[schemas/event/secondary@master] Migrate MediaWikiPingback schema

https://gerrit.wikimedia.org/r/938814

Change 984627 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/mediawiki-config@master] wgEventStreams - Add eventlogging_MediaWikiPingback stream

https://gerrit.wikimedia.org/r/984627

Change 985023 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/mediawiki-config@master] Create eventlogging-processor legacy converter to proxy to eventgate for mediawiki.org

https://gerrit.wikimedia.org/r/985023

Change 984627 merged by jenkins-bot:

[operations/mediawiki-config@master] wgEventStreams - Add eventlogging_MediaWikiPingback stream

https://gerrit.wikimedia.org/r/984627

Mentioned in SAL (#wikimedia-operations) [2023-12-30T16:55:44Z] <otto@deploy2002> Synchronized wmf-config/ext-EventStreamConfig.php: Config: [[gerrit:984627|Add eventlogging_MediaWikiPingback stream (T323828)]] (duration: 15m 10s)

Change 981446 abandoned by Phuedx:

[operations/mediawiki-config@master] ext-EventStreamConfig: Add eventlogging_MediaWikiPingback stream config

Reason:

Done in I37f745a77af164329933e1a32fbd029c80d56ee4

https://gerrit.wikimedia.org/r/981446