Page MenuHomePhabricator

EventBus - Add central user id to MediaWiki events
Closed, ResolvedPublic

Description

This will allow us to associate users across wikis in events.

Minimum requirement is to add this to mediawiki.page_change.v1 event stream, but we may want to add it elsewhere too.

This will allow us to calculate daily global editor metrics for T403660.

Event Timeline

Ottomata renamed this task from EventBus - Add CentralAuth global user id to MediaWiki events to EventBus - Add CentralAuth central user id to MediaWiki events.Sep 4 2025, 4:47 PM

Change #1184889 had a related patch set uploaded (by Ottomata; author: Ottomata):

[mediawiki/extensions/EventBus@master] Lookup and add a user_central_id on serialized users

https://gerrit.wikimedia.org/r/1184889

Ottomata renamed this task from EventBus - Add CentralAuth central user id to MediaWiki events to EventBus - Add central user id to MediaWiki events.Sep 4 2025, 5:45 PM
Ottomata triaged this task as High priority.

@mforns patches ready for review if you are willing! :)

cc @tchin as well.

I waffled between user_central_id and central_user_id, but chose user_central_id because https://wikitech.wikimedia.org/wiki/User:Ottomata/Organized_Code#Nouns_before_adjectives

Please comment if there are other preferences.

Thank you very much for the invitiation to bikeshed! ๐Ÿ‘ โค๏ธ ๐Ÿšฒ ๐Ÿ 

I'd suggest user_global_id instead. IMO, that's by far the most common term for "pan-Wikimedia". Just consider this search for "global" on Meta-Wiki, which brings up "global user pages", "global bans", "global rights", "global renames", and more. If you search for "central", on the other hand, CentralNotice is the only example of something high profile named "central".

We also use "global" in the metrics context; "global active editors" is something I say a lot, but I feel like if I said "central active editors", people would be a bit confused.

Even talking specifically about user identity code, even though the extension is called CentralAuth, the table that backs it is called globaluser.

Hm, I started with that, but MediaWiki software actually calls this concept 'central id', independent of the CentralAuth extension. CentralAuth just happens to be a central id provider.

Since these are specifically MediaWiki state change events, I try to lean towards sticking closely with MediaWiki concepts rather than more colloquial terms.

But, I could be convinced! I also was expecting this concept to be called 'global user' since that is the term I had heard thrown around too.

Since these are specifically MediaWiki state change events, I try to lean towards sticking closely with MediaWiki concepts rather than more colloquial terms.

Interesting! I think that's a reasonable objective, and I can see how it follows from thinking of the Event Platform as a part of MediaWiki.

On the other hand, I am thinking of the Event Platform as a part of the Wikimedia data ecosystem, and choosing the widely-understood Wikimedia term over an incidental piece of MediaWiki jargon follows naturally from that. For example, we want to add this to the mediawiki_history dataset, and there at least I think the Wikimedia perspective clearly wins. So we might end up with user_central_id here and user_global_id there, which seems unfortunate although not necessarily wrong.

In this case, I'd argue that "global" is the better choice given that:

  • it's the clearly dominant Wikimedia term
  • it might be the dominant term at wiki farms generallyโ€”see how it's used at Miraheze, for example
  • it's used by the only existing MediaWiki central ID provider (which is probably the only one that will ever exist)

But, like I said, I grok the reasoning for "central" and I definitely don't find it terrible, so up to you ๐Ÿ˜Š

Def an important choice and I am on the fence! Let's get some more minds. Asking in slack in #mediawiki-interfaces, #talk-to-data-engineering and #working-with-data

Both are honestly fine.

But given the explicit bikeshedding invite, I !vote for

  • user_central_id if this is even in theory something that could be used on a non-SUL Wikimedia wiki (= a fishbowl/private wiki) or outside the Wikimedia wiki farm
    • This is what MediaWiki core use to refer to the feature to look up some identifier for a specific user that is consistent across different wikis in the same wiki farm and happens to be the CentralAuth user ID when that's installed and the local user ID when not.
    • Also, codesearch in deployed code for central *id (in case-insensitive mode) has about double the number of hits compared to global *id. The majority of the hits for the latter are in Wikibase, while the former is much more concentrated in user- and authentication-focused code.
  • user_global_id if not.
    • "Global account" is the user-facing term for a SUL account. But we tend to use the term "user" instead of "account" in technical contexts.

Even talking specifically about user identity code, even though the extension is called CentralAuth, the table that backs it is called globaluser.

... and the class that references that table is called CentralAuthUser :-)

  • it's used by the only existing MediaWiki central ID provider (which is probably the only one that will ever exist)

I still dream we get around implementing the evil plans some day :-)

it's used by the only existing MediaWiki central ID provider (which is probably the only one that will ever exist)

I'm actually surprised no other providers exist. (I thought for sure GoogleLogin would do it, since it already uses Google IDs internally. This proposed fork of it does but it was abandoned.) It's easy to write one, for most authentication extensions it's conceptually a good fit, a bunch of global functionality extensions depend on it but otherwise can work with any authentication extension (maybe as long as it uses the same username on every wiki).

Some points from Slack:

@Reedy

https://www.mediawiki.org/wiki/Manual:Central_ID is publicly documented, global id not so much
We expose them as centralids in ApiQueryUserInfo, so if someone was looking for correlating values... https://en.wikipedia.org/w/api.php?action=help&modules=query%2Buserinfo

@Ottomata

re docs: non MW software docs do seem to refer to this concept as โ€˜globalโ€™ e.g. https://meta.wikimedia.org/wiki/Global_rename_policy

@Tgr

Arguably central ID is a more abstract concept. E.g. on officewiki CentralIdLookup will return the local user ID (which is as central as it gets). You could set up a wiki with some different lookup mechanism and that would use "central IDs" as well but it would be a different ID scheme. Very unlikely to happen in practice though.

Central IDs are an official MediaWiki core concept and the various global things aren't. But some of the Global* extensions are actually built around central IDs and can work with any kind of single-login farm. Their names predate central IDs (which were added for AuthManager, so 2016-ish).

I'm leaning towards 'central'. If there are no more strong objections, I'm going to proceed with that.

I will also add this decision to https://wikitech.wikimedia.org/wiki/Data_Platform/Data_modeling_guidelines#WMF-specific_Conventions

Change #1184889 merged by jenkins-bot:

[mediawiki/extensions/EventBus@master] Lookup and add a user_central_id on serialized users

https://gerrit.wikimedia.org/r/1184889

Change #1189530 had a related patch set uploaded (by Ottomata; author: Ottomata):

[mediawiki/extensions/WikimediaEvents@master] Fix ip_reputation.score validation errors in production

https://gerrit.wikimedia.org/r/1189530

mediawiki.ip_reputation.score stream is now failing validation. I did not realize it was using the EventBus user entity serializer. Submitted patches to fix.

@kostajh what is the urgency of this? When the train goes to group2 in 1.5 hours from now, I assume that all mediawiki.ip_reputation.score will fail validation. Can we live with lost events until next week's train? or should we backport asap?

Change #1189536 had a related patch set uploaded (by Kosta Harlan; author: Ottomata):

[mediawiki/extensions/WikimediaEvents@wmf/1.45.0-wmf.19] Fix ip_reputation.score validation errors in production

https://gerrit.wikimedia.org/r/1189536

Change #1189530 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@master] Fix ip_reputation.score validation errors in production

https://gerrit.wikimedia.org/r/1189530

Change #1189536 merged by jenkins-bot:

[mediawiki/extensions/WikimediaEvents@wmf/1.45.0-wmf.19] Fix ip_reputation.score validation errors in production

https://gerrit.wikimedia.org/r/1189536

Mentioned in SAL (#wikimedia-operations) [2025-09-18T18:11:26Z] <jhuneidi@deploy1003> Started scap sync-world: Backport for [[gerrit:1189536|Fix ip_reputation.score validation errors in production (T403664)]]

Mentioned in SAL (#wikimedia-operations) [2025-09-18T18:17:57Z] <jhuneidi@deploy1003> kharlan, jhuneidi: Backport for [[gerrit:1189536|Fix ip_reputation.score validation errors in production (T403664)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2025-09-18T18:28:43Z] <jhuneidi@deploy1003> Finished scap sync-world: Backport for [[gerrit:1189536|Fix ip_reputation.score validation errors in production (T403664)]] (duration: 17m 17s)

Change #1189552 had a related patch set uploaded (by Ottomata; author: Ottomata):

[machinelearning/liftwing/inference-services@main] Bump version of mediawiki/page/prediction_classification_change event schema

https://gerrit.wikimedia.org/r/1189552

Change #1189552 merged by jenkins-bot:

[machinelearning/liftwing/inference-services@main] Bump version of mediawiki/page/prediction_classification_change event schema

https://gerrit.wikimedia.org/r/1189552

Change #1189626 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/deployment-charts@master] ml-services: deploy ml services that use event streams

https://gerrit.wikimedia.org/r/1189626

Change #1189626 merged by jenkins-bot:

[operations/deployment-charts@master] ml-services: deploy ml services that use event streams

https://gerrit.wikimedia.org/r/1189626