Page MenuHomePhabricator

Make CentralIdLookup easier to use safely across environments (different providers, part of cluster/not part, etc.)
Open, Needs TriagePublic

Description

There are some potential gotchas when using CentralIdLookup:

a. You can get a central ID from LocalIdLookup that could be used incorrectly. This can only be used on that same wiki, or on other wikis using shared user tables (these two cases should be different providers: T170996). If you're expecting (incorrectly) that the wiki is part of a farm, you may not realize you're using LocalIdLookup.

b. Central IDs can only be used on the same wikiset of wikis, but there is no way to identify which wikiset it came from. You can't even identify the provider from the string serialization of the ID.

This makes it hard to use the central IDs in a fool-proof way cross-wiki.

Possible solution that addresses both: Make the canonical central ID a string, with a UID identifying the wikiset it came from, then the user-specific part after. E.g.

56e0cfd98f3fcf2abf0d6:123

That would then be mapped by the provider to whatever ID it uses internally.

For standalone wikis, it would make a wikiset of 1 with a UID generated at installation time. For other installs (e.g. CentralAuth or shared user tables), they would all use the same wikiset ID.

Event Timeline

a) You can get a central ID from LocalIdLookup that could be used incorrectly. This can only be used on that same wiki, or on other wikis using shared user tables (maybe these two cases should even be different providers). If you're expecting (incorrectly) that the wiki is part of a farm, you may not realize you're using LocalIdLookup.

I don't really understand this part. You should just be able to use it - what scenario are you running into where it doesn't work?

Central IDs can only be used on the same wikiset of wikis, but there is no way to identify which wikiset it came from. You can't even identify the provider from the string serialization of the ID.

What do you mean by wikiset?

a) You can get a central ID from LocalIdLookup that could be used incorrectly. This can only be used on that same wiki, or on other wikis using shared user tables (maybe these two cases should even be different providers). If you're expecting (incorrectly) that the wiki is part of a farm, you may not realize you're using LocalIdLookup.

I don't really understand this part. You should just be able to use it - what scenario are you running into where it doesn't work?

E.g. if you tried to transwiki a Flow board from officewiki to mediawikiwiki, the central user ID from officewiki would not be valid for importing to mediawikiwiki.

However, if you're exporting from cawiki to mediawikiwiki, it will work fine (after converting back to local user with localUserFromCentralId). See T154830: Transwiki (within a farm) support for Flow dumps/imports and https://gerrit.wikimedia.org/r/#/c/337895/4/includes/Dump/Exporter.php .

This is just an example, though. The point is to make it more fool-proof to use.

IIRC, there was also a case where LocalIdLookup was (wrongly) used to try to detect if the wiki was part of CentralAuth.

Central IDs can only be used on the same wikiset of wikis, but there is no way to identify which wikiset it came from. You can't even identify the provider from the string serialization of the ID.

What do you mean by wikiset?

A group of connected wikis using the same provider. E.g. "the WMF wikis that are part of CentralAuth". officewiki would be a different wikiset since it's not CentralAuth, and an independent wiki farm using CentralAuth would also be a separate wikiset. I can change the terminology if you think another term would be clearer.

Another problem that arises from this is with shared user tables. The cases are as follows:

  1. If the wiki uses CentralAuth, the lookup object will be a CentralIdLookup and it'll return a global user ID from the CentralAuth table
  2. If the wiki uses a shared user table, the lookup object will be a LocalIdLookup and it'll return a user ID from the shared user table
  3. If the wiki doesn't use either, the lookup object will be a LocalIdlookup and it'll return a user ID from the local user table (which is meaningless on any other wiki)

In code written for WMF production extensions, we often want to get a central user ID only if the wiki is an SUL wiki (i.e. we want something that returns a CA global user ID on e.g. cawiki, but null on officewiki). There is no way to do this right now, so instead we look at whether the lookup object is a CentralIdLookup or a LocalIdLookup. But that ends up breaking our code for third parties that use a shared user table.

Another problem that arises from this is with shared user tables. The cases are as follows:

  1. If the wiki uses CentralAuth, the lookup object will be a CentralIdLookup and it'll return a global user ID from the CentralAuth table

CentralAuthIdLookup. CentralIdLookup is the abstract base class everything (including LocalIdLookup) subclasses.

In code written for WMF production extensions, we often want to get a central user ID only if the wiki is an SUL wiki (i.e. we want something that returns a CA global user ID on e.g. cawiki, but null on officewiki). There is no way to do this right now, so instead we look at whether the lookup object is a CentralIdLookup or a LocalIdLookup. But that ends up breaking our code for third parties that use a shared user table.

Yes, I suggested in the description that "maybe these two cases should even be different providers". Now I removed the maybe and made a subtask. :) T170996: Separate "shared user table" into its own provider (not LocalIdLookup)

Daniel Kinzler also raised this at https://gerrit.wikimedia.org/r/#/c/349977/4/lib/includes/Changes/EntityChange.php .