Page MenuHomePhabricator

RevisionStore must not expose user IDs from a foreign wiki
Open, HighPublic

Description

RevisionStore currently constructs User objects via User::newFromAnyId(), assuming rev_user (resp ar_user) refers to a user ID on the local wiki. This is however not the case if the RevisionStore was constructed with $wikiId != false.

See also: T222380: Decide how to represent the identity of users from another wiki.

Event Timeline

daniel created this task.Apr 30 2019, 5:47 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 30 2019, 5:47 PM
daniel triaged this task as High priority.EditedApr 30 2019, 5:48 PM

Bumping to high, since this has the potential to cause subtle data corruption and confusing failure modes. Cross-wiki revision access is still uncommon, so this isn't a very high risk, but still bad enough that we should look into it soon, I think.

Tgr added a comment.Apr 30 2019, 5:54 PM

What should it expose, though? It could go with a user with the given name and ID 0, but it's not great and I think there's no strong guarantee that such a user won't accidentally be converted into the local user with the same name somewhere. It could go with a user with ID 0 and the name [interwiki]>[username] which is what we use for imports; except there's no guarantee there will be an interwiki (granted, I can't really think of a realistic example, all foreign content will be from "hubs" like Wikidata, Commons or Meta, and those do have interwikis).

Perhaps we should take another step back and ask what use case RevisionStore being able to load from other wikis is actually supposed to support. The only places outside of tests I see ->getRevisionStore( $wiki ) being passed a value for $wiki are two calls in Wikibase (but I can't tell with the the chained DI there whether $wiki is ever actually non-local) and the implementation of the deprecated Revision::getRevisionText() (for which I see no non-test callers passing a value for $wiki).

If there is a good use case, perhaps we should have "ForeignRevisionRecord", "ForeignUserIdentity", and so on such that a typehint for RevisionRecord, UserIdentity, and so on won't match them. See T208776 for more discussion along those lines.

Tgr added a comment.Apr 30 2019, 7:17 PM

The main thing I can think of is shadow namespace-like features like global user pages, global templates, global scripts, shared-repo image description pages. Currently these tend to be done via the API (or action=render) but at least for global templates that wouldn't work, plus you can have private wikis and whatnot. (Worth noting that some of these are expected to operate within a wiki farm, but global templates and shared-repo image description pages do not.) Also use of Wikidata items within wikitext / Lua, which is not "shadow" in that there is no local equivalent, but a similar use case.

Then there's recentchanges replication functionality (for Wikidata or Commons, for example) - not sure if that involves RevisionStore, but either way it will run into the same problem.

I think it's best to not allow constructing RevisionStores on foreign repos until we have a clear-cut use-case. Then we can add the functionality to support that use-case. It's not obvious to me that global user pages and such are best implemented using RevisionStore on the local wiki -- somebody needs to come up with an actual design that they plan to implement before that can be decided.

I'll file subtasks for the immediate solution (force ID to 0) and the full solution (not clear yet).

daniel added a comment.May 2 2019, 4:43 PM

What should it expose, though?

I filed T222380: Decide how to represent the identity of users from another wiki. to answer this question. Let's continue the discussion there.

daniel added a comment.May 2 2019, 4:46 PM

I think it's best to not allow constructing RevisionStores on foreign repos until we have a clear-cut use-case.

We already have this in production: Wikibase uses RevisionStoreFactory::getRevisionStore to access revisions from another wiki. So far, it only cares about revision content, not meta-data, so no harm done. But I want a stop-gap in place to prevent any confusion, while we think of a proper solution (see T222380).