Page MenuHomePhabricator

Resolve ambiguity of entity ID prefixes used on Commons.
Closed, InvalidPublic

Description

Wikimedia Commons has been a WikibaseClient for several years now, accessing data from Wikidata, using entity IDs with no prefix, like Q64 and P31.

However, as Commons is becoming a WikibaseRepo in its own right to support SDC by running the WikibaseMediaInfo extension, the empty ID prefix will now conceptually refer to the local repo, that is, commons itself, so commons can use identifiers like M56789 to refer to MediaInfo entities.

There are essentially three options:

  • Keep the "split brain" approach, an accept the fact that a wiki that acts as both, a repo and client, will have duplicate service instances, using separate and potentially inconsistent configuration. This is the status quo, and comes with the potential for subtle and hard to track bugs down the road.
  • Consolidate the configuration and internal state of repo and client code (there is an old design document about this). In concrete terms, this means that there must be only one WikibaseServices instance, which implies that there is only one top level EntityRevisionLookup, only one underlying EntityNamespaceLookup, etc. This would mean that the empty entity prefix can only resolve to one repo, and on repos, the empty prefix would always refer to the local wiki. This would break all content on Commons that currently uses Q-IDs or P-IDs with no prefix. They would have to be migrated, by changing the relevant templates and modules, and cleaning up using a bot.
  • Consolidate internal state and config, but introduce the concept of default repos per entity type. This way, the meaning of the empty prefix could depend on which type of entity is referenced, and Q12345 could be handled as equivalent to wd:Q12345, if wikidata is specified as the default repo for items. This could be done without much trouble by adding an additional mapping to MappingEntityIdParser.

See also:

Related Objects

StatusSubtypeAssignedTask
Declineddchen
OpenNone
OpenNone
DuplicateNone
OpenFeatureNone
OpenFeatureNone
DuplicateNone
ResolvedNone
ResolvedNone
ResolvedNone
Resolved Ramsey-WMF
ResolvedCparle
ResolvedJdforrester-WMF
ResolvedJdforrester-WMF
ResolvedNone
InvalidNone
ResolvedWMDE-leszek
ResolvedWMDE-leszek
ResolvedNone
ResolvedWMDE-leszek
ResolvedWMDE-leszek
ResolvedWMDE-leszek
ResolvedWMDE-leszek
ResolvedWMDE-leszek
DuplicateNone
Resolved Addshore
Resolved Addshore
Resolved Addshore

Event Timeline

Option 3 sounds sanest in terms of fixing/avoiding the issue without a costly migration.

While I like the cleanliness of option 2, I'm inclined to agree with James that option 3 is sanest

For option three, we'd have to decide whether the per-entity-type-defaults should be defined separately from the prefix mapping, or whether it should use some kind of special syntax., I'd suggest the hybrid approach we have also been using for including the target slot in EntityNamespaceLookup: use separate arrays internally (cleaner), but use some special syntax in the config (simpler). So for commons, you would have a mapping like [ '@item' => 'wikidata', '@propert' => 'wikidata' ], with the @ indicating that the mapping is not for prefixes, but for entity types.

I just realized that this probably blocks access to MediaInfo from wikitext on commons. I have not confirmed this, but if my mental model is right, this needs to be fixed before we'll be able to access MediaInfo from wikitext.

Trying to take the step back and think on how the functionality in question has been implemented/outlined two years ago, here would be my thoughts:

  • Having two not-necessarily same config flying around and parts of code arbitrarily picking one config, and other parts picking up the other seems like a bug/unfinished implementation to me. I am surprised it only surfaces now (it has been for 99% me who had messed that up), but should be fixed. If the current "broken" state actually makes commons work the fixing schedule can be of course postponed :)
  • It looks to me that Commons and Wikidata federation is a bit of special case of the federation as it has been envisioned as a general concept. There is no need to have local and wikidata items (like most non-Wikimedia Wikibase instances request in the context in federation). "Funnily" enough, while the "typical" federation is nowhere in use due to current implementation's limitation, so the only real use case is this special/reverse one.
  • I don't claim to have the thorough understanding of the Commons issues now, but it seems to me that those two kinds of federation, i.e. one intending to have different entity types in different repos, possibly having e.g. items from multiple repos, and the one where it is clear some entity types are coming from repo A, and some from repo B are actually separate things, they're not really overlapping. The former requires and is based on the concept of prefixes (to be able to distinguish between different sources of items), whereas the latter could actually do without having prefixes at all. Both make sense as separate approach (the former for non-Wikimedia Wikibases, the latter for Commons, for instance). I am not aware of any practical or planned instance where mixing both concept would actually be needed. Therefore I would strongly encourage to NOT mix both approaches in the implementation and to NOT create a super generic federation where all the things could be done using some config magic. The existing stuff is already overly complicated. Let's at least not make it worse.

Coming back to the particular Commons topic: from WMDE perspective option 3 is really something we would not like to see added as the feature etc. I do understand that converting millions of Commons pages to just change Qxyz to wd:Qxyz is going to be a costly migration. We're happy to help with coming up with some temporary/intermediate solution that would allow Commons running while the migration is on-going.

That said, I am wondering whether from Commons perspective using prefixes is something what's intended? Or actually the opposite? I am not aware (as in: I simply don't know) whether prefixing Wikidata items on Commons has ever been discussed in previous 2 years. Or has it been simply assumed "we need to add prefixes because this is what software requires"? Or did we in the first place implement the wrong feature few years ago?

Finally, I have to admit that being away from IRC for a while I don't feel like I am fully up-to-date with the status of federation between Commons and Wikidata. My assumption was the recent Daniel's change to beta commons config allowed to use beta wikidata items on Commons and all seems fine for now? Is this correct, or is it all still completely broken?
Regarding the most recent report above. @daniel could you please elaborate more what exactly does not work once you've verified whether it does or not? Having more information would make it easier for us (at least for me) to reason about the problem.

I have confirmed locally that the current situations prevents access to MediaInfo entities using the {{#statements}} (nor presumably from Lua, which uses the same underlying mechanism). This means that MediaInfo cannot be used on file description pages as intended, until this is resolved.

Depending on configuration, it is possible to access MediaInfo entities from wikitext using the "commons:" prefix, once https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/+/480494 is merged. This however only works if the statements in the MediaInfo entities use prefixed property IDs (so the main snak have property IDs like "wikidata:P17" instead of just "P17" - more about this in another comment to follow).

Example wikitext:

* From P1 of Q4: {{#statements:P1|from=Q4}}                                   <!-- this works as expected -->
* From P1 of wikidata:Q4: {{#statements:P1|from=wikidata:Q4}}                 <!-- this does not work, the "wikidata" prefix is not used by client code -->
* From P1 of commons:M3: {{#statements:P1|from=commons:M3}}                   <!-- this works, since client code knows has a repo defined for the "commons" prefix (however using cross-wiki access, not recognizing it as the local wiki). Note that the statement must use "wikidata:P1" as the property ID internally, and the repo definition must have a mapping for "wikidata" to "". -->
* From wikidata:P1 of commons:M3: {{#statements:wikidata:P1|from=commons:M3}} <!-- this fails, since client code does not have a repo defined for the "wikidata" prefix; it then tries to find a property with the label 'wikidata:P1', and fails. -->
* From P1 of M3: {{#statements:P1|from=M3}}                                   <!-- this does not work, since client code does not have a namespace or slot defined to look up MediaInfo entities. -->

Output:

* From P1 of Q4: Four
* From P1 of wikidata:Q4:
*  From P1 of commons:M3: a test with prefix
* From wikidata:P1 of commons:M3:
Failed to render property wikidata:P1: Property not found for label 'wikidata:P1' and language 'en'
* From P1 of M3:

Relevant config:

$wmgWikibaseClientEntityNamespaces = [
	'item' => 0,
	'property' => 120,
];

$wmgWikibaseClientRepoNamespaces = [
	'item' => '',
	'property' => 'Property',
];

$wmgWikibaseClientRepositories = [
	'' => [
		'repoDatabase' => 'wikidata',
		'entityNamespaces' => [
			'item' => 0,
			'property' => 120,
			'lexeme' => 146,
		],
		'baseUri' => 'http://wikidata.web.mw.localhost:8080/entity/',
		'prefixMapping' => [ '' => '' ],
	],
	'commons' => [
		'repoDatabase' => 'commons',
		'entityNamespaces' => [ 'mediainfo' => '6/mediainfo' ],
		'baseUri' => 'http://commons.web.mw.localhost:8080/entity/',
		'prefixMapping' => [
			'wikidata' => ''
		],
	],
];

$wmgWikibaseForeignRepositories = [
	'wikidata' => [
		'repoDatabase' => 'wikidata',
		'baseUri' => 'http://wikidata.web.mw.localhost:8080/entity/',
		'supportedEntityTypes' => [ 'item', 'property' ],
		'prefixMapping' => [],
		'entityNamespaces' => [ 'item' => 0, 'property' => 120 ]
	],
];

One source of confusion is the fact that the CachingPropertyInfoLookup used by client code and repo code are hitting the same cache entry. That cache entry ends up containing both prefixed (wikidata:P1) and unprefixed (just P1) property IDs, side by side. This in turn causes API modules, which use the repo service instances, to also accept unprefixed property IDs, which causes data corruption: MediaInfo entities can have some Statements that use prefixed property IDs, and some that use unprefixed property IDs, and these are treated as different and incompatible (correctly - according to the configuration, they come from different repos). Repo side code (in "split brain" operation) should not accept unprefixed property IDs or unprefixed item IDs, since there are no properties or items on the local repo.

  • Having two not-necessarily same config flying around and parts of code arbitrarily picking one config, and other parts picking up the other seems like a bug/unfinished implementation to me. I am surprised it only surfaces now (it has been for 99% me who had messed that up), but should be fixed. If the current "broken" state actually makes commons work the fixing schedule can be of course postponed :)

That's indeed the current situation, yes. It's a consequence of the fact that client and repo are independent extensions, none of which depends on the other. But yes, both should share config for the DataAccess component, but doing so would be incompatible with the requirements and data we have.

  • I don't claim to have the thorough understanding of the Commons issues now, but it seems to me that those two kinds of federation, i.e. one intending to have different entity types in different repos, possibly having e.g. items from multiple repos, and the one where it is clear some entity types are coming from repo A, and some from repo B are actually separate things, they're not really overlapping. The former requires and is based on the concept of prefixes (to be able to distinguish between different sources of items), whereas the latter could actually do without having prefixes at all. Both make sense as separate approach (the former for non-Wikimedia Wikibases, the latter for Commons, for instance). I am not aware of any practical or planned instance where mixing both concept would actually be needed. Therefore I would strongly encourage to NOT mix both approaches in the implementation and to NOT create a super generic federation where all the things could be done using some config magic. The existing stuff is already overly complicated. Let's at least not make it worse.

I tend to agree, but I'd like to further discuss with you how much of a difference this really makes in the code, how confusing a "mixed" config is, and what options a wikibase instance may have to go from one model to the other. We'd probably have to provide conversion scripts.

Coming back to the particular Commons topic: from WMDE perspective option 3 is really something we would not like to see added as the feature etc. I do understand that converting millions of Commons pages to just change Qxyz to wd:Qxyz is going to be a costly migration. We're happy to help with coming up with some temporary/intermediate solution that would allow Commons running while the migration is on-going.

I don't think we can target that solution without consulting the community. And in the light of what you said above, commons seems to fit the "no prefix" case, where we could just map entity types to repos.

That said, I am wondering whether from Commons perspective using prefixes is something what's intended? Or actually the opposite? I am not aware (as in: I simply don't know) whether prefixing Wikidata items on Commons has ever been discussed in previous 2 years. Or has it been simply assumed "we need to add prefixes because this is what software requires"? Or did we in the first place implement the wrong feature few years ago?

Using prefixed to access wikidata has not been proposed to the community, and was never the plan. To my knowledge, "option 2" in this ticket is the first time it has been proposed. And yes, the only reason to do it is because that's how the software is currently designed.

This was recognized as an issue when we first designed federation, and we discussed the idea of mapping entity types to repos, but this was put off for later, and then forgotten...

Finally, I have to admit that being away from IRC for a while I don't feel like I am fully up-to-date with the status of federation between Commons and Wikidata. My assumption was the recent Daniel's change to beta commons config allowed to use beta wikidata items on Commons and all seems fine for now? Is this correct, or is it all still completely broken?

beta-wikidata items and properties can be used on beta-commons now, in both repo (with prefix) and client (without prefix) code. But accessing MediaInfo from wikitext is not possible there. It's a bit tricky to test, since the UI for adding statements is currently disabled.

Regarding the most recent report above. @daniel could you please elaborate more what exactly does not work once you've verified whether it does or not? Having more information would make it easier for us (at least for me) to reason about the problem.

See my comments above.

A detail to consider when going for "federation without prefixes": does this mean no prefixes just for user input, or also in the JSON serialization? Using no prefixes there either may seem intuitive, but it make the data more brittle, and harder to exchange between instances. Other repos consuming that data will then need to replicate the data-type mapping instead of a prefix mapping. That seems error prone to me.

Also, how about the RDF output? At least for that, we will need different prefixes for entities from different sources.

I tend to agree, but I'd like to further discuss with you how much of a difference this really makes in the code,

As I am gone for two more weeks, it would probably the best that I shortly answer here: to me it seems both kinds of federation are completely separate and possibly even mutually exclusive. So I would rather not mix these, neither conceptually nor in terms of code. So depending on how you look at it, the difference might be either "a lot" or "not at all".

and what options a wikibase instance may have to go from one model to the other. We'd probably have to provide conversion scripts.

Per what said above, it is not clear to me in what cases such conversions would be needed. What situations do you have in mind?
Second thoughts: is the idea to potentially use the prefixed federation for now on Commons, although it might be "wrong", and then switch to the non-prefixed one once it has been implemented? With this switch I imagine one would want to do some conversion.

I understand that Commons does not want any prefixes. Current functionality of Wikibase is not what Commons need and can use. That said, WMDE is not able to commit any resources to implement the needed functionality on Wikibase side this year. We would be able to tell more in January once everyone is back from holidays, and we know what exactly is it what we need to build for SDOC.
I am sorry I cannot offer more at this point, but the calendar has no mercy, and it is too big of a thing to ad-hoc quickfix it.

Maybe in a next week or so you'd be able to answer those open questions (prefixes or not in JSON, RDF etc), and generally define what would be desired functionality. That would help us answer better how much of a task do we talk about.

As I am gone for two more weeks, it would probably the best that I shortly answer here: to me it seems both kinds of federation are completely separate and possibly even mutually exclusive. So I would rather not mix these, neither conceptually nor in terms of code. So depending on how you look at it, the difference might be either "a lot" or "not at all".

As far as I can see, that means duplicating a lot of code. Internally, these things look very, very similar.

Also, I'd rather not have them look different in the JSON or the API. If they do, clients have to know about the difference, and implement both as well.

and what options a wikibase instance may have to go from one model to the other. We'd probably have to provide conversion scripts.

Per what said above, it is not clear to me in what cases such conversions would be needed. What situations do you have in mind?

Mostly the thing we had on commons:

"Oh hey, let's use data from wikidata! No prefixes needed." A year later: "You know what would be cool? Having our own data items!"

Second thoughts: is the idea to potentially use the prefixed federation for now on Commons, although it might be "wrong", and then switch to the non-prefixed one once it has been implemented? With this switch I imagine one would want to do some conversion.

This is the status quo, really - federation with prefixes is in the staging pipeline for deployment on commons. With client and repo code configured for different prefixes.

I understand that Commons does not want any prefixes. Current functionality of Wikibase is not what Commons need and can use. That said, WMDE is not able to commit any resources to implement the needed functionality on Wikibase side this year. We would be able to tell more in January once everyone is back from holidays, and we know what exactly is it what we need to build for SDOC.
I am sorry I cannot offer more at this point, but the calendar has no mercy, and it is too big of a thing to ad-hoc quickfix it.

Well, implementing a type-to-repo configuration for use by PrefixMappingEntityIdParser would not be a lot of work, and would fix the problem. I understand that you don't like this option conceptually, but I don't see a problem with it from the perspective of code.

This seems a quick win, and since SDC is supposed to deliver in January, may be the only option.

Maybe in a next week or so you'd be able to answer those open questions (prefixes or not in JSON, RDF etc), and generally define what would be desired functionality. That would help us answer better how much of a task do we talk about.

We may get away with an initial deployment in "split brain" mode, but for statements, we need to properly solve this.

After some discussion with Daniel: I've been advocating for internal federation to only ever take the same entity type from a single repository. I will continue to do so because I believe it is the right thing to do. But we can't say with enough confidence that this assumption will hold true forever.

With Lydia's response, I'm a little confused - it seems to conflict with Leszek's assertion that WMDE wants to avoid that particular future. If WMDE is looking for a tie-breaking vote, it seems like the entire SDC team (including myself) is on board with "option 3", so...should that be the path forward?

I am fine with us thinking about this over the holiday break, as well - no need to rush it as we aren't blocked by this until (at least) late January, as far as I understand.

As far as I can see, the situation is this: Option 1 is ruled out already, since data access from wikitext doesn't work with that approach. Options 2 would need community consensus. Option 3 (default repo per entity type, with prefixes used internally) could be implemented without disturbing anything. Option 4, proposed by Leszek (federation directly based on entity type, with no prefixes used internally) would work as well, but is made unattractive by by Lydia's statement that we cannot guarantee that we will not need prefixes in the future, which would put us back into the position we are now, giving us the choice between option 2 and 3.

Cost estimate for option 3: I can implement this, with Adam reviewing, in one sprint (two weeks). Ironing out issues that come up later may add another couple of weeks. However, note that my January and February are already quite full, and I'd have to coordinate this with my duties for the core platform team. The other person most familiar with the code in question is Leszek.

The cost for option 4 is probably a bit more than this, but not much.

To me, having per-item-type mapping where some types can be imported from Wikidata (Q, P) and some local to Commons (M) sounds the most promising (i.e. option 3). We also must note that P is a special type that is unlike any other, as it denotes predicates, not subjects/objects. We should either agree that control over creating properties still lies in Wikidata (Commons people may be unhappy about this) or create some process where Commons can decide to create Wikidata properties (may be tricky community-wise) or somehow have Commons-only properties (technically challenging I presume, even with prefixes since people would be confused about them). Another solution would be to have P in Commons and duplicate necessary properties (not ideal either for many reasons).

For RDF, Commons should definitely know prefixes for Q and P (either from config or from some kind of federation API) since otherwise it's impossible to generate proper statements involving any of these. The outcome should be that Wikidata items on Commons have same URIs as on Wikidata proper, while things particular to Commons (including Commons properties, if any, and Commons M-items) should have distinct prefixes.

Disclaimer: In this comment I exclusively focus on technical/Wikibase as a software aspects of the topic. I am entirely opaque to what might be good or not so good social/community process additions to what the software allows/provides.

Thinking about the problem at hand (Commons case), the possible (though not certain) future needs of WMF wikis, and the needs we've collected from non-WMF Wikibase instances , and having my brain melted multiple times I came with the following:

It seems there are two problems that Wikibase tries to solve with prefixes/federation:

  1. Being able to use/reference entities from other Wikibase instances (e.g. Commons using items/properties from Wikidata, or FactGrid using properties from Wikidata)
  2. Being able to, within a Wikibase instance, mix entities of a particular type coming from multiple sources (ie. Wikibase instances - potential example of this would be FactGrid using Wikidata items, but also have their own, Fact-Grid specific items.

In my understanding, what 1 needs is to be able to access other wikis database (either through direct DB access, or through some more sophisticated API), and 2 could be solved by prefixes as the way to disambiguate identifiers.

The existing code, which is now causing trouble, is imperfect in the sense that, while trying to solve both tasks 1 and 2 at the same time, it seem to be giving to much importance to prefixes, and specifically it makes the empty/no prefix somehow special and bound the "local" database.

That would be it for stating the obvious.

Instead of trying to bend the existing code to work for Commons and hopefully for other future uses, I tried to take a step back, and was thinking about solving those two challenges kind of separately.
The idea I'd like to hear comments from @daniel, @Addshore and others about would be the following:

  • forget about the currently implemented prefixes, repositories, special '' prefix etc
  • As a first step implement the possibility to configure Wikibase to read entities from different "sources" (for now only using different DBs accessed through MW DB code)
  • As a next step (not required for SDoC), implement the possibility to allow using entities of type X coming from multiple sources. To differentiate IDs prefixes would be used.

Without I doubt I have a limited perspective so I might be missing some important use case. Please point out wholes in the plan above.
I am aware I might be simply suggesting to do https://xkcd.com/927/ here, the plan outlined above might be a viable:

  • It would allow Commons define MediaInfo without needing to change any existing pages, templates etc on Commons which refer to P123 and Q456 already
  • It would leave the topic of prefixes/disambiguation for the second implementation phase, meaning faster unblocking SDoC features
  • Adding prefixes as a second step would be still fulfilling needs of wikibase instances that e.g. want to use both Wikidata and own items.

As we have spent quite some time talking about the configuration, and how clumsy it is now, this is how potentially this all would be configured in pseudo JSON if the "entity sources" and "prefixes" concept got split.

Configuring different sources of entities:

...
"entitySources": {
    "item": {
        "database": "wikidatawiki",
        "namespaceId": 120,
        "conceptUriBase" : "http://wikidata.org/entity/"
    },
    "property": {
        "database": "wikidatawiki",
        "namespaceId": 122,
        "conceptUriBase" : "http://wikidata.org/entity/"
    },
    "mediainfo": {
        "database": false,
        "namespaceId": 777,
        "conceptUriBase" : "http://commons.wikimedia.org/entity/"
    },
}
...

And once the multiple source for entity type functionality is implemented, the config could for example look along the lines of (with of course some B/C code handling the single-source kind of config too)

...
"entitySources": {
    "item": [
         {
            "prefix": "",
            "database": "wikidatawiki",
            "namespaceId": 120,
            "conceptUriBase" : "http://wikidata.org/entity/"
        },
    ],
    "property": [
        {
            "prefix": "",
            "database": "wikidatawiki",
            "namespaceId": 122,
            "conceptUriBase" : "http://wikidata.org/entity/"
        },
        {
            "prefix": "commons",
            "database": false,
            "namespaceId": 555,
            "conceptUriBase" : "http://comons.wikimedia.org/entity/"
        },
    ],
    "mediainfo": [
         {
            "prefix": "",
            "database": false,
            "namespaceId": 777,
            "conceptUriBase" : "http://commons.wikimedia.org/entity/"
        }
    ]
}
...

It should be possible to provide some rough estimates on timeline of implementing this, but I would not like to use it as a means of choosing between options. I would like to be certain we are about to build the right thing before we get into discussing timelines and see what could be the minimum version to account for deadlines etc.

Thoughts?

It all sounds sane to me.

It would allow Commons define MediaInfo without needing to change any existing pages, templates etc on Commons which refer to P123 and Q456 already

This is one of the more annoying points here, but also the point that makes me lean toward getting rid of the "" prefix assumption for local entities.
Having P123 and wikidata:P123 on commonswiki would be a right pain...

Though it also raises some other thoughts:

  • So when and if commons will be able to define properties on commonswiki, they will have to prefix them with common: or some other prefix. IMO this sucks a bit. But there is no reasonable way to get around this right now.
  • The code loading the P123 without a prefix on commons right now is in my head separate to the repo code that is going to do our federation, although maybe this is yet another point for bringing the 2 closer together.

Changing any prefixes of lack of prefixes in the future will be a massive migration and not something that we want to do, but if we pull in the fact that we already refer to entities from wikidata on commons then we are already at that stage.

In the context of this thread, it's worth recalling the ongoing wish from Commons users for Commons categories to be able to have their own local items on the Commons wikibase.

At the moment only items on Wikidata itself are available to store structured data for Commons categories. This is okay so far as it goes -- currently there are about 2.1 million Wikidata items linked to Commons categories and supporting "wikidata infobox", which have been very well received.

But that is still under 30% of all Commons categories, and some of the Wikidata community have very considerable reservations about extending it further -- see for example the skepticism and rejection expressed in this discussion recently at Project Chat.

It would be highly desirable to be able to store structured data for all 7.3 million Commons categories -- and in particular for categories for complex intersections of topics, and for "non-notable" people, both of which are rather unwelcome on Wikidata. Being able to document, by wikibase statements, what these categories relate to would be hugely helpful, to

  • support wikibase-derived infoboxes, to explain the meaning of the category internationally and multilingually
  • allow internationalised labels and descriptions to be added -- a long-time Commons request
  • allow volunteers to work together to build up a structured understanding of the meaning of Commons categories
  • gather the understanding of the meaning categories needed to translate a file's membership of a category into appropriate Commons wikibase statements for the file -- hugely important if we want Commons wikibase to get populated
  • identify gaps on WIkidata -- ie 'simple' things (people, places, ideas etc) not currently represented on Wikidata, but present in the Commons ontology
  • in the reverse direction, make it possible for wikibase statements on files to be used to verify, extend or refine the categorisation of those files -- making categories more systematic and thorough and complete

All of this becomes possible for volunteers to work on, if local Commons wikibase items can be available for Commons categories.

Distinguishing these local items via a prefix, eg c:Q1234, as opposed to plain Q2345 on Wikidata, would seem a very acceptable way to allow both them and Wikidata items to exist and each be referenced.

@WMDE-leszek have you had a chance to think about how WMDE might implement this? Are you able to share any timeframe?

As a postscript to my comment two posts above, note that in such a scenario a Commons category page might well be associated with both an item on Wikidata (via a sitelink equivalence) and a local item on the Commons wikibase.

The bulk of the work on this topic was already complete.
The status quo is that entities currently exist once within the Wikimedia landscape, and thus "entity ID prefixes" are not used.
This topic, though not related to commons, will come up again in another form when we tackled federated properties pt2