Page MenuHomePhabricator

Document how to set up federated Wikibase instances
Closed, ResolvedPublic

Description

Documentation should explain what components must be enabled (ie. both repo and client and required) and what configuration must be provided for Wikibase instance to be able to access entities from other repositories.

Event Timeline

Change 341514 had a related patch set uploaded (by wmde-leszek):
[mediawiki/extensions/Wikibase] Slightly expand documentation on configuring federation

https://gerrit.wikimedia.org/r/341514

Some documentation has already been thereat docs/federation.wiki. https://gerrit.wikimedia.org/r/341514 expands the existing docs a bit.

WMDE-leszek moved this task from Review to Backlog on the Wikidata-Former-Sprint-Board board.
WMDE-leszek removed a project: Patch-For-Review.

As https://gerrit.wikimedia.org/r/341514 is just being merged at this very moment, one could consider this task as done.
On the other hand it would be fair to state there is still documentation that could be improved (e.g. more high-level docs? or maybe something more technical?).
As an author of quite some part of the related code, and part of the documentation, I'd very much appreciate other commenting on the state of the documentation. If there is still room for improvements, it should be described in the task description, and should happen we state documentation is perfect, this task should be closed.

Change 341514 merged by jenkins-bot:
[mediawiki/extensions/Wikibase] Slightly expand documentation on configuring federation

https://gerrit.wikimedia.org/r/341514

Change 342615 had a related patch set uploaded (by WMDE-leszek):
[mediawiki/extensions/Wikibase] Document adding interwiki prefix for federated repos

https://gerrit.wikimedia.org/r/342615

Change 342615 merged by jenkins-bot:
[mediawiki/extensions/Wikibase] Document adding interwiki prefix for federated repos

https://gerrit.wikimedia.org/r/342615

WMDE-leszek moved this task from Doing to Backlog on the Wikidata-Former-Sprint-Board board.
WMDE-leszek removed a project: Patch-For-Review.

I tried to create a federated setup using the provided documentation, to see if it’s sufficient. I eventually succeeded, but there were some problems. Here are my main takeaways:

  • I had no idea how to set up interwiki links, and the first documentation I found was outdated. Is the usual consumer of this documentation assumed to be familiar with this? (In the end it’s just a single SQL statement, so I’d say we could include it anyways.)
  • I was not sure what I should put in the foreignRepositories config. What are entity types ('item' (correct) or 'Item' (namespace, incorrect))? Is a symbolic database name the same as a normal database name? What’s the base URI for? (I still don’t know the answer to the last question – the link to the remote item comes from the interwiki URL, not this base URI.)

    (It’s possible that this, too, is assumed to be clear to the usual reader, and that I’m just unusually unfamiliar with these terms.)
  • I first tried to have local and remote items and properties, and got this error:

    > Using same entity types on multiple repositories is not supported yet. "item" has already be defined for repository ""

    This is, in my opinion, a very severe restriction. As far as I can tell, this effectively means that a federated setup without the WikibaseMediaInfo extension, or some other extension providing entity types beyond item and property, does not make sense – you need local items (otherwise you don’t have anything to put statements on), so the only thing you can take from the remote repository is properties. The documentation should definitely mention that.
  • At one point, I got a really weird error ($wgContLang was null) when I had two entries in the foreignRepositories config. Are multiple entries supported?
  • Attempting to configure a repo with no local entities at all ($wgWBRepoSettings['entityNamespaces'] = [];) resulted in an error message in the setup scripts. I assume the interpretation of “empty array ≈ unset variable” is unintentional.

You can also read the full adventure here (CommonMark; paste here to render):

WMDE-leszek added a subscriber: Lucas_Werkmeister_WMDE.

Thank you @Lucas_Werkmeister_WMDE! This is exactly this kind of feedback I wanted to get on this documentation. Your work is very much appreciated!

I hope it going through all this was not completely annoying for you. It might have been easier if you could get more instant help/advice when you were bumping into some issues. But you've managed to get this working, that gives some hope.

Lots of valid remarks and questions need to be answered. Thanks for this. I am going to fix those one after another once I am back from the season break.

Change 348939 had a related patch set uploaded (by WMDE-leszek):
[mediawiki/extensions/Wikibase@master] Expand slightly documentation regarding setting up federated repositories

https://gerrit.wikimedia.org/r/348939

I've submitted https://gerrit.wikimedia.org/r/348939 that expands existing documentation slightly based on your feedback @Lucas_Werkmeister_WMDE. There is also some example config included, which hopefully helps in some places you've spotted as non-obvious.

Please also find few comments below related to what you've found.

I had no idea how to set up interwiki links, and the first documentation I found was outdated. Is the usual consumer of this documentation assumed to be familiar with this? (In the end it’s just a single SQL statement, so I’d say we could include it anyways.)

I've assumed this should not be covered in the federation-related documentation. On the other hand I agree there is no clear documentation how to add the interwiki. Interwiki, Sites etc is a bit of a mess if you asked me, and I don't really know how to actually this should be documented (updating https://www.mediawiki.org/wiki/Manual:Interwiki?). This is a weak excuse though, I admit.

I was not sure what I should put in the foreignRepositories config. What are entity types ('item' (correct) or 'Item' (namespace, incorrect))?

Good point. I've changed this bit to say "entity type IDs" same it is done in other doc files (e.g. docs/options.wiki). I also hope the example included helps to figure out which one is relevant here.

Is a symbolic database name the same as a normal database name?

I've borrowed the "symbolic" term from other parts of the documentation. Same as above, I hope the changed version, and the example would make it more clear now. So it is not necessarily "normal" database name as it could also be false meaning "use the database of the local wiki". This should be rather clear for people familiar with DB-related MW or Wikibsae code, but this is not necessarily obvious for all people setting up the wiki. Good point.

What’s the base URI for? (I still don’t know the answer to the last question – the link to the remote item comes from the interwiki URL, not this base URI.)

It is not the same thing as the interwiki URL. It could be something completely different. This "baseUri" is about concept URIs. It is more related to linked data, RDF, and has actually not much to do with Mediawiki/Interwikis etc. Actually, I am not sure where would be the appropriate documentation about this. It looks like e.g. https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format assumes it is clear what it is.
Related to that: I really don't like the fact that the key in the "foreignRepositories" array is called "baseUri" (yes, it was me who introduced it). It should rather be "baseConceptUri" or something similar). Actually there has been patch that along with doing other things was renaming this setting key. The patch has been unfortunately suspended. I think I am going to rename it separately any way, as "base URI" is far too confusing.

I first tried to have local and remote items and properties, and got this error:

Using same entity types on multiple repositories is not supported yet. "item" has already be defined for repository ""

This is, in my opinion, a very severe restriction. As far as I can tell, this effectively means that a federated setup without the WikibaseMediaInfo extension, or some other extension providing entity types beyond item and property, does not make sense – you need local items (otherwise you don’t have anything to put statements on), so the only thing you can take from the remote repository is properties. The documentation should definitely mention that.

This is now mentioned in federation docs explicitly. And it is indeed quite strict limitation. We decided to only have a single entity type provided by single repository to postpone dealing with cases where search results from multiple repos must be merged (custom sorting etc), and so on. As currently federation is only limited to setups with shared database access, which quite reducing the use for third-party (non-Wikimedia) installation, this limitation is not that pressing, though. In Wikimedia case (Commons, Wiktionary) we don't think we would like to have items from multiple repos. And as the shared database access requirement is the limitation that would require quite a fair amount of work to be removed (ie. to also have "federation using API" etc), we considered having a single-entity-type=per-repo limitation in place for start as well.

At one point, I got a really weird error ($wgContLang was null) when I had two entries in the foreignRepositories config. Are multiple entries supported?

This is interesting. Can you reproduce it again? I failed to do so by simply having two foreign repos. Multiple foreign repositories should of course be allowed.

Attempting to configure a repo with no local entities at all ($wgWBRepoSettings['entityNamespaces'] = [];) resulted in an error message in the setup scripts. I assume the interpretation of “empty array ≈ unset variable” is unintentional.

Although if I remember correctly I also was not very convinced by the change (it was done in https://gerrit.wikimedia.org/r/#/c/341781/) but I am afraid this interpretation is intentional. And I can see some sense in the justification for this change in principle (don't want to provide any entity types, just don't use Wikibase then).
With that change in place it is becoming a bit hacky to have a Wikibase not used for providing items and properties, though, as you've noticed. The way I managed to deal with this is actually only setting relevant entity types (ie. possibly no items and properties) in $wgWBClientSettings['repoNamespaces']. The example in docs/federation.wiki now also mentions this. And I hope we will manage to improve this all a bit, namespace configuration is just a bit entangled, and therefore tricky.

Thanks, it’s much better with the new change.

This is interesting. Can you reproduce it again?

It seems to be related to my wiki2 database – it occurs even when I only configure that one wiki in the foreign repositories. I can send you an SQL dump of the database if you’re interested.

Change 348939 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Expand slightly documentation regarding setting up federated repositories

https://gerrit.wikimedia.org/r/348939