Page MenuHomePhabricator

Feature Request: data format for portability between wikibase instances
Open, Needs TriagePublic

Description

Hello,

I would like to understand if it is possible to export data from a Wikibase instance and import it into another Wikibase instance without loss of fidelity.

Let’s say that I have a simple wikibase instance with three entities
P1 locatedIn
Q1 State of Wisconsin
Q2 City of Madison

And there’s a statement on Q2
Q2 P1 Q1 (Madison is locatedIn Wisconsin)

The specific scenario I am considering is if my dataset lives in a hosted wikibase provider like WBStack, and if WBStack decides to shutdown, I would like to be able to migrate the content of my WBstack instance to another wikibase instance, but I would like all of the identifiers on the entities to remain consistent when I import them into the new wikibase, e.g. I want my new Wikibase to have the P1, Q1, and Q2 with the same meanings. This is important because I want queries like
?city P1 Q1
to work without having to be updated for new P and Q numbers, since queries will be part of my scripts and applications - all external to my Wikibase and won’t be updated as part of any automated import tools.

I am not excited about MySQL dumps as the interchange format, because I worry that it ties the dump to a specific version of Wikibase - say I have a dump of my wikibase from running 1.34, but WBStack is running Wikibase 1.36 - will I be able to import my backup into WBstack? Does a hosted provider really want to import sites using MySQL dumps?

To further clarify: I am specifically focused on getting data into Wikibase, and not just using the contents of a Wikibase in a triplestore like BlazeGraph/WDQS. If there is a clear path to import an RDF export of a Wikibase back into Wikibase itself, I would be happy to go that route, but I don’t believe that is possible today.

Note that I am also not (necessarily) asking for a Wikidata dump to be produced in this format, though I could imagine that many people would find that useful to create a local mirror of Wikidata in their own Wikibase.

I do think that in the end what I would like to be able to do is create entities in Wikibase with specific identifiers, e.g. I want to be able to start a blank wikibase and create a new entity with Q43788 without having to create 43,777 empty items first (Q43788 is Madison, WI on Wikidata.) But to be clear, I am not asking for any sort of guaranteed identifier sync with Wikidata. Even if I create Q43788 as ‘Madison WI’ in my local, non-Wikidata instance of Wikibase, I do not expect Wikidata would in any way try to respect any P or Q numbers I might create in my local wikibase, and I would take responsibility for conflict management in my own wikibase, e.g. it is my problem if I try to merge two wikibase instances that have different ideas of what Q42 should represent.

My ask might be as simple as “I want to be able create wikibase entities by supplying my own IDs, instead of Wikibase issuing the IDs”, but I am still exploring Wikibase and I suspect the full story is more involved.

Thank you for all of your work on wikibase and wikidata!

Event Timeline

I would say that MediaWiki export and import, or SQL dumps, are a better way to migrate data between Wikibase instances than some Wikibase-specific features. Both of these should also preserve the edit history. (Wikibase doesn’t allow import of entities by default, but there seems to be an allowEntityImport setting that can be set to allow it, though I confess I haven’t tried it.) Have you tried MediaWiki’s XML export/import, or experienced problems with it?

I am not excited about MySQL dumps as the interchange format, because I worry that it ties the dump to a specific version of Wikibase - say I have a dump of my wikibase from running 1.34, but WBStack is running Wikibase 1.36 - will I be able to import my backup into WBstack? Does a hosted provider really want to import sites using MySQL dumps?

Since MediaWiki supports updating from older versions (within limits: see T259771), I assume newer MediaWiki+Wikibase versions can deal with SQL dumps produced by older versions. (Probably not the other way around, though.) That said, I haven’t tried this myself. (However, I know for certain that Wikibase is able to read page content that uses older serialization formats – after all, we still have such revisions in Wikidata.)