Page MenuHomePhabricator

Tool to copy Wikibase entities from one Wikibase to another
Closed, ResolvedPublic

Description

As a new Wikibase user, I would like to copy entities, primarily properties and items from an existing Wikibase to my new one, so that I can have a starting point in my knowledge graph from which to build.

After doing some small prototyping exercise in our Wikibase.cloud team week we attempted to build a tool using pywikibot to enable copying pages from one Wiki to another. We attempted to use the transferbot.py script under-the-hood but realised that this does not work on Wikibase entities.

In order to make an incremental step towards this goal we want to try an alternative technical implementation that could tell us how realistic it would be to use this as a foundation for a future user facing feature. This ticket however doesn't anticipate a complete user facing feature; although we are aware that we (or some other technically minded user) might use the outcome of this to do some real-world importing.

We realised that a sensible architecture for this would be a container image that we can run which takes the following parameters:

  • a source wiki
  • a target wiki (+ credentials to edit there; maybe OAuth)
  • a list of entities to move (or perhaps optionally all entities in the case of a smaller source Wikibase)

One way to do this would be using wikibase-cli which we investigated a little further and seemed plausible.

When a property with a data type missing on the target wiki gets copied over we should log a warning, don't copy the property and continue.

Out of scope for now:

  • copy all statements
  • for when we do copy over statements: what should happen about statements referencing entities that are not present (i.e. the depth of copy)?
  • don't bother with allowing for specific languages to be copied
  • (not a probem while we're not copying statements) figuring out what to do with statements with a datatype missing on target wiki?

Acceptance Criteria:

  • A container image (i.e. a Dockerfile and some tooling to build it)
  • That takes as either ENV variables, a file mounted in or standard input
    • a list of entities
    • A source wiki
    • A target wiki
  • That after completing has imported items and properties with their labels, descriptions and alias in all languages and the datatype of properties to the target wiki

Details

Other Assignee
Deniz_WMDE

Event Timeline

Evelien_WMDE renamed this task from Feature Request: Tool to move Wikibase entities from one Wikibase to another to Tool to move Wikibase entities from one Wikibase to another.Dec 5 2023, 4:23 PM
Evelien_WMDE updated the task description. (Show Details)

What are the arguments against enabling 'allowEntityImport' and just importing an export file?

Charlie_WMDE updated the task description. (Show Details)
Charlie_WMDE subscribed.

What are the arguments against enabling 'allowEntityImport' and just importing an export file?

Hi! I this presents a different approach for a different use case. It would probably make more sense for a "back up and restore" type feature than a "pick some data from other places to help you get started" use case.

Importing an export file (for example) would almost always result in overriding the early entities in the Wikibase you're importing in to.

Please feel free to open a separate ticket in the back log if you have a specific reason you want something to be implemented in this direction :)

Fring renamed this task from Tool to move Wikibase entities from one Wikibase to another to Tool to copy Wikibase entities from one Wikibase to another.Mar 12 2024, 10:43 AM

Command for doing the above using wikibase-cli

wb data $ENTITIES -i "$SOURCE" | jq -c '{.type,.labels,.descriptions,.aliases,.datatype}' | wb create-entity --batch -i "$TARGET"

Repository for the Docker image is found here: https://github.com/wbstack/transferbot

The tool worked for me and did what's stated in the ACs on some cursory tests, but can probably still be tweaked further to our likings:

➜  transferbot git:(main) docker run -e TARGET_WIKI_OAUTH_CONSUMER_KEY="x" -e TARGET_WIKI_OAUTH_CONSUMER_SECRET="x" -e TARGET_WIKI_OAUTH_TOKEN="x" -e TARGET_WIKI_OAUTH_TOKEN_SECRET="x" --rm ghcr.io/wbstack/transferbot:main https://yetagain.wikibase.dev https://updatewiki.wikibase.dev Q3
processing line 1: {"type":"item","labels":{"en":{"language":"en","value":"en"}},"descriptions":{"en":{"language":"en","value":"the language code for the english language maybe"}},"aliases":{},"datatype":null}
{"entity":{"type":"item","id":"Q6","labels":{"en":{"language":"en","value":"en"}},"descriptions":{"en":{"language":"en","value":"the language code for the english language maybe"}},"aliases":{},"claims":{},"sitelinks":{},"lastrevid":3},"success":1}
done processing 1 lines: successes=1 errors=0

with the transferred entity at https://updatewiki.wikibase.dev/wiki/Item:Q6 (source at https://yetagain.wikibase.dev/wiki/Item:Q3)

I could verify the functionality with 162 entities. I created a PR for some fixes/improvements: https://github.com/wbstack/transferbot/pull/1

Another observation: composing the run command on the fly seems quite error prone and feels a bit tricky to me: juggling secrets like that is always a bit stressful I think and also for example I forgot to add -ti to docker run, which rendered me unable to ctrl+c out of it. Maybe reading from a .env file or something could make sense? I made an example in this PR: https://github.com/wbstack/transferbot/pull/2

I merged the PR with the small fixes/improvements that rosalie reviewed and closed the other one as I think it was out of scope and not very useful for now. I think the repo that exists now is a cool step in the right direction and meets the ACs here, so I consider this done.