Page MenuHomePhabricator

Importing a Wikibase entity via XML breaks creating new entites
Open, MediumPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  1. In a fresh Wikibase installation, create an item to get Item:Q1
  2. Export the item as XML (e.g. using /Special:Export/Item:Q1)
  3. In a second fresh Wikibase installation, import the XML
  4. Attempt to create a new item in the fresh Wikibase installation via Special:NewItem

What happens?:
Receive an error about the item already existing.

In the first screenshot, you can see the error.
The second shows that the error is not caused by the label already being used.
The third serves to confirm that the second would have turned up a match by correctly finding the imported item.

Screenshot 2021-08-19 at 21-55-12 Create a new Item - Wikipartments.png (1×3 px, 108 KB)

Screenshot 2021-08-19 at 21-55-31 Search results for Test2 - Wikipartments.png (530×3 px, 49 KB)

Screenshot 2021-08-19 at 21-55-37 Search results for Test - Wikipartments.png (630×3 px, 63 KB)

What should have happened instead?:

A new Item was created

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc:
Installed Software
MediaWiki 1.36.1
PHP 7.4.22 (fpm-fcgi)
MariaDB 10.5.11-MariaDB
ICU 67.1
LuaSandbox 4.0.2
Lua 5.1.5

Installed Skins
Timeless 0.9.1 (80cc022) 18:56, 23 July 2021
Vector 1.0.0 (b9deb92) 19:04, 23 July 2021

Installed extensions
Flagged Revisions – (c63cb4d) 19:16, 22 July 2021
CodeEditor – (e587a94) 12:59, 22 July 2021
WikiEditor 0.5.3 (cf1d759) 14:23, 23 July 2021
ParserFunctions 1.6.0 (d999660) 13:28, 27 May 2021
Scribunto – (ac71012) 21:34, 23 July 2021
TitleBlacklist 1.5.0 (67533b2) 05:07, 28 May 2021
WikibaseClient – (a00a340073)
WikibaseRepository – (a00a340073)
WikipartmentsGeocode 0.1.0
WikipartmentsMessages 0.0.0

(The last two are custom extension of mine.)

More information
This issue is caused by the import process not updating (and/or creating) entries in the wb_id_counters table.
(See also https://wikibase.consulting/transferring-wikibase-data-between-wikis/)

Because id generation happens non-atomically with entity creation, failing to create the item still updates the wb_id_counters table with an incremented id. In the specific reproduction case given here, this means a second attempt to create the item will succeed. If more items were imported, however, attempts will fail until the id is incremented to an unused id.

If non-sequential ids are imported, this will create failures sporadically (whenever an attempt to reuse an imported id is reached).

Event Timeline

I locally have a solution using the AfterImportPage hook to compare the highest recorded id in wb_id_counters for the imported entity type to the imported entity id. If the imported id is higher, it writes the imported id to the wb_id_counters table. I'm not sure this is the best solution, though, for a few reasons:

  1. If something goes wrong page import, but some of the revisions were successfully imported, then it doesn't solve the problem as the AfterPageImport hook won't be called.
  2. It requires reading from (and possibly writing to) the database an extra time for every page import, slowing down the import process, which is already a concern (see T287164: Improve bulk import via API)
  3. It adds a second hook to the import process (somehow it feels like more hooks for one task is less good than fewer hooks for one task)

In case using the AfterImportPage is a good solution, I'm working to prepare what I have locally (currently based off of commit a00a340073) to be based off of master and replace its use of the old Hook system with the new HookHandler system. I'll push up a patch once I'm done (should be tonight or tomorrow).

Alternate idea, instead of using the AfterImportPage hook, I could just extend the already used onImportHandleRevisionXMLTag hook.

This has more impact on performance, as this can be called multiple times per page.

It does solve the missing id on partial import problem, though. However, it has the opposite problem: it will update with the id (consuming it if its the highest one) in the event of completely unsuccessful import. That seems better, though, as all that does is waste an id, instead of breaking functionality. (As far as I can tell, there's no issue if there's a gap in ids, but please correct me if I'm wrong about that.)

Change 714011 had a related patch set uploaded (by CtrlZvi; author: CtrlZvi):

[mediawiki/extensions/Wikibase@master] repo: Consume an entity's id on import

https://gerrit.wikimedia.org/r/714011

Addshore triaged this task as Medium priority.Aug 31 2021, 2:23 PM

(As far as I can tell, there's no issue if there's a gap in ids, but please correct me if I'm wrong about that.)

There is indeed no issue with this

@CtrlZvi: Per emails from Sep18 and Oct20 and https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup , I am resetting the assignee of this task because there has not been progress lately (please correct me if I am wrong!). Resetting the assignee avoids the impression that somebody is already working on this task. It also allows others to potentially work towards fixing this task. Please claim this task again when you plan to work on it (via Add Action...Assign / Claim in the dropdown menu) - it would be welcome. Thanks for your understanding!