Page MenuHomePhabricator

Bulk import: evaluate current tooling
Closed, ResolvedPublic

Description

Current Situation:

  • Users report that bulk import is too slow and error-prone.
  • There are several related tools and workarounds.

Goal:

  • Technically evaluate our solutions and external projects that came up in the user research, such as Quickstatements, Raise Wikibase, Wikibase Insert, etc.
  • Gather knowledge that will inform our decision on how to improve the bulk import functionality.

Acceptance Criteria:

  • We know what is and isn't handled by the existing tools.
  • We know if benchmarking is possible/needed.

Event Timeline

adee_wmde changed the task status from Open to In Progress.Oct 29 2024, 10:09 AM
adee_wmde moved this task from Sprint Backlog to Doing on the Wikibase Suite Team (Sprint-∞) board.

Several tools for bulk data import into Wikibase were reviewed and categorized by their approach: API wrappers, scripts interacting with the database, and direct database access. These tools were compared on usability and performance, using benchmarks provided by their developers. The review also highlighted the lack of native API support for bulk import functionality. Most tools function as workarounds, emphasizing the limitations of the current API. The ideal solution would combine the high performance of direct database access with the Wikibase integration and features provided by the API.
These findings were shared with product for feedback. We also consulted with user research and colleagues who worked on the REST API, which led us to prioritize further research into extending the API with a bulk import method or implementing a CLI tool as a potential solution.

@adee_wmde could you please list of the reviewed tools? Better indeed would be to publish the review itself elsewhere, just for the record. Seems an interesting work to keep publicly accesible.

@adee_wmde could you please list the reviewed tools?