Page MenuHomePhabricator

Migrate Wikidata off of Blazegraph
Open, HighPublic

Description

Currently there are a lot of issues for evaluation and analysis of Blazegraph replacements, such as:

T206560 - Evaluate alternatives to BG (including lots of subtasks around testing and evaluating alternatives)
T306725 - Decide which BG services to migrate (assuming a migration is bound to happen)

... but no issue for the migration itself. It seems unavoidable and urgent, hence this task.

We should migrate before reloads fail: Blazegraph instability has been slowing down data reloads on WDQS, and may prevent them altogether next time. As the Query Service is the public-facing part of Wikidata in many contexts, this feels like preventing WD itself from being updated.

@Gehel wrote:

TL;DR: We expect to successfully complete the recent data reload on Wikidata Query Service soon, but we've encountered multiple failures related to the size of the graph, and anticipate that this issue may worsen in the future. Although we succeeded this time, we cannot guarantee that future reload attempts will be successful given the current trend of the data reload process. Thank you for your understanding

Proposal:

  • Migrate WD to a different db backend before we next need to reload the query service. (Even if there is a double-backend solution for a time: T290839)
  • Document the migration process for ourselves and for other wikibase users.

Motivation to do this now:

  1. We need a new production-quality backend. Practicing + testing a migration helps practice future recovery workflows.
  2. Working through the migration process will bring needed attention to this critical step in WD growth
  3. Whatever the challenges, waiting until a backend failure happens will be worse.
  4. There is an ongoing tax for delaying migration: more issues opening every season for fixing slowness, failures, or other inconsistencies with BG.

Event Timeline

Sj renamed this task from Migrate off of Blazegraph to Migrate Wikidata off of Blazegraph.Feb 24 2023, 8:07 PM
Sj updated the task description. (Show Details)

Possibly this is already covered somewhere else and can be closed and merged. Such as existing discussions about:

  • A rough timeline for migration
  • Current status of decisions made & pending, about where and how to migrate
  • A map of components + services affected, so they can be notified and run their own downstream analysis

Other issues that may depend on the details above:

  • Plans for handling possible obstacles or failures
  • Metrics to run on test datasets + then on the full system post-migration
  • Challenges and stopgaps that this may resolve (technical + social debt)
  • Stalled features or service requests that may become possible
MPhamWMF moved this task from Incoming to Scaling on the Wikidata-Query-Service board.