Page MenuHomePhabricator

[Epic] Splitting the graph in WDQS
Open, HighPublic

Description

In order to stabilize the Wikidata Query Service we are looking into splitting the graph inside Blazegraph into 2 (or potentially more) subgraphs. This ticket is for tracking the investigation into what a sensible split would be, what the consequences are and then making it happen.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
ResolvedGehel
ResolvedManuel
ResolvedAndrewTavis_WMDE
DeclinedNone
DeclinedNone
Resolveddr0ptp4kt
Resolveddcausse
ResolvedLydia_Pintscher
ResolvedGehel
Opendcausse
Resolvedbking
Resolvedbking
Resolvedbking
ResolvedRKemper
ResolvedRKemper
Resolveddr0ptp4kt
ResolvedRKemper
ResolvedDzahn
ResolvedRKemper
ResolvedRKemper
OpenNone
ResolvedRKemper
ResolvedGehel
Resolvedbking
OpenNone
ResolvedAndrewTavis_WMDE
DuplicateAndrewTavis_WMDE
OpenNone
ResolvedGehel
Resolveddcausse
Resolveddr0ptp4kt
Resolveddcausse
OpenSannita
OpenNone
Resolveddcausse
Resolveddcausse
Openpfischer
OpenNone
Opendcausse
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone

Event Timeline

Gehel triaged this task as High priority.May 22 2023, 12:58 PM
Gehel moved this task from Incoming to Scaling on the Wikidata-Query-Service board.
Gehel moved this task from Scaling to Epics on the Wikidata-Query-Service board.

When T345475 is done, we should have 3 new WDQS hosts in CODFW that could be used for the graph splitting experiment. @RKemper let us know if you have any objections to this plan.

We'll use eqiad hosts instead, see T347505

Mentioned in SAL (#wikimedia-operations) [2024-03-05T22:37:10Z] <bking@cumin2002> START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: T337013

Mentioned in SAL (#wikimedia-operations) [2024-03-05T22:37:14Z] <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: T337013

Mentioned in SAL (#wikimedia-operations) [2024-03-07T18:22:18Z] <bking@cumin2002> START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on wdqs[1022-1025].eqiad.wmnet with reason: T337013

Mentioned in SAL (#wikimedia-operations) [2024-03-07T18:22:38Z] <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on wdqs[1022-1025].eqiad.wmnet with reason: T337013