Page MenuHomePhabricator

[Epic] Splitting the graph in WDQS
Open, HighPublic

Description

In order to stabilize the Wikidata Query Service we are looking into splitting the graph inside Blazegraph into 2 (or potentially more) subgraphs. This ticket is for tracking the investigation into what a sensible split would be, what the consequences are and then making it happen.

Related Objects

StatusSubtypeAssignedTask
OpenNone
OpenNone
ResolvedGehel
Resolved Manuel
ResolvedAndrewTavis_WMDE
DeclinedNone
DeclinedNone
Resolveddr0ptp4kt
Resolveddcausse
ResolvedLydia_Pintscher
ResolvedGehel
Resolveddcausse
Resolvedbking
Resolvedbking
Resolvedbking
ResolvedRKemper
ResolvedRKemper
Resolveddr0ptp4kt
ResolvedRKemper
ResolvedDzahn
ResolvedRKemper
ResolvedRKemper
OpenNone
ResolvedRKemper
ResolvedGehel
Resolvedbking
ResolvedGehel
ResolvedAndrewTavis_WMDE
DuplicateAndrewTavis_WMDE
OpenNone
ResolvedGehel
Resolveddcausse
Resolveddr0ptp4kt
Resolveddcausse
InvalidSannita
ResolvedLucas_Werkmeister_WMDE
Resolveddcausse
Resolveddcausse
Resolvedpfischer
ResolvedEBernhardson
Resolveddcausse
OpenNone
ResolvedRKemper
ResolvedStevemunene
ResolvedStevemunene
ResolvedStevemunene
ResolvedStevemunene
ResolvedRKemper
ResolvedRKemper
InvalidGehel
ResolvedRKemper
ResolvedItamarWMDE
ResolvedStevemunene
DuplicateNone
ResolvedRKemper
Resolveddcausse
Resolvedbking
Resolveddcausse
OpenRKemper
ResolvedGehel
ResolvedAudreyPenven_WMDE
Resolvedpfischer
OpenNone
OpenNone
OpenAndrewTavis_WMDE
OpenNone
OpenNone
OpenNone
OpenNone
OpenRKemper
OpenNone
OpenNone
OpenRKemper
OpenNone
OpenNone
Resolvedbking

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Gehel triaged this task as High priority.May 22 2023, 12:58 PM
Gehel moved this task from Incoming to Scaling on the Wikidata-Query-Service board.
Gehel moved this task from Scaling to Epics on the Wikidata-Query-Service board.

When T345475 is done, we should have 3 new WDQS hosts in CODFW that could be used for the graph splitting experiment. @RKemper let us know if you have any objections to this plan.

We'll use eqiad hosts instead, see T347505

Mentioned in SAL (#wikimedia-operations) [2024-03-05T22:37:10Z] <bking@cumin2002> START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: T337013

Mentioned in SAL (#wikimedia-operations) [2024-03-05T22:37:14Z] <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: T337013

Mentioned in SAL (#wikimedia-operations) [2024-03-07T18:22:18Z] <bking@cumin2002> START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on wdqs[1022-1025].eqiad.wmnet with reason: T337013

Mentioned in SAL (#wikimedia-operations) [2024-03-07T18:22:38Z] <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on wdqs[1022-1025].eqiad.wmnet with reason: T337013

Will this also solve T261764? or is the graph used by the Query Service different than the one used by the API?

Will this also solve T261764? or is the graph used by the Query Service different than the one used by the API?

I don't believe this is using the Query Service. This means it would not be affected.

Mentioned in SAL (#wikimedia-operations) [2024-08-23T15:52:57Z] <bking@cumin2002> START - Cookbook sre.hosts.downtime for 17:00:00 on wdqs[1023-1024].eqiad.wmnet with reason: noisy alerts related to graph split T337013

Mentioned in SAL (#wikimedia-operations) [2024-08-23T15:53:13Z] <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 17:00:00 on wdqs[1023-1024].eqiad.wmnet with reason: noisy alerts related to graph split T337013