Page MenuHomePhabricator

[Epic] Splitting the graph in WDQS
Closed, ResolvedPublic

Description

In order to stabilize the Wikidata Query Service we are looking into splitting the graph inside Blazegraph into 2 (or potentially more) subgraphs. This ticket is for tracking the investigation into what a sensible split would be, what the consequences are and then making it happen.

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedBTracy-WMF
ResolvedGehel
Resolved Manuel
ResolvedAndrewTavis_WMDE
DeclinedNone
DeclinedNone
Resolveddr0ptp4kt
Resolveddcausse
ResolvedLydia_Pintscher
ResolvedGehel
Resolveddcausse
Resolvedbking
Resolvedbking
Resolvedbking
ResolvedRKemper
ResolvedRKemper
Resolveddr0ptp4kt
ResolvedRKemper
ResolvedDzahn
ResolvedRKemper
ResolvedRKemper
ResolvedGehel
ResolvedRKemper
ResolvedGehel
Resolvedbking
ResolvedGehel
ResolvedAndrewTavis_WMDE
DuplicateAndrewTavis_WMDE
ResolvedGehel
ResolvedGehel
Resolveddcausse
Resolveddr0ptp4kt
Resolveddcausse
InvalidSannita
ResolvedLucas_Werkmeister_WMDE
Resolveddcausse
Resolveddcausse
Resolvedpfischer
ResolvedEBernhardson
Resolveddcausse
ResolvedRKemper
ResolvedRKemper
Resolved Stevemunene
Resolved Stevemunene
Resolved Stevemunene
Resolved Stevemunene
ResolvedRKemper
ResolvedRKemper
InvalidGehel
ResolvedRKemper
ResolvedItamarWMDE
Resolved Stevemunene
DuplicateNone
ResolvedRKemper
Resolveddcausse
Resolvedbking
Resolveddcausse
ResolvedRKemper
ResolvedRKemper
ResolvedRKemper
ResolvedRKemper
ResolvedRKemper
Resolveddcausse
ResolvedGehel
ResolvedLucas_Werkmeister_WMDE
Resolvedpfischer
ResolvedRKemper
OpenNone
ResolvedAndrewTavis_WMDE
ResolvedRKemper
Resolved Stevemunene
ResolvedRKemper
Resolvedbking
ResolvedRKemper
ResolvedRKemper
ResolvedRKemper
ResolvedRKemper
ResolvedRKemper
Resolved Stevemunene
ResolvedRKemper
ResolvedGehel
Resolveddcausse
Resolvedbking
Resolvedbking
Resolvedbking
In ProgressNone
OpenNone
ResolvedRKemper
ResolvedRKemper
InvalidRKemper
Resolvedbking
Invalidbking
OpenNone

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

When T345475 is done, we should have 3 new WDQS hosts in CODFW that could be used for the graph splitting experiment. @RKemper let us know if you have any objections to this plan.

We'll use eqiad hosts instead, see T347505

Mentioned in SAL (#wikimedia-operations) [2024-03-05T22:37:10Z] <bking@cumin2002> START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: T337013

Mentioned in SAL (#wikimedia-operations) [2024-03-05T22:37:14Z] <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on wdqs[1022-1024].eqiad.wmnet with reason: T337013

Mentioned in SAL (#wikimedia-operations) [2024-03-07T18:22:18Z] <bking@cumin2002> START - Cookbook sre.hosts.downtime for 60 days, 0:00:00 on wdqs[1022-1025].eqiad.wmnet with reason: T337013

Mentioned in SAL (#wikimedia-operations) [2024-03-07T18:22:38Z] <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 60 days, 0:00:00 on wdqs[1022-1025].eqiad.wmnet with reason: T337013

Will this also solve T261764? or is the graph used by the Query Service different than the one used by the API?

Will this also solve T261764? or is the graph used by the Query Service different than the one used by the API?

I don't believe this is using the Query Service. This means it would not be affected.

Mentioned in SAL (#wikimedia-operations) [2024-08-23T15:52:57Z] <bking@cumin2002> START - Cookbook sre.hosts.downtime for 17:00:00 on wdqs[1023-1024].eqiad.wmnet with reason: noisy alerts related to graph split T337013

Mentioned in SAL (#wikimedia-operations) [2024-08-23T15:53:13Z] <bking@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 17:00:00 on wdqs[1023-1024].eqiad.wmnet with reason: noisy alerts related to graph split T337013

Change #1122151 had a related patch set uploaded (by Bking; author: Bking):

[operations/mediawiki-config@master] wdqs-categories: use new split graph hosts (wdqs-main) for categories

https://gerrit.wikimedia.org/r/1122151

Change #1124535 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/mediawiki-config@master] wdqs categories: switch to internal-main

https://gerrit.wikimedia.org/r/1124535

Change #1122151 merged by Ryan Kemper:

[operations/mediawiki-config@master] wdqs-categories: remove extraneous wgCirrusSearchCategoryEndpoint value

https://gerrit.wikimedia.org/r/1122151

Change #1124535 merged by jenkins-bot:

[operations/mediawiki-config@master] wdqs categories: switch to internal-main

https://gerrit.wikimedia.org/r/1124535

Mentioned in SAL (#wikimedia-operations) [2025-03-25T20:26:26Z] <ryankemper@deploy1003> Started scap sync-world: Backport for [[gerrit:1124535|wdqs categories: switch to internal-main (T375520 T385896 T337013)]]

Mentioned in SAL (#wikimedia-operations) [2025-03-25T20:33:11Z] <ryankemper@deploy1003> ryankemper: Backport for [[gerrit:1124535|wdqs categories: switch to internal-main (T375520 T385896 T337013)]] synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

Mentioned in SAL (#wikimedia-operations) [2025-03-25T20:48:06Z] <ryankemper@deploy1003> Finished scap sync-world: Backport for [[gerrit:1124535|wdqs categories: switch to internal-main (T375520 T385896 T337013)]] (duration: 21m 40s)

BTracy-WMF claimed this task.