Page MenuHomePhabricator

Provide a low availability / scalability full graph endpoint to ease the transition to a split graph
Closed, ResolvedPublic

Description

Scholia (the main use case for the scholarly sub-graph) has requested an extension to ease the transition. While we don't have the resources to run a fully fledged full graph endpoint in parallel to the split endpoints, we should be able to provide a single server (lower availability and lower scalability), on a new dedicated endpoint, which would still expose the full graph for a transitional period. We expect that new endpoint to be available for a limited amount of time, probably until the end of December 2025.

Given the rate of queries requiring scholarly articles that we see in our logs, a single server should have enough capacity to handle the load.

We won't have any SLO on this new endpoint. With a single failure, any issue, or any maintenance operation will generate downtime. In particular, we will not be able to react to issues outside of working hours, so failures on a Friday are unlikely to be addressed until the following Monday.

AC

  • DNS record create for query-legacy-full.wikidata.org
  • Corresponding backend.yaml trafficserver routing config for query-legacy-full.wikidata.org/ (UI) and query-legacy-full.wikidata.org/sparql (sparql endpoint)
  • Separate UI created for this purpose (include a header at the top of the page that explains that the service is temporary)

Event Timeline

Gehel triaged this task as High priority.Jan 22 2025, 9:12 AM

Per IRC conversation with Traffic, we won't need to touch LVS to make this change; we can simply add a DNS record and an ATS mapping. That means it'll be a lot less work to roll out this new endpoint.

RKemper updated Other Assignee, added: Stevemunene.
RKemper updated the task description. (Show Details)
RKemper added a subscriber: Stevemunene.

Change #1122676 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/dns@master] wdqs: Create DNS entry for one full graph host

https://gerrit.wikimedia.org/r/1122676

Change #1121726 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: add routing for legacy full graph host

https://gerrit.wikimedia.org/r/1121726

Change #1122678 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/deployment-charts@master] wdqs: create new ui for wdqs legacy full

https://gerrit.wikimedia.org/r/1122678

Change #1122678 merged by Ryan Kemper:

[operations/deployment-charts@master] wdqs: create new ui for wdqs legacy full

https://gerrit.wikimedia.org/r/1122678

Mentioned in SAL (#wikimedia-operations) [2025-03-03T22:46:19Z] <ryankemper> T384422 k8s deployment of wikidata-query-legacy-full-gui release in codfw looks fine, proceeding to eqiad

Change #1122676 merged by Ryan Kemper:

[operations/dns@master] wdqs: Create DNS entry for one full graph host

https://gerrit.wikimedia.org/r/1122676

Change #1121726 merged by Ryan Kemper:

[operations/puppet@production] wdqs: add routing for legacy full graph host

https://gerrit.wikimedia.org/r/1121726

Mentioned in SAL (#wikimedia-operations) [2025-03-03T22:56:38Z] <ryankemper> T384422 Deploying backend.yaml routing patch; after it's deployed we should theoretically be able to see a UI at https://query-legacy-full.wikidata.org/

Realized my approach with the DNS patch is likely incorrect. The current (wrong) approach is having an entry in templates/wmnet that does this:

wdqs-legacy-full 300 IN CNAME wdqs2009.codfw.wmnet.

However what we actually want is a subdomain of wikidata.org, wdqs-legacy-full.wikidata.org, that points to dyna.wikimedia.org., and then the backend.yaml will look the same as it does now (namely, routing a dyna request originating for wdqs-legacy-full.wikidata.org/ to miscweb-k8s and routing wdqs-legacy-full.wikidata.org/sparql to https://wdqs2009.codfw.wmnet/sparql).

Will get an example patch up and then try it out tomorrow (Tuesday). I've still got some brushing up to do on how dyna works under the hood before I can proceed.

Change #1124197 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/dns@master] wdqs: create query-legacy-full.wikidata.org

https://gerrit.wikimedia.org/r/1124197

Change #1124197 merged by Ryan Kemper:

[operations/dns@master] wdqs: create query-legacy-full.wikidata.org

https://gerrit.wikimedia.org/r/1124197

Change #1124481 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: fix query-legacy-full cert typo

https://gerrit.wikimedia.org/r/1124481

Change #1124481 merged by Ryan Kemper:

[operations/puppet@production] wdqs: fix query-legacy-full cert typo

https://gerrit.wikimedia.org/r/1124481

Jelto subscribed.

I deploy the admin-ng service in all wikikube clusters to apply the change. The ingress is configured for query-legacy-full.wikidata.org now.

Curl confirms the service is generally listening:

curl -I --resolve query-legacy-full.wikidata.org:30443:$(dig +short k8s-ingress-wikikube.svc.eqiad.wmnet) https://query-legacy-full.wikidata.org:30443
HTTP/2 200 
date: Wed, 05 Mar 2025 08:57:26 GMT
server: istio-envoy
last-modified: Fri, 31 Jan 2025 09:28:51 GMT
etag: "4671-62cfd2ba75ac0"
accept-ranges: bytes
content-length: 18033
cache-control: no-cache
content-type: text/html
x-envoy-upstream-service-time: 7

However I'm not fully sure what's the difference between query-legacy-full.wikidata.org and wdqs-legacy-full.wikidata.org, I'll leave that to you.

RKemper updated the task description. (Show Details)

Change #1139537 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] query-legacy-full: set cluster in hiera

https://gerrit.wikimedia.org/r/1139537

Change #1139537 merged by Ryan Kemper:

[operations/puppet@production] query-legacy-full: set cluster in hiera

https://gerrit.wikimedia.org/r/1139537