Page MenuHomePhabricator

RKemper (Ryan Kemper)
User

Today

  • No visible events.

Tomorrow

  • No visible events.

Monday

  • No visible events.

User Details

User Since
May 1 2020, 10:28 PM (292 w, 6 h)
Availability
Available
LDAP User
Unknown
MediaWiki User
RKemper (WMF) [ Global Accounts ]

Recent Activity

Today

RKemper added a comment to T411919: hw troubleshooting: PERC1 battery failure for an-worker1148.

In the past I've had to assign the ticket to the relevant person, but I don't see those instructions in the template so hopefully I didn't mess up by not putting an assignee :)

Sat, Dec 6, 2:42 AM · SRE, ops-eqiad, DC-Ops
RKemper updated the task description for T411919: hw troubleshooting: PERC1 battery failure for an-worker1148.
Sat, Dec 6, 2:42 AM · SRE, ops-eqiad, DC-Ops
RKemper updated the task description for T411919: hw troubleshooting: PERC1 battery failure for an-worker1148.
Sat, Dec 6, 2:39 AM · SRE, ops-eqiad, DC-Ops
RKemper moved T411919: hw troubleshooting: PERC1 battery failure for an-worker1148 from Backlog to Hardware Failure / Troubleshoot on the ops-eqiad board.
Sat, Dec 6, 2:37 AM · SRE, ops-eqiad, DC-Ops
RKemper created T411919: hw troubleshooting: PERC1 battery failure for an-worker1148.
Sat, Dec 6, 2:36 AM · SRE, ops-eqiad, DC-Ops

Thu, Dec 4

RKemper added a comment to T411568: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts.

Stat host reboots completed.

Thu, Dec 4, 10:36 PM · Patch-For-Review, Data-Platform-SRE (2025.11.07 - 2025.11.28), SRE
RKemper added a comment to T411568: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts.

Oh, with respect to the patch, we should also get https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/976163/1/cookbooks/sre/hadoop/reboot-workers.py reviewed and merged at the same time since it's directly relevant to this

Thu, Dec 4, 6:28 AM · Patch-For-Review, Data-Platform-SRE (2025.11.07 - 2025.11.28), SRE
RKemper added a comment to T411568: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts.

an-worker* partially done. made https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1214664 to allow us to reboot a subset of a cluster's hosts while still handling the need to restart one journal node at a time properly. patch needs a bit of fixup.

Thu, Dec 4, 6:27 AM · Patch-For-Review, Data-Platform-SRE (2025.11.07 - 2025.11.28), SRE

Wed, Dec 3

RKemper added a comment to T410956: OpenSearch on K8s: Allow applications to target specific DC for writes.

In short, it looks like opensearch-ipoid-test-bootstrap-0 is failing to properly initialize, leading to pod/opensearch-ipoid-test-masters-0 being unable to boostrap the cluster

Wed, Dec 3, 10:51 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28)
RKemper added a comment to T410956: OpenSearch on K8s: Allow applications to target specific DC for writes.

We merged the above two patches and deployed. The opensearch-ipoid-test cluster is having issues bootstrapping:

Wed, Dec 3, 10:51 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28)
RKemper added a comment to T407702: Multiple deleted items still available in Wikidata Query Service.

It's interesting that the deletion in the task description is still not processed. I spot-checked every wdqs host in eqiad, and they all agree that the item still exists, so there's definitely an issue here and it's not merely confined to a few hosts.

Wed, Dec 3, 1:46 AM · Wikidata-Omega, Wikidata, Wikidata-Query-Service
RKemper added a comment to T411568: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts.

an-worker* reboots ongoing now

Wed, Dec 3, 1:23 AM · Patch-For-Review, Data-Platform-SRE (2025.11.07 - 2025.11.28), SRE
RKemper created T411568: October 2025 Bullseye reboots: Data Platform Engineering-owned hosts.
Wed, Dec 3, 1:09 AM · Patch-For-Review, Data-Platform-SRE (2025.11.07 - 2025.11.28), SRE

Tue, Dec 2

RKemper moved T393966: Update WDQS SLO lag queries to reflect graph split changes from In Progress to Needs Review on the Data-Platform-SRE (2025.11.07 - 2025.11.28) board.

The current iteration of https://gerrit.wikimedia.org/r/c/operations/puppet/+/1202049/5/modules/profile/files/thanos/recording_rules.yaml has removed the sum and rate functions, since we can rely on pyrra to compute some intermediate metrics from these SLIs.

Tue, Dec 2, 7:23 AM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, User-Elukey, Essential-Work, SRE-SLO, observability
RKemper added a comment to T389859: WDQS: Alert on high thread count.

I like the proposal of >800 for 30 minutes, it should hopefully avoid too much flapping from the alert.

Tue, Dec 2, 7:08 AM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Essential-Work

Thu, Nov 27

RKemper added a comment to T410406: Racking request for wdqs10(2[8-9]|3[0-2]).

@bking Did you have any luck with reimage? or do you need any assistance?

Thu, Nov 27, 8:42 AM · Essential-Work, SRE, Data-Platform-SRE (2025.11.07 - 2025.11.28), ops-eqiad, DC-Ops, Wikidata, Wikidata-Query-Service
RKemper added a comment to T410573: October 2025 Bullseye reboots: Search Platform-owned hosts.

@RKemper There's still an a missed host: cirrussearch2084 is marked as fixed, but on and old kernel and has an uptime of 215 days

Thu, Nov 27, 8:30 AM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Essential-Work, SecTeam-Processed, SRE-swift-storage, Infrastructure Security, SRE, Security

Wed, Nov 26

RKemper added a comment to T410406: Racking request for wdqs10(2[8-9]|3[0-2]).

Looks like wdqs1032 and wdqs1029 at minimum might need another reimage

Wed, Nov 26, 7:57 PM · Essential-Work, SRE, Data-Platform-SRE (2025.11.07 - 2025.11.28), ops-eqiad, DC-Ops, Wikidata, Wikidata-Query-Service
RKemper updated the task description for T410573: October 2025 Bullseye reboots: Search Platform-owned hosts.
Wed, Nov 26, 7:34 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Essential-Work, SecTeam-Processed, SRE-swift-storage, Infrastructure Security, SRE, Security

Wed, Nov 19

RKemper added a comment to T362114: OpenSearch on K8s: Create Dashboards.

We can see the newly indexed documents here

Wed, Nov 19, 8:14 AM · Data-Platform-SRE (2025.11.07 - 2025.11.28), OKR-Work

Tue, Nov 11

RKemper closed T407406: Add https://query.wikitrek.org/sparql to WDQS allowlist as Resolved.
Tue, Nov 11, 9:17 AM · Essential-Work, Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Wikidata, Wikidata-Query-Service
RKemper closed T407405: Add https://ld.ncl.edu.tw/fuseki/lod/query to WDQS allowlist as Resolved.
Tue, Nov 11, 9:17 AM · Essential-Work, Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Wikimedia Taiwan, Wikidata, Wikidata-Query-Service
RKemper closed T407406: Add https://query.wikitrek.org/sparql to WDQS allowlist, a subtask of T402892: ☎️ Wikidata Allowlist nominations: manual discovery, as Resolved.
Tue, Nov 11, 9:17 AM · Wikibase Cloud
RKemper closed T407405: Add https://ld.ncl.edu.tw/fuseki/lod/query to WDQS allowlist, a subtask of T402892: ☎️ Wikidata Allowlist nominations: manual discovery, as Resolved.
Tue, Nov 11, 9:17 AM · Wikibase Cloud
RKemper closed T407381: Add https://beta.sparql.swisslipids.org/sparql to WDQS allowlist as Resolved.
Tue, Nov 11, 9:17 AM · Essential-Work, Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Wikidata, Wikidata-Query-Service
RKemper closed T407381: Add https://beta.sparql.swisslipids.org/sparql to WDQS allowlist, a subtask of T402892: ☎️ Wikidata Allowlist nominations: manual discovery, as Resolved.
Tue, Nov 11, 9:17 AM · Wikibase Cloud
RKemper closed T407382: Add https://dati.cultura.gov.it/sparql to WDQS allowlist as Resolved.
Tue, Nov 11, 9:17 AM · Essential-Work, Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Wikidata, Wikidata-Query-Service
RKemper closed T407382: Add https://dati.cultura.gov.it/sparql to WDQS allowlist, a subtask of T402892: ☎️ Wikidata Allowlist nominations: manual discovery, as Resolved.
Tue, Nov 11, 9:17 AM · Wikibase Cloud
RKemper added a comment to T362114: OpenSearch on K8s: Create Dashboards.

It's a little difficult to work on the new dashboard with the clusters not being used; we should probably run some test queries so we have some more data plumbing through. For example, one of the most important thinks we want is a measure of QPS, which presumably we can get via search rate, but that metric currently has nothing so I'm not totally sure.

Tue, Nov 11, 4:15 AM · Data-Platform-SRE (2025.11.07 - 2025.11.28), OKR-Work
RKemper reassigned T392222: Create ops-focused OpenSearch dashboard from RKemper to bking.
Tue, Nov 11, 4:13 AM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Essential-Work
RKemper claimed T392222: Create ops-focused OpenSearch dashboard.
Tue, Nov 11, 4:10 AM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Essential-Work

Nov 5 2025

RKemper added a comment to T390860: Elasticsearch dependency upgrade in spicerack.

@elukey Brian and I just tested out a couple of operations, and everything looks good. I think we're ready for the full release.

Nov 5 2025, 10:24 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Data-Engineering-Radar, Discovery-Search, Data-Engineering, Infrastructure-Foundations

Nov 4 2025

RKemper closed T407378: Add https://data.muziekweb.nl/MuziekwebOrganization/Muziekweb/sparql/Muziekweb to WDQS allowlist as Resolved.
Nov 4 2025, 7:47 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Wikidata, Wikidata-Query-Service
RKemper closed T407378: Add https://data.muziekweb.nl/MuziekwebOrganization/Muziekweb/sparql/Muziekweb to WDQS allowlist, a subtask of T402892: ☎️ Wikidata Allowlist nominations: manual discovery, as Resolved.
Nov 4 2025, 7:47 PM · Wikibase Cloud
RKemper closed T407380: Add https://www.performing-arts.ch/sparql to WDQS allowlist as Resolved.
Nov 4 2025, 7:45 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Wikidata, Wikidata-Query-Service
RKemper closed T407380: Add https://www.performing-arts.ch/sparql to WDQS allowlist, a subtask of T402892: ☎️ Wikidata Allowlist nominations: manual discovery, as Resolved.
Nov 4 2025, 7:45 PM · Wikibase Cloud
RKemper closed T407412: Replace http://agrovoc.uniroma2.it/sparql with https://agrovoc.fao.org/sparql in WDQS allowlist as Resolved.
Nov 4 2025, 7:45 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Wikidata, Wikidata-Query-Service
RKemper closed T407412: Replace http://agrovoc.uniroma2.it/sparql with https://agrovoc.fao.org/sparql in WDQS allowlist, a subtask of T402892: ☎️ Wikidata Allowlist nominations: manual discovery, as Resolved.
Nov 4 2025, 7:45 PM · Wikibase Cloud

Nov 3 2025

RKemper added a comment to T389859: WDQS: Alert on high thread count.

We could alert when this metric hits >1000 or maybe >1500

Nov 3 2025, 10:41 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Essential-Work
RKemper renamed T407406: Add https://query.wikitrek.org/sparql to WDQS allowlist from Add https://query.wikitrek.org/sparql] to WDQS allowlist to Add https://query.wikitrek.org/sparql to WDQS allowlist.
Nov 3 2025, 10:12 PM · Essential-Work, Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Wikidata, Wikidata-Query-Service
RKemper claimed T407406: Add https://query.wikitrek.org/sparql to WDQS allowlist.
Nov 3 2025, 10:12 PM · Essential-Work, Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Wikidata, Wikidata-Query-Service
RKemper claimed T407405: Add https://ld.ncl.edu.tw/fuseki/lod/query to WDQS allowlist.
Nov 3 2025, 10:12 PM · Essential-Work, Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Wikimedia Taiwan, Wikidata, Wikidata-Query-Service
RKemper claimed T407382: Add https://dati.cultura.gov.it/sparql to WDQS allowlist.
Nov 3 2025, 10:12 PM · Essential-Work, Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Wikidata, Wikidata-Query-Service
RKemper added a project to T407381: Add https://beta.sparql.swisslipids.org/sparql to WDQS allowlist: Data-Platform-SRE (2025.10.17 - 2025.11.07).
Nov 3 2025, 10:11 PM · Essential-Work, Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Wikidata, Wikidata-Query-Service
RKemper claimed T407381: Add https://beta.sparql.swisslipids.org/sparql to WDQS allowlist.
Nov 3 2025, 10:11 PM · Essential-Work, Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Wikidata, Wikidata-Query-Service

Oct 31 2025

RKemper claimed T362114: OpenSearch on K8s: Create Dashboards.
Oct 31 2025, 9:05 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), OKR-Work
RKemper added a comment to T362114: OpenSearch on K8s: Create Dashboards.

Discussed this with Brian. First step for me is to take a look at https://grafana-rw.wikimedia.org/d/c0a89788-c6fe-4d06-aeb2-70b63049599e/opensearch-on-k8s?orgId=1&from=now-7d&to=now&timezone=browser&var-datasource=P0AF0B00C3C579A2D&var-interval=1m&var-cluster=opensearch-test&var-node=$__all&var-shard_type=$__all&var-pool_name=$__all, pull out the most directly useful panels and create a new section at the top with those concise metrics (we'll still have the auto-generated stuff below it).

Oct 31 2025, 9:02 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), OKR-Work

Oct 29 2025

RKemper updated subscribers of T393966: Update WDQS SLO lag queries to reflect graph split changes.

@dcausse In this updated version of the SLI we don't want to count throttled requests as either a success or failure, but rather exclude them entirely. However I'm having a bit of trouble understanding how all the pieces fit together.

Oct 29 2025, 9:37 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, User-Elukey, Essential-Work, SRE-SLO, observability
RKemper closed T408063: doing a wdqs categories transfer w/ --force flag wipes out /srv/wdqs/data_loaded on dest host as Resolved.
Oct 29 2025, 8:46 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07)
RKemper closed T408163: wdqs data-transfer: make the --force behavior default as Resolved.
Oct 29 2025, 8:45 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07)
RKemper closed T408165: Requesting Kerberos access for Jmoore111, a subtask of T408164: Requesting access to Superset, Turnilo, Spark, Presto, Hive, Hadoop, Jupyter for Jmoore111, as Resolved.
Oct 29 2025, 8:37 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), SRE, SRE-Access-Requests
RKemper closed T408165: Requesting Kerberos access for Jmoore111 as Resolved.

Okay, we verified that the kerberos principal is set up and Justin can kinit successfully.

Oct 29 2025, 8:37 PM · SRE, SRE-Access-Requests, Data-Engineering-Radar, Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Data-Engineering
RKemper added a comment to T408165: Requesting Kerberos access for Jmoore111.

Oops, needed to have made it for jmoore111. I deleted the old principal and recreated with the proper name:

Oct 29 2025, 8:30 PM · SRE, SRE-Access-Requests, Data-Engineering-Radar, Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Data-Engineering
RKemper added a comment to T408165: Requesting Kerberos access for Jmoore111.

Configured a kerberos principal (hopefully I was supposed to do that in this ticket and not a separate request):

Oct 29 2025, 8:20 PM · SRE, SRE-Access-Requests, Data-Engineering-Radar, Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Data-Engineering
RKemper added a comment to T393966: Update WDQS SLO lag queries to reflect graph split changes.

Working on the new metrics here. The panel labeled success rate is what will ultimately be the SLI. There's still a couple further changes to make:

  • merge the two datacenter's metrics (they're just separated right now while the final query is getting assembled)
  • subtract throttled requests
Oct 29 2025, 7:32 AM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, User-Elukey, Essential-Work, SRE-SLO, observability

Oct 28 2025

RKemper closed T372094: WDQS: Document review/refresh for https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service as Resolved.
Oct 28 2025, 7:45 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Documentation, Wikidata, Wikidata-Query-Service
RKemper closed T372094: WDQS: Document review/refresh for https://wikitech.wikimedia.org/wiki/Wikidata_Query_Service, a subtask of T337013: [Epic] Splitting the graph in WDQS, as Resolved.
Oct 28 2025, 7:45 PM · Discovery-Search, Epic, Wikidata-Query-Service, Wikidata

Oct 27 2025

RKemper added a comment to T407378: Add https://data.muziekweb.nl/MuziekwebOrganization/Muziekweb/sparql/Muziekweb to WDQS allowlist.

Direct URL seems to be https://data.muziekweb.nl/_api/datasets/MuziekwebOrganization/Muziekweb/services/Muziekweb/sparql

Oct 27 2025, 9:18 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Wikidata, Wikidata-Query-Service
RKemper added a comment to T407380: Add https://www.performing-arts.ch/sparql to WDQS allowlist.

When I make a request via the UI it ends up going to https://www.performing-arts.ch/sparql?repository=default, so I'll try that URL

Oct 27 2025, 9:17 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Wikidata, Wikidata-Query-Service
RKemper claimed T407378: Add https://data.muziekweb.nl/MuziekwebOrganization/Muziekweb/sparql/Muziekweb to WDQS allowlist.
Oct 27 2025, 9:16 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Wikidata, Wikidata-Query-Service
RKemper claimed T407380: Add https://www.performing-arts.ch/sparql to WDQS allowlist.
Oct 27 2025, 9:16 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Wikidata, Wikidata-Query-Service
RKemper added a project to T407412: Replace http://agrovoc.uniroma2.it/sparql with https://agrovoc.fao.org/sparql in WDQS allowlist: Data-Platform-SRE (2025.10.17 - 2025.11.07).
Oct 27 2025, 9:09 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Wikidata, Wikidata-Query-Service
RKemper claimed T407412: Replace http://agrovoc.uniroma2.it/sparql with https://agrovoc.fao.org/sparql in WDQS allowlist.
Oct 27 2025, 9:09 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), Wikidata, Wikidata-Query-Service

Oct 24 2025

RKemper added a comment to T406920: deepcategory search fails to show all expected results.

@dcausse Here's the current state now, looks like everything's synced up:

Oct 24 2025, 7:06 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), CirrusSearch, Commons
RKemper moved T408163: wdqs data-transfer: make the --force behavior default from Backlog - project to Needs Review on the Data-Platform-SRE (2025.10.17 - 2025.11.07) board.
Oct 24 2025, 7:04 AM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07)

Oct 23 2025

RKemper created T408163: wdqs data-transfer: make the --force behavior default.
Oct 23 2025, 9:28 PM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07)
RKemper added a comment to T393966: Update WDQS SLO lag queries to reflect graph split changes.

Met with rzl.

Oct 23 2025, 8:20 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, User-Elukey, Essential-Work, SRE-SLO, observability
RKemper added a comment to T389859: WDQS: Alert on high thread count.

With respect to the thread-count approach, here's an interesting graph of a recent deadlock: https://grafana-rw.wikimedia.org/d/000000489/wikidata-query-service?orgId=1&refresh=1m&var-cluster_name=wdqs-main&from=2025-10-22T20:47:05.347Z&to=2025-10-23T08:26:43.628Z&timezone=utc&var-graph_type=%289102%7C919%5B35%5D%29&viewPanel=panel-22

Oct 23 2025, 8:28 AM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Essential-Work
RKemper moved T408063: doing a wdqs categories transfer w/ --force flag wipes out /srv/wdqs/data_loaded on dest host from Backlog - project to Needs Review on the Data-Platform-SRE (2025.10.17 - 2025.11.07) board.
Oct 23 2025, 7:46 AM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07)
RKemper added a comment to T408063: doing a wdqs categories transfer w/ --force flag wipes out /srv/wdqs/data_loaded on dest host.

Confirmed the patch fixes the issue:

Oct 23 2025, 7:45 AM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07)
RKemper added a comment to T389859: WDQS: Alert on high thread count.

Uploaded a proposed solution that alerts if the triples count metric is missing. Blazegraph deadlock always leads to the triple count metric missing.

Oct 23 2025, 7:13 AM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, Essential-Work
RKemper created T408063: doing a wdqs categories transfer w/ --force flag wipes out /srv/wdqs/data_loaded on dest host.
Oct 23 2025, 7:02 AM · Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07)

Oct 22 2025

RKemper created T408026: Add WDQS triples disrepancy alerting.
Oct 22 2025, 9:12 PM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Essential-Work, Wikidata, Wikidata-Query-Service
RKemper added a comment to T390860: Elasticsearch dependency upgrade in spicerack.

Looks like Brian already covered it, but just to reiterate, the one cumin test host approach sounds good. We'll just need to exercise a few of the codepaths, I'd probably start with something simple like just hitting the flush synced shards method and then if that works moving to the cookbooks that brian mentioned, which rely indirectly on the elasticsearch spicerack library.

Oct 22 2025, 5:32 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Data-Engineering-Radar, Discovery-Search, Data-Engineering, Infrastructure-Foundations
RKemper added a comment to T393966: Update WDQS SLO lag queries to reflect graph split changes.

@Gehel @RKemper Hi! A while ago I had a chat with Ryan to figure out how to improve the current WDQS SLOs being reported in Pyrra (slo.wikimedia.org). The traffic server's metrics are used, ending up with one SLO for each DC. We'd prefer to use something closer to the service, like nginx metrics on the wdqs hosts (and possibly having a single SLO without splitting by DCs, since this is an active/active service). Are those SLOs used at the moment? Namely, are they periodically checked etc.? Otherwise I'd propose to remove their config to clean up the Pyrra's status, and then restart when you are ready. Lemme know!

Oct 22 2025, 6:39 AM · Data-Platform-SRE (2025.11.07 - 2025.11.28), Patch-For-Review, User-Elukey, Essential-Work, SRE-SLO, observability

Oct 21 2025

RKemper moved T403036: Add query.portal.mardi4nfdi.de to WDQS allowlist from Blocked/Waiting to Done on the Data-Platform-SRE (2025.10.17 - 2025.11.07) board.

Fixed the query: https://query.wikidata.org/#SELECT%20%3Fs%20%3Fp%20%3Fo%20%7B%0A%20%20SERVICE%20%3Chttps%3A%2F%2Fquery.portal.mardi4nfdi.de%2Fsparql%3E%20%7B%0A%20%20%20%20SELECT%20%2a%20%7B%0A%20%20%20%20%20%20%3Fs%20%3Fp%20%3Fo%0A%20%20%20%20%7D%20LIMIT%2010%0A%20%20%7D%0A%7D

Oct 21 2025, 9:09 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Wikidata, Wikidata-Query-Service
RKemper moved T401919: Also add RKD schema.org Knowledge Graph to the WDQS allowlist from Blocked/Waiting to Done on the Data-Platform-SRE (2025.10.17 - 2025.11.07) board.

derp, turns out it was just the limit order! the following query works:

Oct 21 2025, 9:07 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Wikibase Cloud, Wikidata, Wikidata-Query-Service

Oct 20 2025

RKemper updated subscribers of T401919: Also add RKD schema.org Knowledge Graph to the WDQS allowlist.

@dcausse any guesses why this federation isn't working here?

Oct 20 2025, 10:05 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Wikibase Cloud, Wikidata, Wikidata-Query-Service
RKemper added a comment to T406920: deepcategory search fails to show all expected results.

Sounds like some initial investigation has been done by the team and the issue seems to be localized to codfw hosts. We can likely fix the issue by doing data transfers of the categories graph from an eqiad host to all the codfw hosts, but we may want to take a closer look before doing so incase we miss an opportunity to identify a bug in our process.

Oct 20 2025, 9:54 PM · Discovery-Search (2025.10.20 - 2025.12.31), Essential-Work, Data-Platform-SRE (2025.10.17 - 2025.11.07), CirrusSearch, Commons

Oct 10 2025

RKemper moved T405978: Re-image remaining full graph hosts to post-graph-split roles from In Progress to Done on the Data-Platform-SRE (2025.09.26 - 2025.10.17) board.

This is done.

Oct 10 2025, 6:35 AM · Data-Platform-SRE (2025.09.26 - 2025.10.17), Essential-Work, Wikidata-Query-Service, Wikidata

Oct 8 2025

RKemper added a comment to T405978: Re-image remaining full graph hosts to post-graph-split roles.

Sigh, had host all ready for the data-transfer and then ran the reimage by mistake. Probably my sign to log off for the night :) This host will need a scap deploy and data transfer when done

Oct 8 2025, 4:57 AM · Data-Platform-SRE (2025.09.26 - 2025.10.17), Essential-Work, Wikidata-Query-Service, Wikidata
RKemper added a comment to T405978: Re-image remaining full graph hosts to post-graph-split roles.

^ oops that should say wdqs-main host not wdqs-internal-main

Oct 8 2025, 3:54 AM · Data-Platform-SRE (2025.09.26 - 2025.10.17), Essential-Work, Wikidata-Query-Service, Wikidata
RKemper added a comment to T405978: Re-image remaining full graph hosts to post-graph-split roles.

wdqs1018 has been reimaged and scap-deployed. data-transfer in progress

Oct 8 2025, 3:40 AM · Data-Platform-SRE (2025.09.26 - 2025.10.17), Essential-Work, Wikidata-Query-Service, Wikidata

Oct 7 2025

RKemper moved T403036: Add query.portal.mardi4nfdi.de to WDQS allowlist from Done to Blocked/Waiting on the Data-Platform-SRE (2025.09.26 - 2025.10.17) board.

Oops, updated the wrong ticket. This is failing with upstream request timeout: https://query.wikidata.org/#SELECT%20%3Fs%20%3Fp%20%3Fo%20%7B%0A%20%20SERVICE%20%3Chttps%3A%2F%2Fquery.portal.mardi4nfdi.de%2Fsparql%3E%20%7B%0A%20%20%20%20%3Fs%20%3Fp%20%3Fo%0A%20%20%7D%0A%7D%0ALIMIT%2010

Oct 7 2025, 9:56 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Wikidata, Wikidata-Query-Service
RKemper moved T402905: Add Artsdata to WDQS allowlist from Blocked/Waiting to Done on the Data-Platform-SRE (2025.09.26 - 2025.10.17) board.

The federation seems to be failing. Maybe the https://kg.artsdata.ca/sparql URL is wrong? I'm seeing it be redirected to https://artsdata-trifid-production.herokuapp.com/sparql

Looks like we'll want https://artsdata-trifid-production.herokuapp.com/query

Oct 7 2025, 9:55 PM · Data-Platform-SRE (2025.09.26 - 2025.10.17), Essential-Work, Wikidata, Wikidata-Query-Service
RKemper added a comment to T401919: Also add RKD schema.org Knowledge Graph to the WDQS allowlist.

I think based off making a request with browser tools -> network the correct URL is going to be https://rkd.triply.cc/_api/datasets/rkd/RKD-SDO-Knowledge-Graph/sparql

Oct 7 2025, 9:52 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Wikibase Cloud, Wikidata, Wikidata-Query-Service
RKemper moved T403036: Add query.portal.mardi4nfdi.de to WDQS allowlist from Blocked/Waiting to Done on the Data-Platform-SRE (2025.09.26 - 2025.10.17) board.

EDIT: Ignore the below, mixed up tickets!

Oct 7 2025, 9:24 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Wikidata, Wikidata-Query-Service
RKemper added a comment to T390860: Elasticsearch dependency upgrade in spicerack.

Alright, we've got tests passing and it looks like we're ready to merge! It's been a while since I've merged a new spicerack version, are there still a bunch of manual steps that you need to run @elukey or is it pretty hands-off?

Oct 7 2025, 8:05 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Data-Engineering-Radar, Discovery-Search, Data-Engineering, Infrastructure-Foundations
RKemper claimed T392622: ProbeDown - query.wikidata.org.
Oct 7 2025, 6:44 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Wikidata, Wikidata-Query-Service

Oct 6 2025

RKemper added a comment to T402905: Add Artsdata to WDQS allowlist.

The federation seems to be failing. Maybe the https://kg.artsdata.ca/sparql URL is wrong? I'm seeing it be redirected to https://artsdata-trifid-production.herokuapp.com/sparql

Oct 6 2025, 9:19 PM · Data-Platform-SRE (2025.09.26 - 2025.10.17), Essential-Work, Wikidata, Wikidata-Query-Service
RKemper added a comment to T401919: Also add RKD schema.org Knowledge Graph to the WDQS allowlist.
Oct 6 2025, 9:17 PM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Wikibase Cloud, Wikidata, Wikidata-Query-Service

Sep 23 2025

RKemper closed T395772: Teardown lvs for wdqs public pool as Resolved.

Removed via sudo -E cumin 'A:config-master' 'rm -fv /srv/config-master/pybal/*/wdqs':

Sep 23 2025, 7:24 PM · Essential-Work, Data-Platform-SRE (2025.09.05 - 2025.09.26), Epic, Wikidata-Query-Service, Wikidata
RKemper closed T395772: Teardown lvs for wdqs public pool, a subtask of T337013: [Epic] Splitting the graph in WDQS, as Resolved.
Sep 23 2025, 7:24 PM · Discovery-Search, Epic, Wikidata-Query-Service, Wikidata
RKemper added a comment to T395772: Teardown lvs for wdqs public pool.
Sep 23 2025, 6:58 PM · Essential-Work, Data-Platform-SRE (2025.09.05 - 2025.09.26), Epic, Wikidata-Query-Service, Wikidata
RKemper created T405395: DPE SRE work to enable testing of Blazegraph alternatives.
Sep 23 2025, 6:32 PM · Essential-Work, Data-Platform-SRE (2025.09.26 - 2025.10.17), Wikidata-Query-Service, Wikidata

Sep 18 2025

RKemper added a comment to T390860: Elasticsearch dependency upgrade in spicerack.

Have pushed out various improvements to the code. Still much to do on the unit test side.

Sep 18 2025, 6:56 AM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Data-Engineering-Radar, Discovery-Search, Data-Engineering, Infrastructure-Foundations

Sep 12 2025

RKemper updated subscribers of T390860: Elasticsearch dependency upgrade in spicerack.

Current state:

Sep 12 2025, 4:13 AM · Data-Platform-SRE (2025.10.17 - 2025.11.07), Essential-Work, Data-Engineering-Radar, Discovery-Search, Data-Engineering, Infrastructure-Foundations

Sep 4 2025

RKemper claimed T403738: Apply Envoy updates to wcqs and wdqs hosts.
Sep 4 2025, 6:06 PM · Essential-Work, Data-Platform-SRE (2025.08.16 - 2025.09.05), SRE, serviceops, envoy
RKemper added a comment to T403738: Apply Envoy updates to wcqs and wdqs hosts.

Current status:

Sep 4 2025, 5:59 PM · Essential-Work, Data-Platform-SRE (2025.08.16 - 2025.09.05), SRE, serviceops, envoy

Sep 2 2025

RKemper closed T398820: Add RKD to WDQS allowlist as Resolved.
Sep 2 2025, 3:06 PM · Essential-Work, Data-Platform-SRE (2025.08.16 - 2025.09.05), Discovery-Search (2025.08.15 - 2025.09.05), Wikibase Cloud, Wikidata, Wikidata-Query-Service