Page MenuHomePhabricator
Feed Advanced Search

Yesterday

RKemper updated subscribers of T361525: Degraded RAID on elastic2088.

Was able to get a puppet run on elastic2088, but since that run a couple hours ago the host is ssh unreachable (it hangs indefinitely). Seeing some concerning stuff in the drac via getsel on elastic2088.mgmt.codfw.wmnet:

Wed, Apr 17, 2:52 AM · ops-codfw, Data-Platform-SRE (2024.04.15 - 2024.05.05), Patch-For-Review

Mon, Apr 15

RKemper moved T338009: Create dashboards for Search SLOs from In Progress to Needs Review on the Data-Platform-SRE (2024.04.15 - 2024.05.05) board.

Added further context to the SLI section of the documentation explaining what each query type actually means. I believe there's no more oustanding TODOs on this task.

Mon, Apr 15, 6:18 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Discovery-Search (Current work)
RKemper added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

@RKemper Is your point that the queries should return a result? Neither DBLP nor Wikidata have the predicate foaf:name, so it's clear that both SERVICE queries return an empty result. Here is an example for a query that gives a result:

PREFIX schema: <http://schema.org/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?editor ?editorName
WHERE {
  SERVICE <https://qlever.cs.uni-freiburg.de/api/wikidata> {
    wd:Q113544723 wdt:P179 ?editor.
    ?editor schema:name ?editorName.
  }
}
Mon, Apr 15, 5:40 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata
RKemper added a comment to T362534: Make Elasticsearch rolling operation cookbook safer: default to one node per run.

I vote that we choose the default based on the cluster being operated on. So 3 for eqiad and codfw, 2 for cloudelastic and 1 for relforge.

Mon, Apr 15, 5:25 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05)

Mon, Apr 8

RKemper moved T362080: decommission wdqs1025 from Backlog to Done on the Data-Platform-SRE (2024.03.25 - 2024.04.14) board.

Created subtask for dc-ops' side of the decom. Resolving this parent ticket now.

Mon, Apr 8, 11:03 PM · Patch-For-Review, Data-Platform-SRE (2024.03.25 - 2024.04.14)
RKemper added a subtask for T362080: decommission wdqs1025: T362122: decommission wdqs1025.eqiad.wmnet.
Mon, Apr 8, 11:01 PM · Patch-For-Review, Data-Platform-SRE (2024.03.25 - 2024.04.14)
RKemper added a parent task for T362122: decommission wdqs1025.eqiad.wmnet: T362080: decommission wdqs1025.
Mon, Apr 8, 11:01 PM · ops-eqiad, SRE, decommission-hardware
RKemper created T362122: decommission wdqs1025.eqiad.wmnet.
Mon, Apr 8, 11:01 PM · ops-eqiad, SRE, decommission-hardware

Thu, Apr 4

RKemper moved T361525: Degraded RAID on elastic2088 from Backlog to Blocked / Waiting on the Data-Platform-SRE (2024.03.25 - 2024.04.14) board.
Thu, Apr 4, 6:59 PM · ops-codfw, Data-Platform-SRE (2024.04.15 - 2024.05.05), Patch-For-Review

Tue, Apr 2

RKemper closed T357533: Allow federated queries with the Iconclass sparql endpoint as Resolved.
Tue, Apr 2, 3:45 PM · Data-Platform-SRE, Wikidata, Wikidata-Query-Service
RKemper closed T358882: Decommission elastic2037-2054 as Resolved.

With dc-ops having closed out the decom subtask, this should be all done.

Tue, Apr 2, 6:12 AM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Patch-For-Review
RKemper closed T358882: Decommission elastic2037-2054, a subtask of T353878: Service implementation for elastic2087-2109, as Resolved.
Tue, Apr 2, 6:11 AM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Patch-For-Review
RKemper claimed T361268: Service implementation for elastic110[3-7].
Tue, Apr 2, 6:11 AM · Data-Platform-SRE

Thu, Mar 28

RKemper added a subtask for T358882: Decommission elastic2037-2054: T361305: decommission elastic20[37-54].codfw.wmnet.
Thu, Mar 28, 8:41 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Patch-For-Review
RKemper added a parent task for T361305: decommission elastic20[37-54].codfw.wmnet: T358882: Decommission elastic2037-2054.
Thu, Mar 28, 8:41 PM · SRE, ops-codfw, decommission-hardware
RKemper created T361305: decommission elastic20[37-54].codfw.wmnet.
Thu, Mar 28, 8:41 PM · SRE, ops-codfw, decommission-hardware
RKemper assigned T361286: Fatal error detected on elastic2088 to Papaul.
Thu, Mar 28, 8:02 PM · Patch-For-Review, SRE, ops-codfw, Data-Platform-SRE
RKemper added a comment to T353878: Service implementation for elastic2087-2109.

elastic2088 is unreachable and reported as missing from PuppetDB by Netbox report. No host should be powered on with puppet disabled or not working for longer period of time. Please either reimage it or shut it down now and reimage it at a later stage (before powering it on).

Thu, Mar 28, 6:42 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Patch-For-Review
RKemper updated the task description for T361268: Service implementation for elastic110[3-7].
Thu, Mar 28, 5:56 PM · Data-Platform-SRE
RKemper added a comment to T358046: decommission cloudelastic100[1-4].wikimedia.org.

@RKemper Thanks for bringing this up! I missed running the script for this device. It's been run and decommissioned.

Thu, Mar 28, 5:54 PM · SRE, ops-eqiad, decommission-hardware
RKemper created T361268: Service implementation for elastic110[3-7].
Thu, Mar 28, 5:53 PM · Data-Platform-SRE
RKemper reopened T358046: decommission cloudelastic100[1-4].wikimedia.org as "In Progress".

@VRiley-WMF In netbox I see cloudelastic1003 still listed as decommissioning, whereas the other cloudelastic hosts are marked as Offline. Is it just the step to set netbox status to Offline that we're missing or are there other steps that still need to be run on cloudelastic1003 as well?

Thu, Mar 28, 5:40 PM · SRE, ops-eqiad, decommission-hardware

Thu, Mar 21

RKemper added a comment to T360697: Investigate/fix broken apifeatureusage index deletion.

Glancing at ryankemper@apifeatureusage1001:~$ sudo journalctl -u curator_actions_apifeatureusage_eqiad:

Thu, Mar 21, 8:46 PM · Discovery-Search, Data-Platform-SRE

Wed, Mar 20

RKemper added a comment to T353845: decommission wdqs100[6-8].

Had forgotten to properly assign dc-ops as well as tag for the DC. Straightened that out now, so this should be ready for dc-ops to do the decom.

Wed, Mar 20, 8:08 PM · SRE, ops-eqiad, decommission-hardware
RKemper assigned T353845: decommission wdqs100[6-8] to Jclark-ctr.
Wed, Mar 20, 8:07 PM · SRE, ops-eqiad, decommission-hardware

Mar 15 2024

RKemper updated the task description for T358841: Adapt gitlab pipelines for the new wmf-jvm-parent-pom.
Mar 15 2024, 6:53 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Release-Engineering-Team, Discovery-Search, Data-Engineering, Metrics Platform Backlog, Java-Scala-Standardization

Mar 13 2024

RKemper added a comment to T347034: RESTBase /v1/related endpoint should call the MW action API with a GET not a POST.

With https://github.com/wikimedia/restbase/pull/1336 being merged, is this ticket now resolved?

Mar 13 2024, 6:46 PM · API Platform, RESTBase Sunsetting, Essential-Work, Wikifeeds, Sustainability (Incident Followup), Discovery-Search

Mar 4 2024

RKemper updated the task description for T338009: Create dashboards for Search SLOs.
Mar 4 2024, 7:37 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Discovery-Search (Current work)

Mar 1 2024

RKemper moved T356651: Rebuild and deploy textify plugin from Needs Review to Done on the Data-Platform-SRE (2024.03.04 - 2024.03.24) board.

This should be all done; @TJones can you confirm all is working as it should?

Mar 1 2024, 9:42 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Discovery-Search (Current work)

Feb 29 2024

RKemper claimed T345337: spicerack: tox fails to install PyYAML using python 3.11 on bookworm.

@fnegri @brouberol Yeah, Brian and I will work on getting this tested and merged. Thanks for the heads up!

Feb 29 2024, 7:36 PM · cloud-services-team (FY2023/2024-Q3-Q4), Patch-For-Review, Infrastructure-Foundations, SRE-tools, Spicerack

Feb 21 2024

RKemper added a comment to T351488: Allow federated queries with the MiMoTextBase SPARQL endpoint.

@HinMar Okay, I think we've got the endpoints properly allowed. Queries appear to be working for me. Are you seeing the same?

Feb 21 2024, 10:45 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata, Wikidata-Query-Service
RKemper added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

@Hannah_Bast Okay, we figured out what was making the allowed endpoints not updated properly. https://w.wiki/6q2i doesn't get an error message anymore, although the query itself returns no results.

Feb 21 2024, 10:43 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata
RKemper moved T346455: Allow federated queries with the NFDI4Culture Knowledge Graph from In Progress to Blocked / Waiting on the Data-Platform-SRE (2024.02.12 - 2024.03.03) board.
Feb 21 2024, 10:41 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata, Wikidata-Query-Service
RKemper added a comment to T346455: Allow federated queries with the NFDI4Culture Knowledge Graph.

@Loz.ross Yes the change to allowed endpoints did not get properly deployed; we've fixed that now. However there's another issue now that we've gotten past the Service URI not being allowed:

Feb 21 2024, 10:41 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata, Wikidata-Query-Service

Feb 20 2024

RKemper moved T357780: Decommission cloudelastic1001-1004 from Backlog to Done on the Data-Platform-SRE (2024.02.12 - 2024.03.03) board.
Feb 20 2024, 8:34 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03)
RKemper updated the task description for T358046: decommission cloudelastic100[1-4].wikimedia.org.
Feb 20 2024, 8:33 PM · SRE, ops-eqiad, decommission-hardware
RKemper updated the task description for T358046: decommission cloudelastic100[1-4].wikimedia.org.
Feb 20 2024, 8:25 PM · SRE, ops-eqiad, decommission-hardware
RKemper created T358046: decommission cloudelastic100[1-4].wikimedia.org.
Feb 20 2024, 8:24 PM · SRE, ops-eqiad, decommission-hardware
RKemper claimed T357533: Allow federated queries with the Iconclass sparql endpoint.
Feb 20 2024, 7:38 PM · Data-Platform-SRE, Wikidata, Wikidata-Query-Service

Feb 15 2024

RKemper added a comment to T356651: Rebuild and deploy textify plugin.

Finished the upload process; next up is rolling restart of cluster and merge of https://gitlab.wikimedia.org/repos/search-platform/cirrussearch-elasticsearch-image/-/merge_requests/7?commit_id=f1028a26dff38603bea67a8edda8337dab07bbfc

Feb 15 2024, 5:51 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03), Discovery-Search (Current work)
RKemper edited P19522 Plugin Upload Process.
Feb 15 2024, 5:15 PM · Discovery-Search (Current work)

Feb 6 2024

RKemper added a comment to T338009: Create dashboards for Search SLOs.

Added new threshold markers at 95% for the 4 SLO graphs. We may want to revise the % SLO upwards, but let's stick with 95% for now until we get another quarter of data.

Feb 6 2024, 7:46 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Discovery-Search (Current work)

Feb 1 2024

RKemper added a comment to T351488: Allow federated queries with the MiMoTextBase SPARQL endpoint.

@RKemper : Thank you for your message. The project has ended, but we still kindly ask you to whitelist this endpoint. We at the Trier Center for Digital Humanities will continue to work with LOD beyond this one project and would be delighted to be able to run federated queries starting from Wikidata directed towards the MiMoTextBase. We have developed an approach in MiMoText that we now want to transfer and adapt for other domains in a new project (“LODinG” – Linked Open Data in the Humanities). We are still very interested in the 'wikiverse' and in gaining as much experience as possible in the area of 'federation', which we see as the absolute key to the LOD vision. We are also planning to provide a showcase and if the whitelisting could be done relatively soon, we would like to include this new 'federation direction'. Can you estimate how long it will take?

Feb 1 2024, 7:44 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata, Wikidata-Query-Service
RKemper added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

Yes, https://qlever.cs.uni-freiburg.de/api/dblp is the URL for API calls, whereas https://qlever.cs.uni-freiburg.de/dblp (without the /api) is the URL of the QLever UI. Same for all the other endpoints.

For example, https://qlever.cs.uni-freiburg.de/api/dblp?query=SELECT+%2A+WHERE+%7B+%3Fs+%3Fp+%3Fo+%7D+LIMIT+10 gives you the results for SELECT * WHERE { ?s ?p ?o } LIMIT 10 as application/sparql-results+json .

Feb 1 2024, 7:41 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

Jan 31 2024

RKemper added a comment to T346455: Allow federated queries with the NFDI4Culture Knowledge Graph.

@Loz.ross Sorry for the delay, we've added the endpoint. Can you confirm it's working with an example query?

Jan 31 2024, 10:22 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata, Wikidata-Query-Service
RKemper added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.
Jan 31 2024, 10:22 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata
RKemper added a comment to T351488: Allow federated queries with the MiMoTextBase SPARQL endpoint.

@HinMar Sorry for missing this request - our bad! I see your earlier comment mentioned the project expiring by end of 2023. Is the project still ongoing and therefore we should still whitelist this new endpoint or should I instead close this ticket out?

Jan 31 2024, 10:19 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata, Wikidata-Query-Service

Jan 30 2024

RKemper closed T355272: ProbeDown as Resolved.
Jan 30 2024, 4:54 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11)

Jan 25 2024

RKemper moved T351354: Service implementation for cloudelastic1007-1010 from In Progress to Blocked / Waiting on the Data-Platform-SRE (2024.01.22 - 2024.02.11) board.

Old masters are no longer master-eligible. They're still participating in the actual cluster; we're holding off on the physical decom until T355617 is done

Jan 25 2024, 10:48 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03)

Jan 24 2024

RKemper added a comment to T351354: Service implementation for cloudelastic1007-1010.

Forgot to add the Bug: label but https://gerrit.wikimedia.org/r/c/operations/puppet/+/992826 is part of this ticket as well

Jan 24 2024, 10:58 PM · Data-Platform-SRE (2024.02.12 - 2024.03.03)

Jan 23 2024

RKemper closed T355593: Re-generate webserver-misc-apps.discovery.wmnet cergen certificate, a subtask of T351650: Expose 3 new dedicated WDQS endpoints, as Resolved.
Jan 23 2024, 7:25 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper closed T355593: Re-generate webserver-misc-apps.discovery.wmnet cergen certificate as Resolved.
Jan 23 2024, 7:25 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), collaboration-services
RKemper moved T350464: Expose SPARQL endpoints with full wikidata data set and with split graph to enable experimentation on federation with a split graph from Blocked/Waiting to Needs Reporting on the Discovery-Search (Current work) board.

This should be all done, with the new experimental services accessible at:

Jan 23 2024, 5:19 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper updated the task description for T350464: Expose SPARQL endpoints with full wikidata data set and with split graph to enable experimentation on federation with a split graph.
Jan 23 2024, 5:17 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper moved T354658: Create 3 microsites for wdqs full graph, main graph, & scholarly articles from In Progress to Done on the Data-Platform-SRE (2024.01.22 - 2024.02.11) board.

Experimental microsites are up and externally reachable:

Jan 23 2024, 5:15 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper updated the task description for T354658: Create 3 microsites for wdqs full graph, main graph, & scholarly articles.
Jan 23 2024, 5:13 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper moved T355593: Re-generate webserver-misc-apps.discovery.wmnet cergen certificate from Backlog to Done on the Data-Platform-SRE (2024.01.22 - 2024.02.11) board.

We've rolled this out following the steps in https://wikitech.wikimedia.org/wiki/Cergen#Update_a_certificate

Jan 23 2024, 5:12 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), collaboration-services
RKemper moved T351650: Expose 3 new dedicated WDQS endpoints from In Progress to Done on the Data-Platform-SRE (2024.01.22 - 2024.02.11) board.
Jan 23 2024, 5:11 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper updated the task description for T351650: Expose 3 new dedicated WDQS endpoints.
Jan 23 2024, 5:11 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Jan 22 2024

RKemper added a subtask for T350464: Expose SPARQL endpoints with full wikidata data set and with split graph to enable experimentation on federation with a split graph: T355593: Re-generate webserver-misc-apps.discovery.wmnet cergen certificate.
Jan 22 2024, 8:02 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper added a parent task for T355593: Re-generate webserver-misc-apps.discovery.wmnet cergen certificate: T350464: Expose SPARQL endpoints with full wikidata data set and with split graph to enable experimentation on federation with a split graph.
Jan 22 2024, 8:02 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), collaboration-services
RKemper moved T354661: Generate TLS certs for new WDQS endpoints from In Progress to Done on the Data-Platform-SRE (2024.01.22 - 2024.02.11) board.

These 3 new services have their internal certs working with Envoy. Moving to Done and spun off https://phabricator.wikimedia.org/T355593 for the last cert-related work.

Jan 22 2024, 8:01 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata
RKemper created T355593: Re-generate webserver-misc-apps.discovery.wmnet cergen certificate.
Jan 22 2024, 7:59 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), collaboration-services
RKemper moved T347624: Refactor sre.wdqs.data-transfer to use new spicerack class api from In Progress to Needs Review on the Data-Platform-SRE (2024.01.22 - 2024.02.11) board.
Jan 22 2024, 7:47 PM · Data-Platform-SRE (2024.03.04 - 2024.03.24)
RKemper moved T354658: Create 3 microsites for wdqs full graph, main graph, & scholarly articles from Needs Review to In Progress on the Data-Platform-SRE (2024.01.22 - 2024.02.11) board.
Jan 22 2024, 7:44 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper moved T354662: Create DNS records for 3 new WDQS endpoints from In Progress to Done on the Data-Platform-SRE (2024.01.22 - 2024.02.11) board.
Jan 22 2024, 7:42 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata
RKemper updated the task description for T355589: Migrate Search SLOs to prometheus based metrics.
Jan 22 2024, 7:28 PM · Data-Platform-SRE, Discovery-Search
RKemper added a comment to T355589: Migrate Search SLOs to prometheus based metrics.

Should be moved to Blocked / Waiting. However for now I think I need to leave it in incoming until it's been triaged by the Search Platform team.

Jan 22 2024, 7:27 PM · Data-Platform-SRE, Discovery-Search
RKemper created T355589: Migrate Search SLOs to prometheus based metrics.
Jan 22 2024, 7:26 PM · Data-Platform-SRE, Discovery-Search
RKemper moved T338009: Create dashboards for Search SLOs from In Progress to Done on the Data-Platform-SRE (2024.01.22 - 2024.02.11) board.

Finished the documentation. With the new dashboard up in https://grafana-rw.wikimedia.org/d/xiWr1c5Iz/search-slos?orgId=1, this work is complete.

Jan 22 2024, 7:24 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Discovery-Search (Current work)

Jan 19 2024

RKemper added a comment to T355278: ProbeDown - *_experimental_wikidata_org.

Deployed https://gerrit.wikimedia.org/r/c/operations/puppet/+/991680 to back out these microsites over the weekend and cut down on noise

Jan 19 2024, 12:07 AM · collaboration-services

Jan 18 2024

RKemper added a comment to T354662: Create DNS records for 3 new WDQS endpoints.

Deployed the following changes via sudo -i authdns-update:

Jan 18 2024, 7:50 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata

Jan 16 2024

RKemper added a comment to T354658: Create 3 microsites for wdqs full graph, main graph, & scholarly articles.

We'll need to add 3 entries to https://gerrit.wikimedia.org/r/c/operations/puppet/+/668543/4/modules/profile/manifests/microsites/wdqs.pp following the model in https://gerrit.wikimedia.org/r/c/operations/puppet/+/668543/

Jan 16 2024, 8:01 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper added a comment to T354661: Generate TLS certs for new WDQS endpoints.

Talked with gehel, ebernhardson, and inflatador. We're going to start with full-experimental.query.wikidata.org, main-experimental.query.wikidata.org, scholarly-experimental.query.wikidata.org to get these 3 test endpoints up. Meanwhile, we can open up the convo with the community as far as what the ultimate "final" naming/domain scheme will be wrt https://phabricator.wikimedia.org/T354043

Jan 16 2024, 7:36 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata
RKemper added a comment to T354661: Generate TLS certs for new WDQS endpoints.

We did the initial work to get envoy via PKI / cfssl operational in https://phabricator.wikimedia.org/T354555#9454855. Next up is adding specific alt-names for the three new endpoints. Here's a few different proposals for naming scheme:

  • full.query.wikidata.org, main.query.wikidata.org, scholar.query.wikidata.org
  • full.wikidata.org, main.wikidata.org, scholar.wikidata.org
  • full-query.wikidata.org, main-query.wikidata.org, and scholarly-query.wikidata.org
  • full-query.wikidata.org, main-query.wikidata.org, and scholar.wikidata.org
  • full-graph.wikidata.org, main-graph.wikidata.org, and scholar-graph.wikidata.org
Jan 16 2024, 7:27 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata
RKemper added a comment to T338009: Create dashboards for Search SLOs.

Made some various improvements to the dashboard: collated SLIs into a single row, added threshold markers for every SLI, added y axis labelling and added a soft max of 600ms to automcomplete latency since currently grafana was setting the y axis max below 600 due to no data points existing >= 600

Jan 16 2024, 6:49 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Discovery-Search (Current work)

Jan 10 2024

RKemper moved T350703: Restart Search Platform-owned services for Java 8 / Java 11 security updates from In Progress to Done on the Data-Platform-SRE (2024.01.01 - 2024.01.21) board.
Jan 10 2024, 6:21 PM · Data-Platform-SRE (2024.01.01 - 2024.01.21)
RKemper added a comment to T350703: Restart Search Platform-owned services for Java 8 / Java 11 security updates.

@MoritzMuehlenhoff Oops it appears we made the same mistake twice :P Can you do one more check for us? I think everything is all set now:

Jan 10 2024, 12:34 AM · Data-Platform-SRE (2024.01.01 - 2024.01.21)

Jan 9 2024

RKemper added a comment to T338009: Create dashboards for Search SLOs.

Finished adding the SLO dashboards to https://grafana-rw.wikimedia.org/d/H6f-bA7Sk/rkemper-search-sli-test?orgId=1&from=now-90d&to=now. Remaining steps:

Jan 9 2024, 7:25 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Discovery-Search (Current work)
RKemper created T354662: Create DNS records for 3 new WDQS endpoints.
Jan 9 2024, 4:01 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata
RKemper created T354661: Generate TLS certs for new WDQS endpoints.
Jan 9 2024, 3:59 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata
RKemper removed a subtask for T350464: Expose SPARQL endpoints with full wikidata data set and with split graph to enable experimentation on federation with a split graph: T354658: Create 3 microsites for wdqs full graph, main graph, & scholarly articles.
Jan 9 2024, 3:51 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper added a subtask for T351650: Expose 3 new dedicated WDQS endpoints: T354658: Create 3 microsites for wdqs full graph, main graph, & scholarly articles.
Jan 9 2024, 3:51 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper edited parent tasks for T354658: Create 3 microsites for wdqs full graph, main graph, & scholarly articles, added: T351650: Expose 3 new dedicated WDQS endpoints; removed: T350464: Expose SPARQL endpoints with full wikidata data set and with split graph to enable experimentation on federation with a split graph.
Jan 9 2024, 3:51 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper updated the task description for T351650: Expose 3 new dedicated WDQS endpoints.
Jan 9 2024, 3:51 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper created T354658: Create 3 microsites for wdqs full graph, main graph, & scholarly articles.
Jan 9 2024, 3:50 PM · Data-Platform-SRE (2024.01.22 - 2024.02.11), Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jan 8 2024

RKemper updated the task description for T350464: Expose SPARQL endpoints with full wikidata data set and with split graph to enable experimentation on federation with a split graph.
Jan 8 2024, 5:11 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper added a comment to T353482: decommission wdqs10[09-10].eqiad.wmnet.

@Jclark-ctr Yes, these hosts are fully ready to be decom'd.

Jan 8 2024, 4:15 PM · SRE, ops-eqiad, decommission-hardware

Jan 4 2024

RKemper added a comment to T353878: Service implementation for elastic2087-2109.

elastic2087 has joined the cluster as a bullseye host. I haven't officially pooled it yet.

Jan 4 2024, 9:16 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Patch-For-Review
RKemper moved T350703: Restart Search Platform-owned services for Java 8 / Java 11 security updates from In Progress to Needs Review on the Data-Platform-SRE (2024.01.01 - 2024.01.21) board.

@MoritzMuehlenhoff This should be all done. Let us know if you see any rogue java processes hanging around!

Jan 4 2024, 8:51 PM · Data-Platform-SRE (2024.01.01 - 2024.01.21)
RKemper updated the task description for T350703: Restart Search Platform-owned services for Java 8 / Java 11 security updates.
Jan 4 2024, 8:37 PM · Data-Platform-SRE (2024.01.01 - 2024.01.21)

Jan 2 2024

RKemper updated the task description for T353878: Service implementation for elastic2087-2109.
Jan 2 2024, 8:05 PM · Data-Platform-SRE (2024.03.25 - 2024.04.14), Patch-For-Review

Dec 20 2023

RKemper updated subscribers of T351671: Service implementation for wdqs10[17-21].

After talking in the #wikimedia-sre IRC channel, I'll run the sre.network.configure-switch-interfaces myself, and then Volans will take care of the puppetdb/debmonitor stuff after seeing if the cookbook can be improved to handle those idempotently.

Dec 20 2023, 11:16 PM · Data-Platform-SRE (2023.12.01 - 2023.12.31)
RKemper moved T351671: Service implementation for wdqs10[17-21] from In Progress to Done on the Data-Platform-SRE (2023.12.01 - 2023.12.31) board.
Dec 20 2023, 10:23 PM · Data-Platform-SRE (2023.12.01 - 2023.12.31)
RKemper updated the task description for T351671: Service implementation for wdqs10[17-21].
Dec 20 2023, 10:23 PM · Data-Platform-SRE (2023.12.01 - 2023.12.31)
RKemper added a comment to T351671: Service implementation for wdqs10[17-21].

Decom cookbook ran: https://sal.toolforge.org/log/tSXrhYwBhuQtenzvzt4I

Dec 20 2023, 10:22 PM · Data-Platform-SRE (2023.12.01 - 2023.12.31)
RKemper added a comment to T353845: decommission wdqs100[6-8].

Decom cookbook ran: https://sal.toolforge.org/log/tSXrhYwBhuQtenzvzt4I

Dec 20 2023, 10:22 PM · SRE, ops-eqiad, decommission-hardware
RKemper added a subtask for T351671: Service implementation for wdqs10[17-21]: T353845: decommission wdqs100[6-8].
Dec 20 2023, 10:21 PM · Data-Platform-SRE (2023.12.01 - 2023.12.31)
RKemper added a parent task for T353845: decommission wdqs100[6-8]: T351671: Service implementation for wdqs10[17-21].
Dec 20 2023, 10:21 PM · SRE, ops-eqiad, decommission-hardware
RKemper created T353845: decommission wdqs100[6-8].
Dec 20 2023, 10:21 PM · SRE, ops-eqiad, decommission-hardware