Feed Advanced Search

Advanced Search
Use Results
Edit Query
Hide Query

	Include stories about projects I am a member of.

Aug 31 2023

RKemper added a comment to T344198: Decommission wdqs100[3-5].

Generated new cergen certs for wdqs.discovery.wmnet that include wdqs1016 in the alt_names instead of wdqs1005. Followed the steps below:

Aug 31 2023, 7:27 PM · Data-Platform-SRE

RKemper updated the task description for T344198: Decommission wdqs100[3-5].

Aug 31 2023, 7:00 PM · Data-Platform-SRE

Aug 30 2023

RKemper renamed T344518: hw troubleshooting: wdqs1010 unreachable from SSH or DRAC from wdqs1010 unreachable from SSH or DRAC to hw troubleshooting: wdqs1010 unreachable from SSH or DRAC.

Aug 30 2023, 6:27 PM · Patch-For-Review, SRE, ops-eqiad, Data-Platform-SRE, DC-Ops

RKemper assigned T344518: hw troubleshooting: wdqs1010 unreachable from SSH or DRAC to Papaul.

Aug 30 2023, 6:25 PM · Patch-For-Review, SRE, ops-eqiad, Data-Platform-SRE, DC-Ops

Aug 29 2023

RKemper edited P19522 Plugin Upload Process.

Aug 29 2023, 6:58 PM · Discovery-Search (Current work)

Aug 28 2023

RKemper renamed T345081: hw troubleshooting: ipmi down for wdqs1005.eqiad.wmnet from decommission wdqs1005.eqiad.wmnet to hw troubleshooting: ipmi down for wdqs1005.eqiad.wmnet.

Aug 28 2023, 10:12 PM · SRE, ops-eqiad, DC-Ops, Data-Platform-SRE, decommission-hardware

RKemper renamed T344198: Decommission wdqs100[3-5] from Decommission wdqs10[03-05] to Decommission wdqs100[3-5].

Aug 28 2023, 9:30 PM · Data-Platform-SRE

Aug 17 2023

RKemper added a comment to T342361: Examine/refactor WDQS startup scripts.

Some observations from last two patches, tested on wdqs2007 before reverting due to issues:

Aug 17 2023, 7:28 PM · Data-Platform-SRE

Aug 16 2023

RKemper added a comment to T325315: Add support for redirects in CirrusSearch.

Built wmf-elasticsearch-search-plugins_7.10.2-9 and wmf-elasticsearch-search-plugins_7.10.2-9~bullseye (https://apt.wikimedia.org/wikimedia/pool/thirdparty/elastic710/w/wmf-elasticsearch-search-plugins/); installed on all elastic* hosts (incl. relforge* and cloudelastic*). Rolling restarts not completed yet. relforge* can be restarted at any time, but elastic* and cloudelastic* must wait till after an ongoing reindex of all wikis has completed.

Aug 16 2023, 9:46 PM · Data Engineering and Event Platform Team, MW-1.41-notes (1.41.0-wmf.16; 2023-07-04), Event-Platform, Data-Engineering, Discovery-Search (Current work)

RKemper moved T343820: Retune enwiki_content shard settings from In Progress to Blocked / Waiting on the Data-Platform-SRE board.

Will be in blocked/waiting for a few days while a reindex of all wikis completes to apply the newest settings.

Aug 16 2023, 1:08 AM · Discovery-Search (Current work), Data-Platform-SRE

RKemper updated the task description for T343820: Retune enwiki_content shard settings.

Aug 16 2023, 1:08 AM · Discovery-Search (Current work), Data-Platform-SRE

Aug 15 2023

RKemper updated the task description for T344284: Rename usages of whitelist to allowlist in query service rdf repo.

Aug 15 2023, 6:46 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service, Data-Platform-SRE

RKemper created T344284: Rename usages of whitelist to allowlist in query service rdf repo.

Aug 15 2023, 6:45 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service, Data-Platform-SRE

RKemper claimed T344198: Decommission wdqs100[3-5].

Aug 15 2023, 3:37 PM · Data-Platform-SRE

Aug 14 2023

RKemper added a comment to T316876: wdqs: replace git-fat with git-lfs.

Patch was merged here: https://gerrit.wikimedia.org/r/947928

Aug 14 2023, 6:44 PM · Patch-For-Review, Data-Platform-SRE (2024.04.15 - 2024.05.05), git-lfs, Release-Engineering-Team (Priority Backlog 📥), Wikidata, Wikidata-Query-Service, Scap

RKemper moved T343820: Retune enwiki_content shard settings from Incoming to In Progress on the Data-Platform-SRE board.

Aug 14 2023, 3:12 PM · Discovery-Search (Current work), Data-Platform-SRE

RKemper moved T339347: qlever dblp endpoint for wikidata federated query nomination from Ready for Work to Blocked / Waiting on the Data-Platform-SRE board.

Aug 14 2023, 3:10 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

Aug 8 2023

RKemper created P50210 wdqs qlever endpoint chunktask execution exception.

Aug 8 2023, 7:00 PM · Wikidata-Query-Service, Discovery-Search

RKemper updated the task description for T324788: Create WDQS/WCQS update lag SLO dashboard in Grizzly.

Aug 8 2023, 5:19 PM · Wikidata-Query-Service, Wikidata, Discovery-Search (Current work)

RKemper added a comment to T339347: qlever dblp endpoint for wikidata federated query nomination.

Looks like we lost track of this a bit. @bking and I can work this this week.

Aug 8 2023, 3:52 PM · Data-Platform-SRE (2024.04.15 - 2024.05.05), Wikidata-Query-Service, Wikidata

RKemper created T343820: Retune enwiki_content shard settings.

Aug 8 2023, 3:28 PM · Discovery-Search (Current work), Data-Platform-SRE

Aug 7 2023

RKemper added a comment to T343319: search.svc.eqiad.wmnet, search.svc.codfw.wmnet certs about to expire.

Just some investigation we did to understand where the metrics come from: probe_ssl_earliest_cert_expiry comes from the blackbox exporter (see random docs). That metric is used by the alerting repo here: https://github.com/wikimedia/operations-alerts/blob/4ecc222e95710395a6f9a7039e487186d2264323/team-sre/probes.yaml#L55

Aug 7 2023, 9:42 PM · Data-Platform-SRE, sre-alert-triage

RKemper moved T335576: [Epic] Search SLOs from Needs Reporting to Epics on the Discovery-Search (Current work) board.

Aug 7 2023, 3:21 PM · Discovery-Search (Current work), Epic

RKemper moved T335576: [Epic] Search SLOs from Epics to Needs Reporting on the Discovery-Search (Current work) board.

Aug 7 2023, 3:20 PM · Discovery-Search (Current work), Epic

RKemper moved T342593: Five deleted Wikidata items pertaining to Wikimedia category pages still present in the Query Service from Incoming to WDQS-icebox on the Wikidata-Query-Service board.

Aug 7 2023, 3:18 PM · Event-Platform, Data-Engineering, Data Engineering and Event Platform Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Aug 3 2023

RKemper closed T340482: Restart buster query service hosts (wdqs/wcqs) to apply java8 sec upgrades as Resolved.

Checked like so:

Aug 3 2023, 7:32 PM · Data-Platform-SRE

RKemper moved T338159: Create Turnilo/Superset dashboards for identifying users w/ excessive WDQS queries from Needs review to Needs Reporting on the Discovery-Search (Current work) board.

Aug 3 2023, 7:04 PM · Sustainability (Incident Followup), Data-Platform-SRE, Discovery-Search (Current work)

RKemper closed T338159: Create Turnilo/Superset dashboards for identifying users w/ excessive WDQS queries as Resolved.

In T338159#9002663, @EBernhardson wrote:

It looks like we added only the link, could we add a paragraph about how to use this dashboard as well?

Aug 3 2023, 7:04 PM · Sustainability (Incident Followup), Data-Platform-SRE, Discovery-Search (Current work)

Aug 1 2023

RKemper moved T324811: Create WDQS Lag SLO dashboard with Grizzly && documentation from In Progress to Done on the Data-Platform-SRE board.

Aug 1 2023, 3:35 PM · Data-Platform-SRE, Wikidata

RKemper moved T342762: 404 from nginx on wcqs2001 from Incoming to Done on the Data-Platform-SRE board.

Aug 1 2023, 3:30 PM · sre-alert-triage, Data-Platform-SRE

Jul 28 2023

RKemper updated the task description for T328330: Create SLI / SLO on Search update lag.

Jul 28 2023, 7:37 PM · Data-Platform-SRE, Discovery-Search (Current work)

Jul 25 2023

RKemper moved T342035: Decommission wdqs200[4-6] from In Progress to Done on the Data-Platform-SRE board.

Decom cookbook finished, and dc-ops ticket created (see ticket desc AC section for ticket #)

Jul 25 2023, 6:11 AM · Data-Platform-SRE

RKemper updated the task description for T342035: Decommission wdqs200[4-6].

Jul 25 2023, 5:33 AM · Data-Platform-SRE

RKemper created T342600: decommission wdqs200[4-6].

Jul 25 2023, 5:27 AM · SRE, ops-codfw, decommission-hardware

RKemper updated the task description for T342035: Decommission wdqs200[4-6].

Jul 25 2023, 2:54 AM · Data-Platform-SRE

Jul 24 2023

RKemper renamed T342035: Decommission wdqs200[4-6] from Decommission old WDQS servers to Decommission wdqs200[4-6].

Jul 24 2023, 11:30 PM · Data-Platform-SRE

RKemper moved T328325: Reimage wdqs20[13-22] servers to Bullseye from Ready for Work to Done on the Data-Platform-SRE board.

Jul 24 2023, 8:30 PM · Data-Platform-SRE, Discovery-Search (Current work)

Jul 21 2023

RKemper moved T332314: Service implementation for wdqs20[13-22] from In Progress to Needs Reporting on the Data-Platform-SRE board.

wdqs202[1-2] have been brought into service. With teh merging of https://gerrit.wikimedia.org/r/c/operations/puppet/+/940272, all hosts are now in service and have alerting enabled.

Jul 21 2023, 3:58 AM · Patch-For-Review, Discovery-Search (Current work), Data-Platform-SRE, Wikidata, Wikidata-Query-Service

RKemper claimed T342035: Decommission wdqs200[4-6].

With the new hosts in service, we can now begin decom'ing these hosts at our convenience.

Jul 21 2023, 3:57 AM · Data-Platform-SRE

RKemper updated the task description for T332314: Service implementation for wdqs20[13-22].

Jul 21 2023, 3:52 AM · Patch-For-Review, Discovery-Search (Current work), Data-Platform-SRE, Wikidata, Wikidata-Query-Service

Jul 20 2023

RKemper added a comment to T332314: Service implementation for wdqs20[13-22].

All of these hosts except wdqs202[1-2] are in service. Those last two hosts will be brought in service after a final data xfer (ongoing).

Jul 20 2023, 6:54 PM · Patch-For-Review, Discovery-Search (Current work), Data-Platform-SRE, Wikidata, Wikidata-Query-Service

Jul 18 2023

RKemper added a comment to T342162: "scap deploy"'s config-deploy should check for broken symlinks.

In T342162#9025774, @thcipriani wrote:
I think I have the context to understand this.

It looks like /srv/deployment/wdqs/wdqs-cache/revs/$CURRENT_DEPLOY_COMMIT_HASH/.git/config-files/etc/query_service is symlinked at /etc/query_service/ldf-config.json is that true?

I see this line in log output:
/etc/query_service/ldf-config.json is already linked to current rev(use --force to override)
Which is exactly what you're describing. It comes from the check in scap deploy here: https://gitlab.wikimedia.org/repos/releng/scap/-/blob/master/scap/deploy.py#L212

The assumption is that if a file is already symlinked, there's no need to regenerate the file. But it sounds like it's a bad assumption in this case, is that true?

Jul 18 2023, 9:28 PM · Release-Engineering-Team (Yakisfaction), Patch-For-Review, Data-Platform-SRE, Scap

RKemper updated the task description for T342162: "scap deploy"'s config-deploy should check for broken symlinks.

Jul 18 2023, 7:38 PM · Release-Engineering-Team (Yakisfaction), Patch-For-Review, Data-Platform-SRE, Scap

RKemper added a comment to T342162: "scap deploy"'s config-deploy should check for broken symlinks.

First draft of this ticket up. There's a couple things that aren't perfect:

Jul 18 2023, 7:34 PM · Release-Engineering-Team (Yakisfaction), Patch-For-Review, Data-Platform-SRE, Scap

RKemper created T342162: "scap deploy"'s config-deploy should check for broken symlinks.

Jul 18 2023, 7:33 PM · Release-Engineering-Team (Yakisfaction), Patch-For-Review, Data-Platform-SRE, Scap

RKemper created P49573 Scap deploy-log for wdqs2016 host without config-files directory with --force command.

Jul 18 2023, 7:30 PM · Discovery-Search

RKemper created P49572 Scap deploy-log for wdqs2016 host without config-files directory w/o --force command.

Jul 18 2023, 7:29 PM · Wikidata-Query-Service, Discovery-Search

Jul 17 2023

RKemper updated the task description for T342035: Decommission wdqs200[4-6].

Jul 17 2023, 6:40 PM · Data-Platform-SRE

RKemper updated the task description for T324811: Create WDQS Lag SLO dashboard with Grizzly && documentation.

Jul 17 2023, 6:29 PM · Data-Platform-SRE, Wikidata

RKemper moved T338159: Create Turnilo/Superset dashboards for identifying users w/ excessive WDQS queries from Ready for Work to In Progress on the Data-Platform-SRE board.

Jul 17 2023, 3:09 PM · Sustainability (Incident Followup), Data-Platform-SRE, Discovery-Search (Current work)

Jul 13 2023

RKemper updated the task description for T301167: Service implementation for wdqs20[09,10,11,12].

Jul 13 2023, 5:35 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jun 29 2023

RKemper added a comment to T332314: Service implementation for wdqs20[13-22].

Merged patch (had wrong ticket in commit message): https://gerrit.wikimedia.org/r/c/operations/puppet/+/934403

Jun 29 2023, 7:47 PM · Patch-For-Review, Discovery-Search (Current work), Data-Platform-SRE, Wikidata, Wikidata-Query-Service

Jun 27 2023

RKemper added a comment to T340482: Restart buster query service hosts (wdqs/wcqs) to apply java8 sec upgrades.

This should be done, but I haven't yet ran a validation command to sanity check that the correct version is in place.

Jun 27 2023, 3:30 PM · Data-Platform-SRE

RKemper updated the task description for T340482: Restart buster query service hosts (wdqs/wcqs) to apply java8 sec upgrades.

Jun 27 2023, 3:29 PM · Data-Platform-SRE

RKemper moved T340482: Restart buster query service hosts (wdqs/wcqs) to apply java8 sec upgrades from Incoming to In Progress on the Data-Platform-SRE board.

Jun 27 2023, 3:29 PM · Data-Platform-SRE

RKemper moved T324811: Create WDQS Lag SLO dashboard with Grizzly && documentation from In Progress to Needs Review on the Data-Platform-SRE board.

Jun 27 2023, 3:23 PM · Data-Platform-SRE, Wikidata

Jun 26 2023

RKemper renamed T340482: Restart buster query service hosts (wdqs/wcqs) to apply java8 sec upgrades from Reboot buster query service hosts (wdqs/wcqs) to apply java8 sec upgrades to Restart buster query service hosts (wdqs/wcqs) to apply java8 sec upgrades.

Jun 26 2023, 9:00 PM · Data-Platform-SRE

RKemper created T340482: Restart buster query service hosts (wdqs/wcqs) to apply java8 sec upgrades.

Jun 26 2023, 6:52 PM · Data-Platform-SRE

RKemper updated the task description for T338159: Create Turnilo/Superset dashboards for identifying users w/ excessive WDQS queries.

Jun 26 2023, 6:51 PM · Sustainability (Incident Followup), Data-Platform-SRE, Discovery-Search (Current work)

RKemper renamed T338159: Create Turnilo/Superset dashboards for identifying users w/ excessive WDQS queries from Create Turnilo/Superset dashboards for WDQS to Create Turnilo/Superset dashboards for identifying users w/ excessive WDQS queries.

Jun 26 2023, 6:50 PM · Sustainability (Incident Followup), Data-Platform-SRE, Discovery-Search (Current work)

May 30 2023

RKemper moved T334823: Add https://opendata.aragon.es/sparql to the list of federated endpoints for WDQS and WCQS from To Be Deployed to Needs Reporting on the Discovery-Search (Current work) board.

Should be deployed as of today.

May 30 2023, 7:39 PM · Wikidata, Discovery-Search (Current work), Wikidata-Query-Service

May 22 2023

RKemper added a comment to T335994: Allow federated queries to the UNESCO SPARQL endpoint.

Thanks for the patience on this! This is getting deployed today.

May 22 2023, 8:55 PM · Data-Platform-SRE, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

RKemper added a comment to T324811: Create WDQS Lag SLO dashboard with Grizzly && documentation.

Documentation aspect of this ticket's already done. Basically two things left to do to close this ticket out:

May 22 2023, 6:28 PM · Data-Platform-SRE, Wikidata

RKemper changed the point value for T324811: Create WDQS Lag SLO dashboard with Grizzly && documentation from 3 to 5.

May 22 2023, 6:24 PM · Data-Platform-SRE, Wikidata

RKemper updated the task description for T324811: Create WDQS Lag SLO dashboard with Grizzly && documentation.

May 22 2023, 6:16 PM · Data-Platform-SRE, Wikidata

May 18 2023

RKemper added a comment to T332355: Deploy Turkish Analyzer Plugin.

Relforge SAL entry: https://phabricator.wikimedia.org/T274204#8862474

May 18 2023, 6:01 PM · Discovery-Search (Current work)

May 17 2023

RKemper added a comment to T332355: Deploy Turkish Analyzer Plugin.

We've built the new package 7.10.2-5. Haven't yet done a restart of hosts.

May 17 2023, 9:50 PM · Discovery-Search (Current work)

RKemper updated the task description for T332355: Deploy Turkish Analyzer Plugin.

May 17 2023, 9:35 PM · Discovery-Search (Current work)

May 15 2023

RKemper updated the task description for T336577: Update WDQS Runbook following update lag incident.

May 15 2023, 3:53 PM · Data-Platform-SRE (2024.01.01 - 2024.01.21), SRE-OnFire, Sustainability, Discovery-Search (Current work), Wikimedia-Incident, Wikidata, Wikidata-Query-Service

May 11 2023

RKemper added a comment to T331300: Ensure WDQS stack works on Bullseye.

We've noticed that on the bullseye hosts, the blazegraph prometheus exporters are in a restart loop, ultimately [likely] due to differing python versions breaking the current implementation of our exporter script.

May 11 2023, 6:59 PM · Patch-For-Review, Data-Platform-SRE, Discovery-Search (Current work)

May 9 2023

RKemper updated the task description for T303134: Should wdqs LVS checks page.

May 9 2023, 8:03 PM · [DEPRECATED] wdwb-tech, Wikidata, Sustainability (Incident Followup), Wikidata-Query-Service

RKemper moved T328306: Thanos rule evaluation alerts for service_slis from In Progress to Needs Reporting on the Data-Platform-SRE board.

@Gehel With the recording rule removed in https://gerrit.wikimedia.org/r/912382, there shouldn't be any performance issues since we're not recording anything. The latest query settings in https://gerrit.wikimedia.org/r/c/operations/grafana-grizzly/+/917938 and previous patches are sufficient for acceptable performance on the query, i.e. we don't get timeouts when viewing the graph.

May 9 2023, 8:01 PM · Data-Platform-SRE, Discovery-Search (Current work), Observability-Metrics

RKemper moved T313751: Create WDQS uptime SLO from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

With https://gerrit.wikimedia.org/r/c/operations/grafana-grizzly/+/917938, we now have the grizzly dashboard where we want it. That was the last blocker for closing out this ticket, so this should be all done.

May 9 2023, 8:01 PM · Data-Platform-SRE, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

RKemper added a comment to T323064: Create WDQS Uptime SLO dashboard in Grizzly.

Forgot to link patch but here's the (hopefully final) grizzly patch to get this where we want it: https://gerrit.wikimedia.org/r/c/operations/grafana-grizzly/+/917938

May 9 2023, 7:59 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

May 4 2023

RKemper updated the task description for T333656: Decommission query-preview.wikidata.org.

May 4 2023, 6:52 PM · Data-Platform-SRE, Discovery-Search (Current work), collaboration-services, Wikidata, Wikidata-Query-Service

Apr 26 2023

RKemper added a comment to T331300: Ensure WDQS stack works on Bullseye.

We're examining wdqs2022, where we have completed the transfer of /srv/wdqs/ yet blazegraph is not starting.

Apr 26 2023, 9:41 PM · Patch-For-Review, Data-Platform-SRE, Discovery-Search (Current work)

Apr 19 2023

RKemper updated the task description for T331300: Ensure WDQS stack works on Bullseye.

Apr 19 2023, 9:37 PM · Patch-For-Review, Data-Platform-SRE, Discovery-Search (Current work)

Apr 17 2023

RKemper updated the task description for T325324: Evaluate options to soften wdqs paging.

Apr 17 2023, 6:40 PM · Discovery-Search (Current work), Sustainability (Incident Followup), SRE-OnFire

RKemper updated the task description for T303134: Should wdqs LVS checks page.

Apr 17 2023, 6:40 PM · [DEPRECATED] wdwb-tech, Wikidata, Sustainability (Incident Followup), Wikidata-Query-Service

Apr 13 2023

RKemper updated the task description for T333656: Decommission query-preview.wikidata.org.

Apr 13 2023, 10:04 PM · Data-Platform-SRE, Discovery-Search (Current work), collaboration-services, Wikidata, Wikidata-Query-Service

RKemper updated the task description for T333656: Decommission query-preview.wikidata.org.

Apr 13 2023, 9:41 PM · Data-Platform-SRE, Discovery-Search (Current work), collaboration-services, Wikidata, Wikidata-Query-Service

Apr 12 2023

RKemper added a comment to T333656: Decommission query-preview.wikidata.org.

In T333656#8773505, @Dzahn wrote:

hi @RKemper was wondering if you can bring this one up in your team meeting or so (no rush, but would be nice to have): https://gerrit.wikimedia.org/r/c/operations/dns/+/905754 cheers, Daniel

Apr 12 2023, 8:34 PM · Data-Platform-SRE, Discovery-Search (Current work), collaboration-services, Wikidata, Wikidata-Query-Service

Apr 11 2023

RKemper added a comment to T334210: Elasticsearch CapEx Requests FY23-24 Due 21 April 2023.

Things we looked at

Apr 11 2023, 7:35 PM · Discovery-Search (Current work)

Apr 3 2023

RKemper renamed T333656: Decommission query-preview.wikidata.org from Decommission query-preview.wikdiata.org to Decommission query-preview.wikidata.org.

Apr 3 2023, 9:51 PM · Data-Platform-SRE, Discovery-Search (Current work), collaboration-services, Wikidata, Wikidata-Query-Service

Mar 14 2023

RKemper added a comment to T324335: Remove logstash from the Search Elasticsearch servers.

In T324335#8683259, @Gehel wrote:

After investigation, configuring log4j to talk directly to syslog is adding too much complexity related to the Java Security Manager. We will keep logstash to do log forwarding for now.

Mar 14 2023, 8:29 PM · observability, Observability-Logging, Discovery-Search (Current work)

RKemper moved T317816: Enable 10G networking in cirrus elastic clusters from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

Rerouted a shard like so:

Mar 14 2023, 7:52 PM · Discovery-Search (Current work)