Page MenuHomePhabricator

RKemper (Ryan Kemper)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
May 1 2020, 10:28 PM (49 w, 17 h)
Availability
Available
LDAP User
Unknown
MediaWiki User
RKemper (WMF) [ Global Accounts ]

Recent Activity

Thu, Mar 25

RKemper added a comment to T267927: Reload wikidata journal from fresh dumps.

wdqs1009, wdqs1010, and wdqs2008 are done, so we need to data-transfer to the remaining instances.

Thu, Mar 25, 5:25 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Wed, Mar 24

RKemper updated the task description for T278378: Pull Elasticsearch config out of Spicerack.
Wed, Mar 24, 7:30 PM · Discovery-Search (Current work)
RKemper created T278378: Pull Elasticsearch config out of Spicerack.
Wed, Mar 24, 7:28 PM · Discovery-Search (Current work)
RKemper moved T277792: Create new elasticsearch cookbook that combines a plugin upgrade with a full reboot from Needs review to Needs Reporting on the Discovery-Search (Current work) board.
Wed, Mar 24, 1:42 AM · Discovery-Search (Current work)

Mon, Mar 22

RKemper updated the task description for T278185: hw troubleshooting: IPMI sensor critical for elastic1042.eqiad.wmnet.
Mon, Mar 22, 9:17 PM · Discovery-Search (Current work), DC-Ops
RKemper created T278185: hw troubleshooting: IPMI sensor critical for elastic1042.eqiad.wmnet.
Mon, Mar 22, 9:04 PM · Discovery-Search (Current work), DC-Ops

Fri, Mar 19

RKemper added a comment to T275885: Generate SSL certification for relforge1003.eqiad.wmnet and relforge1004.eqiad.wmnet.

I forgot to try running a curl command from inside the analytics network *before* deploying the new cert, so I don't have a good before/after comparison, but curling relforge from within the analytics network hangs indefinitely, which is a good sign (it should reject the cert and return immediately if it's still broken).

Fri, Mar 19, 3:46 AM · Discovery-Search (Current work)
RKemper added a comment to T275885: Generate SSL certification for relforge1003.eqiad.wmnet and relforge1004.eqiad.wmnet.
Fri, Mar 19, 3:21 AM · Discovery-Search (Current work)
RKemper added a comment to T275885: Generate SSL certification for relforge1003.eqiad.wmnet and relforge1004.eqiad.wmnet.

After creating a new manifest and running the cert gen command, we need to copy the newly generated secret key in decrypted form to another location in the /srv/private repo. Then we chown all the new files to make sure they're owned by gitpuppet (it's possible there's a git commit hook that does this for me but I didn't see one so I have just been playing it safe). Finally, we need to copy over the pubkey to the operations/puppet repo.

Fri, Mar 19, 3:13 AM · Discovery-Search (Current work)
RKemper added a comment to T275885: Generate SSL certification for relforge1003.eqiad.wmnet and relforge1004.eqiad.wmnet.
ryankemper@puppetmaster1001:/srv/private$ sudo cergen -c 'relforge.*' --generate --base-path /srv/private/modules/secret/secrets/certificates /srv/private/modules/secret/secrets/certificates/certificate.manifests.d
2021-03-19 02:55:31,498 INFO     cergen                                   Generating certificates ['relforge.svc.eqiad.wmnet'] with force=False
2021-03-19 02:55:31,498 INFO     Certificate(relforge.svc.eqiad.wmnet)    Generating all files, force=False...
2021-03-19 02:55:31,500 INFO     Certificate(relforge.svc.eqiad.wmnet)    Generating certificate file
/usr/lib/python3/dist-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for puppetmaster1001.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python3/dist-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for puppetmaster1001.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python3/dist-packages/urllib3/connection.py:362: SubjectAltNameWarning: Certificate for puppetmaster1001.eqiad.wmnet has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
2021-03-19 02:55:33,004 INFO     Certificate(relforge.svc.eqiad.wmnet)    Generating CA certificate file
2021-03-19 02:55:33,005 INFO     Certificate(relforge.svc.eqiad.wmnet)    Generating PKCS12 keystore file
2021-03-19 02:55:33,285 INFO     Certificate(relforge.svc.eqiad.wmnet)    Generating Java keystore file
2021-03-19 02:55:34,365 INFO     Certificate(relforge.svc.eqiad.wmnet)    Importing PuppetCA(puppetmaster1001.eqiad.wmnet_8140) cert into Java keystore
2021-03-19 02:55:35,406 INFO     Certificate(relforge.svc.eqiad.wmnet)    Generating Java truststore file with CA certificate PuppetCA(puppetmaster1001.eqiad.wmnet_8140)
Fri, Mar 19, 2:56 AM · Discovery-Search (Current work)
RKemper added a comment to T275885: Generate SSL certification for relforge1003.eqiad.wmnet and relforge1004.eqiad.wmnet.

New cergen-based manifest (`modules/secret/secrets/certificates/certificate.manifests.d/relforge.certs.yaml
) to generate relforge.svc.eqiad.wmnet`:

Fri, Mar 19, 2:49 AM · Discovery-Search (Current work)

Thu, Mar 18

RKemper moved T277792: Create new elasticsearch cookbook that combines a plugin upgrade with a full reboot from Incoming to Needs review on the Discovery-Search (Current work) board.
Thu, Mar 18, 7:26 PM · Discovery-Search (Current work)
RKemper created T277792: Create new elasticsearch cookbook that combines a plugin upgrade with a full reboot.
Thu, Mar 18, 7:20 PM · Discovery-Search (Current work)

Sat, Mar 13

RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

DNS change logs

ryankemper@authdns1001:~$ sudo authdns-update
Updating authdns1001.wikimedia.org (self)...
Pulling the current revision from https://gerrit.wikimedia.org/r/operations/dns.git
Reviewing 85d9b49dc2ff0f8e3657f6f2cd91ce3df79bd1cf...
Sat, Mar 13, 1:15 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

The issues with envoy were resolved by running sudo /usr/local/sbin/build-envoy-config -c /etc/envoy to properly build /etc/envoy/envoy.yaml. That should have been done by puppet already, triggered upon a sudo systemctl restart envoyproxy.service, but it didn't - perhaps a race condition. See https://gerrit.wikimedia.org/g/operations/puppet/+/b7dacbca9fae42b32bb91fd485a3f2c70ff903b3/modules/envoyproxy/manifests/init.pp#81 and https://gerrit.wikimedia.org/g/operations/puppet/+/b7dacbca9fae42b32bb91fd485a3f2c70ff903b3/modules/envoyproxy/manifests/conf.pp#30 for the puppet code that normally does it automatically.

Sat, Mar 13, 1:14 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.
Sat, Mar 13, 1:06 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Fri, Mar 12

RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

I missed a step yesterday: I'd updated /srv/private as well as the public labs/private repo but missed the step for updating operations/puppet with the new pubkey: https://gerrit.wikimedia.org/r/c/operations/puppet/+/671267

Fri, Mar 12, 9:51 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

Current status for when I pick this back up:

Fri, Mar 12, 8:48 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper committed rLPRI59521d29e59b: wdqs: new query-preview for wdqs1009 (test host) (authored by RKemper).
wdqs: new query-preview for wdqs1009 (test host)
Fri, Mar 12, 8:31 AM
RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

Ah, so poking around the certificate.manifests.d repo I see certs that don't necessarily follow the discovery.wmnet pattern. To me that implies Option 2 should be working, so I might be missing something. Here's an example that doesn't use discovery:

ryankemper@puppetmaster1001:/srv/private$ cat modules/secret/secrets/certificates/certificate.manifests.d/analytics_http_ui.certs.yaml
yarn.wikimedia.org:
  authority: puppet_ca
  expiry: null
  alt_names: ["yarn.wikimedia.org", "hue.wikimedia.org", "hue-next.wikimedia.org", "superset.wikimedia.org", "pivot.wikimedia.org", "turnilo.wikimedia.org", "stats.wikimedia.org", "analytics.wikimedia.org", "piwik.wikimedia.org", "datasets.wikimedia.org"]
  key:
    password: REDACTED
    algorithm: ec
Fri, Mar 12, 8:18 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

Option 2 fails to even generate the cert. All the cergen documentation is written for a certificate like query-preview.discovery.wmnet and not wdqs1009.eqiad.wmnet or query-preview.wikidata.org. So I do think this just isn't what cergen is built to do.

Fri, Mar 12, 8:07 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

Hit a big blocker with the current proposed approach of using wdqs1009.eqiad.wmnet as the cert name:

Fri, Mar 12, 8:00 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Mar 10 2021

RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

Finished rolling back to the previous iteration of wdqs.discovery.wmnet cert since we're now going to create a net-new cert wdqs1009.eqiad.wmnet for wdqs-test

Mar 10 2021, 7:35 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

ryankemper@puppetmaster1001:/srv/private$ git status
On branch master
Changes not staged for commit:

(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
Mar 10 2021, 7:19 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

ryankemper@puppetmaster1001:/srv/private$ git status
On branch master
Changes not staged for commit:

(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
Mar 10 2021, 4:59 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Mar 8 2021

RKemper renamed T252504: Automate the smoke test of the canary deployment of WDQS from Smoke test the canary deployment of WDQS to Automate the smoke test of the canary deployment of WDQS.
Mar 8 2021, 4:42 PM · wdwb-tech, Sustainability (Incident Followup), Wikidata-Query-Service, Wikidata

Mar 5 2021

RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

Here's how the ats mapping looks afterdeploy of the backend.yaml changes:

Mar 5 2021, 12:49 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Mar 4 2021

RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

Posting logs of our IRC convo from ~1 month ago for context when I tag people for review:

Mar 4 2021, 6:02 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Mar 3 2021

RKemper moved T275345: Medium error reported for sda on elastic2045 from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Mar 3 2021, 6:28 AM · Discovery-Search (Current work), SRE, ops-codfw
RKemper moved T274555: elastic2054 unresponsive from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Mar 3 2021, 6:27 AM · SRE, ops-codfw, Discovery-Search (Current work)
RKemper added a comment to T274203: Build Extra Plugin with extra-analysis-khmer and deploy to Maven Central.

PLUGIN BUILD & UPLOAD STEPS PERFORMED:

# Starting from plugins repo
# (1) Build locally and scp over to build host
./debian/rules prepare_build
cd ..
ssh 'deneb.codfw.wmnet' 'sudo rm -rfv ~/plugins'
Mar 3 2021, 6:00 AM · Discovery-Search (Current work)
RKemper updated the task description for T274204: Deploy new version of Extra Plugin (with Khmer filter) to Elasticsearch cluster.
Mar 3 2021, 5:36 AM · Discovery-Search (Current work)
RKemper added a comment to T274203: Build Extra Plugin with extra-analysis-khmer and deploy to Maven Central.

Note: The initial build/upload was broken due to operator error, so we built/uploaded the new 6.5.4-6, which has been confirmed to work.

Mar 3 2021, 5:35 AM · Discovery-Search (Current work)

Mar 2 2021

RKemper added a comment to T275885: Generate SSL certification for relforge1003.eqiad.wmnet and relforge1004.eqiad.wmnet.

From https://wikitech.wikimedia.org/wiki/Cergen:

Mar 2 2021, 9:30 PM · Discovery-Search (Current work)
RKemper added a comment to T274204: Deploy new version of Extra Plugin (with Khmer filter) to Elasticsearch cluster.

@TJones Thanks, I'll tap in David or Zbyszko to see if they can find the error.

Mar 2 2021, 7:52 AM · Discovery-Search (Current work)

Feb 26 2021

RKemper added a comment to T275345: Medium error reported for sda on elastic2045.

Side note: Just noticed I named the tmux session elastic1065. Fortunately as can be seen above we're reimaging the proper host, elastic2045 :P

Feb 26 2021, 5:33 AM · Discovery-Search (Current work), SRE, ops-codfw
RKemper added a comment to T267927: Reload wikidata journal from fresh dumps.

Note: Puppet is still disabled on wdqs2008 while the reload runs. It occurred to me that I'm not sure if puppet actually needs to be disabled during data reloads or if that's just a precaution we've historically taken - any insight here @Gehel?

Feb 26 2021, 5:12 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper added a comment to T274751: Upgrade firmware on wdqs1009.

@Cmjohnson The data reload is complete on wdqs1009, so the host can now have its firmware upgraded and be rebooted at its convenience. Note this is an internal wdqs test host, so there is no public-facing service for us to worry about.

Feb 26 2021, 3:47 AM · Discovery-Search (Current work), wdwb-tech, SRE, Wikidata-Query-Service, ops-eqiad, DC-Ops, Wikidata

Feb 25 2021

RKemper added a comment to T267927: Reload wikidata journal from fresh dumps.

Downtimed wdqs2008 until 2021-03-04 21:56:59

Feb 25 2021, 7:58 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper added a comment to T274204: Deploy new version of Extra Plugin (with Khmer filter) to Elasticsearch cluster.

I started doing restarts in eqiad, but hit a show-stopper: any node with the new plugin version had its elasticsearch systemd units stuck in a failure state that persisted across restarts. The most suspicious log-line by far is java.nio.file.AccessDeniedException: /var/run/elasticsearch:

Feb 25 2021, 12:32 AM · Discovery-Search (Current work)

Feb 24 2021

RKemper moved T274204: Deploy new version of Extra Plugin (with Khmer filter) to Elasticsearch cluster from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Feb 24 2021, 11:20 PM · Discovery-Search (Current work)
RKemper claimed T274204: Deploy new version of Extra Plugin (with Khmer filter) to Elasticsearch cluster.
Feb 24 2021, 10:57 PM · Discovery-Search (Current work)
RKemper claimed T274203: Build Extra Plugin with extra-analysis-khmer and deploy to Maven Central.
Feb 24 2021, 10:53 PM · Discovery-Search (Current work)
RKemper moved T274203: Build Extra Plugin with extra-analysis-khmer and deploy to Maven Central from Ready for Development to Needs Reporting on the Discovery-Search (Current work) board.

Now that the new debian package is built & uploaded, we can proceed to the actual roll-out (https://phabricator.wikimedia.org/T274204) when ready

Feb 24 2021, 10:41 PM · Discovery-Search (Current work)
RKemper added a comment to T274203: Build Extra Plugin with extra-analysis-khmer and deploy to Maven Central.

The new debian package has been built and uploaded.

Feb 24 2021, 10:39 PM · Discovery-Search (Current work)
RKemper added a comment to T265113: Memory issue on elastic1063 caused elasticsearch to be killed.

Commands used to unban elastic1063:

Feb 24 2021, 10:10 PM · Discovery-Search (Current work), ops-eqiad, SRE
RKemper created T275658: Kibana: Render kibana settings file based off of Kibana/Elasticsearch version.
Feb 24 2021, 5:51 PM · Patch-For-Review, Discovery-Search (Current work)
RKemper added a comment to T275549: Data-transfer from wdqs2008 to wdqs1010 following re-image of wdqs1010.

Closed because this is (somewhat) redundant with T267927; will track in that ticket

Feb 24 2021, 7:27 AM · Discovery-Search (Current work)
RKemper closed T275549: Data-transfer from wdqs2008 to wdqs1010 following re-image of wdqs1010 as Declined.
Feb 24 2021, 7:26 AM · Discovery-Search (Current work)
RKemper added a comment to T275345: Medium error reported for sda on elastic2045.

@Gehel Yup I can get elastic2045 re-imaged and unbanned once we get sda replaced

Feb 24 2021, 3:25 AM · Discovery-Search (Current work), SRE, ops-codfw

Feb 23 2021

RKemper updated the task description for T275549: Data-transfer from wdqs2008 to wdqs1010 following re-image of wdqs1010.
Feb 23 2021, 7:58 PM · Discovery-Search (Current work)
RKemper created T275549: Data-transfer from wdqs2008 to wdqs1010 following re-image of wdqs1010.
Feb 23 2021, 7:48 PM · Discovery-Search (Current work)

Feb 19 2021

RKemper renamed T274204: Deploy new version of Extra Plugin (with Khmer filter) to Elasticsearch cluster from Deploy new version of Extra Pugin (with Khmer filter) to Elasticsearch cluster to Deploy new version of Extra Plugin (with Khmer filter) to Elasticsearch cluster.
Feb 19 2021, 8:52 AM · Discovery-Search (Current work)

Feb 10 2021

RKemper added a comment to T262211: Service implementation for relforge100[34].

From

ryankemper@relforge1004:~$ sudo systemctl status kibana.service
● kibana.service - Kibana
   Loaded: loaded (/etc/systemd/system/kibana.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2021-02-10 00:38:09 UTC; 2min 41s ago
  Process: 1040 ExecStart=/usr/share/kibana/bin/kibana -c /etc/kibana/kibana.yml (code=exited, status=64)
 Main PID: 1040 (code=exited, status=64)
Feb 10 2021, 12:50 AM · Patch-For-Review, Discovery-Search (Current work)
RKemper added a comment to T262211: Service implementation for relforge100[34].

Closing the loop on the above, it looks like newsfeed.enabled exists in Elasticsearch 7 but not in Elasticsearch 6.

Feb 10 2021, 12:39 AM · Patch-For-Review, Discovery-Search (Current work)
RKemper claimed T274321: relforge: discuss possible PII concerns with relforge data.
Feb 10 2021, 12:38 AM · Discovery-Search (Current work)
RKemper created T274321: relforge: discuss possible PII concerns with relforge data.
Feb 10 2021, 12:21 AM · Discovery-Search (Current work)
RKemper moved T262211: Service implementation for relforge100[34] from Needs Reporting to In Progress on the Discovery-Search (Current work) board.

Moving back to in-progress - I'd thought that all systemd units were working properly now, but kibana.service is still failing on relforge100[3,4].

Feb 10 2021, 12:02 AM · Patch-For-Review, Discovery-Search (Current work)

Feb 9 2021

RKemper updated the task description for T274314: relforge: open up access to relforge100[3,4].
Feb 9 2021, 11:49 PM · Patch-For-Review, Discovery-Search (Current work)
RKemper updated the task description for T274314: relforge: open up access to relforge100[3,4].
Feb 9 2021, 11:48 PM · Patch-For-Review, Discovery-Search (Current work)
RKemper created T274314: relforge: open up access to relforge100[3,4].
Feb 9 2021, 11:07 PM · Patch-For-Review, Discovery-Search (Current work)
RKemper moved T262211: Service implementation for relforge100[34] from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

Updated netbox entries to mark the servers as active.

Feb 9 2021, 7:54 PM · Patch-For-Review, Discovery-Search (Current work)
RKemper updated the task description for T262211: Service implementation for relforge100[34].
Feb 9 2021, 7:52 PM · Patch-For-Review, Discovery-Search (Current work)
RKemper added a comment to T262211: Service implementation for relforge100[34].

Above issue is resolved; our order of operations is a bit flawed and will result in puppet trying to install packages such as elasticsearch-oss before it can "see" the package (presumably due to lack of an apt-get update being ran in time). Issues self-healed in ~30 minutes; I manually restarted the failing services once the state in puppet-land had resolved itself.

Feb 9 2021, 7:52 PM · Patch-For-Review, Discovery-Search (Current work)
RKemper added a comment to T262211: Service implementation for relforge100[34].

Prometheus exporters are having trouble:

Feb 9 2021, 7:31 PM · Patch-For-Review, Discovery-Search (Current work)
RKemper moved T274213: Reboot wdqs hosts from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

See https://sal.toolforge.org/log/A5m9h3cBgTbpqNOmqYik for timing of reboot. Also see https://phabricator.wikimedia.org/T274270 for related ticket that came out of this (reboot took super long)

Feb 9 2021, 7:09 PM · Discovery-Search (Current work)
RKemper triaged T274213: Reboot wdqs hosts as High priority.
Feb 9 2021, 7:08 PM · Discovery-Search (Current work)

Feb 8 2021

RKemper added a comment to T267927: Reload wikidata journal from fresh dumps.

Still waiting for the latest dumps to be downloaded (few more hours), then need to reboot WDQS hosts as part of https://phabricator.wikimedia.org/T274213, then can do the actual data-reload

Feb 8 2021, 11:06 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper moved T274213: Reboot wdqs hosts from Incoming to In Progress on the Discovery-Search (Current work) board.
Feb 8 2021, 11:03 PM · Discovery-Search (Current work)
RKemper created T274213: Reboot wdqs hosts.
Feb 8 2021, 11:02 PM · Discovery-Search (Current work)
RKemper added a comment to T266495: Create Debian Package for Flink.

(See https://phabricator.wikimedia.org/T273097#6805355 for why this ticket has been closed)

Feb 8 2021, 10:37 PM · Wikidata-Query-Service, Wikidata
RKemper added a comment to T273636: Blazegraph journal for wcqs is too big.
Feb 8 2021, 10:21 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper moved T273636: Blazegraph journal for wcqs is too big from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

WCQS is back in service; updating the notification channels right now and will comment back here after

Feb 8 2021, 10:15 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper moved T267927: Reload wikidata journal from fresh dumps from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Feb 8 2021, 4:54 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper moved T273636: Blazegraph journal for wcqs is too big from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Feb 8 2021, 4:53 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper claimed T273636: Blazegraph journal for wcqs is too big.
Feb 8 2021, 4:53 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Feb 5 2021

RKemper added a comment to T267927: Reload wikidata journal from fresh dumps.

sudo cookbook sre.wdqs.data-reload wdqs1009.eqiad.wmnet --reuse-downloaded-dump --reload-data wikidata --skolemize --reason 'T267927: Reload wikidata jnl from fresh dumps' --task-id T267927 is failing with:

Feb 5 2021, 10:51 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Feb 4 2021

RKemper added a comment to T266470: Expose wdqs1009 to wdqs users and gather feedback.

TODO from IRC meeting with bblack/gehel: create a DNS entry (CNAME to dyna.wm.o), another set of entries in backend.yaml map, create another minisite (with the appropriate configuration)

Feb 4 2021, 7:42 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Feb 3 2021

RKemper added a comment to T273636: Blazegraph journal for wcqs is too big.

Notified WikiData mailing list and also posted here: https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service#Updates

Feb 3 2021, 7:54 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper added a comment to T273636: Blazegraph journal for wcqs is too big.

wcqs-beta-01.eqiad.wmflabs is running low on disk space due to its blazegraph journal dataset size. In order to free up space we will need to take the service down, delete the journal and re-import from the latest dump. Service interruption will begin at Feb 4 18:30 UTC and continue until the data reload is complete.

Feb 3 2021, 7:50 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper added a comment to T273097: Create Flink Base Image.

@akosiaris Is your concern with the idea of using a`flink` base image solution mainly just centered around the inefficiency/inconvenience of needing SRE to merge any flink version upgrades? Since we have an embedded SRE on search (me) and to a lesser extent Guillaume, I think it wouldn't be too much of a problem. In general having our dependencies managed by a docker image will make it easier for us to be explicit about what version we're using, and it seems like the default docker-y way of doing things. Is there a technical reason why a base image might not be a good idea?

Feb 3 2021, 7:00 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper added a comment to T267927: Reload wikidata journal from fresh dumps.

We'll want to reload these this Friday, because the latest dumps should be available thursday evening.

Feb 3 2021, 5:41 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Feb 1 2021

RKemper claimed T267927: Reload wikidata journal from fresh dumps.
Feb 1 2021, 4:43 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
RKemper claimed T266470: Expose wdqs1009 to wdqs users and gather feedback.
Feb 1 2021, 4:33 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper moved T266470: Expose wdqs1009 to wdqs users and gather feedback from Ready for Development to In Progress on the Discovery-Search (Current work) board.
Feb 1 2021, 4:32 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
RKemper updated the task description for T273097: Create Flink Base Image.
Feb 1 2021, 4:25 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Jan 28 2021

RKemper added a comment to T265113: Memory issue on elastic1063 caused elasticsearch to be killed.

@Jclark-ctr In addition to Erik's point above about dmidecode being installed, we just deployed a patch to install edac-util on all Elasticsearch systems (this includes logstash*, cloudelastic* btw). So edac-util is now available for use

Jan 28 2021, 11:43 PM · Discovery-Search (Current work), ops-eqiad, SRE

Jan 27 2021

RKemper moved T272713: Failing HTTP check on WDQS servers after latest deployment from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

Barring any further issues cropping up, this is done.

Jan 27 2021, 1:25 AM · Patch-For-Review, Discovery-Search (Current work)

Jan 26 2021

RKemper committed rLPRI83db7a0dbf14: wdqs: add dummy key for new wdqs-internal cert (authored by RKemper).
wdqs: add dummy key for new wdqs-internal cert
Jan 26 2021, 10:18 PM
RKemper added a project to T272444: decommission relforge1001.eqiad.wmnet and relforge1002.eqiad.wmnet: ops-eqiad.
Jan 26 2021, 10:18 PM · SRE, ops-eqiad, decommission-hardware
RKemper reassigned T272444: decommission relforge1001.eqiad.wmnet and relforge1002.eqiad.wmnet from RKemper to Cmjohnson.
Jan 26 2021, 10:03 PM · SRE, ops-eqiad, decommission-hardware
RKemper updated the task description for T273009: Update decommission phab template to remove unneeded steps.
Jan 26 2021, 10:00 PM · DC-Ops
RKemper renamed T273009: Update decommission phab template to remove unneeded steps from Update decommission phab template to remove unneeded homer step to Update decommission phab template to remove unneeded steps.
Jan 26 2021, 9:59 PM · DC-Ops
RKemper updated the task description for T272444: decommission relforge1001.eqiad.wmnet and relforge1002.eqiad.wmnet.
Jan 26 2021, 9:43 PM · SRE, ops-eqiad, decommission-hardware
RKemper created T273009: Update decommission phab template to remove unneeded steps.
Jan 26 2021, 9:34 PM · DC-Ops
RKemper updated the task description for T272444: decommission relforge1001.eqiad.wmnet and relforge1002.eqiad.wmnet.
Jan 26 2021, 8:52 PM · SRE, ops-eqiad, decommission-hardware
RKemper updated the task description for T272444: decommission relforge1001.eqiad.wmnet and relforge1002.eqiad.wmnet.
Jan 26 2021, 8:36 PM · SRE, ops-eqiad, decommission-hardware
RKemper added a comment to T272713: Failing HTTP check on WDQS servers after latest deployment.

Since resolving this monitoring issue is one of our highest priorities, here's a handoff for Tues Jan 26 so that Europe can make headway:

Jan 26 2021, 8:28 AM · Patch-For-Review, Discovery-Search (Current work)
RKemper added a comment to T272713: Failing HTTP check on WDQS servers after latest deployment.

Finished generating new cert. Here's a (password-redacted) log of the changes made:

Jan 26 2021, 8:22 AM · Patch-For-Review, Discovery-Search (Current work)

Jan 23 2021

RKemper added a comment to T272713: Failing HTTP check on WDQS servers after latest deployment.

I've downtimed the WDQS sparql alerts until next week.

Jan 23 2021, 6:07 AM · Patch-For-Review, Discovery-Search (Current work)