Page MenuHomePhabricator

bking (Brian King)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Dec 15 2021, 9:19 PM (5 w, 5 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
BKing (WMF) [ Global Accounts ]

Recent Activity

Fri, Jan 21

bking added a comment to T299797: Deploy new elastic cluster nodes on deployment-prep.

Second attempt, created instance ID 48d468a8-7733-47fd-a078-cc4d931d1545 with deployment-elastic00 with hostname 'deployment-elastic00' (same as before). This time, the DNS record was created. I could get an SSH prompt, but could not login.

Fri, Jan 21, 11:35 PM · Discovery, Release-Engineering-Team (Radar), Beta-Cluster-Infrastructure
bking added a comment to T299797: Deploy new elastic cluster nodes on deployment-prep.

Booted instance ID 48ba77ab-3c6d-46ca-93fd-7a0785d7f45c with hostname 'deployment-elastic00.' I could ping and get an SSH prompt, but no login. Checking other instances for user-data and other possible methods for converge (puppet related metadata?).

Fri, Jan 21, 9:07 PM · Discovery, Release-Engineering-Team (Radar), Beta-Cluster-Infrastructure
bking added a comment to T299797: Deploy new elastic cluster nodes on deployment-prep.

Looking at https://phabricator.wikimedia.org/T278689 as a reference point for code changes required for new nodes. Will need to check with @Majavah for all details.

Fri, Jan 21, 8:13 PM · Discovery, Release-Engineering-Team (Radar), Beta-Cluster-Infrastructure
bking added a project to T299797: Deploy new elastic cluster nodes on deployment-prep: Discovery.
Fri, Jan 21, 8:05 PM · Discovery, Release-Engineering-Team (Radar), Beta-Cluster-Infrastructure
bking updated the task description for T278641: Migrate deployment-prep away from Debian Stretch to Buster/Bullseye.
Fri, Jan 21, 7:58 PM · Release-Engineering-Team (Radar), Beta-Cluster-Infrastructure
bking created T299797: Deploy new elastic cluster nodes on deployment-prep.
Fri, Jan 21, 7:46 PM · Discovery, Release-Engineering-Team (Radar), Beta-Cluster-Infrastructure

Tue, Jan 18

bking closed T299410: Disable puppet failure alerts on wcqs-beta-01.wikidata-query.eqiad.wmflabs as Resolved.
Tue, Jan 18, 2:49 PM · Discovery-Search (Current work)
bking created T299410: Disable puppet failure alerts on wcqs-beta-01.wikidata-query.eqiad.wmflabs.
Tue, Jan 18, 2:49 PM · Discovery-Search (Current work)

Fri, Jan 14

bking closed T299151: Ban elastic2035 from prod elastic clusters as Resolved.
Fri, Jan 14, 10:58 PM · Discovery
bking added a comment to T299151: Ban elastic2035 from prod elastic clusters.

Banned elastic2035 and elastic2051 (which was already broken) via the following commands:

Fri, Jan 14, 10:58 PM · Discovery
bking added a comment to T298738: Add bking as icinga user.

Confirmed working, sorry for the delay. Feel free to close.

Fri, Jan 14, 4:39 PM · SRE, SRE-Access-Requests
bking reopened T299151: Ban elastic2035 from prod elastic clusters as "In Progress".
Fri, Jan 14, 2:44 PM · Discovery

Thu, Jan 13

bking created T299177: Fix package dependency error on elasticsearch puppet config.
Thu, Jan 13, 10:47 PM · Patch-For-Review, Discovery-Search (Current work)
bking closed T299151: Ban elastic2035 from prod elastic clusters as Resolved.
Thu, Jan 13, 9:15 PM · Discovery
bking added a comment to T299151: Ban elastic2035 from prod elastic clusters.
Thu, Jan 13, 9:11 PM · Discovery
bking added a comment to P18732 elastic2051.codfw.wmnet reimage failure.

When I try to apt-install manually from the host, I get the following failure:
`
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
openjdk-8-jdk : Depends: openjdk-8-jre (= 8u312-b07-1~deb9u1) but it is not going to be installed

Depends: openjdk-8-jdk-headless (= 8u312-b07-1~deb9u1) but it is not going to be installed
Depends: libx11-6 but it is not going to be installed

wmf-elasticsearch-search-plugins : Depends: elasticsearch-oss (= 6.5.4) but it is not going to be installed or

elasticsearch (= 6.5.4) but it is not installable

E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).`

Thu, Jan 13, 7:01 PM
bking edited P18732 elastic2051.codfw.wmnet reimage failure.
Thu, Jan 13, 6:59 PM
bking created T299151: Ban elastic2035 from prod elastic clusters.
Thu, Jan 13, 6:56 PM · Discovery
bking added a comment to T298674: Degraded RAID on elastic2051.

Looks like the server is trying to PXE boot from its 1 GB NICs, but it should be using its 10GB NICs. Guessing this can be fixed through the BIOS based on papaul's recommendations.

Thu, Jan 13, 5:42 PM · Discovery-Search (Current work), Patch-For-Review, SRE, ops-codfw
bking added a comment to T298674: Degraded RAID on elastic2051.

More details on failure:
`Exception raised while executing cookbook sre.hosts.reimage:
Traceback (most recent call last):

File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 234, in run
  raw_ret = runner.run()
File "/srv/deployment/spicerack/cookbooks/sre/hosts/reimage.py", line 455, in run
  self._install_os()
File "/srv/deployment/spicerack/cookbooks/sre/hosts/reimage.py", line 303, in _install_os
  self.remote_installer.wait_reboot_since(di_reboot_time, print_progress_bars=False)
File "/usr/lib/python3/dist-packages/wmflib/decorators.py", line 210, in wrapper
  return func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/spicerack/remote.py", line 582, in wait_reboot_since
  f"Uptime for {nodeset} higher than threshold: {round(uptime, 2)} > {round(delta, 2)}"

spicerack.remote.RemoteCheckError: Uptime for elastic2051.codfw.wmnet higher than threshold: 1415.17 > 1355.2
The reimage failed, see the cookbook logs for the details
Reimage executed with errors:

  • elastic2051 (FAIL)`
Thu, Jan 13, 3:41 PM · Discovery-Search (Current work), Patch-For-Review, SRE, ops-codfw
bking created P18732 elastic2051.codfw.wmnet reimage failure.
Thu, Jan 13, 3:40 PM
bking added a comment to T298674: Degraded RAID on elastic2051.

Per yesterday's conversation with @Gehel (and Moritz's suggestion above) , we have elected to reimage this server to Stretch and deal with the Bullseye issues separately. Working this now...

Thu, Jan 13, 2:55 PM · Discovery-Search (Current work), Patch-For-Review, SRE, ops-codfw
bking reopened T298674: Degraded RAID on elastic2051 as "In Progress".
Thu, Jan 13, 2:49 PM · Discovery-Search (Current work), Patch-For-Review, SRE, ops-codfw
bking claimed T298674: Degraded RAID on elastic2051.
Thu, Jan 13, 2:49 PM · Discovery-Search (Current work), Patch-For-Review, SRE, ops-codfw

Wed, Jan 12

bking added a comment to T298674: Degraded RAID on elastic2051.

@MoritzMuehlenhoff This is a good point; will discuss further with my team today.

Wed, Jan 12, 2:21 PM · Discovery-Search (Current work), Patch-For-Review, SRE, ops-codfw

Tue, Jan 11

bking added a comment to T289135: Upgrade Cirrus Elasticsearch clusters to Debian Bullseye.

The output of 'run-puppet-agent' : https://phabricator.wikimedia.org/P18581

Tue, Jan 11, 10:48 PM · Discovery-Search (Current work), SRE
bking updated subscribers of T296470: Initialize WCQS production servers.
Tue, Jan 11, 9:49 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
bking added a comment to T296470: Initialize WCQS production servers.

Started data load via tmux session on cumin1001 at ~ Tue Jan 11 16:53:46 2022 . Expected to take at least 24 hours.

Tue, Jan 11, 8:58 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
bking added a comment to T298981: Create Kerberos login for Brian King (bking).

This is confirmed working, feel free to close this ticket.

Tue, Jan 11, 4:45 PM · SRE, Data-Engineering, LDAP-Access-Requests, SRE-Access-Requests
bking updated the task description for T298981: Create Kerberos login for Brian King (bking).
Tue, Jan 11, 4:14 PM · SRE, Data-Engineering, LDAP-Access-Requests, SRE-Access-Requests
bking updated the task description for T298981: Create Kerberos login for Brian King (bking).
Tue, Jan 11, 4:13 PM · SRE, Data-Engineering, LDAP-Access-Requests, SRE-Access-Requests
bking created T298981: Create Kerberos login for Brian King (bking).
Tue, Jan 11, 4:13 PM · SRE, Data-Engineering, LDAP-Access-Requests, SRE-Access-Requests
bking updated the title for P18581 'run-puppet-agent' output from elastic2051 (bullseye) from puppet output to 'run-puppet-agent' output from elastic2051 (bullseye).
Tue, Jan 11, 3:49 PM
bking added a comment to P18581 'run-puppet-agent' output from elastic2051 (bullseye).

output of apt-get update

Tue, Jan 11, 3:48 PM
bking created P18581 'run-puppet-agent' output from elastic2051 (bullseye).
Tue, Jan 11, 3:46 PM

Mon, Jan 10

bking added a comment to T298674: Degraded RAID on elastic2051.

The server reimage to bullseye is incomplete due to missing packages (among other things). I found an epic with more details , the next steps for me are to look at the output of the sre hosts reimage puppet run posted above, and address the failures one by one.

Mon, Jan 10, 2:40 PM · Discovery-Search (Current work), Patch-For-Review, SRE, ops-codfw

Fri, Jan 7

bking added projects to T298817: Get familiar with ES non-prod environments: SRE, Discovery-Search (Current work).
Fri, Jan 7, 10:32 PM · Discovery-Search (Current work), SRE
bking created T298817: Get familiar with ES non-prod environments.
Fri, Jan 7, 10:32 PM · Discovery-Search (Current work), SRE

Thu, Jan 6

bking added a comment to T298674: Degraded RAID on elastic2051.

@Papaul Checked the box with 'hdparm', the failed disk is at sda, but it is not displaying its serial number.

Thu, Jan 6, 11:02 PM · Discovery-Search (Current work), Patch-For-Review, SRE, ops-codfw
bking created T298738: Add bking as icinga user.
Thu, Jan 6, 9:48 PM · SRE, SRE-Access-Requests

Wed, Jan 5

bking added a comment to T298570: Consider filesystem/disk based improvements on WQDS servers.

@Aklapper Just added it, let me know if it looks correct. If associating a project is required, I'd also suggest making this a required field when creating tasks (if possible).

Wed, Jan 5, 3:40 PM · SRE, Discovery-Search (Current work)
bking added a project to T298570: Consider filesystem/disk based improvements on WQDS servers: Discovery-Search (Current work).
Wed, Jan 5, 3:38 PM · SRE, Discovery-Search (Current work)

Tue, Jan 4

bking created T298570: Consider filesystem/disk based improvements on WQDS servers.
Tue, Jan 4, 10:23 PM · SRE, Discovery-Search (Current work)
bking added a comment to T298525: Tune "BlazegraphFreeAllocatorsDecreasingRapidly" alerts.

Related commits here

Tue, Jan 4, 9:53 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
bking renamed T298525: Tune "BlazegraphFreeAllocatorsDecreasingRapidly" alerts from Tune "BlazegraphFreeAllocatorsDecreasingRapidly" to Tune "BlazegraphFreeAllocatorsDecreasingRapidly" alerts.
Tue, Jan 4, 4:22 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
bking updated subscribers of T298525: Tune "BlazegraphFreeAllocatorsDecreasingRapidly" alerts.

More context from @dcausse :

Tue, Jan 4, 3:08 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
bking claimed T298525: Tune "BlazegraphFreeAllocatorsDecreasingRapidly" alerts.
Tue, Jan 4, 2:57 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
bking created T298525: Tune "BlazegraphFreeAllocatorsDecreasingRapidly" alerts.
Tue, Jan 4, 2:40 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Dec 22 2021

bking placed T298194: Research improvements to Pwstore process up for grabs.
Dec 22 2021, 2:43 PM · SRE
bking updated the task description for T298194: Research improvements to Pwstore process.
Dec 22 2021, 2:24 PM · SRE
bking created T298194: Research improvements to Pwstore process.
Dec 22 2021, 2:21 PM · SRE

Dec 21 2021

bking added a subtask for T297907: SRE Onboarding - Brian King, Search Platform team: T298144: Complete onboarding tasks listed in mediawiki.
Dec 21 2021, 8:38 PM · Discovery-Search (Current work)
bking added a parent task for T298144: Complete onboarding tasks listed in mediawiki: T297907: SRE Onboarding - Brian King, Search Platform team.
Dec 21 2021, 8:38 PM · Discovery-Search (Current work)
bking created T298144: Complete onboarding tasks listed in mediawiki.
Dec 21 2021, 8:28 PM · Discovery-Search (Current work)
bking added a member for Discovery: bking.
Dec 21 2021, 8:08 PM
bking added a watcher for Discovery: bking.
Dec 21 2021, 8:08 PM
bking renamed T271143: Some Search Platform / Discovery clusters apparently do not support IPv6 from to Some Search Platform / Discovery clusters apparently do not support IPv6 .
Dec 21 2021, 3:41 PM · Discovery-Search (Current work), Infrastructure-Foundations, IPv6, User-crusnov, Discovery, SRE-tools
bking renamed T271143: Some Search Platform / Discovery clusters apparently do not support IPv6 from Some Search Platform / Discovery clusters apparently do not support IPv6 to .
Dec 21 2021, 3:39 PM · Discovery-Search (Current work), Infrastructure-Foundations, IPv6, User-crusnov, Discovery, SRE-tools
bking claimed T271143: Some Search Platform / Discovery clusters apparently do not support IPv6 .
Dec 21 2021, 3:32 PM · Discovery-Search (Current work), Infrastructure-Foundations, IPv6, User-crusnov, Discovery, SRE-tools

Dec 20 2021

bking changed the status of T297907: SRE Onboarding - Brian King, Search Platform team, a subtask of T297910: Requesting shell access for Brian King, from Open to In Progress.
Dec 20 2021, 4:51 PM · SRE, LDAP-Access-Requests, SRE-Access-Requests
bking changed the status of T297907: SRE Onboarding - Brian King, Search Platform team from Open to In Progress.
Dec 20 2021, 4:51 PM · Discovery-Search (Current work)
bking changed the status of T297907: SRE Onboarding - Brian King, Search Platform team, a subtask of T297891: Add bking (me) to #wmf-nda, from Open to In Progress.
Dec 20 2021, 4:51 PM · WMF-NDA-Requests

Dec 16 2021

bking renamed T297910: Requesting shell access for Brian King from Requesting access to RESOURCE for Brian King (bking@wikimedia.org) to Requesting access to LDAP groups for Brian King (bking@wikimedia.org).
Dec 16 2021, 8:58 PM · SRE, LDAP-Access-Requests, SRE-Access-Requests
bking created T297910: Requesting shell access for Brian King.
Dec 16 2021, 8:58 PM · SRE, LDAP-Access-Requests, SRE-Access-Requests
bking created T297891: Add bking (me) to #wmf-nda.
Dec 16 2021, 3:53 PM · WMF-NDA-Requests