Page MenuHomePhabricator
Feed Advanced Search

May 24 2023

klausman closed T277492: Investigate separating k8s-level users between our k8s and thr ServiceOps k8s as Resolved.

Yes, I think we can close this. I'd even say that having different user configs in codfw vs eqiad is an antipattern.

May 24 2023, 8:52 AM · Machine-Learning-Team

May 22 2023

klausman claimed T337213: Update to KServe 0.11.
May 22 2023, 7:57 AM · Machine-Learning-Team

May 17 2023

klausman added a comment to T333124: Move Revert-risk multilingual model from staging to production.

The changes from 920208 have been deployed.

May 17 2023, 3:13 PM · Machine-Learning-Team, Lift-Wing

May 15 2023

klausman added a comment to T333124: Move Revert-risk multilingual model from staging to production.

Namespaces are live in both eqiad and codfw:

May 15 2023, 2:24 PM · Machine-Learning-Team, Lift-Wing
klausman committed rLPRIff13336077b8: role::mlserve: Add dummy data for revertrisk user.
role::mlserve: Add dummy data for revertrisk user
May 15 2023, 2:07 PM

May 11 2023

klausman added a comment to T333124: Move Revert-risk multilingual model from staging to production.

@klausman do you have time to work with Aiko to push this to production during then next days?

May 11 2023, 8:37 AM · Machine-Learning-Team, Lift-Wing

May 2 2023

klausman updated the task description for T334049: codfw row C switches upgrade.
May 2 2023, 1:24 PM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, serviceops, Infrastructure-Foundations, SRE Observability, collaboration-services, Platform Engineering, Traffic, Data-Engineering, Machine-Learning-Team, netops, cloud-services-team

Apr 28 2023

klausman closed T331712: Migrate ml-cache to Bullseye as Resolved.

All machines in codfw done.

Apr 28 2023, 8:42 AM · SRE, Machine-Learning-Team
klausman closed T331712: Migrate ml-cache to Bullseye, a subtask of T291916: Tracking task for Bullseye migrations in production, as Resolved.
Apr 28 2023, 8:42 AM · User-Elukey, Epic, Infrastructure-Foundations, SRE

Apr 26 2023

klausman updated the task description for T334049: codfw row C switches upgrade.
Apr 26 2023, 9:58 AM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, serviceops, Infrastructure-Foundations, SRE Observability, collaboration-services, Platform Engineering, Traffic, Data-Engineering, Machine-Learning-Team, netops, cloud-services-team
klausman updated the task description for T335042: codfw row D switches upgrade.
Apr 26 2023, 9:58 AM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, netops, Machine-Learning-Team, Traffic, collaboration-services, SRE Observability, serviceops, cloud-services-team, Infrastructure-Foundations, Platform Engineering
klausman updated the task description for T335042: codfw row D switches upgrade.
Apr 26 2023, 9:57 AM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, netops, Machine-Learning-Team, Traffic, collaboration-services, SRE Observability, serviceops, cloud-services-team, Infrastructure-Foundations, Platform Engineering
klausman updated the task description for T334049: codfw row C switches upgrade.
Apr 26 2023, 9:56 AM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, serviceops, Infrastructure-Foundations, SRE Observability, collaboration-services, Platform Engineering, Traffic, Data-Engineering, Machine-Learning-Team, netops, cloud-services-team
klausman updated the task description for T334049: codfw row C switches upgrade.
Apr 26 2023, 9:56 AM · Data-Platform-SRE, Discovery-Search (Current work), SRE, DBA, serviceops, Infrastructure-Foundations, SRE Observability, collaboration-services, Platform Engineering, Traffic, Data-Engineering, Machine-Learning-Team, netops, cloud-services-team

Apr 19 2023

klausman added a comment to T330414: Create ORES migration endpoint (ORES/Liftwing translation).

Namespace has been created on staging, and is visible:

Apr 19 2023, 1:59 PM · Patch-For-Review, Machine-Learning-Team
klausman committed rLPRI03f29d6c38a8: hiera: Add faux secrets for ores-legacy service on Lift Wing.
hiera: Add faux secrets for ores-legacy service on Lift Wing
Apr 19 2023, 10:34 AM

Apr 18 2023

klausman updated the task description for T333377: eqiad row D switches upgrade.
Apr 18 2023, 1:50 PM · netops, DBA, Discovery-Search (Current work), SRE, cloud-services-team, Data-Engineering, Traffic, Machine-Learning-Team, collaboration-services, Infrastructure-Foundations, Platform Engineering, SRE Observability
klausman updated the task description for T333377: eqiad row D switches upgrade.
Apr 18 2023, 1:21 PM · netops, DBA, Discovery-Search (Current work), SRE, cloud-services-team, Data-Engineering, Traffic, Machine-Learning-Team, collaboration-services, Infrastructure-Foundations, Platform Engineering, SRE Observability

Apr 17 2023

klausman added a comment to T327620: Define SLI/SLO for Lift Wing.

https://wikitech.wikimedia.org/wiki/SLO/Lift_Wing Started a draft doc here

Apr 17 2023, 4:21 PM · Machine-Learning-Team
klausman added a comment to T327620: Define SLI/SLO for Lift Wing.

Yes, my plan was to elaborte on my write up a bit (it's mostly for sorting my thoughts), and then use the template you mentioned and develop that into something like the API GW SLO (with plenty SRE input).

Apr 17 2023, 9:16 AM · Machine-Learning-Team

Mar 30 2023

klausman added a comment to T327620: Define SLI/SLO for Lift Wing.

https://docs.google.com/document/d/1NspQtkfyuD_kiYCgms1gRZeFFiAaetnk/edit <- My thoughts so far, comments here or on the doc welcome.

Mar 30 2023, 12:41 PM · Machine-Learning-Team

Feb 24 2023

klausman added a comment to T305447: Automate the procedure to bootstrap minikube on the ML-Sandbox and to share it by multiple users.

The current (now resolved) reason for the disk fillup was a 22G logfile:

Feb 24 2023, 11:19 AM · Machine-Learning-Team

Feb 21 2023

klausman moved T329135: Move secret keys to constants in WikiGPT from In Progress to Complete Q3 2022/23 on the Machine-Learning-Team board.
Feb 21 2023, 3:41 PM · Machine-Learning-Team
klausman moved T329003: Add unique URL to each answer in WikiGPT from In Progress to Complete Q3 2022/23 on the Machine-Learning-Team board.
Feb 21 2023, 3:41 PM · Machine-Learning-Team
klausman moved T329345: Fix feature to view old search results from In Progress to Complete Q3 2022/23 on the Machine-Learning-Team board.
Feb 21 2023, 3:41 PM · Machine-Learning-Team
klausman moved T329016: [WikiGPT] Improve search results of WikiGPT from In Progress to Blocked on the Machine-Learning-Team board.
Feb 21 2023, 3:41 PM · Machine-Learning-Team
klausman moved T329528: Fix WikiGPT copy link feature mobile view from Unsorted to Blocked on the Machine-Learning-Team board.
Feb 21 2023, 3:28 PM · Machine-Learning-Team
klausman moved T330165: eqiad row B switches upgrade from Unsorted to Complete Q3 2022/23 on the Machine-Learning-Team board.
Feb 21 2023, 3:10 PM · Patch-For-Review, Data Pipelines, Data-Engineering-Planning, DBA, Discovery-Search (Current work), SRE, serviceops, cloud-services-team, Machine-Learning-Team, Platform Engineering, SRE Observability, Infrastructure-Foundations, collaboration-services, Traffic
klausman moved T329073: eqiad row A switches upgrade from Unsorted to Complete Q3 2022/23 on the Machine-Learning-Team board.
Feb 21 2023, 3:10 PM · Patch-For-Review, Discovery-Search (Current work), Shared-Data-Infrastructure, Data-Engineering-Planning, DBA, SRE, Platform Engineering, Infrastructure-Foundations, Traffic, serviceops, Machine-Learning-Team, cloud-services-team, Data-Persistence, SRE Observability, collaboration-services

Feb 13 2023

klausman added a comment to T329556: K8s etcd on bullseye show TLS errors in logs.

I resolved this by doing the following:

Feb 13 2023, 5:52 PM · Patch-For-Review, Foundational Technology Requests, Shared-Data-Infrastructure, Kubernetes, Prod-Kubernetes, serviceops

Dec 13 2022

klausman added a comment to T324464: Cleanup NLLB200 docker image .

We did some more refactoring/improving of the Docker image today, and have done basic tests. The staging endpoint now uses the new image, and it looks like it's working fine. The new way of building the image has been committed to my fork of Stopes, on the usual aws_publish branch.

Dec 13 2022, 4:49 PM · Wikimedia Enterprise, Machine-Learning-Team, ContentTranslation
achou awarded T325051: Create new GitLab project group: machine-learning a Yellow Medal token.
Dec 13 2022, 3:32 PM · GitLab (Project Migration), Release-Engineering-Team
klausman moved T324468: Write/polish documentation for NLLb200 on AWS from Unsorted to In Progress on the Machine-Learning-Team board.
Dec 13 2022, 3:14 PM · Wikimedia Enterprise, Machine-Learning-Team
klausman created T325051: Create new GitLab project group: machine-learning.
Dec 13 2022, 10:49 AM · GitLab (Project Migration), Release-Engineering-Team

Dec 12 2022

klausman created P42674 (An Untitled Masterwork).
Dec 12 2022, 5:08 PM

Dec 5 2022

klausman placed T324464: Cleanup NLLB200 docker image up for grabs.
Dec 5 2022, 2:57 PM · Wikimedia Enterprise, Machine-Learning-Team, ContentTranslation
klausman created T324468: Write/polish documentation for NLLb200 on AWS.
Dec 5 2022, 2:56 PM · Wikimedia Enterprise, Machine-Learning-Team
klausman renamed T324467: Add monitoring+alerting for NLLB200 AWS service from Add monitoring+alerting for AWS service to Add monitoring+alerting for NLLB200 AWS service.
Dec 5 2022, 2:55 PM · Wikimedia Enterprise, Machine-Learning-Team, ContentTranslation
klausman created T324467: Add monitoring+alerting for NLLB200 AWS service.
Dec 5 2022, 2:54 PM · Wikimedia Enterprise, Machine-Learning-Team, ContentTranslation
klausman created T324464: Cleanup NLLB200 docker image .
Dec 5 2022, 2:54 PM · Wikimedia Enterprise, Machine-Learning-Team, ContentTranslation

Nov 30 2022

klausman created P41894 (An Untitled Masterwork).
Nov 30 2022, 4:17 PM
klausman added a comment to T323925: codfw: ManagementSSHDown for ores2009 and thumbor2004.

ores2009 is shutting down & powering off now

Nov 30 2022, 3:58 PM · SRE, serviceops, ops-codfw

Nov 28 2022

klausman moved T323916: Configure LW Inference services on API GW config from Unsorted to In Progress on the Machine-Learning-Team board.
Nov 28 2022, 3:12 PM · Machine-Learning-Team, Lift-Wing
klausman added projects to T323916: Configure LW Inference services on API GW config: Lift-Wing, Machine-Learning-Team.
Nov 28 2022, 3:12 PM · Machine-Learning-Team, Lift-Wing
klausman added a subtask for T288789: API Gateway Integration: T323916: Configure LW Inference services on API GW config.
Nov 28 2022, 3:11 PM · Epic, Machine-Learning-Team, Lift-Wing
klausman added a parent task for T323916: Configure LW Inference services on API GW config: T288789: API Gateway Integration.
Nov 28 2022, 3:11 PM · Machine-Learning-Team, Lift-Wing
klausman created T323916: Configure LW Inference services on API GW config.
Nov 28 2022, 3:11 PM · Machine-Learning-Team, Lift-Wing

Nov 25 2022

klausman moved T319178: Decide external URL scheme (on API GW) for models on Lift Wing from In Progress to Complete Q3 2022/23 on the Machine-Learning-Team board.
Nov 25 2022, 9:49 AM · Machine-Learning-Team, Lift-Wing

Nov 23 2022

klausman added a comment to T319178: Decide external URL scheme (on API GW) for models on Lift Wing.

After some discussion, we have decided that the API-GW side URL scheme for LW should look like:

Nov 23 2022, 3:02 PM · Machine-Learning-Team, Lift-Wing

Nov 22 2022

klausman added a subtask for T288789: API Gateway Integration: T319178: Decide external URL scheme (on API GW) for models on Lift Wing.
Nov 22 2022, 3:45 PM · Epic, Machine-Learning-Team, Lift-Wing
klausman added a parent task for T319178: Decide external URL scheme (on API GW) for models on Lift Wing: T288789: API Gateway Integration.
Nov 22 2022, 3:45 PM · Machine-Learning-Team, Lift-Wing
klausman closed T302516: Help Language team to make progress on open MT models to be used by Content Translation tool as Resolved.

I'll close this ticket for now, since the main effort is focused on NLLB200 on AWS (https://phabricator.wikimedia.org/T321781). If-when we look at MarianNMT again, we can reopen (or more likely make a new Task).

Nov 22 2022, 3:28 PM · Machine-Learning-Team

Nov 8 2022

klausman closed T312564: Move Wikilabels Postgres Instances to VMs as Resolved.
Nov 8 2022, 3:44 PM · Machine-Learning-Team
klausman added a comment to T307389: Upgrade wikilabels databases to buster/bullseye.

This still needs a fix to https://github.com/wikimedia/wikilabels-wmflabs-deploy/blob/master/config/00-main.yaml#L20 which I have prepared in https://github.com/wikimedia/wikilabels-wmflabs-deploy/pull/57

Nov 8 2022, 11:06 AM · Patch-For-Review, Machine-Learning-Team, Wikilabels, cloud-services-team (Kanban), Data-Services, Cloud-VPS (Debian Stretch Deprecation)

Nov 3 2022

klausman triaged T312564: Move Wikilabels Postgres Instances to VMs as Medium priority.
Nov 3 2022, 11:51 AM · Machine-Learning-Team

Nov 2 2022

klausman moved T307389: Upgrade wikilabels databases to buster/bullseye from Unsorted to Complete Q3 2022/23 on the Machine-Learning-Team board.
Nov 2 2022, 12:09 PM · Patch-For-Review, Machine-Learning-Team, Wikilabels, cloud-services-team (Kanban), Data-Services, Cloud-VPS (Debian Stretch Deprecation)
klausman moved T312564: Move Wikilabels Postgres Instances to VMs from In Progress to Complete Q3 2022/23 on the Machine-Learning-Team board.
Nov 2 2022, 12:09 PM · Machine-Learning-Team
klausman reopened T307389: Upgrade wikilabels databases to buster/bullseye as "In Progress".
Nov 2 2022, 12:09 PM · Patch-For-Review, Machine-Learning-Team, Wikilabels, cloud-services-team (Kanban), Data-Services, Cloud-VPS (Debian Stretch Deprecation)
klausman reopened T307389: Upgrade wikilabels databases to buster/bullseye, a subtask of T306061: Cloud VPS "clouddb-services" project Stretch deprecation, as In Progress.
Nov 2 2022, 12:09 PM · cloud-services-team, Data-Services, Cloud-VPS (Debian Stretch Deprecation)
klausman reopened T312564: Move Wikilabels Postgres Instances to VMs as "In Progress".
Nov 2 2022, 12:08 PM · Machine-Learning-Team
klausman closed T312564: Move Wikilabels Postgres Instances to VMs as Resolved.
Nov 2 2022, 10:43 AM · Machine-Learning-Team
klausman added a comment to T312564: Move Wikilabels Postgres Instances to VMs.

As just added to T307389: DBs have been migrated and docs updated. Taavi has shut down the old clouddb instances and if we don't find we still need them for some reason, will delete them in a week.

Nov 2 2022, 10:43 AM · Machine-Learning-Team
klausman closed T307389: Upgrade wikilabels databases to buster/bullseye as Resolved.

Created VM and Puppet stuff as detailed above, and migrated the data, then switched the uwsgi applications on the main instance and staging to use said VM. Updated docs accordingly, including this new section:

Nov 2 2022, 10:42 AM · Patch-For-Review, Machine-Learning-Team, Wikilabels, cloud-services-team (Kanban), Data-Services, Cloud-VPS (Debian Stretch Deprecation)
klausman closed T307389: Upgrade wikilabels databases to buster/bullseye, a subtask of T306061: Cloud VPS "clouddb-services" project Stretch deprecation, as Resolved.
Nov 2 2022, 10:41 AM · cloud-services-team, Data-Services, Cloud-VPS (Debian Stretch Deprecation)

Oct 25 2022

klausman moved T319178: Decide external URL scheme (on API GW) for models on Lift Wing from Parked to In Progress on the Machine-Learning-Team (Active Tasks) board.
Oct 25 2022, 2:54 PM · Machine-Learning-Team, Lift-Wing
klausman claimed T319178: Decide external URL scheme (on API GW) for models on Lift Wing.
Oct 25 2022, 2:42 PM · Machine-Learning-Team, Lift-Wing
klausman claimed T307389: Upgrade wikilabels databases to buster/bullseye.
Oct 25 2022, 2:38 PM · Patch-For-Review, Machine-Learning-Team, Wikilabels, cloud-services-team (Kanban), Data-Services, Cloud-VPS (Debian Stretch Deprecation)
klausman moved T312564: Move Wikilabels Postgres Instances to VMs from Parked to In Progress on the Machine-Learning-Team (Active Tasks) board.
Oct 25 2022, 2:04 PM · Machine-Learning-Team

Oct 3 2022

klausman triaged T319178: Decide external URL scheme (on API GW) for models on Lift Wing as High priority.
Oct 3 2022, 10:48 AM · Machine-Learning-Team, Lift-Wing
klausman created T319178: Decide external URL scheme (on API GW) for models on Lift Wing.
Oct 3 2022, 10:46 AM · Machine-Learning-Team, Lift-Wing

Sep 6 2022

klausman created T317091: Cable/connection issue on ml-cache1001.eqiad.wmnet.
Sep 6 2022, 10:34 AM · ops-eqiad

Aug 31 2022

klausman created P33717 (An Untitled Masterwork).
Aug 31 2022, 1:34 PM

Aug 22 2022

klausman added a comment to T315652: Define custom rate limit tiers for machine learning projects .

From an ML POV, the useful tiers would probably be:

Aug 22 2022, 10:44 AM · Core Platform Team Initiatives (API Gateway)

Aug 17 2022

klausman added a comment to T312564: Move Wikilabels Postgres Instances to VMs.

A few notes:

Aug 17 2022, 1:30 PM · Machine-Learning-Team

Jul 27 2022

klausman committed rLPRI756a0ad1d5b4: ML k8s: fix articletopic-outlink names.
ML k8s: fix articletopic-outlink names
Jul 27 2022, 11:10 AM
klausman committed rLPRI8e9d4c83b1ee: ml-k8s: add dummy secrects for article-outlink.
ml-k8s: add dummy secrects for article-outlink
Jul 27 2022, 10:52 AM
klausman added a comment to T313822: codfw: ml-serve2001 memmory issue DIMM A2.

Ok, the machine is booted and sitting in GRUB. @Papaul I can't seem to run memtes86+ via idrac (I just get a black screen). Can you check whether it works with direct access? Alternatively, do you know how to run it so that console redirection works? Thanks!

Jul 27 2022, 10:01 AM · Machine-Learning-Team, SRE, ops-codfw

Jul 26 2022

klausman closed T312550: uwsgi socket/UDP logger is broken if no other logger uses the same format as Resolved.

Change 817210 actually fixes this, we now see messages in logstash again. Apparently, an unset buffer size causes JSON generation to break. The upstream bug is still open, but I doubt it will be fixed soon, especially with a mitigation being available now.

Jul 26 2022, 1:39 PM · SRE

Jul 7 2022

klausman added a comment to T312550: uwsgi socket/UDP logger is broken if no other logger uses the same format.

Upstream issue: https://github.com/unbit/uwsgi/issues/2456

Jul 7 2022, 3:39 PM · SRE
klausman claimed T312550: uwsgi socket/UDP logger is broken if no other logger uses the same format.
Jul 7 2022, 2:47 PM · SRE
klausman updated the task description for T312550: uwsgi socket/UDP logger is broken if no other logger uses the same format.
Jul 7 2022, 2:29 PM · SRE
klausman updated the task description for T312550: uwsgi socket/UDP logger is broken if no other logger uses the same format.
Jul 7 2022, 2:29 PM · SRE
klausman created T312550: uwsgi socket/UDP logger is broken if no other logger uses the same format.
Jul 7 2022, 2:28 PM · SRE

Jul 5 2022

klausman added a comment to T302195: Create the ml-serve-staging k8s cluster.

Now also running draftquality for enwiki:

Jul 5 2022, 1:38 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)

Jun 29 2022

klausman created T311628: Create Swift account for readonly access to ML models.
Jun 29 2022, 1:32 PM · Machine-Learning-Team (Active Tasks), SRE-swift-storage, Lift-Wing

Jun 23 2022

klausman added a comment to T302195: Create the ml-serve-staging k8s cluster.

Prometheus is now correctly set up with its own volumes (we hadn't done that yet), and I managed to save the old data.

Jun 23 2022, 2:56 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)

Jun 22 2022

klausman added a comment to T302195: Create the ml-serve-staging k8s cluster.

Add'l things done:

Jun 22 2022, 1:25 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
klausman committed rLPRI60f26db2ba1a: pki: Fix wrong cluster name for ML staging k8s.
pki: Fix wrong cluster name for ML staging k8s
Jun 22 2022, 12:28 PM
klausman committed rLPRI45bed6f9e285: Add dummy secrets for ML staging k8s CA.
Add dummy secrets for ML staging k8s CA
Jun 22 2022, 12:28 PM

Jun 21 2022

klausman created P29933 (An Untitled Masterwork).
Jun 21 2022, 10:51 AM

Jun 13 2022

klausman added a comment to T302195: Create the ml-serve-staging k8s cluster.

Istio config and (most of) the cert-manager config have been applied. For cert-manager, I need to sync up with Luca regarding part of said config referring to the ml-serve endpoints.

Jun 13 2022, 11:21 AM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)

May 13 2022

klausman closed T303801: Upgrade ORES to Debian Buster as Resolved.
May 13 2022, 9:06 AM · Machine-Learning-Team (Active Tasks), Patch-For-Review

May 10 2022

klausman claimed T303801: Upgrade ORES to Debian Buster.
May 10 2022, 2:21 PM · Machine-Learning-Team (Active Tasks), Patch-For-Review

Mar 22 2022

klausman committed rLPRI899d25e97d8d: labs: Add dummy keyfile for ML staging k8s in codfw.
labs: Add dummy keyfile for ML staging k8s in codfw
Mar 22 2022, 4:06 PM
klausman committed rLPRIbd2fb2724109: hiera: Add k8s dummy tokens for ML staging env.
hiera: Add k8s dummy tokens for ML staging env
Mar 22 2022, 3:51 PM
klausman committed rLPRId554eac0951b: hiera: add dummy tokens for ML staging k8s setup.
hiera: add dummy tokens for ML staging k8s setup
Mar 22 2022, 2:24 PM

Mar 18 2022

klausman added a comment to T302701: Re-evaluate ip pools for ml-serve-{eqiad,codfw}.

I put the smaller staging allocation at the end to avoid fragmentation (at least for now, it can't be avoided forever, in my experience). Similar, the Train/DSE range is "flipped" (/21 first) to avoid fragmentation between it and the preceding prod ranges. If there would be sufficiently smaller ranges needed in EQIAD for future projects, they should follow the same scheme as the staging ranges in CODFW (allocate from the end, try to avoid fragmentation in the same alternating-sizes pattern as for prod/train).

Mar 18 2022, 3:32 PM · Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
klausman added a comment to T302701: Re-evaluate ip pools for ml-serve-{eqiad,codfw}.

I have setup IP ranges (and sliced them up for our use):

Mar 18 2022, 3:29 PM · Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)

Mar 15 2022

klausman closed T302197: Create etcd cluster for ml-serve-staging k8s, a subtask of T302195: Create the ml-serve-staging k8s cluster, as Resolved.
Mar 15 2022, 5:55 PM · Patch-For-Review, Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)
klausman closed T302197: Create etcd cluster for ml-serve-staging k8s as Resolved.
# etcdctl -C https://ml-staging-etcd2001.codfw.wmnet:2379  cluster-health 
member 493aa03d462725d1 is healthy: got healthy result from https://ml-staging-etcd2002.codfw.wmnet:2379
member b12825ca936a35a6 is healthy: got healthy result from https://ml-staging-etcd2003.codfw.wmnet:2379
member fce0f93975c27096 is healthy: got healthy result from https://ml-staging-etcd2001.codfw.wmnet:2379
cluster is healthy
#
Mar 15 2022, 5:55 PM · Epic, Lift-Wing, Machine-Learning-Team (Active Tasks)