Page MenuHomePhabricator

Provision Kask for Echo timestamp storage in k8s
Closed, ResolvedPublic

Description

Instances of Kask need to be provisioned (in both data-centers) for storage of Echo timestamps (alert, and notification last-seen times). These instances will connect to the RESTBase Cassandra cluster, and so will require keys created from that clusters authority.

TODO: Establish current Echo storage throughput (Redis) to inform resource requirements

Details

Related Gerrit Patches:
operations/deployment-charts : masterechostore: fixup Cassandra contact list
operations/deployment-charts : masterechostore: remove affinity (copypasta from sessionstore)
operations/deployment-charts : masterechostore: create production deployments
operations/deployment-charts : masterechostore: create new staging deployment
operations/deployment-charts : masterechostore: Add namespace creation stanzas

Event Timeline

Eevans triaged this task as Normal priority.Oct 1 2019, 9:06 PM
Eevans created this task.

Change 543212 had a related patch set uploaded (by Eevans; owner: Eevans):
[operations/deployment-charts@master] [WIP] echostore: create staging deployment

https://gerrit.wikimedia.org/r/543212

Change 543463 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] echostore: Add namespace creation stanzas

https://gerrit.wikimedia.org/r/543463

Change 543463 merged by jenkins-bot:
[operations/deployment-charts@master] echostore: Add namespace creation stanzas

https://gerrit.wikimedia.org/r/543463

Mentioned in SAL (#wikimedia-operations) [2019-10-16T14:24:04Z] <_joe_> creating namespaces and policies for echostore in codfw, T234376

Change 543212 merged by Eevans:
[operations/deployment-charts@master] echostore: create new staging deployment

https://gerrit.wikimedia.org/r/543212

Change 543699 had a related patch set uploaded (by Eevans; owner: Eevans):
[operations/deployment-charts@master] echostore: create production deployments

https://gerrit.wikimedia.org/r/543699

Change 543699 merged by Eevans:
[operations/deployment-charts@master] echostore: create production deployments

https://gerrit.wikimedia.org/r/543699

I'm unable to deploy to codfw; I'm seeing the following:

$ kubectl get events
LAST SEEN   TYPE      REASON              KIND         MESSAGE
46s         Warning   FailedScheduling    Pod          0/6 nodes are available: 2 Insufficient cpu, 4 node(s) didn't match node selector.
46s         Warning   FailedScheduling    Pod          0/6 nodes are available: 2 Insufficient cpu, 4 node(s) didn't match node selector.
46s         Warning   FailedScheduling    Pod          0/6 nodes are available: 2 Insufficient cpu, 4 node(s) didn't match node selector.
46s         Warning   FailedScheduling    Pod          0/6 nodes are available: 2 Insufficient cpu, 4 node(s) didn't match node selector.
9m6s        Normal    SuccessfulCreate    ReplicaSet   Created pod: kask-production-dfd5f9666-6zxfd
9m6s        Normal    SuccessfulCreate    ReplicaSet   Created pod: kask-production-dfd5f9666-c4mnd
9m6s        Normal    SuccessfulCreate    ReplicaSet   Created pod: kask-production-dfd5f9666-jx6sg
9m6s        Normal    SuccessfulCreate    ReplicaSet   Created pod: kask-production-dfd5f9666-2xztd
9m6s        Normal    ScalingReplicaSet   Deployment   Scaled up replica set kask-production-dfd5f9666 to 4

Change 543711 had a related patch set uploaded (by Eevans; owner: Eevans):
[operations/deployment-charts@master] echostore: remove affinity (copypasta from sessionstore)

https://gerrit.wikimedia.org/r/543711

Change 543711 merged by Eevans:
[operations/deployment-charts@master] echostore: remove affinity (copypasta from sessionstore)

https://gerrit.wikimedia.org/r/543711

Eevans added a subscriber: Joe.Oct 16 2019, 9:59 PM

From a conversation w/ @Joe on IRC, it seems the nodeAffinity section (copypasta from the sessionstore deployment) was likely causing the problem. I issued a helmfile delete, and updated the config (removing that section), but am now getting:

$ helmfile diff
Adding repo stable https://releases.wikimedia.org/charts/
"stable" has been added to your repositories

Updating repo
Hang tight while we grab the latest from your chart repositories...
...Skip local chart repository
...Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈ 

helmfile.yaml: basePath=.
Comparing production stable/kask
"production" has no deployed releases

in ./helmfile.yaml: failed processing release production: helm exited with status 1:
  Error: "production" has no deployed releases
  Error: plugin "diff" exited with erro

Perhaps there is some step required after helmfile delete?

Change 543731 had a related patch set uploaded (by Eevans; owner: Eevans):
[operations/deployment-charts@master] echostore: fixup Cassandra contact list

https://gerrit.wikimedia.org/r/543731

Change 543731 merged by Eevans:
[operations/deployment-charts@master] echostore: fixup Cassandra contact list

https://gerrit.wikimedia.org/r/543731

Eevans added a subscriber: CDanis.Oct 16 2019, 11:10 PM

Hat tip to @CDanis who pointed me at https://github.com/helm/helm/issues/3208#issuecomment-348154521; A helm delete production --purge did the trick.

Joe added a comment.Oct 17 2019, 5:32 AM

Heh yes sorry, I forgot to tell you yesterday - you need to use helmfile destroy in newer versions of helmfile.

Heh yes sorry, I forgot to tell you yesterday - you need to use helmfile destroy in newer versions of helmfile.

I'm pretty sure I tried that (it seemed like the Right Thing™ based on the description in the help synopsis), and got an error of a different kind. If that's supposed to be equivalent, I'll see if I can't suss out the exact error from my scroll buffer.

CCicalese_WMF closed this task as Resolved.Sat, Nov 2, 9:52 PM

Marking as Resolved per T234376#5582595.