Page MenuHomePhabricator

Upgrade sessionstore cluster to Cassandra 4.1.1
Closed, ResolvedPublic

Assigned To
Authored By
Eevans
May 24 2023, 7:07 PM
Referenced Files
F37097221: sessionss latency distribution (posts).png
Jun 7 2023, 8:39 PM
F37097219: sessions latency distribution.png
Jun 7 2023, 8:39 PM
Restricted File
Jun 7 2023, 6:59 PM
F37097130: image.png
Jun 7 2023, 6:54 PM
F37097128: image.png
Jun 7 2023, 6:54 PM

Description

codfw (2023-06-07 (tentative))

  • Depool codfw data-center
  • Create keyspace snapshots
sudo cumin 'sessionstore2* ' 'c-foreach-nt snapshot -t 4x_upgrade-`date +%Y%m%d%H%I` -- sessions system system_auth system_distributed system_schema system_traces'
  • Upgrade sessionstore2001.codfw.wmnet (merge r926588)
    • Kask Ok (logs)?
    • Logging works (local/remote)?
    • Logged errors/warnings?
    • Prometheus metrics?
    • cqlsh works?
    • nodetool works?
    • Handles (generated) traffic Ok?
      • Load
      • Latency
      • GC
  • Upgrade sessionstore2002.codfw.wmnet (merge r926589)
  • Upgrade sessionstore2003.codfw.wmnet (merge r926590)
  • Generate traffic/load
ssh deploy2002.codfw.wmnet -- siege -f /home/eevans/T327954/urls.txt -i -c 64 -t 3H -d 0.1
    • Cassandra memory/GC Ok?
    • Load
    • Latency
    • Logged errors?
  • Repool codfw data-center (+3h (tentative))
    • Logged errors/warnings?
    • Handles traffic Ok?

eqiad (2023-06-08 (tentative))

  • Depool eqiad data-center
  • Create keyspace snapshots
sudo cumin 'sessionstore1*' 'c-foreach-nt snapshot -t 4x_upgrade-`date +%Y%m%d%H%I` -- sessions system system_auth system_distributed system_schema system_traces'
  • Upgrade sessionstore1001.eqiad.wmnet (merge r928569)
  • Upgrade sessionstore1002.eqiad.wmnet (merge r928570)
  • Upgrade sessionstore1003.eqiad.wmnet (merge r928571)

Post-upgrade

  • Per-host hiera settings moved back to role (r928572)
  • Set legacy_ssl_storage_port_enabled: false (remove assignment, r928573)
  • Set server_encryption_optional: false (remove assignment, r928573)
  • Clear snapshots
c-foreach-nt clearsnapshot -t 4x_upgrade-202306081604

Event Timeline

Eevans triaged this task as Medium priority.May 24 2023, 8:23 PM
Eevans updated the task description. (Show Details)

Change 926588 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] sessionstore: upgrade sessionstore2001 to Cassandra 4.1.1

https://gerrit.wikimedia.org/r/926588

Change 926589 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] sessionstore: upgrade sessionstore2002 to Cassandra 4.1.1

https://gerrit.wikimedia.org/r/926589

Change 926590 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] sessionstore: upgrade sessionstore2003 to Cassandra 4.1.1

https://gerrit.wikimedia.org/r/926590

Eevans updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2023-06-07T15:02:33Z] <urandom> de-pooling sessionstore/codfw — T337426

Eevans updated the task description. (Show Details)

Change 926588 merged by Eevans:

[operations/puppet@production] sessionstore: upgrade sessionstore2001 to Cassandra 4.1.1

https://gerrit.wikimedia.org/r/926588

Mentioned in SAL (#wikimedia-operations) [2023-06-07T15:14:45Z] <urandom> Upgrading Cassandra to 4.1.1, sessionstore2001 — T337426

Change 926589 merged by Eevans:

[operations/puppet@production] sessionstore: upgrade sessionstore2002 to Cassandra 4.1.1

https://gerrit.wikimedia.org/r/926589

Mentioned in SAL (#wikimedia-operations) [2023-06-07T15:44:43Z] <urandom> Upgrading Cassandra to 4.1.1, sessionstore2002 — T337426

Change 926590 merged by Eevans:

[operations/puppet@production] sessionstore: upgrade sessionstore2003 to Cassandra 4.1.1

https://gerrit.wikimedia.org/r/926590

Mentioned in SAL (#wikimedia-operations) [2023-06-07T15:52:07Z] <urandom> Upgrading Cassandra to 4.1.1, sessionstore2003 — T337426

Mentioned in SAL (#wikimedia-operations) [2023-06-07T15:56:51Z] <urandom> Beginning (3 hour) generated traffic testing of sessionstore.svc.codfw.wmnet — T337426

Latency at the service-level seems to be a bit higher than before the upgrade...

image.png (928×1 px, 182 KB)

image.png (792×1 px, 225 KB)

...but this isn't reflected in Cassandra latency, so I'm not sure what (if anything) to make of it, particularly since this is fake/generated (read: not production) traffic.

{F37097133}

Mentioned in SAL (#wikimedia-operations) [2023-06-07T19:11:33Z] <urandom> (Re)pooling codfw sessionstore — T337426

Here are some latency metrics after repooling the datacenter.

sessions latency distribution.png (859×1 px, 82 KB)
Kask latency distribution (GET)
sessionss latency distribution (posts).png (859×1 px, 84 KB)
Kask latency distribution (POST)

Change 928569 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] sessionstore: upgrade sessionstore1001 to Cassandra 4.1.1

https://gerrit.wikimedia.org/r/928569

Change 928570 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] sessionstore: upgrade sessionstore1002 to Cassandra 4.1.1

https://gerrit.wikimedia.org/r/928570

Change 928571 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] sessionstore: upgrade sessionstore1003 to Cassandra 4.1.1

https://gerrit.wikimedia.org/r/928571

Change 928572 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] sessionstore: move per-host settings back to role

https://gerrit.wikimedia.org/r/928572

Change 928573 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] sessionstore: remove transitional settings

https://gerrit.wikimedia.org/r/928573

Mentioned in SAL (#wikimedia-operations) [2023-06-08T16:06:01Z] <urandom> depooling eqiad sessionstore for Cassandra upgrade — T337426

Mentioned in SAL (#wikimedia-operations) [2023-06-08T16:22:33Z] <urandom> creating pre-upgrade Cassandra snapshots, sessionstore/eqiad — T337426

Change 928569 merged by Eevans:

[operations/puppet@production] sessionstore: upgrade sessionstore1001 to Cassandra 4.1.1

https://gerrit.wikimedia.org/r/928569

Mentioned in SAL (#wikimedia-operations) [2023-06-08T16:26:47Z] <urandom> Upgrading Cassandra to 4.1.1, sessionstore1001 — T337426

Change 928570 merged by Eevans:

[operations/puppet@production] sessionstore: upgrade sessionstore1002 to Cassandra 4.1.1

https://gerrit.wikimedia.org/r/928570

Mentioned in SAL (#wikimedia-operations) [2023-06-08T16:35:11Z] <urandom> Upgrading Cassandra to 4.1.1, sessionstore1002 — T337426

Change 928571 merged by Eevans:

[operations/puppet@production] sessionstore: upgrade sessionstore1003 to Cassandra 4.1.1

https://gerrit.wikimedia.org/r/928571

Mentioned in SAL (#wikimedia-operations) [2023-06-08T16:40:59Z] <urandom> Upgrading Cassandra to 4.1.1, sessionstore1003 — T337426

Mentioned in SAL (#wikimedia-operations) [2023-06-08T16:46:43Z] <urandom> Starting traffic test against sessionstore.svc.eqiad.wmnet — T337426

Change 928572 merged by Eevans:

[operations/puppet@production] sessionstore: move per-host settings back to role

https://gerrit.wikimedia.org/r/928572

Change 928573 merged by Eevans:

[operations/puppet@production] sessionstore: remove transitional settings

https://gerrit.wikimedia.org/r/928573

Mentioned in SAL (#wikimedia-operations) [2023-06-08T18:18:48Z] <urandom> (Re)pooling sessionstore/eqiad — T337426

Eevans claimed this task.
Eevans updated the task description. (Show Details)
Eevans updated the task description. (Show Details)