Page MenuHomePhabricator

AQS multi-datacenter cluster expansion
Open, MediumPublic

Description

AQS (currently an eqiad-only cluster) is being expanded from 6 to 12 hosts (T305570), as well as being made multi-datacenter with the addition of 12 hosts in codfw (T305568).

The high-level steps to complete this are:

  • Enable encryption (T307798: Enable Cassandra encryption (inter-node & client))
  • Validate/correct multi-datacenter client configurations (T307799: Ensure AQS Cassandra client connections are multi-datacenter)
  • ALTER keyspaces to use NetworkTopologyStrategy (where/if necessary)
  • Bootstrap aqs2001-aqs2012 (T307801: Bootstrap new Cassandra nodes (codfw))
  • Bootstrap aqs1016-aqs1021 (T307802: Bootstrap new Cassandra nodes (eqiad)) OPTIONAL
  • ALTER keyspaces to include codfw replica count
  • Perform a rebuild operation from each of aqs2001-aqs2012 (nodetool rebuild -- eqiad)
    • aqs2001-a.codfw.wmnet (a7406ee0-ecbb-11ec-b266-0dba5031eff2)
    • aqs2001-b.codfw.wmnet (7dc18f80-ed70-11ec-a7d4-9369582268a3)
    • aqs2002-a.codfw.wmnet (4e105e60-ee51-11ec-88d3-95ab689283e4)
    • aqs2002-b.codfw.wmnet (403a0030-ef5e-11ec-a132-bf70daf94763)
    • aqs2003-a.codfw.wmnet (f3052140-f012-11ec-82b1-3d590b460781)
    • aqs2003-b.codfw.wmnet (ed24c600-f0c0-11ec-8e36-1bfbea6ebf33)
    • aqs2004-a.codfw.wmnet (5e618960-f19e-11ec-ac41-9fc127d8570e)
    • aqs2004-b.codfw.wmnet (af359120-f233-11ec-82f9-23ac7f2cf81f)
    • aqs2005-a.codfw.wmnet (d3efa220-ed70-11ec-930c-37f52ff7543a)
    • aqs2005-b.codfw.wmnet (dce05940-ee44-11ec-ae8e-af639f32c8d1)
    • aqs2006-a.codfw.wmnet (f6afe970-ee54-11ec-9771-0931f42f1a0a)
    • aqs2006-b.codfw.wmnet (64f11c10-ef5e-11ec-88f9-0ba4322deefb)
    • aqs2007-a.codfw.wmnet (f430a8f0-f012-11ec-98a1-bfd50867790a)
    • aqs2007-b.codfw.wmnet (750d03d0-f0ca-11ec-a849-935c51e02210)
    • aqs2008-a.codfw.wmnet (844cb870-f19e-11ec-9f86-adc804430e05)
    • aqs2008-b.codfw.wmnet (11f4dea0-f262-11ec-aba6-f19c7aafb0c1)
    • aqs2009-a.codfw.wmnet (20774f10-ed96-11ec-987a-c3d1e63f83e3)
    • aqs2009-b.codfw.wmnet (de0462b0-ee4b-11ec-a8e5-5bfd10bc44d0)
    • aqs2010-a.codfw.wmnet (34075790-ee55-11ec-ad09-ff723eba9afd)
    • aqs2010-b.codfw.wmnet (7e860780-ef5e-11ec-a9cd-5de303454aee)
    • aqs2011-a.codfw.wmnet (f5568b50-f012-11ec-b213-7994abb2ae9a)
    • aqs2011-b.codfw.wmnet (60cd2480-f0d5-11ec-b107-311d6ba62aad)
    • aqs2012-a.codfw.wmnet (b4fd2ae0-f19e-11ec-b8eb-19a09e1e7ed2)
    • aqs2012-b.codfw.wmnet (6ff0ff60-f23b-11ec-afd9-83845bb3eefc)
  • ALTER KEYSPACE system_auth... for codfw replicas

Event Timeline

Eevans triaged this task as Medium priority.May 4 2022, 11:43 PM

Regarding replication strategy, the current state looks like the following:

CREATE KEYSPACE "local_group_default_T_pageviews_per_project_v2" WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'}  AND durable_writes = true;
CREATE KEYSPACE "local_group_default_T_mediarequest_top_files" WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'}  AND durable_writes = true;
CREATE KEYSPACE "local_group_default_T_lgc_pagecounts_per_project" WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'}  AND durable_writes = true;
CREATE KEYSPACE "local_group_default_T_mediarequest_per_file" WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'}  AND durable_writes = true;
CREATE KEYSPACE "local_group_default_T_top_percountry" WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'}  AND durable_writes = true;
CREATE KEYSPACE system_schema WITH replication = {'class': 'LocalStrategy'}  AND durable_writes = true;
CREATE KEYSPACE system_auth WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '12'}  AND durable_writes = true;
CREATE KEYSPACE "local_group_default_T_editors_bycountry" WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'}  AND durable_writes = true;
CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}  AND durable_writes = true;
CREATE KEYSPACE system_traces WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '2'}  AND durable_writes = true;
CREATE KEYSPACE "local_group_default_T_top_pageviews" WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'}  AND durable_writes = true;
CREATE KEYSPACE "local_group_default_T_mediarequest_per_referer" WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'}  AND durable_writes = true;
CREATE KEYSPACE system_distributed WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;
CREATE KEYSPACE image_suggestions WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'}  AND durable_writes = true;
CREATE KEYSPACE "local_group_default_T_unique_devices" WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'}  AND durable_writes = true;
CREATE KEYSPACE "local_group_default_T_top_bycountry" WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'}  AND durable_writes = true;
CREATE KEYSPACE "local_group_default_T_pageviews_per_article_flat" WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'}  AND durable_writes = true;

I propose to apply the following (and will plan to do so tomorrow ~15:00UTC, if there are no objections):

ALTER KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '12'};
ALTER KEYSPACE system_traces WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '2'};
ALTER KEYSPACE system_distributed WITH replication = {'class': 'NetworkTopologyStrategy', 'eqiad': '3'};
NOTE: system_auth replication doesn't need to be this high, but we can wait to reduce it (3 is probably enough) until after we have working replicas in codfw.

Mentioned in SAL (#wikimedia-operations) [2022-06-02T13:44:08Z] <urandom> ALTER-ing system_auth replication strategy, AQS Cassandra cluster -- T307641

Mentioned in SAL (#wikimedia-operations) [2022-06-15T14:50:34Z] <urandom> ALTER-ing replication for codfw (Cassandra) expansion -- T307641

@ayounsi, @cmooney when streaming data for this rebuild, from eqiad to codfw, is there a target throughput we should limit to? What would be the canonical place to monitor this from?

Change 805883 had a related patch set uploaded (by Eevans; author: Eevans):

[operations/puppet@production] AQS: Use data-center apropos host list

https://gerrit.wikimedia.org/r/805883

@Eevans
The easiest is to look at https://librenms.wikimedia.org/bill/bill_id=24/ and each link (under Billed Ports) individually for better accuracy.
Note that LibreNMS have a 5min granularity so if it shows a link saturating it could have started 5min sooner. There is a "real time" link under individual ports, but its quality is varying.

For the setup we currently have 2 active/active 10G links between eqiad and codfw (so each can saturate if there is a single 10Gbps transfer for example).

The thresholds depend on the length of the transfer:

  • if it's something actively monitored that can be stopped quickly (for example if a link fail right at that time), then you can go up to 8Gbps
  • if it's a one off long running background process, it's better not to cross 4Gbps
  • If it's something permanent we should discuss it more in details, but 2Gbps would be a safe starting point

Don't hesitate to sync up with us if you want more eyes on the link usage.

@Eevans
The easiest is to look at https://librenms.wikimedia.org/bill/bill_id=24/ and each link (under Billed Ports) individually for better accuracy.
Note that LibreNMS have a 5min granularity so if it shows a link saturating it could have started 5min sooner. There is a "real time" link under individual ports, but its quality is varying.

For the setup we currently have 2 active/active 10G links between eqiad and codfw (so each can saturate if there is a single 10Gbps transfer for example).

The thresholds depend on the length of the transfer:

  • if it's something actively monitored that can be stopped quickly (for example if a link fail right at that time), then you can go up to 8Gbps
  • if it's a one off long running background process, it's better not to cross 4Gbps
  • If it's something permanent we should discuss it more in details, but 2Gbps would be a safe starting point

Don't hesitate to sync up with us if you want more eyes on the link usage.

Thanks, this helps! This falls into the second of those two scenarios (long running one-off). It sounds like we have a lot of room to ramp it up from where it is now, while easily keeping things conservative (even well below this 4Gbps figure).

Eevans updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2022-06-21T19:22:03Z] <urandom> replicating Cassandra system_auth keyspace to codfw -- T307641

Eevans updated the task description. (Show Details)
Eevans updated the task description. (Show Details)

Change 805883 merged by Ryan Kemper:

[operations/puppet@production] AQS: Use data-center apropos host list

https://gerrit.wikimedia.org/r/805883

Eevans updated the task description. (Show Details)
Eevans updated the task description. (Show Details)
Eevans moved this task from In-Progress to Complete on the Cassandra board.