Page MenuHomePhabricator

Deploy ceph radosgw processes to data-engineering cluster
Closed, ResolvedPublic

Description

Ceph clusters use a service called the Ceph Object Gateway to provide the Swift and S3 interface to clients. This service is commonly referred to as radosgw.
https://docs.ceph.com/en/reef/radosgw/

Unlike the osd,mon, mds and mgr services, which are Ceph cluster internal daemons, radosgw is different in that it is just a client application.

In our case, each of our OSD servers is also going to run a radosgw daemon.

Event Timeline

Gehel lowered the priority of this task from High to Low.Dec 15 2023, 10:08 AM
Gehel moved this task from Blocked / Waiting to Misc on the Data-Platform-SRE board.
Gehel subscribed.

We're not working on object storage at the moment, we might reopen those tickets if object storage becomes a priority again

Change #1034973 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] [WIP] Add radosgw services to the cephosd servers

https://gerrit.wikimedia.org/r/1034973

There was some discussion in T326945#9041188 about the object storage gateway configuration.

In summary, this will be a multi-zone configuration, with one zonegroup called wmnet. This cluster will support a zone called eqiad.
We will not be using multiple realms, so if there is a real it will be called default.

We have a second cluster ready to be installed in codfw which will, in time, support a codfw zone. It's not ready yet, but I think we should reserve the service address for it.

So I think that we want two anycast service IP addresses:

  • 10.3.0.8 (Anycast) - rgw.eqiad.anycast.wmnet
  • 10.3.0.9 (Reserved) - rgw.codfw.anycast.wmnet

We will then set up a wildcard DNS so that any bucket will resolve to this address e.g. bucket-name.rgw.eqiad.anycast.wmnet -> 10.3.0.8

Change #1034973 merged by Btullis:

[operations/puppet@production] Add radosgw services to the cephosd servers

https://gerrit.wikimedia.org/r/1034973

I don't want to bikeshed too much, but a couple of thoughts:

  • There is always a realm, I'd be tempted to give it a meaningful name (e.g. for the apus cluster, it's apus)
  • Similarly, would you mind a less all-encompassing and/or more meaningful zonegroup name? it's likely we're going to have a number of Ceph clusters with a number of realms and realmgroups, and it'll minimise confusion if they're given names that correspond to their cluster/use
  • apus is using the regular LVS service (so there is apus.discovery.wmnet, apus.svc.eqiad.wmnet, and apus.svc.codfw.wmnet) and using upstreams default bucketname-in-url-path approach, rather than bucketname-as-hostname (cf https://docs.ceph.com/en/reef/radosgw/s3/commons/#bucket-and-host-name ), which is simpler - given you control the clients, do you need to faff around with wildcard DNS etc?

Thanks @MatthewVernon - much appreciated. Now is definitely a good time for bikeshedding.

How about if we use data-platform for the realm, and dpe for the zonegroup? Would that suit? What is your zonegroup name for the apus cluster? Maybe the realm and zonegroup names could be the same, without causing issues.
The trouble is that we don't really have a 'codename' for this cluster and team names have changed quite a lot over time, which is why we have a Kubernetes cluster called dse-k8s and lots of hosts prefixed with an- for analytics.

Yes, you might be right about there being no real need for the wildcard DNS. I'll bear that in mind.

For apus, we have realm apus, zonegroup apus_zg, zones eqiad and codfw.

OK, so maybe I should go with:

realm dpe, zonegroup dpe_zg, zones eqiad and codfw

I have created the realm and the zonegroup.

btullis@cephosd1001:~$ sudo radosgw-admin realm create --rgw-realm=dpe --default
{
    "id": "350a37b5-d907-4b0b-a680-b51cce916b02",
    "name": "dpe",
    "current_period": "a36cfe74-11ce-488a-9f1b-03fe5c58b212",
    "epoch": 1
}
btullis@cephosd1001:~$ sudo radosgw-admin zonegroup create --rgw-zonegroup=dpe_zg --default
{
    "id": "5705578a-fc34-45fe-ab85-9cbd39e3aff5",
    "name": "dpe_zg",
    "api_name": "dpe_zg",
    "is_master": false,
    "endpoints": [],
    "hostnames": [],
    "hostnames_s3website": [],
    "master_zone": "",
    "zones": [],
    "placement_targets": [
        {
            "name": "default-placement",
            "tags": [],
            "storage_classes": []
        }
    ],
    "default_placement": "default-placement",
    "realm_id": "350a37b5-d907-4b0b-a680-b51cce916b02",
    "sync_policy": {
        "groups": []
    },
    "enabled_features": [
        "resharding"
    ]
}

I have created the eqiad zone. I used the endpoint https://rgw.eqiad.dpe.anycast.wmnet.

btullis@cephosd1001:~$ sudo radosgw-admin zone create --rgw-zonegroup=dpe_zg --rgw-zone=eqiad --master --default --endpoints=https://rgw.eqiad.dpe.anycast.wmnet
{
    "id": "f2a063b8-decc-4444-bca6-18f4a297bd5e",
    "name": "eqiad",
    "domain_root": "eqiad.rgw.meta:root",
    "control_pool": "eqiad.rgw.control",
    "gc_pool": "eqiad.rgw.log:gc",
    "lc_pool": "eqiad.rgw.log:lc",
    "log_pool": "eqiad.rgw.log",
    "intent_log_pool": "eqiad.rgw.log:intent",
    "usage_log_pool": "eqiad.rgw.log:usage",
    "roles_pool": "eqiad.rgw.meta:roles",
    "reshard_pool": "eqiad.rgw.log:reshard",
    "user_keys_pool": "eqiad.rgw.meta:users.keys",
    "user_email_pool": "eqiad.rgw.meta:users.email",
    "user_swift_pool": "eqiad.rgw.meta:users.swift",
    "user_uid_pool": "eqiad.rgw.meta:users.uid",
    "otp_pool": "eqiad.rgw.otp",
    "system_key": {
        "access_key": "",
        "secret_key": ""
    },
    "placement_pools": [
        {
            "key": "default-placement",
            "val": {
                "index_pool": "eqiad.rgw.buckets.index",
                "storage_classes": {
                    "STANDARD": {
                        "data_pool": "eqiad.rgw.buckets.data"
                    }
                },
                "data_extra_pool": "eqiad.rgw.buckets.non-ec",
                "index_type": 0,
                "inline_data": true
            }
        }
    ],
    "realm_id": "350a37b5-d907-4b0b-a680-b51cce916b02",
    "notif_pool": "eqiad.rgw.log:notif"
}

I can check on the placement_pools later on to make sure that the index always goes onto SSD, irrespective of whether the data buckets go onto the HDDs.

Change #1064026 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add ceph client config data for radosgw clients

https://gerrit.wikimedia.org/r/1064026

Change #1064026 merged by Btullis:

[operations/puppet@production] Add ceph client config data for radosgw clients

https://gerrit.wikimedia.org/r/1064026

Change #1064041 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Enable the correct service name for radosgw

https://gerrit.wikimedia.org/r/1064041

Change #1064041 merged by Btullis:

[operations/puppet@production] Enable the correct service name for radosgw

https://gerrit.wikimedia.org/r/1064041

Change #1064063 had a related patch set uploaded (by Btullis; author: Btullis):

[operations/puppet@production] Add TLS support to the radosgw services on the DPE ceph cluster

https://gerrit.wikimedia.org/r/1064063

Change #1064063 merged by Btullis:

[operations/puppet@production] Add TLS envoyproxy to the radosgw services on the DPE ceph cluster

https://gerrit.wikimedia.org/r/1064063

I have now deployed envoyproxy to all five cephosd servers.

btullis@cephosd1001:~$ openssl s_client -connect localhost:443
CONNECTED(00000003)
Can't use SSL_get_servername
depth=2 C = US, ST = California, L = San Francisco, O = "Wikimedia Foundation, Inc", OU = Cloud Services, CN = Wikimedia_Internal_Root_CA
verify return:1
depth=1 C = US, L = San Francisco, O = "Wikimedia Foundation, Inc", OU = SRE Foundations, CN = discovery
verify return:1
depth=0 CN = rgw.eqiad.dpe.anycast.wmnet
verify return:1

There are three new metadata pools for the cluster, which match what we had set out in T326945#9041188

btullis@cephosd1001:~$ sudo ceph osd pool ls
.mgr
rbd-metadata-ssd
rbd-metadata-hdd
rbd-data-ssd
rbd-data-hdd
dse-k8s-csi-ssd
.rgw.root
eqiad.rgw.log
eqiad.rgw.control
eqiad.rgw.meta

I'm going to mark this as done, for now. The radosgw services are running, as well as the reverse proxy and TLS terminator.
The next step is to get the load-balancing working, then I can start testing user ops, bucket ops, and check the placement_pool settings to ensure that we can specify which buckets go on the HDDs.