Page MenuHomePhabricator

eqiad: 1 VM requested for community-crm
Closed, ResolvedPublic

Description

Cloud VPS Project Tested: civicrm-prototype T343486
Site/Location: codfw for the initial host
Number of systems: 1
Service: community-crm
Networking Requirements: private IP. Will serve https://community-crm.wikimedia.org (redirected do port 80 on the VM) via the CDN
Processor Requirements: 4
Memory: 4
Disks: 40
Other Requirements: Prefer available for install/testing by 10/25.

This VM will be the first for the community-crm project. More details are in T343486 and T335190. The testing is wrapping up currently and hope to get review on the puppet module next week.

Event Timeline

As for the network, do you want a public IP or should this rather run on a private IP and then get served from our CDN? For most new services this is our typical mode of operation.

And does this really need 16G? It sounds like a run-of-the-mill web service which usually need much less and 16G makes up one quarter of one of our virtualisation servers. If it needs so much, so be it, but bumping the memory retroactively is also a fairly simple operation, so we could also start with less and then increase as needed?

And wrt rampup and ongoing maintenance of the service, on the SRE side will this be operated by fr-tech or some SRE sub team like ServiceOps-Collab?

@MoritzMuehlenhoff It doesn't need the full 16G. I was just basing that off of the initial requests/approval when Quim was corresponding with Lukasz. We can start with 4G which should give us good overhead.

As for the ramp up and maintenance, that will be operated by fr-tech. We will require ssh access to the hosts to do database maintenance and code deployments. At a minimum we will need access for @Jgreen and myself. We may need access for additional fr-tech members, but that isn't determined yet.

Finally, in regards to a public IP or private/CDN, I'm not 100% certain. To my knowledge, there is no reason that it can't be hosted behind the CDN but I'm not sure if there could be issues with data being cached at the CDN. We would want to ensure that requests are not cached for anything but images and the static js and css files. CiviCRM is setting "cache-control: must-revalidate, no-cache, private" in case that helps.

I've read the CDN wikitech page but honestly I don't know enough about the CDN workings to make an informed decision and would welcome your thoughts or pointers on where to learn more.

@MoritzMuehlenhoff It doesn't need the full 16G. I was just basing that off of the initial requests/approval when Quim was corresponding with Lukasz. We can start with 4G which should give us good overhead.

Ok, sounds good. Bumping the RAM afterwards is as simple as changing a setting and rebooting the VM.

As for the ramp up and maintenance, that will be operated by fr-tech. We will require ssh access to the hosts to do database maintenance and code deployments. At a minimum we will need access for @Jgreen and myself. We may need access for additional fr-tech members, but that isn't determined yet.

Ok, we can set up the necessary sudo rules to give you and Jeff root access via the frtech group.

Finally, in regards to a public IP or private/CDN, I'm not 100% certain. To my knowledge, there is no reason that it can't be hosted behind the CDN but I'm not sure if there could be issues with data being cached at the CDN. We would want to ensure that requests are not cached for anything but images and the static js and css files. CiviCRM is setting "cache-control: must-revalidate, no-cache, private" in case that helps.

I've read the CDN wikitech page but honestly I don't know enough about the CDN workings to make an informed decision and would welcome your thoughts or pointers on where to learn more.

I'm pretty sure that is all possible, but for the finer details it's possibly best to sync up with the Traffic team in SRE. How about we start with a VM using a public IP address, then you can ahead with the initial rampup. Given those are VMs we can easily re-create them with a private address and then move it behind the caches in a subsequent step? If so, the last remaining thing we need is a hostname, crm1001 or similar?

I'd rather do it the other way around, start as a private IP behind the CDN and move it to a public one if there are are blockers. But based on the exchanges here I'm not seeing any obvious blocker. Note that caching is not required when using the CDN (we have plenty of services that are not cached).
See also https://wikitech.wikimedia.org/wiki/Wikimedia_network_guidelines#Public_IPs for more info.

Thanks both for your feedback. I've chatted with Jeff and we think we should go ahead and start out with the private IP behind the CDN. Hopefully that will serve what we need and we won't need a public ip.

Finally, in regards to a public IP or private/CDN, I'm not 100% certain. To my knowledge, there is no reason that it can't be hosted behind the CDN but I'm not sure if there could be issues with data being cached at the CDN.

You can take a look at puppet/hieradata/role/common/cache/text.yaml. There are examples for services that use caching ('normal') and others that are configured to not use caching ("pass"). example, the people.wikimedia.org service is configured to never cache at CDN level while it's still getting the other advantages.

people.wikimedia.org:
  caching: 'pass'
performance.wikimedia.org:
  caching: 'normal'

@Dwisehaupt I think we have all data now except the hostname, see my earlier comment. crm1001 or something else?

Sorry, I forgot to respond to that. crm1001 is good.

Change 975000 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add crm1001 to site.pp

https://gerrit.wikimedia.org/r/975000

Change 975000 merged by Muehlenhoff:

[operations/puppet@production] Add crm1001 to site.pp

https://gerrit.wikimedia.org/r/975000

Change 975002 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Add crm* pattern to partman setup

https://gerrit.wikimedia.org/r/975002

Change 975002 merged by Muehlenhoff:

[operations/puppet@production] Add crm* pattern to partman setup

https://gerrit.wikimedia.org/r/975002

Change 975122 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Adapt VM name

https://gerrit.wikimedia.org/r/975122

Change 975122 merged by Muehlenhoff:

[operations/puppet@production] Adapt VM name

https://gerrit.wikimedia.org/r/975122

MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff updated the task description. (Show Details)
MoritzMuehlenhoff updated the task description. (Show Details)

Change 975202 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Configure crm2001 for Puppet 7

https://gerrit.wikimedia.org/r/975202

Change 975202 merged by Muehlenhoff:

[operations/puppet@production] Configure crm2001 for Puppet 7

https://gerrit.wikimedia.org/r/975202

Cookbook cookbooks.sre.hosts.reimage was started by jmm@cumin1001 for host crm2001.codfw.wmnet with OS bookworm

Cookbook cookbooks.sre.hosts.reimage started by jmm@cumin1001 for host crm2001.codfw.wmnet with OS bookworm completed:

  • crm2001 (PASS)
    • Removed from Puppet and PuppetDB if present and deleted any certificates
    • Removed from Debmonitor if present
    • Forced PXE for next reboot
    • Host rebooted via gnt-instance
    • Host up (Debian installer)
    • Add puppet_version metadata to Debian installer
    • Set boot media to disk
    • Host up (new fresh bookworm OS)
    • Generated Puppet certificate
    • Signed new Puppet certificate
    • Run Puppet in NOOP mode to populate exported resources in PuppetDB
    • Found Nagios_host resource for this host in PuppetDB
    • Downtimed the new host on Icinga/Alertmanager
    • First Puppet run completed and logged in /var/log/spicerack/sre/hosts/reimage/202311170845_jmm_2329512_crm2001.out
    • configmaster.wikimedia.org updated with the host new SSH public key for wmf-update-known-hosts-production
    • Rebooted
    • Automatic Puppet run was successful
    • Forced a re-check of all Icinga services for the host
    • Icinga status is optimal
    • Icinga downtime removed
    • Updated Netbox data from PuppetDB

Change 975209 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Create a new initial role for crm hosts

https://gerrit.wikimedia.org/r/975209

Change 975209 merged by Muehlenhoff:

[operations/puppet@production] Create a new initial role for crm hosts

https://gerrit.wikimedia.org/r/975209

Change 975229 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Apply crm role to crm2001

https://gerrit.wikimedia.org/r/975229

Mentioned in SAL (#wikimedia-operations) [2023-11-17T10:12:49Z] <jmm@cumin1001> START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new crm VM - jmm@cumin1001 - T349402"

Mentioned in SAL (#wikimedia-operations) [2023-11-17T10:17:51Z] <jmm@cumin1001> END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new crm VM - jmm@cumin1001 - T349402"

Change 975229 merged by Muehlenhoff:

[operations/puppet@production] Apply crm role to crm2001

https://gerrit.wikimedia.org/r/975229

Change 975234 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Extend Wmflib::Team type with Fundraising Tech

https://gerrit.wikimedia.org/r/975234

Change 975234 merged by Muehlenhoff:

[operations/puppet@production] Extend Wmflib::Team type with Fundraising Tech

https://gerrit.wikimedia.org/r/975234

Change 975242 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/puppet@production] Create a new crm-root group and apply to crm hosts

https://gerrit.wikimedia.org/r/975242

MoritzMuehlenhoff claimed this task.

crm2001.codfw.wmnet has been created and configured to allow logins by fr-tech SREs. Let me know if you run into any issues, I'm resolving the task for now.

I also created https://gerrit.wikimedia.org/r/c/operations/puppet/+/975242/ to give you sudo rights for root on the VM(s), but that needs approval in the SRE IF meeting happening next Monday.

Change 975242 merged by Muehlenhoff:

[operations/puppet@production] Create a new crm-root group and apply to crm hosts

https://gerrit.wikimedia.org/r/975242

crm2001.codfw.wmnet has been created and configured to allow logins by fr-tech SREs. Let me know if you run into any issues, I'm resolving the task for now.

I also created https://gerrit.wikimedia.org/r/c/operations/puppet/+/975242/ to give you sudo rights for root on the VM(s), but that needs approval in the SRE IF meeting happening next Monday.

@Dwisehaupt @Jgreen The permission group has been setup, let me know if this doesn't work as expected.