Page MenuHomePhabricator

[tbs.harbor.tools] Replicate toolsbeta harbor deployment
Closed, ResolvedPublic

Description

Replicate the same deployment we have in toolsbeta but in tools.

This will require also createing the proxy harbor.tools.wmflabs.org.

  1. Create tools-harbor-1.tools.eqiad1.wikimedia.cloud VM in the tools project
  2. Add a 500GB volume for the images (feel free to do some calculations to guess a good size)
  3. Add the hiera value profile::toolforge::harbor::cinder_attached: true to the instance puppet
  4. Create a prefix puppet config in horizon for the prefix tools-harbor- with the puppetclass role::wmcs::toolforge::harbor
  5. Run puppet on the new VM
  6. Run the script /srv/ops/harbor/prepare and run puppet again
  7. Check that harbor is up and running on that VM
  8. Add a proxy from horizon named tools-harbor.wmflabs.org pointing to the new harbor instance
  9. Create a security group in horizon allowing access on port 80 from 172.16.0.0/21 (would be nice to be more specific but I'm not really sure how)
  10. Create a Trove database and add the credentials to /etc/puppet/private in toolsbeta-puppetmaster-04 (profile::toolforge::harbor::db_harbor_pwd)
  11. Generate an admin password for Harbor and add that password to /etc/puppet/private in toolsbeta-puppetmaster-04 as profile::toolforge::harbor::admin_pwd
  12. Duplicate the maintain-harbor deployment for tools (TBD) moved to its own task: T332347

Related Objects

StatusSubtypeAssignedTask
StalledLucasWerkmeister
Resolvedmatmarex
ResolvedLegoktm
ResolvedLegoktm
Opendcaro
In Progressdcaro
ResolvedRaymond_Ndibe
ResolvedRaymond_Ndibe
ResolvedRaymond_Ndibe
ResolvedRaymond_Ndibe
ResolvedNone
Resolveddcaro
Resolveddcaro
ResolvedRaymond_Ndibe
Resolveddcaro
ResolvedRaymond_Ndibe
Resolveddcaro
ResolvedRaymond_Ndibe
ResolvedRaymond_Ndibe
Resolvedfnegri
Resolveddcaro
ResolvedAndrew
ResolvedSlst2020
Resolveddcaro
Resolvedfnegri

Event Timeline

dcaro renamed this task from [tbs.harbor.tools] Replicate toolsbeta deployment to [tbs.harbor.tools] Replicate toolsbeta harbor deployment.Mar 7 2023, 9:12 AM
fnegri changed the task status from Open to In Progress.Mar 13 2023, 4:44 PM

@dcaro and @fnegri , for Duplicate the maintain-harbor deployment for tools It's theoretically possible possible to have the maintain-harbor tool on toolforge handle the maintenance for tools.harbor and toolsbeta.harbor, what I'm not sure of is if that is an acceptable flow. The last time we discussed this we weren't able to make up our minds. Maybe we can conclude here?

@Raymond_Ndibe I think that might work, at least for the beta. @dcaro what do you think?

I created the instance and volume with the following params:

root@cloudcontrol1005:~# OS_PROJECT_ID=tools openstack server create --flavor g3.cores4.ram8.disk20 tools-harbor-1 --image debian-11.0-bullseye --network lan-flat-cloudinstances2b
root@cloudcontrol1005:~# OS_PROJECT_ID=tools openstack volume create --type high-iops --size 500 --description "Data volume for Harbor images" tools-harbor

I added the role::wmcs::toolforge::harbor Puppet role but it's failing with

Error: /Stage[main]/Profile::Toolforge::Harbor/File[/srv/harbor/data/secret/cert/server.crt]: Could not evaluate: Could not retrieve information from environment production source(s) file:///etc/acmecerts/toolforge/live/ec-prime256v1.chained.crt
Notice: /Stage[main]/Profile::Toolforge::Harbor/File[/srv/harbor/data/secret/cert/server.key]: Dependency File[/srv/harbor/data/secret/cert/server.crt] has failures: true
Warning: /Stage[main]/Profile::Toolforge::Harbor/File[/srv/harbor/data/secret/cert/server.key]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Toolforge::Harbor/Exec[reload-nginx-on-tls-update]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Toolforge::Harbor/Exec[ensure-compose-started]: Skipping because of failed dependencies
Error: /Stage[main]/Profile::Toolforge::Harbor/Acme_chief::Cert[toolforge]/File[/etc/acmecerts/toolforge]: Failed to generate additional resources using 'eval_generate': Error 403 on SERVER: access denied
Error: /Stage[main]/Profile::Toolforge::Harbor/Acme_chief::Cert[toolforge]/File[/etc/acmecerts/toolforge]: Could not evaluate: Could not retrieve file metadata for puppet://tools-acme-chief-01.tools.eqiad.wmflabs/acmedata/toolforge: Error 403 on SERVER: access denied

I created the instance and volume with the following params:

root@cloudcontrol1005:~# OS_PROJECT_ID=tools openstack server create --flavor g3.cores4.ram8.disk20 tools-harbor-1 --image debian-11.0-bullseye --network lan-flat-cloudinstances2b
root@cloudcontrol1005:~# OS_PROJECT_ID=tools openstack volume create --type high-iops --size 500 --description "Data volume for Harbor images" tools-harbor

I added the role::wmcs::toolforge::harbor Puppet role but it's failing with

Error: /Stage[main]/Profile::Toolforge::Harbor/File[/srv/harbor/data/secret/cert/server.crt]: Could not evaluate: Could not retrieve information from environment production source(s) file:///etc/acmecerts/toolforge/live/ec-prime256v1.chained.crt
Notice: /Stage[main]/Profile::Toolforge::Harbor/File[/srv/harbor/data/secret/cert/server.key]: Dependency File[/srv/harbor/data/secret/cert/server.crt] has failures: true
Warning: /Stage[main]/Profile::Toolforge::Harbor/File[/srv/harbor/data/secret/cert/server.key]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Toolforge::Harbor/Exec[reload-nginx-on-tls-update]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Toolforge::Harbor/Exec[ensure-compose-started]: Skipping because of failed dependencies
Error: /Stage[main]/Profile::Toolforge::Harbor/Acme_chief::Cert[toolforge]/File[/etc/acmecerts/toolforge]: Failed to generate additional resources using 'eval_generate': Error 403 on SERVER: access denied
Error: /Stage[main]/Profile::Toolforge::Harbor/Acme_chief::Cert[toolforge]/File[/etc/acmecerts/toolforge]: Could not evaluate: Could not retrieve file metadata for puppet://tools-acme-chief-01.tools.eqiad.wmflabs/acmedata/toolforge: Error 403 on SERVER: access denied

It looks like we have to tell acme to build a cert for it too somehow?

I think I might have advanced a bit it, not sure if it was the right way :/

So as it was getting 403 from acme-chief, I checked the hiera data of the tools-acme-chief-* prefix, and there I added the harbor fqdn to the allowed list;

...
profile::acme_chief::certificates:
  toolforge:
    CN: toolforge.org
    SNI:
    - toolforge.org
    - '*.toolforge.org'
    - tools.wmflabs.org
    - '*.tools.wmflabs.org'
    authorized_regexes:
    - ^tools-proxy-[0-9]+\.tools\.eqiad\.wmflabs$
    - ^tools-docker-registry-[0-9]+\.tools\.eqiad\.wmflabs$
    - ^tools-docker-registry-[0-9]+\.tools\.eqiad1\.wikimedia\.cloud$
    - ^tools-harbor-[0-9]+\.tools\.eqiad1\.wikimedia\.cloud$    # <- this one
    challenge: dns-01
...

That allowed it to pull all the certs from acme-chief, but then fails with:

Error: Could not set 'file' on ensure: No such file or directory - A directory component in /srv/harbor/data/secret/cert/server.crt20230313-16828-1riyhvk.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/profile/manifests/toolforge/harbor.pp, line: 99)
Error: Could not set 'file' on ensure: No such file or directory - A directory component in /srv/harbor/data/secret/cert/server.crt20230313-16828-1riyhvk.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/profile/manifests/toolforge/harbor.pp, line: 99)
Wrapped exception:
No such file or directory - A directory component in /srv/harbor/data/secret/cert/server.crt20230313-16828-1riyhvk.lock does not exist or is a dangling symbolic link
Error: /Stage[main]/Profile::Toolforge::Harbor/File[/srv/harbor/data/secret/cert/server.crt]/ensure: change from 'absent' to 'file' failed: Could not set 'file' on ensure: No such file or directory - A directory component in /srv/harbor/data/secret/cert/server.crt20230313-16828-1riyhvk.lock does not exist or is a dangling symbolic link (file: /etc/puppet/modules/profile/manifests/toolforge/harbor.pp, line: 99)
Notice: /Stage[main]/Profile::Toolforge::Harbor/File[/srv/harbor/data/secret/cert/server.key]: Dependency File[/srv/harbor/data/secret/cert/server.crt] has failures: true
Warning: /Stage[main]/Profile::Toolforge::Harbor/File[/srv/harbor/data/secret/cert/server.key]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Toolforge::Harbor/Exec[reload-nginx-on-tls-update]: Skipping because of failed dependencies
Warning: /Stage[main]/Profile::Toolforge::Harbor/Exec[ensure-compose-started]: Skipping because of failed dependencies

The dir /srv/harbor/data/secret/cert did not exist, so I manually created it (probably puppet should at some point), then it passed that point, and got errors when trying to bring the compose up (that's expected, as it requires manually running the first time).

It does seem that there's some typo somewhere xd

root@tools-harbor-1:/srv# tree
/srv
├── harbor
│   └── data
│       └── secret
│           └── cert
│               ├── server.crt
│               └── server.key
└── ops
    └── harbor
        ├── data
        │   └── secret
        │       └── cert
        ├── harbor.yml
        └── prepare

Change 898773 had a related patch set uploaded (by FNegri; author: FNegri):

[operations/puppet@production] [tbs.harbor] Fix wrong paths for Harbor certs

https://gerrit.wikimedia.org/r/898773

@dcaro thanks for finding the issue! I created a patch that will hopefully fix the missing paths.

it passed that point, and got errors when trying to bring the compose up

It looks like it failed only one time, and the following Puppet runs do not show the error anymore, but the docker containers are not running.

(that's expected, as it requires manually running the first time).

Is that first manual run documented somewhere? I imagine I have to make some changes in the yaml file generated by Puppet?

I checked the diff between the Puppet-generated yaml file and the one currently used in toolsbeta:

5c5
< hostname: dummy.harbor.fqdn
---
> hostname: harbor.toolsbeta.wmflabs.org
34c34
< harbor_admin_password: insecurityrules
---
> harbor_admin_password: xx
137c137
<     host: dummy.db.host
---
>     host: ttg4ncgzifw.svc.trove.eqiad1.wikimedia.cloud
204a205,210
>
> robot_accounts:
>   toolsbeta-image-builder:
>       password: "xx"
>   maintain-harbor:
>       password: "xx"

I have some questions:

  • I thought we were not using Trove because Postgres support was not good?
  • Are the robot_accounts credentials coming from Hiera?

Is that first manual run documented somewhere? I imagine I have to make some changes in the yaml file generated by Puppet?

Not yet, we decided to delay writing admin docs :/, feel free to create some docs for it though!
It's simple though, you just have to run the prepare script in that dir: https://github.com/toolforge/buildservice/blob/main/utils/get_harbor.sh#L21

I thought we were not using Trove because Postgres support was not good?

We are \o/, so far it's good enough, though we might want to setup some kind of backups eventually (also delayed)

Are the robot_accounts credentials coming from Hiera?

Yes, from the secrets repository in the puppetmaster of the project, we can add them together if you want/have not done it yet

Change 898773 merged by FNegri:

[operations/puppet@production] [tbs.harbor] Fix wrong paths for Harbor certs

https://gerrit.wikimedia.org/r/898773

Change 899724 had a related patch set uploaded (by FNegri; author: FNegri):

[operations/puppet@production] [tbs.harbor] Clean up admin pwd management

https://gerrit.wikimedia.org/r/899724

I've reset the password for the toolsbeta-harbor database, and harbor is back up and running (added it also to the secrets in the puppetmaster)

Change 899724 merged by FNegri:

[operations/puppet@production] [tbs.harbor] Clean up admin pwd management

https://gerrit.wikimedia.org/r/899724

Change 900400 had a related patch set uploaded (by FNegri; author: FNegri):

[operations/puppet@production] [tbs.harbor] Remove duplicate pwd

https://gerrit.wikimedia.org/r/900400

I created a Trove DB for the new tools-harbor-1 instance and created a database and a user inside that instance with:

postgres=# CREATE DATABASE harbor;
CREATE DATABASE
postgres=# CREATE USER harbor WITH ENCRYPTED PASSWORD 'xx';
CREATE ROLE
postgres=# GRANT ALL PRIVILEGES ON DATABASE harbor TO harbor;
GRANT

The password is stored in /etc/puppet/private/ in tools-puppetmaster-02.

fnegri updated the task description. (Show Details)

Up and running at https://tools-harbor.wmcloud.org/ 🎉

Change 900400 merged by FNegri:

[operations/puppet@production] [tbs.harbor] Remove duplicate pwd

https://gerrit.wikimedia.org/r/900400