Page MenuHomePhabricator

[harbor] Move harbor data to object storage service
Closed, ResolvedPublic

Description

It'd be nice if Harbor stored data somewhere that wasn't linked to individual hosts.

https://wikitech.wikimedia.org/wiki/Help:Object_storage_user_guide

Possible Steps

Toolsbeta:

  • create neccessary resources on horizon toolsbeta project
    • harborstorage object storage container
    • harbordb1 trove database (harbor uses this to store every other thing it needs to keep track of)
    • toolsbeta-harbor-2 compute instance (This is temporal and will be removed once we data migration is completed)
    • web-proxy exposing toolsbeta-harbor-2 to the internet. should also be removed when toolsbeta-harbor-2 is removed.
  • figure out authentication for the harborstorage object storage
    • current auth method is toolsbeta s3 credentials generated for the toolforge-harbor user
  • setup harbor on the new instance and configure it to use the harborstorage object storage
    • setup harbor
    • configure harbor to use object storage (s3)
    • configure harbor to use the new trove database
    • test push and pull locally using limakilo
  • configure original toolsbeta-harbor-1 harbor setup to replicate to the new toolsbeta-harbor-2 through the harbor ui
  • announce toolsbeta harbor down time to read-only mode
  • reconfigure toolsbeta-harbor-1 harbor to use the newly populated harborstorage and trove database during the downtime
  • any necessary cleanup (delete toolsbeta-harbor-2, the old trove database, webproxy, etc)
  • unmount and delete toolsbeta-harbor volume

Tools:

  • create neccessary resources on horizon tools project
    • harborstorage object storage container
    • tools-harbordb1 trove database (harbor uses this to store every other thing it needs to keep track of)
    • tools-harbor-2 compute instance based on trixy
      • create vm on horizon
      • run cookbook to update puppet certificate
    • web-proxy exposing tools-harbor-2 to the internet. should also be removed when migration is done.
  • figure out authentication for the harborstorage object storage
    • current auth method is tools s3 credentials generated for the toolforge-harbor user
  • setup harbor on tools-harbor-2 and configure it to use the harborstorage object storage
    • disable puppet on tools-harbor-2.
    • manually configure harbor to use the new harborstorage s3 object storage (this will be switched to automatic via puppet during the maintenance window)
    • manually configure harbor to use the new trove database (this will be switched to automatic via puppet during the maintenance window)
    • run prepare and docker-compose
    • test push and pull locally using lima-kilo
  • recreate all user's, robot accounts and policies that currently exists in tools-harbor-1 on tools-harbor-2. Unfortunately these cannot be replicated so we have to do them manually. For robot accounts, use the same name and password. For users, use same name and email (will send out an email telling the users to change their password on harbor after the upgrade)
  • configure original tools-harbor-1 harbor setup to replicate to the new tools-harbor-2 through the harbor ui.
    • Go to administration -> registries and create a registry entry on tools-harbor-1 to point to tools-harbor-2.
    • Go to administration -> replication and create replicate-to-tools-harbor-2 replication job on tools-harbor-1 to replicate to tools-harbor-2 (note that that harbor replication fails if you try replicating so many projects at once, so you might need to have several replication rules using pattern similar to this tool-{v,w,x,y,z}*/**. see harbor documentation for more details)
    • trigger the job manually and make sure it works as expected
    • verify that all the projects are replicated to tools-harbor-2
  • add tools-harbor-2 to maintain-harbor configuration and verify that maintain-harbor performs all expected tasks on the new harbor instance (it's possible that this will require briefly disabling maintain-harbor on tools-harbor-1). This can be done temporarily disabling puppet on a target k8s-control node, changing the information in the secrets file, running the functional tests then enabling puppet again.
    • user policies get created in tools-harbor-2
    • user quotas get created in tools-harbor-2
  • announce tools harbor down time to read-only mode
  • during maintenance window
    • disable puppet on tools-harbor-1
    • add s3_config and update trove database configuration on puppet
    • enable puppet on tools-harbor-2, run puppet agent test, run docker-compose down and up to verify that the tools-harbor-2 is still up and serving harbor.
    • update the password of the robot account robot$tools-image-builder in tools-harbor-1 to the same one used for the robot robot$tools-image-builder in tools-harbor-2. Harbor made some changes to their password validation logic and now we need to change this password if we want any resource making use of the robot to be able to connect to harbor after the upgrade.
    • make tools-harbor-1 read-only. This can be done through the harbor UI by going to administration -> configuration -> system settings
    • run replication job in the tools-harbor-1 harbor UI once more.
    • change the webproxy pointing to tools-harbor-1 to point to tools-harbor-2
    • test that everything works properly, perhaps by rebuilding lima-kilo, run a build, run a job, run functional tests on tools.
    • verify that maintain-harbor still works with the new harbor instance
    • any necessary cleanup
      • unmount tools-harbor volume. maybe keep around for like 1 week before deleting
      • delete tools-harbor-1. maybe keep around for like 1 week before deleting
      • delete tools-harbordb trove database - maybe keep around for like 1 seek before deleting
      • delete tools-harbor-2.wmcloud.org webproxy

Additional Steps

  • ensure the auth for the toolforge-harbor user get moved to pwstore. Currently pwstore seems to be down and just so we don't forget, we should probably keep this task open until that is successfully moved into pwstore.
  • email all administration -> users on harbor to change their harbor password if they want to continue using harbor

Event Timeline

dcaro triaged this task as High priority.EditedJan 24 2024, 2:22 PM
dcaro subscribed.

This might solve T336668: [harbor] Create backups and/or replication - for the images themselves, the DB still would need backups.

dcaro renamed this task from Move harbor data to object storage service to [harbor] Move harbor data to object storage service.Mar 5 2024, 4:10 PM

Change #1093856 had a related patch set uploaded (by Raymond Ndibe; author: Raymond Ndibe):

[operations/puppet@production] profile::manifests::toolforge::harbor: add s3 auth to harbor config

https://gerrit.wikimedia.org/r/1093856

Change #1093856 merged by David Caro:

[operations/puppet@production] profile::manifests::toolforge::harbor: add s3 auth to harbor config

https://gerrit.wikimedia.org/r/1093856

Raymond_Ndibe changed the task status from Open to Stalled.Mar 13 2025, 2:51 PM
Raymond_Ndibe changed the task status from Stalled to In Progress.Jul 23 2025, 3:55 AM

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-18T16:53:26Z] <raymond-ndibe@cloudcumin1001> START - Cookbook wmcs.vps.refresh_puppet_certs on tools-harbor-2.tools.eqiad1.wikimedia.cloud (T350687)

Mentioned in SAL (#wikimedia-cloud-feed) [2025-08-18T16:54:58Z] <raymond-ndibe@cloudcumin1001> END (PASS) - Cookbook wmcs.vps.refresh_puppet_certs (exit_code=0) on tools-harbor-2.tools.eqiad1.wikimedia.cloud (T350687)

Not all projects are currently replicated, on tools-harbor-2:

image.png (232×920 px, 42 KB)

On tools-harbor-1:

image.png (232×920 px, 42 KB)

Not all projects are currently replicated, on tools-harbor-2:

image.png (232×920 px, 42 KB)

On tools-harbor-1:

image.png (232×920 px, 42 KB)

I think only empty projects are not being replicated.
These are the affected tools:

tool-arturo-test-tool
tool-chlod-staging
tool-dcaro-test10
tool-ldap-beta
tool-n-ninety-five
tool-patsabot
tool-pauliesnug-first-rust-tool
tool-toolforge-cli-test
tool-toolviews

every single one of them has no artifact. you can check this yourself by copying and pasting on tools-harbor and inspecting them

Mentioned in SAL (#wikimedia-cloud) [2025-08-19T14:22:46Z] <Raymond_Ndibe> setting tools-harbor-1 as read-only (T350687)

Mentioned in SAL (#wikimedia-cloud) [2025-08-19T14:37:00Z] <dcaro> flipped the tools-harbor.wmcloud.org endpoint to point to tools-harbor-2 (T350687)

Raymond_Ndibe updated the task description. (Show Details)
Raymond_Ndibe moved this task from In Review to Done on the Toolforge (Toolforge iteration 24) board.