User Details
- User Since
- Mar 6 2020, 9:03 PM (256 w, 5 d)
- Availability
- Available
- IRC Nick
- Raymond_Ndibe
- LDAP User
- Raymond Ndibe
- MediaWiki User
- Raymond Ndibe [ Global Accounts ]
Mon, Jan 27
Hello Arturo, welcome back!
Yea thanks for reporting this. This is partly my fault. I was working on something on toolsbeta (testing harbor upgrade). I just reverted somethings so this should no longer be an issue
Fri, Jan 24
Thu, Jan 23
yeaaaa I think there is a task somewhere about moving replica_cnf out of puppet. maybe it's time to work on that
Reminder: Add to changelog on wikitech
Wed, Jan 22
Thu, Jan 16
Wed, Jan 15
Wed, Jan 8
Jan 6 2025
update https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Harbor/maintain-harbor when this is done
Dec 8 2024
I investigated this a bit. I think the problem is coming from the replica field. For some reason I forgot to account for that in loads since it was added after the loads things was refactored. Also this should have been caught by our functional test
please @Multichill share your jobs.yaml file so I can attempt reproducing this and see exactly what is happening
Dec 6 2024
Nov 27 2024
I looked into this a bit. In my opinion, there are 5 ways we know projects can be deleted [technically it's just 3, the others are just abstractions over the 3] (edit this list to add more if I missed anything)
Nov 26 2024
Nov 25 2024
Nov 20 2024
Nov 12 2024
Nov 10 2024
Nov 7 2024
Before:
root@cloudcontrol1005:/home/raymond-ndibe# sudo radosgw-admin user info --uid tools\$tools { "user_id": "tools$tools", "display_name": "tools", "email": "", "suspended": 0, "max_buckets": 1000, "subusers": [], "keys": [], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "default_storage_class": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": true, "check_on_raw": false, "max_size": 8589934592, "max_size_kb": 8388608, "max_objects": 4096 }, "temp_url_keys": [], "type": "keystone", "mfa_ids": [] }
radosgw-admin quota set --quota-scope=user --uid=tools\$tools --max-size=50G --max-objects=51107
Before:
root@cloudcontrol1005:/home/raymond-ndibe# sudo radosgw-admin user info --uid toolsbeta\$toolsbeta { "user_id": "toolsbeta$toolsbeta", "display_name": "toolsbeta", "email": "", "suspended": 0, "max_buckets": 1000, "subusers": [], "keys": [], "swift_keys": [], "caps": [], "op_mask": "read, write, delete", "default_placement": "", "default_storage_class": "", "placement_tags": [], "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 }, "user_quota": { "enabled": true, "check_on_raw": false, "max_size": 8589934592, "max_size_kb": 8388608, "max_objects": 4096 }, "temp_url_keys": [], "type": "keystone", "mfa_ids": [] }
radosgw-admin quota set --quota-scope=user --uid=toolsbeta\$toolsbeta --max-size=50G --max-objects=51107
Nov 6 2024
according to https://docs.openstack.org/keystone/zed/admin/credential-encryption.html, the configuration for this is (also happens to be the default):
[credential] provider = fernet key_repository = /etc/keystone/credential-keys/
This was not explicitly configured in /etc/keystone/keystone.conf or /etc/keystone/domains/keystone.toolsbeta.conf or /etc/keystone/domains/keystone.default.conf, and the comment here https://gerrit.wikimedia.org/g/operations/puppet/+/87fad547f8948a4fca6d2c2b90fb13fcaa2d3b1e/modules/profile/manifests/openstack/base/keystone/fernet_keys.pp#87 leads me to believe that this was unimportant in the past but became the default in a newer keystone version which we maybe upgraded to, without noticing this change in the changelog (this is a guess but is likely to be the reason)
@dcaro I think I figured out where this problem is from. There are two files /etc/keystone/credential-keys/0 and /etc/keystone/credential-keys/1 with user:group=keystone:keystone and mode=600 in cloudcontrol1005 that doesn't exist in either cloudcontrol1006 or 1007.
It might also be worth looking at the sql driver. I am not sure how that part of it works, but if we are reading stuffs from sql, it might be worth it to look for a create-read race condition
To reproduce this using s3 bucket
- ssh into cloudcontrol1005, cloudcontrol1006, cloudcontrol1007 (i.e. ssh cloudcontrol1006.eqiad.wmnet)
- for each cloudcontrol, run journalctl -u keystone -f
- ssh into toolsbeta-harbor-2.toolsbeta.eqiad1.wikimedia.cloud
- become raymond-ndibe (sudo su, su raymond-ndibe) and run s3cmd info s3://harborstorage-2
- observe the 500 UnkownError s3cmd is returning and observe the logs on all cloudcontrol
Nov 5 2024
The priority of this should be high. This basically makes buckets unusable now. I already tried creating and experimenting with two buckets and I can't even push objects as low as 15mb to either of those buckets
Oct 26 2024
Oct 25 2024
Since the redis cache here doesn't need to be persisted, anyone see any problem with having the redis cache be either a different pod in the same namespace or a container in the same pod?