Page MenuHomePhabricator

Raymond_Ndibe (Ray)
Software Engineer

Projects (7)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Mar 6 2020, 9:03 PM (256 w, 5 d)
Availability
Available
IRC Nick
Raymond_Ndibe
LDAP User
Raymond Ndibe
MediaWiki User
Raymond Ndibe [ Global Accounts ]

Recent Activity

Mon, Jan 27

Raymond_Ndibe closed T384843: [components-api] skip functional tests for tools as Resolved.
Mon, Jan 27, 3:58 PM · Toolforge (Toolforge iteration 17)
Raymond_Ndibe claimed T384843: [components-api] skip functional tests for tools.
Mon, Jan 27, 3:45 PM · Toolforge (Toolforge iteration 17)
Raymond_Ndibe created T384843: [components-api] skip functional tests for tools.
Mon, Jan 27, 3:45 PM · Toolforge (Toolforge iteration 17)
Raymond_Ndibe added a comment to T384809: toolsbeta: maintain-kubeusers not running because ImagePullBackOff.

Hello Arturo, welcome back!
Yea thanks for reporting this. This is partly my fault. I was working on something on toolsbeta (testing harbor upgrade). I just reverted somethings so this should no longer be an issue

Mon, Jan 27, 10:07 AM · User-aborrero, Toolforge (Toolforge iteration 17), cloud-services-team

Fri, Jan 24

Raymond_Ndibe created T384720: [infra, harbor] use latest thirdparty/docker in harbor hosts.
Fri, Jan 24, 3:56 PM · Patch-For-Review, Toolforge (Toolforge iteration 17)
Raymond_Ndibe updated the task description for T384327: [infra,harbor] upgrade harbor v2.10.1 ---> v2.12.2.
Fri, Jan 24, 2:33 PM · Patch-For-Review, Toolforge (Toolforge iteration 17)
Raymond_Ndibe updated the task description for T384327: [infra,harbor] upgrade harbor v2.10.1 ---> v2.12.2.
Fri, Jan 24, 2:31 PM · Patch-For-Review, Toolforge (Toolforge iteration 17)
Raymond_Ndibe moved T362867: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28 from In Review to In Progress on the Toolforge (Toolforge iteration 17) board.
Fri, Jan 24, 1:37 PM · cloud-services-team (FY2024/2025-Q3-Q4), Toolforge (Toolforge iteration 17), Patch-For-Review
Raymond_Ndibe changed the status of T370245: [infra,k8s] remove deprecated kubelet flags before 1.28 upgrade (we might be able to remove all custom ones) from Open to In Progress.
Fri, Jan 24, 1:37 PM · Patch-For-Review, Toolforge (Toolforge iteration 17), User-Raymond_Ndibe, cloud-services-team
Raymond_Ndibe changed the status of T370245: [infra,k8s] remove deprecated kubelet flags before 1.28 upgrade (we might be able to remove all custom ones), a subtask of T362867: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28, from Open to In Progress.
Fri, Jan 24, 1:37 PM · cloud-services-team (FY2024/2025-Q3-Q4), Toolforge (Toolforge iteration 17), Patch-For-Review
Raymond_Ndibe changed the status of T384327: [infra,harbor] upgrade harbor v2.10.1 ---> v2.12.2 from Open to In Progress.
Fri, Jan 24, 1:37 PM · Patch-For-Review, Toolforge (Toolforge iteration 17)
Raymond_Ndibe changed the status of T384327: [infra,harbor] upgrade harbor v2.10.1 ---> v2.12.2, a subtask of T352417: [maintain-harbor] Manage project quotas via maintain-harbor, from Open to In Progress.
Fri, Jan 24, 1:37 PM · Toolforge (Toolforge iteration 17), Upstream, Patch-For-Review
Raymond_Ndibe renamed T384327: [infra,harbor] upgrade harbor v2.10.1 ---> v2.12.2 from [infra,harbor] upgrade to v2.10.1 to [infra,harbor] upgrade harbor v2.10.1 ---> v2.12.2.
Fri, Jan 24, 1:36 PM · Patch-For-Review, Toolforge (Toolforge iteration 17)
Raymond_Ndibe moved T362867: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28 from In Progress to In Review on the Toolforge (Toolforge iteration 17) board.
Fri, Jan 24, 1:36 PM · cloud-services-team (FY2024/2025-Q3-Q4), Toolforge (Toolforge iteration 17), Patch-For-Review
Raymond_Ndibe closed T361120: [jobs-cli,jobs-api] quota shows different units for limit and usage as Resolved.
Fri, Jan 24, 1:35 PM · Toolforge (Toolforge iteration 17), Patch-For-Review

Thu, Jan 23

Raymond_Ndibe renamed T384327: [infra,harbor] upgrade harbor v2.10.1 ---> v2.12.2 from [infra,harbor] upgrade to latest to [infra,harbor] upgrade to v2.10.1.
Thu, Jan 23, 7:34 PM · Patch-For-Review, Toolforge (Toolforge iteration 17)
Raymond_Ndibe changed the status of T374193: [k8s, infra] update pause image to 3.9 from Open to In Progress.
Thu, Jan 23, 7:08 PM · Patch-For-Review, Toolforge (Toolforge iteration 17), cloud-services-team
Raymond_Ndibe claimed T374193: [k8s, infra] update pause image to 3.9.
Thu, Jan 23, 7:05 PM · Patch-For-Review, Toolforge (Toolforge iteration 17), cloud-services-team
Raymond_Ndibe renamed T374193: [k8s, infra] update pause image to 3.9 from [k8s, infra] update pause image to 3.6 to [k8s, infra] update pause image to 3.9.
Thu, Jan 23, 7:05 PM · Patch-For-Review, Toolforge (Toolforge iteration 17), cloud-services-team
Raymond_Ndibe added a project to T369800: [replica_cnf,functional-tests] Run replica_cnf functional tests in lima-kilo with the rest of functional tests: User-Raymond_Ndibe.
Thu, Jan 23, 6:52 PM · User-Raymond_Ndibe, cloud-services-team, Toolforge
Restricted Application added a project to T369800: [replica_cnf,functional-tests] Run replica_cnf functional tests in lima-kilo with the rest of functional tests: cloud-services-team.

yeaaaa I think there is a task somewhere about moving replica_cnf out of puppet. maybe it's time to work on that

Thu, Jan 23, 6:52 PM · User-Raymond_Ndibe, cloud-services-team, Toolforge
Raymond_Ndibe moved T374193: [k8s, infra] update pause image to 3.9 from Ready to be worked on to Toolforge iteration 17 on the Toolforge board.
Thu, Jan 23, 6:34 PM · Patch-For-Review, Toolforge (Toolforge iteration 17), cloud-services-team
Raymond_Ndibe claimed T384327: [infra,harbor] upgrade harbor v2.10.1 ---> v2.12.2.
Thu, Jan 23, 6:30 PM · Patch-For-Review, Toolforge (Toolforge iteration 17)
Raymond_Ndibe added a comment to T362621: Support HTTP health checks in jobs framework.

Reminder: Add to changelog on wikitech

Thu, Jan 23, 5:44 PM · Patch-For-Review, Toolforge (Toolforge iteration 17)
Raymond_Ndibe closed T317953: add on-wiki edits of toolforge tools to toolstats report as Resolved.
Thu, Jan 23, 1:21 PM · Toolforge (Toolforge iteration 17), cloud-services-team, User-Raymond_Ndibe
Raymond_Ndibe moved T317953: add on-wiki edits of toolforge tools to toolstats report from Ready to be worked on to Toolforge iteration 17 on the Toolforge board.
Thu, Jan 23, 1:21 PM · Toolforge (Toolforge iteration 17), cloud-services-team, User-Raymond_Ndibe

Wed, Jan 22

Raymond_Ndibe renamed T317953: add on-wiki edits of toolforge tools to toolstats report from add on-wiki edits of toolforge tools to toolviews report to add on-wiki edits of toolforge tools to toolstats report.
Wed, Jan 22, 10:42 AM · Toolforge (Toolforge iteration 17), cloud-services-team, User-Raymond_Ndibe
Raymond_Ndibe renamed T383081: Persist important toolforge k8s components logs from Persist maintain-harbor logs to Persist important toolforge k8s components logs.
Wed, Jan 22, 8:00 AM · Toolforge (Toolforge iteration 17), Patch-For-Review

Thu, Jan 16

Raymond_Ndibe added a comment to T317953: add on-wiki edits of toolforge tools to toolstats report.

Not necessarily. I think that is a decision we haven't yet made. I personally don't mind us taking it over. The purpose of this PR though is to get this task done and closed.

I asked because adding new features to store data that has nothing to do with page views and refactoring all of the code in "my" tool feels like a hostile takeover. Maybe y'all talked this through in 2022 without me and decided it would be fine? I'm just honestly trying to figure out if I am on the hook for fixing the tool going forward or not.

Thu, Jan 16, 4:57 PM · Toolforge (Toolforge iteration 17), cloud-services-team, User-Raymond_Ndibe
Raymond_Ndibe added a comment to T317953: add on-wiki edits of toolforge tools to toolstats report.

I was actively working on this just an hour ago @bd808 . Thanks for bringing it to my attention. I think it failed once again when it wasn't supposed to. I've git fetched the latest changes to the bastion and enabled it again. Should no longer fail

Can I assume this task and related MRs are a formal notice that the WMCS team owns the tool and does not need my help in maintaining it?

Thu, Jan 16, 4:35 PM · Toolforge (Toolforge iteration 17), cloud-services-team, User-Raymond_Ndibe
Raymond_Ndibe changed the status of T362621: Support HTTP health checks in jobs framework, a subtask of T348755: [jobs-api,webservice] Run webservices via the jobs framework, from Open to In Progress.
Thu, Jan 16, 4:16 PM · cloud-services-team, User-Raymond_Ndibe, Toolforge, Epic
Raymond_Ndibe changed the status of T362621: Support HTTP health checks in jobs framework from Open to In Progress.
Thu, Jan 16, 4:16 PM · Patch-For-Review, Toolforge (Toolforge iteration 17)
Raymond_Ndibe changed the status of T377420: [jobs-api,jobs-cli] Introduce a way to stop stuck cronjobs from Open to In Progress.
Thu, Jan 16, 4:16 PM · Patch-For-Review, Toolforge (Toolforge iteration 17), User-Raymond_Ndibe, User-aborrero, cloud-services-team
Raymond_Ndibe added a comment to T383081: Persist important toolforge k8s components logs.

Did you consider using successfulJobsHistoryLimit and failedJobsHistoryLimit to persist pod objects and the logs they include for some amount of time?

Thu, Jan 16, 4:15 PM · Toolforge (Toolforge iteration 17), Patch-For-Review

Wed, Jan 15

Raymond_Ndibe added a comment to T317953: add on-wiki edits of toolforge tools to toolstats report.

A Toolforge job for this task was enabled which seemed to do nothing except crash and spam my mailbox. @Andrew disabled the job.

Wed, Jan 15, 12:09 AM · Toolforge (Toolforge iteration 17), cloud-services-team, User-Raymond_Ndibe

Wed, Jan 8

Raymond_Ndibe changed the status of T383081: Persist important toolforge k8s components logs from Open to In Progress.
Wed, Jan 8, 4:32 PM · Toolforge (Toolforge iteration 17), Patch-For-Review

Jan 6 2025

Raymond_Ndibe reopened T361120: [jobs-cli,jobs-api] quota shows different units for limit and usage as "In Progress".
Jan 6 2025, 8:03 PM · Toolforge (Toolforge iteration 17), Patch-For-Review
Raymond_Ndibe closed T361120: [jobs-cli,jobs-api] quota shows different units for limit and usage as Resolved.
Jan 6 2025, 8:03 PM · Toolforge (Toolforge iteration 17), Patch-For-Review
Raymond_Ndibe closed T381650: Update maintain-harbor documentation after move from a tool to a component as Resolved.
Jan 6 2025, 8:02 PM · Toolforge (Toolforge iteration 16), cloud-services-team, Documentation
Raymond_Ndibe edited projects for T381650: Update maintain-harbor documentation after move from a tool to a component, added: Toolforge (Toolforge iteration 16); removed Toolforge.
Jan 6 2025, 8:02 PM · Toolforge (Toolforge iteration 16), cloud-services-team, Documentation
Raymond_Ndibe added a comment to T383081: Persist important toolforge k8s components logs.

update https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Harbor/maintain-harbor when this is done

Jan 6 2025, 7:56 PM · Toolforge (Toolforge iteration 17), Patch-For-Review
Raymond_Ndibe created T383081: Persist important toolforge k8s components logs.
Jan 6 2025, 7:28 PM · Toolforge (Toolforge iteration 17), Patch-For-Review
Raymond_Ndibe claimed T381650: Update maintain-harbor documentation after move from a tool to a component.
Jan 6 2025, 2:51 PM · Toolforge (Toolforge iteration 16), cloud-services-team, Documentation

Dec 8 2024

Raymond_Ndibe added a comment to T364204: toolforge jobs load flushes out all jobs.

I investigated this a bit. I think the problem is coming from the replica field. For some reason I forgot to account for that in loads since it was added after the loads things was refactored. Also this should have been caught by our functional test

Dec 8 2024, 11:21 AM · cloud-services-team, Wikimedia-Hackathon-2024, Toolforge
Raymond_Ndibe added a comment to T364204: toolforge jobs load flushes out all jobs.

please @Multichill share your jobs.yaml file so I can attempt reproducing this and see exactly what is happening

Dec 8 2024, 8:48 AM · cloud-services-team, Wikimedia-Hackathon-2024, Toolforge

Dec 6 2024

Raymond_Ndibe closed T358225: [maintain-harbor] Move to become a toolforge component as Resolved.
Dec 6 2024, 10:39 AM · Toolforge (Toolforge iteration 16), cloud-services-team, Patch-For-Review, User-Raymond_Ndibe

Nov 27 2024

Raymond_Ndibe added a comment to T380833: [harbor] some artifacts and projects seems to have gone missing.

I looked into this a bit. In my opinion, there are 5 ways we know projects can be deleted [technically it's just 3, the others are just abstractions over the 3] (edit this list to add more if I missed anything)

Nov 27 2024, 9:47 PM · User-aborrero, User-Raymond_Ndibe, cloud-services-team, Toolforge

Nov 26 2024

Raymond_Ndibe added a project to T380833: [harbor] some artifacts and projects seems to have gone missing: User-Raymond_Ndibe.
Nov 26 2024, 2:51 PM · User-aborrero, User-Raymond_Ndibe, cloud-services-team, Toolforge
Raymond_Ndibe closed T378180: [lima-kilo] support caching of container images using a cache disk as Resolved.
Nov 26 2024, 1:55 PM · Patch-For-Review, Toolforge (Toolforge iteration 16)
Raymond_Ndibe closed T375163: lima-kilo installation giving inconsistent result. Sometimes it works, sometimes it doesn't as Resolved.
Nov 26 2024, 1:53 PM · Toolforge (Toolforge iteration 16)
Raymond_Ndibe closed T374585: [lima-kilo] allow for the creation of a multi-node high availability cluster as Resolved.
Nov 26 2024, 1:52 PM · Toolforge (Toolforge iteration 16), User-Raymond_Ndibe
Raymond_Ndibe closed T377854: [harbor] Do not clean up images currently running in production as Resolved.
Nov 26 2024, 1:50 PM · Toolforge (Toolforge iteration 16), User-aborrero, cloud-services-team

Nov 25 2024

Raymond_Ndibe claimed T317953: add on-wiki edits of toolforge tools to toolstats report.
Nov 25 2024, 1:35 PM · Toolforge (Toolforge iteration 17), cloud-services-team, User-Raymond_Ndibe

Nov 20 2024

Raymond_Ndibe updated the task description for T362867: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28.
Nov 20 2024, 12:24 AM · cloud-services-team (FY2024/2025-Q3-Q4), Toolforge (Toolforge iteration 17), Patch-For-Review

Nov 12 2024

Raymond_Ndibe created T379633: unable to log-in to toolsbeta-harbor-1.
Nov 12 2024, 2:41 PM · Toolforge (Toolforge iteration 16), cloud-services-team

Nov 10 2024

Raymond_Ndibe updated the task description for T362867: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28.
Nov 10 2024, 4:23 AM · cloud-services-team (FY2024/2025-Q3-Q4), Toolforge (Toolforge iteration 17), Patch-For-Review
Raymond_Ndibe updated the task description for T362867: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28.
Nov 10 2024, 4:21 AM · cloud-services-team (FY2024/2025-Q3-Q4), Toolforge (Toolforge iteration 17), Patch-For-Review

Nov 7 2024

Raymond_Ndibe closed T379270: Increase Object Storage quota for Toolsbeta project as Resolved.
Nov 7 2024, 8:54 PM · Cloud-VPS (Quota-requests)
Raymond_Ndibe closed T379271: Increase Object Storage quota for Tools project as Resolved.
Nov 7 2024, 8:54 PM · Cloud-VPS (Quota-requests)
Raymond_Ndibe added a comment to T379271: Increase Object Storage quota for Tools project.

Before:

root@cloudcontrol1005:/home/raymond-ndibe# sudo radosgw-admin user info --uid tools\$tools
{
    "user_id": "tools$tools",
    "display_name": "tools",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": true,
        "check_on_raw": false,
        "max_size": 8589934592,
        "max_size_kb": 8388608,
        "max_objects": 4096
    },
    "temp_url_keys": [],
    "type": "keystone",
    "mfa_ids": []
}
Nov 7 2024, 8:54 PM · Cloud-VPS (Quota-requests)
Raymond_Ndibe added a comment to T379271: Increase Object Storage quota for Tools project.
radosgw-admin quota set --quota-scope=user --uid=tools\$tools --max-size=50G --max-objects=51107
Nov 7 2024, 8:51 PM · Cloud-VPS (Quota-requests)
Raymond_Ndibe added a comment to T379270: Increase Object Storage quota for Toolsbeta project.

Before:

root@cloudcontrol1005:/home/raymond-ndibe# sudo radosgw-admin user info --uid toolsbeta\$toolsbeta
{
    "user_id": "toolsbeta$toolsbeta",
    "display_name": "toolsbeta",
    "email": "",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": -1,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": true,
        "check_on_raw": false,
        "max_size": 8589934592,
        "max_size_kb": 8388608,
        "max_objects": 4096
    },
    "temp_url_keys": [],
    "type": "keystone",
    "mfa_ids": []
}
Nov 7 2024, 8:50 PM · Cloud-VPS (Quota-requests)
Raymond_Ndibe added a comment to T379270: Increase Object Storage quota for Toolsbeta project.
radosgw-admin quota set --quota-scope=user --uid=toolsbeta\$toolsbeta --max-size=50G --max-objects=51107
Nov 7 2024, 8:49 PM · Cloud-VPS (Quota-requests)
Raymond_Ndibe updated the task description for T379270: Increase Object Storage quota for Toolsbeta project.
Nov 7 2024, 8:44 PM · Cloud-VPS (Quota-requests)
Raymond_Ndibe updated the task description for T379271: Increase Object Storage quota for Tools project.
Nov 7 2024, 8:43 PM · Cloud-VPS (Quota-requests)
Raymond_Ndibe updated the task description for T379270: Increase Object Storage quota for Toolsbeta project.
Nov 7 2024, 4:55 PM · Cloud-VPS (Quota-requests)
Raymond_Ndibe updated the task description for T379271: Increase Object Storage quota for Tools project.
Nov 7 2024, 4:54 PM · Cloud-VPS (Quota-requests)
Raymond_Ndibe created T379271: Increase Object Storage quota for Tools project.
Nov 7 2024, 3:58 PM · Cloud-VPS (Quota-requests)
Raymond_Ndibe created T379270: Increase Object Storage quota for Toolsbeta project.
Nov 7 2024, 3:56 PM · Cloud-VPS (Quota-requests)
Raymond_Ndibe closed T360626: Frequent radosgw 500 errors with Object Storage as Resolved.
Nov 7 2024, 3:30 PM · cloud-services-team, Cloud-VPS

Nov 6 2024

Raymond_Ndibe added a comment to T360626: Frequent radosgw 500 errors with Object Storage.

This might be related to this errors on logstash https://logstash.wikimedia.org/goto/c7fa935688ccd6ccda0e11b420b747d1

[None req-ec867736-f8b3-40cd-aeef-fb2ee2daf81c swift service - - default default] Credential could not be decrypted. Please contact the administrator: cryptography.fernet.InvalidToken

Happening only on cloudcontrol1006/7 (so when the requests go to 1005 the request works). We might want to try this https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Procedures_and_operations#Rotating_or_revoking_keystone_fernet_tokens

Nov 6 2024, 8:36 PM · cloud-services-team, Cloud-VPS
Raymond_Ndibe claimed T360626: Frequent radosgw 500 errors with Object Storage.
Nov 6 2024, 8:34 PM · cloud-services-team, Cloud-VPS
Raymond_Ndibe added a comment to T360626: Frequent radosgw 500 errors with Object Storage.

according to https://docs.openstack.org/keystone/zed/admin/credential-encryption.html, the configuration for this is (also happens to be the default):

[credential]
provider = fernet
key_repository = /etc/keystone/credential-keys/

This was not explicitly configured in /etc/keystone/keystone.conf or /etc/keystone/domains/keystone.toolsbeta.conf or /etc/keystone/domains/keystone.default.conf, and the comment here https://gerrit.wikimedia.org/g/operations/puppet/+/87fad547f8948a4fca6d2c2b90fb13fcaa2d3b1e/modules/profile/manifests/openstack/base/keystone/fernet_keys.pp#87 leads me to believe that this was unimportant in the past but became the default in a newer keystone version which we maybe upgraded to, without noticing this change in the changelog (this is a guess but is likely to be the reason)

Nov 6 2024, 7:03 PM · cloud-services-team, Cloud-VPS
Raymond_Ndibe added a comment to T360626: Frequent radosgw 500 errors with Object Storage.

@dcaro I think I figured out where this problem is from. There are two files /etc/keystone/credential-keys/0 and /etc/keystone/credential-keys/1 with user:group=keystone:keystone and mode=600 in cloudcontrol1005 that doesn't exist in either cloudcontrol1006 or 1007.

Nov 6 2024, 6:44 PM · cloud-services-team, Cloud-VPS
Raymond_Ndibe added a comment to T360626: Frequent radosgw 500 errors with Object Storage.

It might also be worth looking at the sql driver. I am not sure how that part of it works, but if we are reading stuffs from sql, it might be worth it to look for a create-read race condition

Nov 6 2024, 4:14 AM · cloud-services-team, Cloud-VPS
Raymond_Ndibe added a comment to T360626: Frequent radosgw 500 errors with Object Storage.

This might be related to this errors on logstash https://logstash.wikimedia.org/goto/c7fa935688ccd6ccda0e11b420b747d1

[None req-ec867736-f8b3-40cd-aeef-fb2ee2daf81c swift service - - default default] Credential could not be decrypted. Please contact the administrator: cryptography.fernet.InvalidToken

Happening only on cloudcontrol1006/7 (so when the requests go to 1005 the request works). We might want to try this https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Procedures_and_operations#Rotating_or_revoking_keystone_fernet_tokens

Nov 6 2024, 4:11 AM · cloud-services-team, Cloud-VPS
Raymond_Ndibe added a comment to T360626: Frequent radosgw 500 errors with Object Storage.

To reproduce this using s3 bucket

  • ssh into cloudcontrol1005, cloudcontrol1006, cloudcontrol1007 (i.e. ssh cloudcontrol1006.eqiad.wmnet)
  • for each cloudcontrol, run journalctl -u keystone -f
  • ssh into toolsbeta-harbor-2.toolsbeta.eqiad1.wikimedia.cloud
  • become raymond-ndibe (sudo su, su raymond-ndibe) and run s3cmd info s3://harborstorage-2
  • observe the 500 UnkownError s3cmd is returning and observe the logs on all cloudcontrol
Nov 6 2024, 4:07 AM · cloud-services-team, Cloud-VPS
Raymond_Ndibe added a comment to T360626: Frequent radosgw 500 errors with Object Storage.

@Raymond_Ndibe can you paste one or more example commands that are failing?

It's interesting that for OpenTofu it seems to only fail some times but not always.

We can probably rename the task to "Frequent radosgw 500 errors with Object Storage".

Nov 6 2024, 3:58 AM · cloud-services-team, Cloud-VPS
Raymond_Ndibe added a comment to T360626: Frequent radosgw 500 errors with Object Storage.

To reproduce this using s3 bucket

  • ssh into cloudcontrol1005, cloudcontrol1006, cloudcontrol1007 (i.e. ssh cloudcontrol1006.eqiad.wmnet)
  • for each cloudcontrol, run journalctl -u keystone -f
  • ssh into toolsbeta-harbor-2.toolsbeta.eqiad1.wikimedia.cloud
  • become raymond-ndibe (sudo su, su raymond-ndibe) and run s3cmd info s3://harborstorage-2
  • observe the 500 UnkownError s3cmd is returning and observe the logs on all cloudcontrol
Nov 6 2024, 3:56 AM · cloud-services-team, Cloud-VPS

Nov 5 2024

Restricted Application added a project to T360626: Frequent radosgw 500 errors with Object Storage: cloud-services-team.

The priority of this should be high. This basically makes buckets unusable now. I already tried creating and experimenting with two buckets and I can't even push objects as low as 15mb to either of those buckets

Nov 5 2024, 11:23 AM · cloud-services-team, Cloud-VPS

Oct 26 2024

Raymond_Ndibe changed the status of T377854: [harbor] Do not clean up images currently running in production from Open to In Progress.
Oct 26 2024, 1:39 AM · Toolforge (Toolforge iteration 16), User-aborrero, cloud-services-team

Oct 25 2024

Raymond_Ndibe changed the status of T358225: [maintain-harbor] Move to become a toolforge component from Open to In Progress.
Oct 25 2024, 12:04 PM · Toolforge (Toolforge iteration 16), cloud-services-team, Patch-For-Review, User-Raymond_Ndibe
Raymond_Ndibe changed the status of T362867: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28 from Open to In Progress.
Oct 25 2024, 12:02 PM · cloud-services-team (FY2024/2025-Q3-Q4), Toolforge (Toolforge iteration 17), Patch-For-Review
Raymond_Ndibe claimed T377420: [jobs-api,jobs-cli] Introduce a way to stop stuck cronjobs.
Oct 25 2024, 12:00 PM · Patch-For-Review, Toolforge (Toolforge iteration 17), User-Raymond_Ndibe, User-aborrero, cloud-services-team
Raymond_Ndibe claimed T377854: [harbor] Do not clean up images currently running in production.
Oct 25 2024, 12:00 PM · Toolforge (Toolforge iteration 16), User-aborrero, cloud-services-team
Raymond_Ndibe changed the status of T362867: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28, a subtask of T362868: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.29, from Open to In Progress.
Oct 25 2024, 12:00 PM · cloud-services-team (FY2024/2025-Q3-Q4), Toolforge (Toolforge iteration 17)
Raymond_Ndibe edited projects for T358225: [maintain-harbor] Move to become a toolforge component, added: Toolforge (Toolforge iteration 16); removed Toolforge.
Oct 25 2024, 11:59 AM · Toolforge (Toolforge iteration 16), cloud-services-team, Patch-For-Review, User-Raymond_Ndibe
Raymond_Ndibe changed the status of T378180: [lima-kilo] support caching of container images using a cache disk from Open to In Progress.
Oct 25 2024, 11:59 AM · Patch-For-Review, Toolforge (Toolforge iteration 16)
Raymond_Ndibe added a project to T378180: [lima-kilo] support caching of container images using a cache disk: Patch-For-Review.
Oct 25 2024, 11:56 AM · Patch-For-Review, Toolforge (Toolforge iteration 16)
Raymond_Ndibe changed the status of T374585: [lima-kilo] allow for the creation of a multi-node high availability cluster from Open to In Progress.
Oct 25 2024, 10:58 AM · Toolforge (Toolforge iteration 16), User-Raymond_Ndibe
Raymond_Ndibe created T378180: [lima-kilo] support caching of container images using a cache disk.
Oct 25 2024, 10:57 AM · Patch-For-Review, Toolforge (Toolforge iteration 16)
Raymond_Ndibe added a comment to T369364: toolforge: integrate fourohfour as a custom component, rather than a normal tool.

Since the redis cache here doesn't need to be persisted, anyone see any problem with having the redis cache be either a different pod in the same namespace or a container in the same pod?

Oct 25 2024, 10:16 AM · User-Raymond_Ndibe, Toolforge, User-aborrero, cloud-services-team

Oct 22 2024

Raymond_Ndibe added a project to T377781: [jobs-api,jobs-cli] Add support for replacing a running scheduled job when an overlapping schedule fires (`concurrencyPolicy: Replace`): User-Raymond_Ndibe.
Oct 22 2024, 8:27 AM · User-Raymond_Ndibe, Toolforge, cloud-services-team

Oct 21 2024

Raymond_Ndibe updated the task description for T362867: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.28.
Oct 21 2024, 1:19 PM · cloud-services-team (FY2024/2025-Q3-Q4), Toolforge (Toolforge iteration 17), Patch-For-Review

Oct 17 2024

Raymond_Ndibe added a project to T377420: [jobs-api,jobs-cli] Introduce a way to stop stuck cronjobs: User-Raymond_Ndibe.
Oct 17 2024, 3:18 PM · Patch-For-Review, Toolforge (Toolforge iteration 17), User-Raymond_Ndibe, User-aborrero, cloud-services-team

Oct 7 2024

Raymond_Ndibe created T376673: [openstack object storage] deleted files still occupying space.
Oct 7 2024, 10:44 PM · cloud-services-team, Cloud-VPS

Oct 2 2024

Raymond_Ndibe closed T373072: Decision Request: To strictly enforce semantic versioning rules for toolforge services' APIs or not as Resolved.
Oct 2 2024, 5:34 PM · Toolforge (Toolforge iteration 15), User-Raymond_Ndibe, Cloud Services Proposals, cloud-services-team
Raymond_Ndibe removed a project from T359641: [infra,k8s] Upgrade Toolforge Kubernetes to version 1.27: Patch-For-Review.
Oct 2 2024, 5:34 PM · Toolforge (Toolforge iteration 15), cloud-services-team (FY2024/2025-Q1-Q2)
Raymond_Ndibe updated the task description for T373072: Decision Request: To strictly enforce semantic versioning rules for toolforge services' APIs or not.
Oct 2 2024, 5:27 PM · Toolforge (Toolforge iteration 15), User-Raymond_Ndibe, Cloud Services Proposals, cloud-services-team
Raymond_Ndibe updated the task description for T373072: Decision Request: To strictly enforce semantic versioning rules for toolforge services' APIs or not.
Oct 2 2024, 5:26 PM · Toolforge (Toolforge iteration 15), User-Raymond_Ndibe, Cloud Services Proposals, cloud-services-team