User Details
- User Since
- Mar 6 2020, 9:03 PM (301 w, 2 d)
- Availability
- Available
- IRC Nick
- Raymond_Ndibe
- LDAP User
- Raymond Ndibe
- MediaWiki User
- Raymond Ndibe [ Global Accounts ]
Tue, Dec 9
Fri, Nov 28
Tue, Nov 25
Nov 13 2025
Lima kilo env configurations for anyone who wants to recreate (configmaps, limitranges, resourcequotas, etc. I basically maxed everything out to ensure those never become an issue while running these tests. keeping everything in doc so things don't clutter the task):
https://docs.google.com/document/d/1LfXdcVB-Vh0I0IuoniCN325Tofzu7MK9bBnA8G9aLM0/edit?tab=t.0
Nov 12 2025
quick throw-away script for simple deployments in lima-kilo using the web images:
#!/usr/bin/env python3Nov 11 2025
image-config configmap has the below structure currently:
NOTE: the below entry is not an exact example of what is in the config, I just gathered many of the common aliases, state, extras into a single config entry so we can talk about it
apiVersion: v1
data:
images-v1.yaml: |
bookworm:
aliases:
- tf-bullseye-std
- tf-bullseye-std-DEPRECATED
state: stable
variants:
jobs-framework:
image: docker-registry.tools.wmflabs.org/toolforge-bookworm-sssd
webservice:
extra:
resources: jdk
wstype: generic
image: docker-registry.tools.wmflabs.org/toolforge-bookworm-web-sssd
...
kind: ConfigMap
...Few things to think about:
- How to support all aliases, state, extra? First which among those do we need (e.g. for backwards compatibility) and which are unnecessary? For the necessary ones, how do we support them in harbor if we want the endpoint to be as simple as just making a request to harbor and parsing? we certainly don't want to maintain a yaml in builds-api that defines these since that'd basically mean moving image-config into builds-api. A few things come to mind:
- extensive use of tags. (e.g. aliases-tf-bullseye-std, aliases-tf-bullseye-std-DEPRECATED, state-stable, state-deprecated, resources-jdk, wstype-generic, etc). If we go with this, then we need a way of parsing these in builds-api (probably trivial). More importantly, we'll likely need a cookbook for maintaining these images (updating tag to deprecated, specifying tags when uploading a new image, etc).
- helm chart on harbor (I hate this because it's no different from maintaining a local yaml on builds-api): with this you still have the images, then a chart defining these "tags`
Nov 10 2025
Nov 7 2025
Nov 6 2025
In all cases where both variants exist, the webservice image is functionally a superset of the jobs-framework image, therefore, the webservice image can most likely serve both purposes.
Nov 4 2025
This is more of a thing done for the purpose of backward compatibility rather than a bug mistakenly introduced.
Initially this was being returned as string, so we it was just carried over like that to avoid breaking anything for anyone who uses the api directly and expects this to be string.
Oct 22 2025
Upgrade notification to individual maintainer draft
Upgrade Your Old Toolforge Jobs Version to V2 <name>
Affected tools:
actrial adamant admin ahechtbot air7538tools alertlive arkivbot aswnbot aw-gerrit-gitlab-bridge bothasava botorder brandonbot contribstats croptool csp-report danmicholobot dannys712-bot deployment-calendar dewikinews-rss dexbot dow dykautobot earwigbot emijrpbot erwin85 featured-content-bot ffbot fist fontcdn forrestbot galobot gerakibot gerrit-reviewer-bot h78c67c-bot hay hewiki-tools highly-agitated-pages itwiki itwiki-scuola-italiana jackbot jarry-common jorobot kian lists logoscope magnustools maintgraph map-of-monuments mitmachen mjolnir most-wanted nlwiki-herhaalbot non-robot openstack-browser pagepile pangolinbot1 patrocle phabbot phabsearchemail phansearch phpcs pickme quest random-featured rembot sdbot search-filters sergobot-statistics shex-simple socksfinder sourcemd spur status svbot2 svgcheck sz-iwbot technischewuensche tf-image-bot thanatos thanks thesandbot tnt-dev toolhub-extension-demo toolhunt-api tools-edit-count top25reportbot topicmatcher trainbow tutor typo-fixer update-1lib1ref vicbot2 video2commons wd-flaw-finder wdumps welcomebot wgmc wiki-patrimonio wiki-stat-portal wikicup wikidata-game wikidata-todo wikijournalbot wikilinkbot wikiloves wikiprojectlist wikivoyage wm-domains wmch wmde-access ws-cat-browser zhmrtbot zhwiki-teleirc
Job version upgrade email draft:
[Cloud-announce] Old Toolforge Jobs Upgrade To V2 on 2025-11-20
Oct 21 2025
Oct 20 2025
Command:
sudo cookbook wmcs.openstack.quota_increase --project catalyst --cores 16 --ram 32768 --task-id T407733 --cluster-name eqiad1
Oct 14 2025
Oct 8 2025
--type secret/config with the default being secret should work while creating envvars
I struggle to see why we need an handle configurations saved in a git repo as a different from ordinary url, given that the files in a particular branch in both gitlab and github (and most git servers) can be defind as a simple url.
What am I missing?
typical example is https://gitlab.wikimedia.org/repos/cloud/toolforge/components-api/-/raw/replace_destination_image_comparison_with_image_name/LICENSE?ref_type=heads
the above link can be read by anything, we do not need to know it's in a git repo in branch replace_destination_image_comparison_with_image_name
Sep 25 2025
Sep 24 2025
Sep 23 2025
Sep 15 2025
This is similar to the error message you got @DamianZaremba. @dcaro you should also see this
Hello @DamianZaremba can you help with reproducing the error in the last message you sent? From my experience the only way this can happen is if you tried toolforge components deployment create (without --force-build), immediately after running toolforge build clean. We need to revisit the clean command, but right now the way it works is to delete all the images in harbor, while leaving behind the builds (unfortunately or our users a build existing automatically means that the image should exist, which is the right UX, but that is not how it currently works)
Sep 12 2025
Sep 10 2025
Sep 9 2025
Sep 2 2025
Aug 27 2025
before
raymond-ndibe@cloudcontrol1006:~$ sudo wmcs-openstack quota show catalyst-dev +-----------------------+-------+ | Resource | Limit | +-----------------------+-------+ | cores | 8 | | ram | 16384 | | gigabytes | 80 | ... +-----------------------+-------+
after
raymond-ndibe@cloudcontrol1006:~$ sudo wmcs-openstack quota show catalyst-dev +-----------------------+-------+ | Resource | Limit | +-----------------------+-------+ | cores | 32 | | ram | 65536 | | gigabytes | 670 | ... +-----------------------+-------+
Aug 26 2025
- one-off | continuous jobs: examples:
- {"short": "pending", "messages": ["restarting, maybe retrying?"], "duration": "00:00:32", "up_to_date": false} for jobs that are restarting either because of failure when backofflimit is specified (for jobs), or the restarting if command has exited (for deployments).
- {"short": "pending", "messages": ["scheduling"], "duration": "00:00:32", "up_to_date": false} pod is waiting to be assigned to node
- {"short": "pending", "messages": ["initializing"], "duration": "00:00:32", "up_to_date": false} pod init containers are still running, images still getting pulled
- {"short": "running", "messages": ["running"], "duration": "00:00:32", "up_to_date": true} all containers in the pod are running
- {"short": "succeeded", "messages": ["succeeded"], "duration": "00:00:32", "up_to_date": true} pod containers exited successfully
- {"short": "stopped", "messages": ["stopped"], "duration": "00:00:32", "up_to_date": true} (upcoming) job was stopped by user, to maybe be restarted later.
- {"short": "failed", "messages": ["Command not found"], "duration": "00:00:32", "up_to_date": true} the pod, container(s) failed to run
- {"short": "unknown", "messages": ["unknown"], "duration": "00:00:32", "up_to_date": true} unable to get the status of the job for some reason
yeaaaa, I think I see where the problem is coming from.
raymond-ndibe@cloudcontrol1006:~$ sudo radosgw-admin user info --uid tools\$tools
{
"user_id": "tools$tools",
...
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": true,
"check_on_raw": false,
"max_size": 53687091200,
"max_size_kb": 52428800,
"max_objects": 51107
},
...
}max_size_kb is 52428800 and that's equivalent to 50GB. the storage on horizon is 49.9GB. I just manually ran garbage collection on harbor. Let me see if I can build something rn
might be worth it to check the storage quota of harborstorage s3 bucket. The fact that is it was working intially but stopped suddenly makes me thing it's something to do with storage quota. Let me check
looking at this
Aug 24 2025
sudo cookbook wmcs.vps.create_project --user robertsky --user chlod --cluster-name eqiad1 --project eseap --task-id T401957 --description "To host eseap.org website and other related digital assets (for now a phorge task tracker) for ESEAP Hub"
...
raymond-ndibe@cloudcontrol1006:~$ sudo wmcs-openstack quota show eseap
+-----------------------+-------+ | Resource | Limit | +-----------------------+-------+ | cores | 8 | | instances | 8 | | ram | 16384 | | fixed_ips | None | | networks | 100 | | volumes | 8 | | snapshots | 4 | | gigabytes | 80 | | backups | 10 | | volumes_high-iops | -1 | | gigabytes_high-iops | -1 | | snapshots_high-iops | -1 | | volumes___DEFAULT__ | -1 | | gigabytes___DEFAULT__ | -1 | | snapshots___DEFAULT__ | -1 | | volumes_standard | -1 | | gigabytes_standard | -1 | | snapshots_standard | -1 | | groups | 4 | | ports | 500 | | rbac_policies | 10 | | routers | 10 | | subnets | 100 | | subnet_pools | -1 | | injected-file-size | 10240 | | injected-path-size | 255 | | injected-files | 5 | | key-pairs | 100 | | properties | 128 | | server-group-members | 10 | | server-groups | 10 | | floating-ips | 0 | | secgroup-rules | 100 | | secgroups | 40 | | backup-gigabytes | 1000 | | per-volume-gigabytes | -1 | +-----------------------+-------+
@Robertsky @Chlod, default quotas were used because there was no quota detail in the request. If you need some of these values changed, you need to create new requests
Aug 20 2025
For some reason we don't seem to be discussing the possibility of making one toolforge object store and having a toolforge-storage to group and manage objects belonging to each tool. This seems more consistent with what a platform as a service is, less_flexibility+auto_management. If a tool needs access to it's own s3 bucket complete with keys and everything, aren't they better creating an openstack project, etc?
I believe status messages should be as uniform as possible. If we need to covey extra information it's better to put that in some form of status detail thing.