Page MenuHomePhabricator

[o11y,logging,infra] Deploy Loki to store Toolforge tool log data
Closed, ResolvedPublic

Description

Log Toolforge tool output to a Loki instance.

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
logging: alloy: Deploy cluster-wide on toolsrepos/cloud/toolforge/toolforge-deploy!871taavimain-I7a4ff905e57696ed6534226596428a9294a1e3f6main
logging: alloy: Deploy to double the workersrepos/cloud/toolforge/toolforge-deploy!869taavimain-I3094dc7db753f36154b35cc9c9b3b693a23bc4dcmain
logging: alloy: Deploy to even more nodesrepos/cloud/toolforge/toolforge-deploy!864taavimain-Ifb17bf62ef6f5cf8a6e186c692ce6a456b163b3bmain
logging: alloy: Deploy to more workersrepos/cloud/toolforge/toolforge-deploy!862taavimain-I395644a8e25e1cdef253a09d25320464fa696d6emain
Deploy logging stack by defaultrepos/cloud/toolforge/lima-kilo!253taavimain-I72ab6ba788ca68e9b86bf3bbdb338b15c913eaeemain
logging: alloy: Deploy to two more tools workersrepos/cloud/toolforge/toolforge-deploy!858taavimain-I0b5a86931095bf9a951e14c9b38540df97e7f0dfmain
logging: Deploy to toolsrepos/cloud/toolforge/toolforge-deploy!856taavimain-I2a75f16c20f737eaae579a36b0a927a037469c0fmain
logging: alloy: Allow running on the entire clusterrepos/cloud/toolforge/toolforge-deploy!836taavimain-I7c5569f698935ca9794d6455fb76ea376d1f9b83main-I156a2539de7b40a8aeeed3933fe37d01c93d17a5
logging: alloy: Add routing for infrastructure logsrepos/cloud/toolforge/toolforge-deploy!835taavimain-I156a2539de7b40a8aeeed3933fe37d01c93d17a5main-Ie19e6900e99bd48acbbcf8b86491318b614e8abe
logging: loki: Add second Loki instance for infrastructure logsrepos/cloud/toolforge/toolforge-deploy!834taavimain-Ie19e6900e99bd48acbbcf8b86491318b614e8abemain
logging: loki: Add network policy rule for object storage accessrepos/cloud/toolforge/toolforge-deploy!827taavimain-I729571e8337cbbf6b433eb5d6180b7dc6d3e4bb7main
logging: alloy: Fix loki write service namerepos/cloud/toolforge/toolforge-deploy!826taavimain-I7a15f3e0883a50d8c4cb808fe6f055f09b446c1bmain
logging: loki: Set nameOverriderepos/cloud/toolforge/toolforge-deploy!825taavimain-I59df6ccd608477f2810572f634ad259d9d9c0320main
logging: loki: Add missing emptyDir mounts in toolsbetarepos/cloud/toolforge/toolforge-deploy!824taavimain-I3639ecc2f5ab9b2b4de99d3e2ce13878ea98f62emain
logging: Fix path to get_secret.shrepos/cloud/toolforge/toolforge-deploy!823taavimain-Ia5b1c14eb31da968d2010289033c6b5b53b07311main
logging: Add values to deploy to toolsbetarepos/cloud/toolforge/toolforge-deploy!821taavimain-I054208e27e5db1902a411d3b8df9a317c7aa5488main
logging: Add basic rate limiting and retention configrepos/cloud/toolforge/toolforge-deploy!807taavimain-Ieb3604c5872169a2e8cd173b52177f46a0ef13femain
logging: Init componentrepos/cloud/toolforge/toolforge-deploy!796taavimain-If8f503514316703ce91f966fb6ad40b04ef8fdd0main
Adding loki to installrepos/cloud/toolforge/toolforge-deploy!669rookT386480main
Adding lokirepos/cloud/toolforge/lima-kilo!226rookT386480main
Show related patches Customize query in GitLab

Related Objects

StatusSubtypeAssignedTask
Resolved Bstorm
Resolved GTirloni
Resolved Bstorm
Resolved Bstorm
DeclinedNone
Opentaavi
ResolvedFeaturetaavi
OpenNone
OpenNone
Resolvedtaavi
Resolvedtaavi
OpenFeatureRaymond_Ndibe
Resolvedtaavi
OpenNone
OpenFeatureNone
Resolvedtaavi
Resolvedtaavi
ResolvedAndrew
Resolveddcaro
ResolvedNone
ResolvedAndrew

Event Timeline

I believe
https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/tree/T386480?ref_type=heads
and
https://gitlab.wikimedia.org/repos/cloud/toolforge/lima-kilo/-/tree/T386480?ref_type=heads

Will get loki installed in lima-kilo. @aborrero @dcaro @Andrew @Raymond_Ndibe opinions?

I fussed a little with the git clone in roles/k8s/tasks/toolforge-deploy_components.yaml to pull T386480 as it isn't merged yet, so I don't really have a full test of the patch. Is there a way to do a true test of a patch like this that I'm not realizing?

Thanks!

I fussed a little with the git clone in roles/k8s/tasks/toolforge-deploy_components.yaml to pull T386480 as it isn't merged yet, so I don't really have a full test of the patch. Is there a way to do a true test of a patch like this that I'm not realizing?

Depends on your definition of true test :)

For a full lima-kilo rebuild with the new code, you can pass the value lima_kilo_toolforge_deploy_repo.ref (defined under lima-klio roles/k8s/defaults/main.yaml), either editing the file or passing --extra-vars lima_kilo_toolforge_deploy_repo.ref=new/value when running start-devenv.sh.

If you just want to try the toolforge-deploy part of it you can apply the patch on the clone inside lima-kilo (under ~/toolforge-deploy) and run the deploy script ~/toolforge-deploy/deploy.sh by hand.

If you want to test also the lima-kilo changes without rebuilding, you can re-run ansible from within lima-kilo with toolforge_ansible_run.sh -t k8s (or fulling without params)

Will get loki installed in lima-kilo. @aborrero @dcaro @Andrew @Raymond_Ndibe opinions?

The patches look ok to me (have not tested them yet, testing something else...), they have no extra config for loki yet right?

The patches look ok to me (have not tested them yet, testing something else...), they have no extra config for loki yet right?

Correct this is installing a stock loki. Do we think we should merge these and continue from here or take a different approach?

Do we think we should merge these and continue from here or take a different approach?

It should be ok, we could not install loki by default too and avoid people that don't need it yet from getting it installed, depends on how soon you expect to start integrating it with the rest of toolforge.

dcaro renamed this task from loki into lima-kilo to [o11y,logging,infra] loki into lima-kilo.Feb 19 2025, 8:51 AM
dcaro triaged this task as High priority.
dcaro edited projects, added Toolforge (Toolforge iteration 17); removed Toolforge.
dcaro moved this task from Next Up to In Progress on the Toolforge (Toolforge iteration 17) board.

As I consider it more I guess it doesn't make much of a difference if it is merged now, as about the biggest "risk" of not merging is that I eventually push a toolforge-deploy_components.yaml that is set to pull T386480. So perhaps it is better to wait until we decide more exactly what we want before merging.

dcaro moved this task from In Progress to Next Up on the Toolforge (Toolforge iteration 18) board.
taavi renamed this task from [o11y,logging,infra] loki into lima-kilo to [o11y,logging,infra] Deploy Loki to store Toolforge log data.May 23 2025, 2:32 PM
taavi removed projects: User-aborrero, Epic.
taavi removed a subscriber: rook.
taavi subscribed.

Tentatively claiming.

Change #1154802 had a related patch set uploaded (by Majavah; author: Majavah):

[cloud/wmcs-cookbooks@main] toolforge: Add cookbook to mirror Loki-related images

https://gerrit.wikimedia.org/r/1154802

Change #1154802 merged by jenkins-bot:

[cloud/wmcs-cookbooks@main] toolforge: Add cookbook to mirror Loki-related images

https://gerrit.wikimedia.org/r/1154802

taavi lowered the priority of this task from High to Medium.Jun 13 2025, 9:31 AM

Change #1163729 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] P:toolforge::prometheus: Add scrape rules for Loki/Alloy

https://gerrit.wikimedia.org/r/1163729

taavi renamed this task from [o11y,logging,infra] Deploy Loki to store Toolforge log data to [o11y,logging,infra] Deploy Loki to store Toolforge tool log data.Jul 1 2025, 12:16 PM
taavi updated the task description. (Show Details)

Change #1163729 merged by Majavah:

[operations/puppet@production] P:toolforge::prometheus: Add scrape rules for Loki/Alloy

https://gerrit.wikimedia.org/r/1163729