Page MenuHomePhabricator

[docs,envvars-api,jobs-api,builds-api] create docs on how to operate the cluster and core components
Closed, ResolvedPublic

Description

There seems to be missing docs on how to operate the cluster, for example how to restart a given core component in case of suspected misbehavior.

Example of why this is important is the ticket T380832: [jobs-api] crashing in which an operator responded to an incident did not know how to restart the jobs-api.

We could:

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
envvars-service: add alertsrepos/cloud/toolforge/alerts!22dcaroadd_envvars_alertmain
builds-api: add up alertrepos/cloud/toolforge/alerts!21dcaroadd_builds_api_alertmain
Customize query in GitLab

Related Objects

StatusSubtypeAssignedTask
Resolveddcaro
Resolveddcaro
ResolvedNone
Resolveddcaro
Resolveddcaro
ResolvedRaymond_Ndibe
ResolvedRaymond_Ndibe
ResolvedRaymond_Ndibe
ResolvedRaymond_Ndibe
ResolvedRaymond_Ndibe
ResolvedRaymond_Ndibe
Resolveddcaro
Resolveddcaro
Resolveddcaro
Resolveddcaro

Event Timeline

dcaro renamed this task from toolforge: create docs on how to operate the cluster and core components to [docs,envvars-api,jobs-api,builds-api] create docs on how to operate the cluster and core components.Nov 27 2024, 2:21 PM
dcaro triaged this task as High priority.
dcaro edited projects, added Toolforge (Toolforge iteration 16); removed Toolforge.
fnegri subscribed.

^ Removed the (Resolved) parent task to clean up the Task Graph

dcaro updated the task description. (Show Details)

Updated the docs and the dashboards in grafana, I'll close this for now

dcaro moved this task from Next Up to Done on the Toolforge (Toolforge iteration 22) board.