The SRE Clinic Duty panel (https://phabricator.wikimedia.org/W2880) has 800+ tasks, some of which have not been updated in 3 years. Since this panel is part of the Clinic Duty dashboard it should only contain current tasks actionable by the engineer on Clinic Duty. This task will track the progress of updating the query to no longer contain non-actionable tasks.
Description
Related Objects
Event Timeline
Awesome, thanks!
I suggest that we:
- Filter out all tasks that haven't been updated in the last 1 month (Updated After). I think that includes the creation date and sub-tasks updates
- Filter out tasks with open sub-tasks (eg. tracking tasks), they would show up anyway when all their sub-tasks get closed
- Remove the "group by" and have them sorted by date only
We should maybe also try to agree-on/standardize to have Herald rules that tags a team when one of their project gets tagged.
For example in T320962 Herald added a project: Infrastructure-Foundations. is because {H389} kicked in. If this was implemented with all teams, we would not (or less) need to keep a list of all project tags.
in this case it would tag T319163 with SRE Observability, same for the all the ops-* projects and the global DCops one.
Excluded conftool, Observability-Alerting, Observability-Logging, Observability-Metrics, Observability-Tracing from the triage search.
Confirmed that for WMCS tasks, SRE should only be added if there is a specific action for SRE.
I updated the dashboard query to only show tasks updated in the last 30 days and created a separate search for historical tasks to be cleaned up later.
Now down to one task on this board
can you share the query without this constraint, ill try and clear down some of the backlog as part of my clinic duty
Awesome, thanks!
Here's the link to the query for the 30 days+ tasks: https://phabricator.wikimedia.org/maniphest/query/JcQxUPok1XTJ/#R
It can also be found in the "Triage Information" panel on the Clinic Duty dashboard.
I have some practical questions:
- How to "triage" things that are purely SRE issues or cannot be assigned to a specific team? E.g. "Rewrite all SRE scripts from Python to LISP (tracking)", "I (non-sre team) need SRE help/want to provide SRE awareness about X work"
- If an issue is assigned -more or less correctly- to a team, and they decide "we are not working on this/not my problem" by removing their tag, what should I do, close the ticket? Removing the SRE ticket as at least it was answered/subscribed by someone?
(both use made up task titles, but are actual cases I run into)
I think both can be summarized if we should tag with the desired/ideal state or the real state (Is SRE is for things we own or or for things we will eventually get to work?).
I answer myself with a potential solution: I am thinking to try to hide SRE tasks for triage in the "(Acknowledged)" & "(Radar)" columns, but I wonder if that is just delaying the inevitable- how to handle tasks that are correct but no one is going to work on them? (specially when several teams have different policies- some close them, some remove their tag, some just leave them untouched or on a separate column).
Both are very good questions that I don't have authoritative answers to but I have opinions.
- For SRE-wide tasks there should be a person coordinating the effort so my instinct would be to tag the task with that label of the person's team. The tasks that don't fit this assumption could warrant a review of whether an owner is needed or whether a task can be split among multiple teams.
- If a task is clearly within one team's area and they decide not to work on it then I would either resolve it as declined (the responsibility would lie with the owning team) or remove the SRE label.
I agree but I think the second should be communicated more widely. I think many people would prefer to have a task closed than open forever and never worked, but requires a common understanding of that (e.g. declining now doesn't mean it is a bad I idea, and that it could be reopened later on/pushing for it in the future).
Ideally tasks that a team will not work on (capacity etc) but are not a bad idea (thus such tasks should not get the status "declined" but should remain open && have the team project tag removed && have the code repository tag still left on those open tasks.
I'd consider it a good practice to differentiate between "this is on a team's plate" vs "this is something to potentially do one day in a code repo, whoever may volunteer or find time" (especially as organizations have a tendency to reorganize teams and responsibilities while a code repo may continue to exist). I know this does not scale for SRE teams because too many repos. Other teams in such cases created "Icebox" or "Freezer" workboard columns or separate $Teamname-Icebox project tags.
In the case of Traffic, Traffic-Icebox has become something of a graveyard. We're trying to through it but it's going to take a concerted effort as it's been treated like an unsustainable debt.
This has now been reduced to current tasks only, in large part thanks to @jbond. Resolving :)