Hi Sukhbir, as part of auditing recent incident follow ups, I extracted this action item from the incident document where you were coordinator (link in description).
- Feed Queries
- All Stories
- Search
- Feed Search
- Transactions
- Transaction Logs
Today
Yesterday
@Scott_French should we expect this issue to reoccur this time around? cc @Blake
Given the subtask are complete and we don't plan investing more time on API Gateway (replaced by REST Gateway), I'll close this
The dependent task T405292 is only at ordering stage, so it seems difficult to complete this within Q3. @jijiki shall we move this back to Backlog and pick it up in Q4?
Reassigning to Hugh who accepted to take a look
To help make the case to prioritize this work, would you be able to estimate the toil of your current release process: total SRE time and savings we could hope to achieve with this automation?
Mon, Feb 9
Moving this to Incident Tooling for now. The corresponding subtask for serviceops (POC with MW) is already tracked on our workboard
Changing the priority to Low as this was not actioned for quite some time.
Hi Filippo, since k8s-aux in codfw is now setup, which next steps do you recommend on this (feel free to unassign yourself after)?
Hi Clement, would you be able to triage this task? is it still relevant given the plans for API Gateway?
Lowering priority as this was not actioned for a very long time.
@Clement_Goubert this is in Scheduled work but I don't think we'll have capacity to land it this quarter, so moving it to Backlog
@ssingh do you think you could help with some of the subtasks here?
@RLazarus can we close this? If not can you update the description to what is exactly needed here?
Moving to backlog as we won't have capacity for this quarter
Removing our tag as we don't seem to have anything actionable, please re-add if needed.
Removing our tag as the parent task is already tracked and we're planning which part can be done this quarter.
@Scott_French do you expect to have this done by end of quarter, or do you need others' help on it?
@Jclark-ctr Can I confirm what is the next step on this?
@jasmine_ can we make sure there's guardrail to not forget this? Either better runbook or automation. Please feel free to use the team forums if you're unsure of the approach
Fri, Feb 6
Changing priority to Low.
@elukey assigning this to you as you're noted as reviewer on https://wikitech.wikimedia.org/wiki/Helm/Upstream_Charts/kserve
Unfortunately Serviceops doesn't have the capacity to investigate further at this point in time. @cscott will the rootcausing continue in the task that you created, and if so can we dedupe?
@BBlack can this be closed? I don't see action for Service ops so removing our tag, feel free to add back if needed.
@Clement_Goubert could you PTAL and confirm when this could start and who can be the POC?
Thu, Feb 5
After presentation of Denisse last week, should we try to use their tool (and make feature requests if we miss something important)?
If this happens from time to time, would it worth referencing those example bugs (or the MariaDB switchover) in the runbook @Blake ?
Wed, Feb 4
During last week summit we decided to take that opportunity to route automated updates to a new channel #wikimedia-serviceops-feed.
SRE Observability were you able to look into this?
Hugh stepped in to do that today. Thanks!