As a Wikibase Cloud EM I want to know what it the daily availability (percentage) of defined services of Wikibase Cloud, so that I can learn how good enough level of service is provided to Wikibase Cloud users.
A/C
- Item page can be accessed should be checked regularly
- Item data can be accessed using API (wbgetentities) should be checked regularly
- After this is deployed production/staging.
- Document the how the wikis are setup.
- Document the how to view the graphs
On Wikibase.dev kubernetes uptime checks are running every 60 seconds. On Wikibase.cloud, this would be an adequate frequency as well.
Note:
"Good enough level of service" is an internal non-formal target set by WMDE Engineering team, checking against which is not in the scope of this task.
Notes from storytime:
- Uptime checks are not currently implemented on WB Cloud, just on wikibase.dev
- Could potentially hook up to GC metrics platform (storage and presentation of data). This needs investigation as part of this story.
- Which wiki are we checking? E.g. is there a dedicated test wiki that we create, or randomly select a user's wiki? T: on wikibase.dev the pattern is we are pointing to a dedicated wiki set up by dev team (coffeebase). If eventually we add additional things to monitor (e.g. editing) it makes sense that we are using a dedicated wiki. LM: all for simple solution
- Where is the data presented? Dashboard, email, report, etc. Wikibase.dev is not hooked up to anything to store/present the data LM: most important thing is we have the data. Presentation/alerts/etc could be handled later if sufficiently complex.
- Apply as well to Wikibase.dev? T: keeping both as similar as possible would be beneficial LM: its not a requirement but could be decided as the right way forward in the implementation. T: uptime checks being done via Terraform so could potentially be benefit in aligning there