Per the [[ https://www.mediawiki.org/wiki/Wikimedia_services_policy | Wikimedia Services Policy ]], in order to launch the function orchestrator and evaluator services, these services need to provide operational metrics and logging, "according to the current WMF standards specified in the implementation guidelines."
Because we're using [[ https://www.mediawiki.org/wiki/ServiceTemplateNode | ServiceTemplateNode ]], we get some basic metrics and logging by default. We'll also get some additional metrics 'for free' by virtue of running behind the Envoy reverse-proxy middleware that SRE has set up.
**TODOs:**
- [ - [x] Determine what metrics are required to satisfy the “WMF standards” for metrics and logging mentioned in the Service Policy document.
- [ ] List all the metrics provided by ServiceTemplateNode and Envoy - After chatting with SREs, it seems like the full list of requirements are still to come. In general if we use Service Template Nodes it should take care of 90% of the requirements.
- [ ] Determine which additional metrics we want to report which are not- [x] List all the metrics provided by default, and what else we want to logServiceTemplateNode and Envoy.
- [ - See [metrics doc](https://docs.google.com/document/d/1aaqUXBbj-9XZrivCPussGNJ5GypUyD88EvexBzD_OG8/edit#)
- [x] Determine which additional metrics we want to report which are not provided by default, and what else we want to log.
- See [metrics doc](https://docs.google.com/document/d/1aaqUXBbj-9XZrivCPussGNJ5GypUyD88EvexBzD_OG8/edit#)
- [x] Determine whether ServiceTemplateNode provides APIs for custom, application-specific logging and monitoring.
- Yes.
- [ ] Write the code for collecting and reporting additional metrics and logging additional events.
- [ ] Write a page on Wikitech explaining how it all works.
This task has a dependency on T307722 (Define SLIs and SLOs for function-* services) but some of the work can happen in parallel.