Mentioned in SAL (#wikimedia-cloud) [2021-11-08T10:34:35Z] <arturo> create service account srv-networktests following https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Service_accounts for T294955
Mentioned in SAL (#wikimedia-cloud) [2021-11-08T10:54:38Z] <arturo> [codfw1dev] create service account srv-networktests following https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Service_accounts for T294955
Got to a nice stopping point.
- tests have been deployed to both codfw1dev and eqiad1
- a spicerack cookbook has been created to help with automated usage
- a periodic job has been setup to help us monitor the health of the network
- however, the periodic job depends on icinga monitoring systemd services, which by the time of this writing is disabled for eqiad1
- anyway, not sure yet if we want to be paged by errors reported by the testsuite (not sure yet how stable it will be)
- in any case, I'm leaving the systemd timer job enabled to at least we can see the logs and know if the network wasn't stable at some point
- some docs have been created https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Network/Tests
- decide if we want to be paged by this
- extend with more checks!