Page MenuHomePhabricator

Investigate janitor, maintenance emails parser
Open, LowPublic0 Estimated Story Points

Description

We have this long time project idea of writing a tool that would parse providers' maintenance emails and automatically update a database.

I came across https://github.com/wasabi222/janitor which is still in an early stage but looks quite promising.

Some requirements it would need to be worth deploying in production:

  • Google group integration (or create a dummy email account just for that)
  • .ical feed
  • reduce the amount of parsing failures for currently supported providers

Nice to have:

  • Support all our providers (so far they support NTT, GTT, Zayo, and Telia seems to be work in progress)
  • IRC notifications

Longer term:

  • Netbox integration for circuits lookup

Event Timeline

ayounsi created this task.

I tried this today. It was unable to parse Zayo or Telia new scheduled maintenance emails, but successfully parsed NTT and GTT new scheduled maintenance emails. At this point, the project looks like it would need quite a bit of fixing to fit our use case.

Might be better to switch focus to https://github.com/networktocode/circuit-maintenance-parser
cf. https://ripe82.ripe.net/archives/video/516/
But from the Video it might not be compatible with Netbox.

Not long ago I wrote an internship project proposal for this task, putting it here so it doesn't get forgotten in a Google Doc.


Problem statement:
Part of clinic duty is to maintain the maintenance calendar which is quite some toil:

  • Manually go through emails (quick look shows between 3 and 6 a day)
  • Understand what they mean and if they're relevant (which is not a given for all SREs, eg. match a notification to an actual circuit or DC)
  • Update the calendar (convert start/stop times, manage cancellations, reschedules, emergencies)
  • Check that there are no critical maintenance overlap

And here, any typos or forgetting can be problematic (eg. power maintenance in eqsin got missed and we had an outage).

Project description:
There is a python library actively developed that tackles that exact problem:
https://github.com/networktocode/circuit-maintenance-parser

Going that way would have many benefits:

  • automate the process (alleviate all SREs from that clinic duty task)
  • reducing the risk of forgetting or entry mistakes (less risks of outages)
  • automated overlap notification (less risk of outages)
  • more visibility on infra changes (eg. IRC notifications)

This library could be used in the form of a standalone tool or as a Netbox plugin (similar to what Network To Code does with Nautobot). Exact scope and path to figure out either before the internship starts or with the intern. With possible stretch goals to use the providers REST APIs when available (eg. Telia, Lumen)

Many benefits as well to have this as an internship project:

  • Can be done without giving root to the intern (easier onboarding) as it only relies on open source tools and emails.
  • Not urgent
  • Concrete results achievable in the timeframe, with possible stretch goals
  • Useful for WMF SREs
  • Help the interne understand our infrastructure, both physically (circuits, DCs, etc) and software wise (Netbox, emails)
  • Standalone or as plugin, it would be open source and useful to the SRE or Netbox community at large
  • Similar to the above, possibility to contribute to the upstream project (eg. add support for providers)
  • Even if the intern doesn't stay with us, the project would be easily maintainable