Monitoring and alerting for Toolforge tools
Closed, DuplicatePublic
Actions

Assigned To

None

Authored By

	Sascha
	Mar 22 2021, 10:04 AM

Description

Does Toolforge have shared monitoring and alerting infrastructure? If not, could/should this be added? If yes, could it be better documented?

For example, my dinky little tool computes some data file and offers it for download. I’d like to get alerted when the data goes stale, which could happen when the pipeline for computing the data file has a problem. My webserver is exporting Prometheus metrics on https://qrank.toolforge.org/metrics and I’d like to get alerted when time() - qrank_last_modified_time_seconds gets greater than four weeks (subtracting timestamps from current time as per Prometheus recommendations). Being new to Toolforge, I couldn’t find any docs on where to add such monitoring rules. How do other tools currently get monitored? Does every tool run its own Prometheus server? (That would seems a little wasteful).

As an external volunteer with limited time, I (sadly) can’t permanently keep an eye on my service. That’s why it would be quite important for me to automatically receive alerts when things go wrong. Surely other tool authors will be in a similar situation.

Related Objects

Mentioned In: T53434: Establish an internal system or a recommended external system for monitoring user-created Toolforge web services
T306790: Set up monitoring for community cronjobs

Event Timeline

Sascha created this task.Mar 22 2021, 10:04 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 22 2021, 10:04 AM

Have a look at community-labs-monitoring

Thanks for the pointer! Indeed, I was hoping the Wikimedia Cloud had something like Cortex or Thanos running on behalf of custom tools. Hm, considering for how long these discussions seem to already have been taking place, it doesn’t really look like this will be coming anytime soon. So, closing this ticket here as stalled; things won’t go any faster with more tickets around.

taavi moved this task from Backlog to Feature requests on the Toolforge board.May 16 2021, 4:43 PM

taavi mentioned this in T306790: Set up monitoring for community cronjobs.Apr 25 2022, 12:24 PM

I'd just like to add my support to this idea. Many of the tools that run in toolforge are critical parts of the technical infrastructure that keeps the project going. They deserve all the normal logging, alerting and monitoring support that any serious production system has.

I'd love to see something like https://en.wikipedia.org/wiki/Graphite_(software) set up that any tool could easily feed performance data to and tool maintainers could build their own dashboards. There's really no reason for each tool developer to reinvent the wheel on this kind of stuff.

bd808 closed this task as a duplicate of T53434: Establish an internal system or a recommended external system for monitoring user-created Toolforge web services.Apr 25 2022, 3:40 PM

RoySmith mentioned this in T53434: Establish an internal system or a recommended external system for monitoring user-created Toolforge web services.Apr 25 2022, 3:56 PM

Monitoring and alerting for Toolforge toolsClosed, DuplicatePublicActions

Description

Related Objects

Event Timeline

Monitoring and alerting for Toolforge tools
Closed, DuplicatePublic
Actions