Have Shellbox emit metrics
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Legoktm
	Jan 5 2021, 12:51 AM

Description

In addition to logs (T263545), Shellbox should also emit metrics to help measure and assess the health of the application. We should able to use the PHP statsd client library and have it go to a sidecar that prometheus scrapes.

Proposed metrics:

counter of requests per endpoint (e.g. Score, imagemagick, etc.)
- not sure if we need a separate counter for errors, or that can be inferred from logging?
timing how long it takes for Shellbox to process each request, split per endpoint

In theory these metrics could also be collected/emitted by the MediaWiki client too.

Related Objects
Search...

Status	Assigned	Task
Resolved	Joe	T252745 Sandbox/limit child processes within a container runtime
Resolved	tstarling	T260330 RFC: PHP microservice for containerized shell execution
Open	None	T271179 Have Shellbox emit metrics

Event Timeline

Legoktm created this task.Jan 5 2021, 12:51 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 5 2021, 12:51 AM

I think we will be running every single endpoint as a separate installation (to reduce the attack surface in the single container).

Apart from that, what you described can be covered by a single prometheus histogram, where we should add the following labels:

Status code of the response (if correctly codified in the code, this will also give us authorization errors and badly formed requests)
"endpoint" (that is, what program is being requested)
*maybe* some salient data about the request, like if it contained a pipe?

That's basically all we need to get info about the performance of Shellbox.

But, if we're getting fancy, it would be great to also record the resource usage that php can report in terms of memory used by execution of an external program, although I guess that would require more coding on the shellbox side, and we don't really need that on day 1.

jijiki moved this task from Incoming 🐫 to 🙈🙉🙊Backlog on the serviceops board.Sep 28 2022, 2:24 PM

jijiki moved this task from 🙈🙉🙊Backlog to 🛎 Services & Oids on the serviceops board.Nov 17 2022, 5:13 PM

tstarling moved this task from Backlog to Actually in Shellbox on the Shellbox board.Feb 20 2024, 4:33 AM

Have Shellbox emit metricsOpen, Needs TriagePublicActions

Description

Related ObjectsSearch...

Event Timeline

Have Shellbox emit metrics
Open, Needs TriagePublic
Actions

Related Objects
Search...