(part of https://wikitech.wikimedia.org/wiki/Incident_documentation/20190425-prometheus)
It's possible to nearly OOM the eqiad ops prometheuses just by loading long-enough history of certain grafana dashboards.
This is despite the fact that we're using the default settings for query.timeout, query.max-concurrency, and query.max-samples, which should be more than sufficient given a 94G server (some explanation at https://www.robustperception.io/limiting-promql-resource-usage).
Probably the first thing to try is cutting query.max-samples to something like a third of its current value?