Page MenuHomePhabricator

Excimer UI profile lost when requested from mw-on-k8s
Closed, ResolvedPublic

Event Timeline

Change 963024 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/deployment-charts@master] services: fix xenon/arclamp redis egress rules

https://gerrit.wikimedia.org/r/963024

Change 963024 merged by Filippo Giunchedi:

[operations/deployment-charts@master] services: fix xenon/arclamp redis egress rules

https://gerrit.wikimedia.org/r/963024

This comment was removed by fgiunchedi.

Change 963274 had a related patch set uploaded (by Filippo Giunchedi; author: Filippo Giunchedi):

[operations/deployment-charts@master] mw: allow egress to excimer

https://gerrit.wikimedia.org/r/963274

Change 963274 merged by jenkins-bot:

[operations/deployment-charts@master] mw: allow egress to excimer

https://gerrit.wikimedia.org/r/963274

After enabling egress to webperf I was able to get an excimer profile posted, e.g. https://performance.wikimedia.org/excimer/profile/f4865b080f541cf8

@Krinkle with @Clement_Goubert we were wondering if excimer egress traffic should be enabled for mw-debug only or any mw k8s deployment?

@fgiunchedi For baremetal, it is intentional that this is not limited to mwdebug. I can ssh to a random appserver to investigate an issue, and make a local curl request that sets X-Wikimedia-Debug. The WikimediaDebug features available in that context are logging (to a file, or Logstash) and profiling with Excimer. (XHGui is not available given it depends on php-tideways which is unsuitable for production hosts given its high overhead even in disabled state, unlike excimer, which is specifically designed for production sampling).

For Kubernetes, I imagine it's both harder and less common to need to investigate a specific pod. However, I suppose from an egress configuration point of view, it's the same whether we're talking a naturally spawned mw-web pod (not mw-debug) and one that's spawned for the purpose of creating a shell and investigating something. Granted, in most cases we'll be spawning mw-debug pods for that purpose, but it seems harmless to allow the possibility and the unneeded differences between the two, the better I think? Especially since the failure mode would be hard to detect, it'd likely look like a mysql or webperf host issue rather than an egress issue.

That's my 2c anyway. No strong feelings either way.

fgiunchedi claimed this task.

@fgiunchedi For baremetal, it is intentional that this is not limited to mwdebug. I can ssh to a random appserver to investigate an issue, and make a local curl request that sets X-Wikimedia-Debug. The WikimediaDebug features available in that context are logging (to a file, or Logstash) and profiling with Excimer. (XHGui is not available given it depends on php-tideways which is unsuitable for production hosts given its high overhead even in disabled state, unlike excimer, which is specifically designed for production sampling).

For Kubernetes, I imagine it's both harder and less common to need to investigate a specific pod. However, I suppose from an egress configuration point of view, it's the same whether we're talking a naturally spawned mw-web pod (not mw-debug) and one that's spawned for the purpose of creating a shell and investigating something. Granted, in most cases we'll be spawning mw-debug pods for that purpose, but it seems harmless to allow the possibility and the unneeded differences between the two, the better I think? Especially since the failure mode would be hard to detect, it'd likely look like a mysql or webperf host issue rather than an egress issue.

Thank you for the added context, I agree not special casing mw-debug isn't worth it unless we absolutely need to. In this case we can keep the non-debug / debug symmetry in place as-is now. I'm resolving the task since excimer now works in k8s