Steps to replicate the issue:
- Deploy the best-of app (https://gitlab.wikimedia.org/repos/future-audiences/best-of)
- Hit https://best-of.toolforge.org/api/category/random repeatedly (>10 times)
What happens?:
Application returns a 500
The node application does not seem to notice this happened so it appears to be between the app and the user.
What should have happened instead?:
200 is always expected.
best-of is a fairly vanilla vite/node service. i can run it locally and hammer this endpoint with no errors.
I wrote a small script to repeatedly hammer the application and instrumented the application to print when it thinks it got the request and when it returned from the request. My script thinks it got a 500 at https://best-of.toolforge.org/api/category/random?foo=7 but my tool is convinced that it served that "Response sent: GET /api/category/random ?foo=7 - 200 (80ms)"
I've confirmed the app isn't randomly crashing/restarting. The latency in serving is never very high (<300ms). The grafana dashboard for the service never shows any 500s (https://grafana.wmcloud.org/d/TJuKfnt4z/tool-dashboard?orgId=1&var-cluster=P8433460076D33992&var-namespace=tool-best-of&from=now-6h&to=now&timezone=utc).
