Page MenuHomePhabricator

fawiki user reports getting 503 errors with message "upstream connect error or disconnect before headers"
Closed, DeclinedPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):
Cannot be consistently reproduced. I am hoping that those with Grafana accessing can identify the cases and help investigate more.

What happens?:
User:Mojtabakd reported on fawiki that while trying to save a page, he received the error message "upstream connect error or disconnect before headers".

This error message is found on line 67 of ElasticaErrorHandlerTest.php in Extension:CirrusSearch source code but note that this is a unit test script, and not part of the core code of the extension.

What should have happened instead?:
If the error truly is being raised by a unit test, then it should never be shown to users in Production.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Reedy subscribed.

I'm pretty sure it's not from that test code :)

I would certainly hope so. But searching for the error message only returns one result, and that is the unit test code I mentioned above.

Likely. But the point about an error message shown which appears to only exist in unit test code is also worth investigating.

CDanis added subscribers: RLazarus, CDanis.

This error message comes from Envoy, which we use for internal cross-service TLS termination.

SLyngshede-WMF subscribed.

Merged with an unrelated bug, and the relevant tags was dropped. I've re-added the correct tags.

Gehel subscribed.

Removing the Search Platform team from this, it seems entirely unrelated to CirrusSearch.

There is T301505 which points out this is a symptom of a number of different problems that can appear lower in our tech stack. It is not a cause in itself and doesn't help in pinpointing what is happening. It only tells us something is probably happening, we might want to merge this one into it (unless we can pinpoint this more). @Huji, is there any extra data that could help us pinpoint this more? e.g. timeframes, used browser, specific pages that errored out?

@Huji, is there any extra data that could help us pinpoint this more? e.g. timeframes, used browser, specific pages that errored out?

Unfortunately no reply, thus closing for the time being.