Page MenuHomePhabricator

503 Service Unavailable
Closed, ResolvedPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  • Visit any Wikimedia project 2 minutes ago, any page

What happens?:
503 Service Unavailable
No server is available to handle this request.

What should have happened instead?:
200

Event Timeline

TheresNoTime added a subscriber: TheresNoTime.

Visit any Wikimedia project 2 minutes ago, any page

unable to reproduce currently — time machine broken ( /j )

Visit any Wikimedia project 2 minutes ago, any page

unable to reproduce currently — time machine broken ( /j )

I tried to report the issue while it was happening, but Phabricator was also affected.

On https://grafana.wikimedia.org/ I notice a spike in requests. Maybe ddos?

fgiunchedi claimed this task.
fgiunchedi added a subscriber: fgiunchedi.

There was indeed a brief moment of unavailability (retroactively-posted incident at https://www.wikimediastatus.net/incidents/5k90l09x2p6k)

I'm optimistically resolving this task! Please reopen as needed of course

There was indeed a brief moment of unavailability (retroactively-posted incident at https://www.wikimediastatus.net/incidents/5k90l09x2p6k)

I'm optimistically resolving this task! Please reopen as needed of course

That just says "From 14:55 to 15:01 UTC users have been experiencing slow/unavailable access to Wikipedia and other sites".

"Users have been experiencing" is an odd choice of words here. It wasn't in our heads, it wasn't our crappy wifi. If you have no idea why this happened or how to prevent it from happening again (to the degree that's possible, if it's a ddos there's only so much you can do I guess), I'm a bit worried.

There was indeed a brief moment of unavailability (retroactively-posted incident at https://www.wikimediastatus.net/incidents/5k90l09x2p6k)

I'm optimistically resolving this task! Please reopen as needed of course

That just says "From 14:55 to 15:01 UTC users have been experiencing slow/unavailable access to Wikipedia and other sites".

"Users have been experiencing" is an odd choice of words here. It wasn't in our heads, it wasn't our crappy wifi. If you have no idea why this happened or how to prevent it from happening again (to the degree that's possible, if it's a ddos there's only so much you can do I guess), I'm a bit worried.

My apologies for the lack of details at this time.

There's an incident doc open internally and there will be one posted publicly with more details and insights, I'll post a link here once it is available. My apologies also for implying it wasn't a real incident, it was not my intention.

That just says "From 14:55 to 15:01 UTC users have been experiencing slow/unavailable access to Wikipedia and other sites".

"Users have been experiencing" is an odd choice of words here. It wasn't in our heads, it wasn't our crappy wifi. If you have no idea why this happened or how to prevent it from happening again (to the degree that's possible, if it's a ddos there's only so much you can do I guess), I'm a bit worried.

Hi @AlexisJazz !

That of course isn't a complete incident document, and isn't intended to be.

And the point of us posting a notice on wikimediastatus.net is indeed to confirm with users that it was not just in their heads or their crappy wifi.

The wording you found odd is our boilerplate text for events like this one, however, as sometimes they are caused by the whole-Internet equivalent of crappy wifi: issues beyond our control on backbone networks (see, for instance, https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/)

This one was not caused by such an event -- and we do know very well what the cause was -- but however I'm afraid we can't say more publicly at this time. I hope you understand.

There's an incident doc open internally and there will be one posted publicly with more details and insights, I'll post a link here once it is available. My apologies also for implying it wasn't a real incident, it was not my intention.

Thank you!

That just says "From 14:55 to 15:01 UTC users have been experiencing slow/unavailable access to Wikipedia and other sites".

"Users have been experiencing" is an odd choice of words here. It wasn't in our heads, it wasn't our crappy wifi. If you have no idea why this happened or how to prevent it from happening again (to the degree that's possible, if it's a ddos there's only so much you can do I guess), I'm a bit worried.

Hi @AlexisJazz !

That of course isn't a complete incident document, and isn't intended to be.

And the point of us posting a notice on wikimediastatus.net is indeed to confirm with users that it was not just in their heads or their crappy wifi.

The wording you found odd is our boilerplate text for events like this one, however, as sometimes they are caused by the whole-Internet equivalent of crappy wifi: issues beyond our control on backbone networks (see, for instance, https://blog.cloudflare.com/analysis-of-todays-centurylink-level-3-outage/)

I found it odd because "users have been experiencing" is often corporate speak for "we know there's a problem but we plan on ignoring it". Given what you said I suppose here it's actually meant literally.

This one was not caused by such an event -- and we do know very well what the cause was -- but however I'm afraid we can't say more publicly at this time. I hope you understand.

Yeah I know. Either it's script kiddies and you don't want a witch hunt, it's a serious organization that blackmails you and you can't speak pending the investigation, a spider from Microsoft/Google/etc went nuts and you can't start pointing fingers until stuff has been sorted out behind the scenes or the issue was somehow caused by a bug/exploit and you can't speak until it's patched.

Good to hear the cause is known, that's always better than random unexplained outage.

Please see https://wikitech.wikimedia.org/wiki/Incidents/2022-06-14_overload_varnish_/_haproxy for the public incident report (we know what's going on, the report is light on details on purpose though)