Page MenuHomePhabricator

discourse.wmflabs.org is down
Closed, ResolvedPublic

Description

A volunteer alerted me that https://discourse.wmflabs.org/ is down.

There isn't contact information in the 502 page, so it's not clear who to alert. I'm a fan of bringing this to production (T184461) so we can use it more reliably, but in the meantime it would be useful to get the wmflabs.org version up again, add some informative error pages, and add some monitoring to catch this kind of outage!

Event Timeline

Slaporte created this task.Sep 27 2018, 5:02 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 27 2018, 5:02 PM
Aklapper closed this task as Invalid.Sep 27 2018, 11:24 PM
Aklapper removed a project: Discourse.

T184461 is unrelated to this - different URL. It looks like it's intentionally down as per T204501, hence closing as invalid.

Samwilson reopened this task as Open.Sep 27 2018, 11:36 PM
Samwilson added a project: Discourse.

The disourse-wam project that T204501 relates to is a different Discourse installation to discourse.wmflabs.org. The latter is part of the discourse project.

I'll have a look at restarting this one now.

Uh, I'm sorry! Thanks for the correction!

Out of space! :(

samwilson@discourse1002:/srv/discourse$ sudo ./launcher restart app
You have less than 5GB of free space on the disk where /var/lib/docker is located. You will need more space to continue
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3        19G   18G   80M 100% /

Would you like to attempt to recover space by cleaning docker images and containers in the system?(y/N)n

Answering y helped not at all. So I tried:

samwilson@discourse1002:/srv/discourse$ sudo ./launcher cleanup

And that got us to:

samwilson@discourse1002:/srv/discourse$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3        19G   15G  3.0G  84% /

But that still wasn't enough. I don't have time right now to dig deeper. :-(

What is using all the disk space? (Consider some dus?)

discourse1002 is m1.medium, so 20GiB more can be allocated. Have you considered adding disk space?

I'm not sure what's using the space, but it's probably worth increasing the volume size at some point.

The site is now up again. We'll want to fix the space problem still of course.

I have a memory of there being a regime of rotation of old backups; perhaps we can find somewhere else to store them, which would both free space on this machine and also preserve the backups in a better way.

Thanks for bringing it back to life!

Qgil awarded a token.Oct 1 2018, 10:19 AM
GTirloni closed this task as Resolved.Mar 21 2019, 8:31 PM
GTirloni triaged this task as Medium priority.