Page MenuHomePhabricator

Wikicontribs still returning HTTP 502 Gateway error
Closed, ResolvedPublic

Description

trying to access https://wikicontrib.toolforge.org returns 502 gateway error. This is not a problem with my network since @srishakatux also confirmed it. Neither restarting the server nor complete redeployment seems to help. I saw a comment under task T262550: Toolforge returns HTTP 502 error saying that https://gerrit.wikimedia.org/r/c/operations/puppet/+/626399 should prevent this in the future, but that doesn't seem to be the case.

The error I am getting is exactly like this

502 Bad Gateway
openresty/1.15.8.1

Event Timeline

Nintendofan885 renamed this task from ToolForge still returning HTTP 502 Gateway error to Wikicontribs still returning HTTP 502 Gateway error.Sep 20 2020, 3:11 PM
Nintendofan885 added a project: WikiContrib.

@Aklapper This issue still exists, and I think Raymond might need help from Cloud Services folk to investigate further. I'm wondering why this has been closed as invalid?

Ah, sorry in that case! If this is not an issue in WikiContrib itself then anyone feel free to reopen and set appropriate project tags (e.g. Toolforge) so someone can find and see this ticket - thanks a lot!

$ sudo become wikicontrib
$ tail error.log
2020-09-21 14:13:41: (mod_rewrite.c.282) mod_rewrite invalid result (not beginning with '/') while processing uri: /shop/.env
2020-09-21 14:58:00: (mod_rewrite.c.282) mod_rewrite invalid result (not beginning with '/') while processing uri: /
2020-09-21 14:58:00: (mod_rewrite.c.282) mod_rewrite invalid result (not beginning with '/') while processing uri: /
2020-09-21 14:58:00: (mod_rewrite.c.282) mod_rewrite invalid result (not beginning with '/') while processing uri: /
2020-09-21 15:04:42: (mod_rewrite.c.282) mod_rewrite invalid result (not beginning with '/') while processing uri: /robots.txt
2020-09-21 15:04:42: (mod_rewrite.c.282) mod_rewrite invalid result (not beginning with '/') while processing uri: /robots.txt
2020-09-21 15:04:42: (mod_rewrite.c.282) mod_rewrite invalid result (not beginning with '/') while processing uri: /robots.txt
2020-09-21 15:05:01: (mod_rewrite.c.282) mod_rewrite invalid result (not beginning with '/') while processing uri: /
2020-09-21 15:05:01: (mod_rewrite.c.282) mod_rewrite invalid result (not beginning with '/') while processing uri: /
2020-09-21 15:05:01: (mod_rewrite.c.282) mod_rewrite invalid result (not beginning with '/') while processing uri: /
$HOME/.lighttpd.conf
url.rewrite = (
    ".*\.(js|ico|gif|jpg|png|swf|css|woff|woff2|ttf)$" => "$0",
    "^" => "index.html",
)

thanks @bd808 , it works now. I added "/" to the rewrite rule and it started working as expected. does anyone care to point out why this happened in the first place? because it just happened on it's own. maybe an update or something. I am asking because I'd love to prevent this kind of incident from happening in the future

thanks @bd808 , it works now. I added "/" to the rewrite rule and it started working as expected. does anyone care to point out why this happened in the first place? because it just happened on it's own. maybe an update or something. I am asking because I'd love to prevent this kind of incident from happening in the future

I looked in the $HOME/error.log file to see when these errors first started:

2020-09-17 13:55:42: (server.c.1828) server stopped by UID = 54138 PID = 11594
2020-09-19 20:16:16: (server.c.1464) server started (lighttpd/1.4.53)
2020-09-19 20:16:20: (mod_rewrite.c.282) mod_rewrite invalid result (not beginning with '/') while processing uri: /

This shows that the tools.wikicontrib user (uid=54138) stopped the webservice at 2020-09-17 13:55:42. Then a bit over two days later the webservice was started again. The lighttpd configuration failures started immediately. What we can not tell from these logs is a) what if any changes were made to $HOME/.lighttpd.conf between 2020-09-17 13:55:42 and 2020-09-19 20:16:16, and b) what if any changes were made to the webservice ... start command used on 2020-09-19.

I have a hunch that prior to 2020-09-19 the webservice was running with --backend=gridengine and after it was running with --backend=kubernetes. If this hunch is correct, the material change would be moving from lighttpd v1.4.45 on Debian Stretch (grid engine) to lighttpd v1.4.53 on Debian Buster. https://github.com/lighttpd/lighttpd1.4/compare/lighttpd-1.4.45...lighttpd-1.4.53 shows 626 commits in the diff between those upstream versions. One of those commits was 1de1746. That commit introduced the error message output in your logs, and was first included in the lighttpd 1.4.50 release tag.

Thanks for taking your time to point this out @bd808 . That was very insightful. I think what we need to do at this point is figure out who and who has access to the tool and the user that stopped the server initially if we are to prevent this from happening again