Page MenuHomePhabricator

Labtestwiki returns 503 error
Closed, ResolvedPublic

Event Timeline

Bugreporter triaged this task as Unbreak Now! priority.Jul 8 2019, 12:55 PM
Urbanecm lowered the priority of this task from Unbreak Now! to Needs Triage.EditedJul 8 2019, 1:34 PM
Urbanecm added a project: SRE.
Urbanecm subscribed.

Probably not UBN!. I've tested this locally on a random application server according to https://wikitech.wikimedia.org/wiki/Debugging_in_production:

[urbanecm@mw1261 ~]$ curl -H 'Host: labtestwikitech.wikimedia.org' "http://$(hostname -i)/wiki/Main_Page" 2>/dev/null
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://foundation.wikimedia.org/wiki/Main_Page">here</a>.</p>
</body></html>
[urbanecm@mw1261 ~]$

This just redirects to foundation.wikimedia.org. A redirect not working is probably not worth to be considered an UBN issue. Resetting to Needs Triage because of that.

Note I got PHP Warning: Unable to start TLS: Can't contact LDAP server while runing a script across all wikis with foreachwiki, see logstash or T209565#5312987 for details. If this wiki should be unaccessible, maybe getting rid of this warning is a reason to delete the wiki?

Okay, seems I've tested from an incorrect host. But anyway, labweb1001 gives similar result.

[urbanecm@labweb1001 ~]$ curl -H 'Host: labtestwikitech.wikimedia.org' "http://$(hostname -i)/wiki/Main_Page" 2>/dev/null
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://labtestwikitech.wikimedia.org/wiki/Main_Page">here</a>.</p>
</body></html>
[urbanecm@labweb1001 ~]$
akosiaris triaged this task as Medium priority.Jul 9 2019, 3:21 PM
akosiaris added subscribers: Artur2343, bd808, Bstorm and 3 others.

The host that powers that site was labtestweb2001.wikimedia.org but was replaced by cloudweb2001-dev.wikimedia.org which hasn't been put into service yet. Relevant tasks are T220426 and T218024. Tagging cloud-services-team and subscribing them to the task. I 'll remove operations and wikimedia-production-error, I don't think those apply.

bd808 edited subscribers, added: aborrero; removed: Artur2343.

https://labtestwikitech.wikimedia.org/ is for internal testing rather than community testing, so there really should be no actual impact to the Wikimedia community here. We should get the environment back up in the near future however just for our own piece of mind.

jcrespo subscribed.

In addition to the above, there is now a few production errors when trying to run cron jobs:

Error connecting to 10.192.32.5 as user wikiadmin: :real_connect(): (HY000/1044): Access denied for user 'wikiadmin'@'%' to database 'labtestwiki'

While that access could be added, I don't think a development/staging host should have production passwords. Probably a separate password/grant should be given, removed from production configuration and run the cron job locally.

Sorry, this shouldn't have alerted -- the downtime expired. This will be talking to a test database server (clouddb2001-dev).

While that access could be added, I don't think a development/staging host should have production passwords. Probably a separate password/grant should be given, removed from production configuration and run the cron job locally.

This cluster is the equivalent of testwiki for Wikitech. Password separation would be fine, but the environment is also 100% "production" in the IP space/vlan it is located in, the servers it runs on, and the access rights needed to interact with the deployment and its configuration.

100% "production"

That is ok, then the bug is that this host lacks monitoring and being inserted into the zarcillo db production list. Different bug, but a bug otherwise :-D.

Reedy claimed this task.
Reedy subscribed.

Wiki was fixed at some point