Page MenuHomePhabricator

Labtestwiki returns 503 error
Open, NormalPublic

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 8 2019, 12:55 PM
Bugreporter triaged this task as Unbreak Now! priority.Jul 8 2019, 12:55 PM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptJul 8 2019, 12:55 PM
Urbanecm lowered the priority of this task from Unbreak Now! to Needs Triage.EditedJul 8 2019, 1:34 PM
Urbanecm added a project: Operations.
Urbanecm added a subscriber: Urbanecm.

Probably not UBN!. I've tested this locally on a random application server according to https://wikitech.wikimedia.org/wiki/Debugging_in_production:

[urbanecm@mw1261 ~]$ curl -H 'Host: labtestwikitech.wikimedia.org' "http://$(hostname -i)/wiki/Main_Page" 2>/dev/null
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://foundation.wikimedia.org/wiki/Main_Page">here</a>.</p>
</body></html>
[urbanecm@mw1261 ~]$

This just redirects to foundation.wikimedia.org. A redirect not working is probably not worth to be considered an UBN issue. Resetting to Needs Triage because of that.

Note I got PHP Warning: Unable to start TLS: Can't contact LDAP server while runing a script across all wikis with foreachwiki, see logstash or T209565#5312987 for details. If this wiki should be unaccessible, maybe getting rid of this warning is a reason to delete the wiki?

Okay, seems I've tested from an incorrect host. But anyway, labweb1001 gives similar result.

[urbanecm@labweb1001 ~]$ curl -H 'Host: labtestwikitech.wikimedia.org' "http://$(hostname -i)/wiki/Main_Page" 2>/dev/null
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://labtestwikitech.wikimedia.org/wiki/Main_Page">here</a>.</p>
</body></html>
[urbanecm@labweb1001 ~]$
akosiaris triaged this task as Normal priority.Jul 9 2019, 3:21 PM
akosiaris added subscribers: Artur2343, bd808, Bstorm and 3 others.

The host that powers that site was labtestweb2001.wikimedia.org but was replaced by cloudweb2001-dev.wikimedia.org which hasn't been put into service yet. Relevant tasks are T220426 and T218024. Tagging cloud-services-team and subscribing them to the task. I 'll remove operations and wikimedia-production-error, I don't think those apply.

bd808 edited subscribers, added: aborrero; removed: Artur2343.

https://labtestwikitech.wikimedia.org/ is for internal testing rather than community testing, so there really should be no actual impact to the Wikimedia community here. We should get the environment back up in the near future however just for our own piece of mind.

bd808 moved this task from Clinic Duty to Inbox on the cloud-services-team (Kanban) board.
jcrespo added a subscriber: jcrespo.

In addition to the above, there is now a few production errors when trying to run cron jobs:

Error connecting to 10.192.32.5 as user wikiadmin: :real_connect(): (HY000/1044): Access denied for user 'wikiadmin'@'%' to database 'labtestwiki'

While that access could be added, I don't think a development/staging host should have production passwords. Probably a separate password/grant should be given, removed from production configuration and run the cron job locally.

Sorry, this shouldn't have alerted -- the downtime expired. This will be talking to a test database server (clouddb2001-dev).

bd808 added a comment.Mon, Sep 16, 4:14 PM

While that access could be added, I don't think a development/staging host should have production passwords. Probably a separate password/grant should be given, removed from production configuration and run the cron job locally.

This cluster is the equivalent of testwiki for Wikitech. Password separation would be fine, but the environment is also 100% "production" in the IP space/vlan it is located in, the servers it runs on, and the access rights needed to interact with the deployment and its configuration.

100% "production"

That is ok, then the bug is that this host lacks monitoring and being inserted into the zarcillo db production list. Different bug, but a bug otherwise :-D.