Page MenuHomePhabricator

Investigate why Wikifactmine ElasticSearch has stopped
Open, MediumPublic

Description

Looks like elasticsearch has stopped.

It was being configured by puppet using the same roles as the labs cluster. Role still seems to be in place but nothing is running on 9200. Investigation is needed.

Event Timeline

Tarrow triaged this task as Medium priority.Jul 3 2018, 9:45 AM
Tarrow created this task.

@Tarrow: Is Wikifactmine one of the Tools on Toolforge, or which project should be assigned to this task?

@Aklapper It's a Project Grants funded project which has two tools on Toolforge and well as a project on CloudVPS. I thought it had a component project for it but I guess it never did.

I'm not actually actively working on it now. I guess whoever is doing the work on it from https://meta.wikimedia.org/wiki/Grants:Project/ScienceSource may want one.

[offtopic] @Tarrow: If you or someone would like to have a workboard and project tag, please follow https://www.mediawiki.org/wiki/Phabricator/Creating_and_renaming_projects - thanks!

Separate, but related, https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/447564/ by @EBernhardson proposes to remove support for ElasticSearch 2.x in the puppet roles. We think that wikifactmine is the only cluster left running 2.x today.

@bd808 @EBernhardson I think the puppet role is good to be removed now. I've unapplied it from horizon and removed the references on heira.

I can't ssh into any of these VMs at all at the moment. They're in a fairly precarious situation if not running puppet.

These three elastic search nodes have local firewalls (ferm + iptables) that seem not to be managed by puppet. That means that they're probably blocking whatever ports you would normally need to communicate. I edited ferm by hand to allow bastion access so that you should be able to ssh in and see what's going on.