Page MenuHomePhabricator

Possible use of tools-lab-elasticsearch cluster
Closed, ResolvedPublic

Description

I'm working on the WikiFactMine project. We do some of our fact extraction using a tool called canary which basically loads papers into an elasticsearch index and we then issue a range of queries against the index to get out 'facts'; we then also store these facts in another index. Currently we probably put our ES server under quite a lot of load but we could temper this so we don't slow down whatever else is indexed on the cluster if there isn't already some kind of throttling built in. Currently we use three indices: one for paper bodies, one for facts and one for paper metadata.

We have been running this on a server elsewhere but its currently having some hiccups; it would also be nice to have the open access material running on labs or tool-labs to help with the longevity of the project. I thought about requesting a labs project to run both the tool and elasticsearch on but wanted to check it couldn't be done on tool-labs first. It looks like perhaps it could be possible to run just on tool-labs although canary may also have to be altered to run on the grid and if this is too difficult we may still need to request a labs project.

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
bd808 triaged this task as High priority.Nov 4 2016, 4:23 PM
bd808 moved this task from To Do to In Dev/Progress on the User-bd808 board.

Tool maintainer documentation for requesting access:

Tool admin documentation for granting access:

Handy script for generating password data:

1#!/bin/bash
2# Ugly script to generate password data
3# for the Toolforge elasticsearch cluster
4set -o nounset
5set -o errexit
6
7USER=${1:?Missing USER}
8
9PASS=$(openssl rand -base64 32)
10SHA512=$(printf $PASS | mkpasswd --stdin --method=sha-512)
11
12echo "${1} envvars"
13echo "toolforge envvars create TOOL_ELASTICSEARCH_USER ${USER}"
14echo "toolforge envvars create TOOL_ELASTICSEARCH_PASSWORD ${PASS}"
15echo
16
17echo "${1} puppet master private (hieradata/labs/tools/common.yaml)"
18echo "----"
19echo "profile::toolforge::elasticsearch::haproxy::elastic_users:"
20echo " - name: '${1}'"
21echo " password: '${SHA512}'"

@Tarrow you can just add your tool's name to this ticket to become our first user other than me. :)

Great! Can you add a user for 'tools.wikifactmine-pipeline'? Thanks.

Great! Can you add a user for 'tools.wikifactmine-pipeline'? Thanks.

Credentials are in /data/project/wikifactmine-pipeline/.elasticsearch.ini

bd808 moved this task from Done to Archive on the User-bd808 board.