Page MenuHomePhabricator

Possible use of tools-lab-elasticsearch cluster
Closed, ResolvedPublic


I'm working on the WikiFactMine project. We do some of our fact extraction using a tool called canary which basically loads papers into an elasticsearch index and we then issue a range of queries against the index to get out 'facts'; we then also store these facts in another index. Currently we probably put our ES server under quite a lot of load but we could temper this so we don't slow down whatever else is indexed on the cluster if there isn't already some kind of throttling built in. Currently we use three indices: one for paper bodies, one for facts and one for paper metadata.

We have been running this on a server elsewhere but its currently having some hiccups; it would also be nice to have the open access material running on labs or tool-labs to help with the longevity of the project. I thought about requesting a labs project to run both the tool and elasticsearch on but wanted to check it couldn't be done on tool-labs first. It looks like perhaps it could be possible to run just on tool-labs although canary may also have to be altered to run on the grid and if this is too difficult we may still need to request a labs project.

Related Objects

Event Timeline

Tarrow created this task.Nov 1 2016, 4:46 PM
Restricted Application added a project: User-bd808. · View Herald TranscriptNov 1 2016, 4:46 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Restricted Application added a project: Cloud-Services. · View Herald TranscriptNov 1 2016, 4:48 PM
Addshore added a subscriber: Addshore.
Addshore moved this task from Unsorted 💣 to Watching 👀 on the User-Addshore board.
bd808 triaged this task as High priority.Nov 4 2016, 4:23 PM
bd808 moved this task from To Do to In Dev/Progress on the User-bd808 board.

Tool maintainer documentation for requesting access:

Tool admin documentation for granting access:

Handy script for generating password data:

1#!/usr/bin/env bash
2# Ugly script to generate password data
3# for the Tool Labs elasticsearch cluster
6PASS=$(openssl rand -base64 32)
7TFILE="$(basename $0).$$.tmp"
8echo user=$USER
9echo password=\"$PASS\"
11echo ---------------------------------
12echo $PASS |
13htpasswd -cmi $TFILE $USER &>/dev/null
14cat $TFILE
15rm $TFILE

@Tarrow you can just add your tool's name to this ticket to become our first user other than me. :)

Great! Can you add a user for 'tools.wikifactmine-pipeline'? Thanks.

bd808 closed this task as Resolved.Nov 15 2016, 5:22 PM

Great! Can you add a user for 'tools.wikifactmine-pipeline'? Thanks.

Credentials are in /data/project/wikifactmine-pipeline/.elasticsearch.ini

bd808 moved this task from In Dev/Progress to Done on the User-bd808 board.Jan 7 2017, 12:19 AM
bd808 moved this task from Done to Archive on the User-bd808 board.