Page MenuHomePhabricator

Puppet changes required for elasticsearch 5.x upgrade
Closed, ResolvedPublic

Description

ES 5.x has a couple changes that will require coordination with the puppet code for rollout:

  • ES_HEAP_SIZE no longer supported. Heap should be specified via ES_JAVA_OPTS
  • logging.yml has been removed, we must now use log4j2.properties

Details

Related Gerrit Patches:
operations/puppet : productionelasticsearch: correct iterator in ES5 jvm.options template
operations/puppet : productionUpdate elasticsearch module for es5 compatability
mediawiki/extensions/CirrusSearch : wmf/1.29.0-wmf.11Add method to provide custom index settings to IndexCreator
operations/mediawiki-config : masterConfigure cirrus per-index settings
mediawiki/extensions/CirrusSearch : masterAdd method to provide custom index settings to IndexCreator

Event Timeline

Restricted Application edited projects, added Discovery-Search; removed Discovery-Search (Current work). · View Herald TranscriptJan 17 2017, 11:33 PM

I poked around how this works some. I don't think there will be any problems updating. But someone else should verify my findings.

The logging.yml -> log4j2.properties change can be handled by having both during the transition. Can't see any possible harm here.

ES_HEAP_SIZE can be replaced with ES_JAVA_OPTS today. The script in /usr/share/elasticsearch/bin/elasticsearch.in.sh will set JAVA_OPTS with some defaults (which would utterly fail in prod). Thankfully ES_JAVA_OPTS comes after JAVA_OPTS on the command line in both, and jvm will always take the last -Xmx and -Xms values provided.

debt moved this task from needs triage to Up Next on the Discovery-Search board.Jan 19 2017, 11:05 PM

Change 333969 had a related patch set uploaded (by EBernhardson):
Update elasticsearch module for es5 compatability

https://gerrit.wikimedia.org/r/333969

Additional notes from the elasticsearch migration plugin:

  • Node attributes move to attr namespace
    • node.rack should be rewritten as node.attr.rack
    • node.row should be rewritten as node.attr.row
  • File Descriptors
    • At least 65536 file descriptors must be available to Elasticsearch (we have 65535 ... but it keeps complaining)
  • Mlockall
    • bootstrap.mlockall is set to true but mlockall has failed
    • bootstrap.mlockall has been renamed to bootstrap.memory_lock
  • Threadpool settings
    • threadpool.bulk.queue_size has been renamed to thread_pool.bulk.queue_size
    • threadpool.bulk.size has been renamed to thread_pool.bulk.size
    • threadpool.bulk.type has been renamed to thread_pool.bulk.type
  • Removed settings
    • path.plugins (This might be a pain, but perhaps we can symlink our plugin directory into /usr/share/elasticsearch/plugins ?)

Migration plugin also reports another error which I'm working on reproducing:

  • Index settings
    • Built-in similarities cannot be overridden
      • index.similarity.default.type

I created some indices locally with $wgCirrusSearchSimilarityProfile = 'wmf_defaults' so they would have the 'default' similarity configured. This looks to be a non-blocker, es5 happily loaded up the indices. We will probably still want to update the es5 branch to no longer set this and just use 'default' without configuring it.

Confirmed the settings name changes will be problematic, elasticsearch 5 will scream bloody murder about them existing in the config file and refuse to start. Elasticsearch 5 in general is much more strict about configuration, whereas 2.x would happily ignore settings it doesn't understand.

@Gehel curious what you think the best way forward here will be. I imagine for the deployment we can take down an entire cluster, install es5, and bring the entire cluster back up. During that downtime we can deploy a puppet patch that will update the config file to be es5 compatible wrt these settings. The difficulty i think is that we will want to target that patch to only the cluster we are upgrading, we don't want to get in a position where if we need to start a node in eqiad we can't because it has the 5.x configuration file.

I'm thinking perhaps we need to have some hiera variable, and within the codfw cluster this would resolve to '5' and in eqiad it resolves to '2', and the puppet will use a template appropriate to the elasticsearch version?

Another thing i've just noticed while testing (we probably would have noticed this on relforge anyways), but the thread_pool.bulk.type setting actually no longer exists (per the documentation). We have it set to fixed which is the default anyways so it should be safe to remove. The type was already fixed so this should be safe to remove across configurations for 2.x and 5.x

I've updated the patch with the above findings, it now requires a hiera variable to specify which version of elasticsearch the server should be running.

Change 336933 had a related patch set uploaded (by EBernhardson):
Add method to provide custom index settings to IndexCreator

https://gerrit.wikimedia.org/r/336933

elasticsearch 5 additionally does not allow us to set global defaults for index settings anymore, they need to be set per-index. Those settings can either be provided in an index template (we would have to use a global template), or provided by elasticsearch when creating indices. The above patches setup so CirrusSearch will set these properties on index creation. I'm not completely opposed to setting up a new index template for these use cases, although keep in mind that template is only used when creating the index, changing the template will only effect indices created after the change. Updating any of these values on existing indices will require iterating through the list of indices and issuing update requests.

Opinions, thoughts? I have originally gone with the method of providing settings to CirrusSearch because i'm not sure how to properly automate keeping an index template up to date for the cluster from puppet, and it seems like more work than necessary.

Change 336936 had a related patch set uploaded (by EBernhardson):
Configure cirrus per-index settings

https://gerrit.wikimedia.org/r/336936

EBernhardson added a comment.EditedFeb 9 2017, 11:14 PM

With the slowlog and merge_threads settings moving out of the configuration file they have to all be explicitly set in the indices. I have created the above patches so CirrusSearch will provide them at creation time, and running the below to update all existing indices with the settings:

#!/bin/bash
set -e

HOST=$1
if [ -z "$HOST" ]; then
  echo "Usage: $0 [host]"
  exit 1
fi

curl -s "https://$HOST:9243/_cat/indices" \
        | awk '{print $3}' \
        | xargs -I{} curl -XPUT "https://$HOST:9243/{}/_settings?master_timeout=5m" -d ' {
            "index": {
                "indexing.slowlog.threshold.index.info": "5s",
                "indexing.slowlog.threshold.index.debug": "2s",
                "indexing.slowlog.threshold.index.trace": -1,
                "search.slowlog.threshold.fetch.warn": "1s",
                "search.slowlog.threshold.fetch.info": "800ms",
                "search.slowlog.threshold.fetch.debug": "500ms",
                "search.slowlog.threshold.fetch.trace": "200ms",
                "merge.scheduler.max_thread_count": 1
            }
        }'

AFAICT the settings update has finished on all the indices in eqiad and codfw clusters with no errors.

Change 336933 merged by jenkins-bot:
Add method to provide custom index settings to IndexCreator

https://gerrit.wikimedia.org/r/336933

Change 337617 had a related patch set uploaded (by EBernhardson):
Add method to provide custom index settings to IndexCreator

https://gerrit.wikimedia.org/r/337617

Change 336936 merged by jenkins-bot:
Configure cirrus per-index settings

https://gerrit.wikimedia.org/r/336936

Change 337617 merged by jenkins-bot:
Add method to provide custom index settings to IndexCreator

https://gerrit.wikimedia.org/r/337617

Last time we upgraded (1.7->2.x) we had some annoying issues with the .deb package versions. We were only able to have one version in reprepro so we used some hackishness to tell all elasticsearch instances to keep whatever version of elasticsearch was installed and manually installed the 2.x deb on machines when we were ready to upgrade them.

I poked paravoid about if we had any better solutions this time around, and he suggested using the experimental repo. I havn't looked at how to get packages into it yet, but the relevant puppet patch for the apt module is https://gerrit.wikimedia.org/r/336420. It seems we could add the 5.x package to experimental, and make sure we set apt::use_experimental in the apropriate hiera files when setting elasticsearch::version to 5. Seem plausible?

Other relevant convo: http://tty.gr/s/mediawiki-security-apt-repo

Change 333969 merged by Gehel:
Update elasticsearch module for es5 compatability

https://gerrit.wikimedia.org/r/333969

faidon added a subscriber: faidon.Feb 21 2017, 1:41 PM

I poked paravoid about if we had any better solutions this time around, and he suggested using the experimental repo. I havn't looked at how to get packages into it yet, but the relevant puppet patch for the apt module is https://gerrit.wikimedia.org/r/336420. It seems we could add the 5.x package to experimental, and make sure we set apt::use_experimental in the apropriate hiera files when setting elasticsearch::version to 5. Seem plausible?

Yeah, experimental would work for now. See T158583 for the broader conversation about this -- the situation with ElasticSearch was one of the reasons we are thinking this through.

Change 339434 had a related patch set uploaded (by Gehel):
elasticsearch: correct iterator in ES5 jvm.options template

https://gerrit.wikimedia.org/r/339434

Change 339434 merged by Gehel:
elasticsearch: correct iterator in ES5 jvm.options template

https://gerrit.wikimedia.org/r/339434

Deskana closed this task as Resolved.Mar 31 2017, 12:43 PM
Deskana claimed this task.