Page MenuHomePhabricator

Upgrade elasticsearch to 5.6.14
Closed, ResolvedPublic

Description

To ease migration to ES6 we should migrate to 5.6.14 first.

Event Timeline

dcausse triaged this task as High priority.Feb 12 2019, 5:20 PM
dcausse moved this task from needs triage to [epic] on the Discovery-Search board.

Change 491482 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch: relforge now uses elastic56 apt component

https://gerrit.wikimedia.org/r/491482

Change 491482 merged by Gehel:
[operations/puppet@production] elasticsearch: relforge now uses elastic56 apt component

https://gerrit.wikimedia.org/r/491482

Change 491485 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch/relforge: fix typo in hiera param for elasticsearch version

https://gerrit.wikimedia.org/r/491485

Change 491485 merged by Gehel:
[operations/puppet@production] elasticsearch/relforge: fix typo in hiera param for elasticsearch version

https://gerrit.wikimedia.org/r/491485

Mentioned in SAL (#wikimedia-operations) [2019-02-19T14:29:57Z] <gehel> rolling upgrade of elasticsearch on relforge - T215931

Change 491746 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14

https://gerrit.wikimedia.org/r/491746

Change 491746 merged by Gehel:
[operations/puppet@production] elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14

https://gerrit.wikimedia.org/r/491746

Mentioned in SAL (#wikimedia-operations) [2019-02-20T13:59:26Z] <gehel> rolling upgrade of elasticsearch / cirrus / codfw to 5.6.14 - T215931

Mentioned in SAL (#wikimedia-operations) [2019-02-21T13:18:30Z] <gehel> restarting rolling upgrade on elasticsearch / cirrus / codfw - T215931

Change 492044 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] [cirrus] Switch production search traffic to codfw (1/2)

https://gerrit.wikimedia.org/r/492044

Change 492045 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] [cirrus] Switch production search traffic to codfw (1/2)

https://gerrit.wikimedia.org/r/492045

Change 492044 merged by jenkins-bot:
[operations/mediawiki-config@master] [cirrus] Switch production search traffic to codfw (1/2)

https://gerrit.wikimedia.org/r/492044

Mentioned in SAL (#wikimedia-operations) [2019-02-22T00:17:14Z] <ebernhardson@deploy1001> sync-file aborted: T215931 (duration: 00m 00s)

Mentioned in SAL (#wikimedia-operations) [2019-02-22T00:18:11Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T215931 [cirrus] Switch production search traffic to codfw (1/2) (duration: 00m 46s)

Mentioned in SAL (#wikimedia-operations) [2019-02-22T00:21:11Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T215931 [cirrus] Switch production search traffic to codfw (1/2) (duration: 00m 45s)

Change 492045 merged by jenkins-bot:
[operations/mediawiki-config@master] [cirrus] Switch production search traffic to codfw (2/2)

https://gerrit.wikimedia.org/r/492045

Mentioned in SAL (#wikimedia-operations) [2019-02-22T00:45:23Z] <ebernhardson@deploy1001> sync-file aborted: T215931 [cirrus] Switch production search traffic to codfw (2/2) (duration: 00m 05s)

Mentioned in SAL (#wikimedia-operations) [2019-02-22T00:46:23Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T215931 [cirrus] Switch production search traffic to codfw (2/2) (duration: 00m 46s)

Prior to switchover a ran a few queries against all indices to warm codfw up, I don't think one occurance of this working is enough to call it a win but should try again when we switch eqiad back. Elasticsearch percentiles showed no noticable latency spike when traffic moved from eqiad to codfw.

Grafana dashboard for time in question: https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?panelId=22&fullscreen&orgId=1&from=1550795053678&to=1550796853679&var-cluster=eqiad&var-smoothing=1&var-exported_cluster=search

At this point omega and psi were already serving traffic from codfw. The small latency spikes prior to switchover are those queries slowing down as I ran the warmup queries. Query was run multiple times with combinations of the following words that hopefully appear in many languages: a or the wiki wmf mediawiki wikipedia la to and

Query issued:

{
    "query": {
        "multi_match": {
            "query": "wikipedia",
            "operator": "or",
            "fields": ["all", "all.plain", "title", "title.plain", "category", "category.plain", "heading.plain", "heading", "auxiliary_text.plain", "auxiliary_text", "file_text", "file_text.plain", "redirect.title.plain", "redirect.title", "text", "text.plain", "opening_text.plain", "opening_text", "all_near_match", "template", "template.plain"]
        }
    },
    "size": 9000
}

Change 492266 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch: upgrade elasticsearch / cirrus to 5.6.14

https://gerrit.wikimedia.org/r/492266

Change 492266 merged by Gehel:
[operations/puppet@production] elasticsearch: upgrade elasticsearch / cirrus to 5.6.14

https://gerrit.wikimedia.org/r/492266

Mentioned in SAL (#wikimedia-operations) [2019-02-22T09:16:34Z] <gehel> starting rolling upgrade on elasticsearch / cirrus / eqiad - T215931

Mentioned in SAL (#wikimedia-operations) [2019-02-22T18:02:43Z] <gehel> rolling upgrade on elasticsearch / cirrus / eqiad completed - T215931

debt claimed this task.
debt awarded a token.