To ease migration to ES6 we should migrate to 5.6.14 first.
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | EBernhardson | T183281 [epic] ELK upgrade to 6.x (elasticsearch, kibana, logstash) | |||
Resolved | None | T183282 [epic] Search cluster upgrade to 6.x | |||
Resolved | Gehel | T215916 ElasticSearch 6 migration plan checklist (search cluster) | |||
Resolved | debt | T215931 Upgrade elasticsearch to 5.6.14 | |||
Resolved | dcausse | T215932 Prepare a debian package with search plugins compatible with elastic 5.6.14 | |||
Resolved | Gehel | T216047 Create new elastic56 component in reprepro and upload elasticsearch and plugins | |||
Resolved | Gehel | T216052 upgrade logstash and the logstash elasticsearch cluster to 5.6.14 | |||
Resolved | • Mathew.onipe | T216993 Upgrade logstash plugins to 5.6.14 |
Event Timeline
Change 491482 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch: relforge now uses elastic56 apt component
Change 491482 merged by Gehel:
[operations/puppet@production] elasticsearch: relforge now uses elastic56 apt component
Change 491485 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch/relforge: fix typo in hiera param for elasticsearch version
Change 491485 merged by Gehel:
[operations/puppet@production] elasticsearch/relforge: fix typo in hiera param for elasticsearch version
Mentioned in SAL (#wikimedia-operations) [2019-02-19T14:29:57Z] <gehel> rolling upgrade of elasticsearch on relforge - T215931
Change 491746 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14
Change 491746 merged by Gehel:
[operations/puppet@production] elasticsearch: upgrade elasticsearch / cirrus / codfw to 5.6.14
Mentioned in SAL (#wikimedia-operations) [2019-02-20T13:59:26Z] <gehel> rolling upgrade of elasticsearch / cirrus / codfw to 5.6.14 - T215931
Mentioned in SAL (#wikimedia-operations) [2019-02-21T13:18:30Z] <gehel> restarting rolling upgrade on elasticsearch / cirrus / codfw - T215931
Change 492044 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] [cirrus] Switch production search traffic to codfw (1/2)
Change 492045 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[operations/mediawiki-config@master] [cirrus] Switch production search traffic to codfw (1/2)
Change 492044 merged by jenkins-bot:
[operations/mediawiki-config@master] [cirrus] Switch production search traffic to codfw (1/2)
Mentioned in SAL (#wikimedia-operations) [2019-02-22T00:17:14Z] <ebernhardson@deploy1001> sync-file aborted: T215931 (duration: 00m 00s)
Mentioned in SAL (#wikimedia-operations) [2019-02-22T00:18:11Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T215931 [cirrus] Switch production search traffic to codfw (1/2) (duration: 00m 46s)
Mentioned in SAL (#wikimedia-operations) [2019-02-22T00:21:11Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T215931 [cirrus] Switch production search traffic to codfw (1/2) (duration: 00m 45s)
Change 492045 merged by jenkins-bot:
[operations/mediawiki-config@master] [cirrus] Switch production search traffic to codfw (2/2)
Mentioned in SAL (#wikimedia-operations) [2019-02-22T00:45:23Z] <ebernhardson@deploy1001> sync-file aborted: T215931 [cirrus] Switch production search traffic to codfw (2/2) (duration: 00m 05s)
Mentioned in SAL (#wikimedia-operations) [2019-02-22T00:46:23Z] <ebernhardson@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T215931 [cirrus] Switch production search traffic to codfw (2/2) (duration: 00m 46s)
Prior to switchover a ran a few queries against all indices to warm codfw up, I don't think one occurance of this working is enough to call it a win but should try again when we switch eqiad back. Elasticsearch percentiles showed no noticable latency spike when traffic moved from eqiad to codfw.
Grafana dashboard for time in question: https://grafana.wikimedia.org/d/000000455/elasticsearch-percentiles?panelId=22&fullscreen&orgId=1&from=1550795053678&to=1550796853679&var-cluster=eqiad&var-smoothing=1&var-exported_cluster=search
At this point omega and psi were already serving traffic from codfw. The small latency spikes prior to switchover are those queries slowing down as I ran the warmup queries. Query was run multiple times with combinations of the following words that hopefully appear in many languages: a or the wiki wmf mediawiki wikipedia la to and
Query issued:
{ "query": { "multi_match": { "query": "wikipedia", "operator": "or", "fields": ["all", "all.plain", "title", "title.plain", "category", "category.plain", "heading.plain", "heading", "auxiliary_text.plain", "auxiliary_text", "file_text", "file_text.plain", "redirect.title.plain", "redirect.title", "text", "text.plain", "opening_text.plain", "opening_text", "all_near_match", "template", "template.plain"] } }, "size": 9000 }
Change 492266 had a related patch set uploaded (by Gehel; owner: Gehel):
[operations/puppet@production] elasticsearch: upgrade elasticsearch / cirrus to 5.6.14
Change 492266 merged by Gehel:
[operations/puppet@production] elasticsearch: upgrade elasticsearch / cirrus to 5.6.14
Mentioned in SAL (#wikimedia-operations) [2019-02-22T09:16:34Z] <gehel> starting rolling upgrade on elasticsearch / cirrus / eqiad - T215931
Mentioned in SAL (#wikimedia-operations) [2019-02-22T18:02:43Z] <gehel> rolling upgrade on elasticsearch / cirrus / eqiad completed - T215931