Page MenuHomePhabricator

Replicas set to two on logstash indices regardless of index age
Closed, ResolvedPublic

Description

We have curator configs to reduce the number of replicas to one for older indices (older than 15 days), though that doesn't seem to work:

$ curl -s 'localhost:9200/_cat/indices?v'
health status index                      uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   logstash-2018.12.29        GBXsrnu4SrGV2--Mgafo6Q   1   2   56947044            0    147.5gb         49.1gb
green  open   logstash-2018.12.21        jtcTX4y6QnyYiY1Vz16P3g   1   2   53607035            0    142.6gb         47.4gb
green  open   logstash-2018.12.18        UxcN4EM7TF6Gg-gaEYmHTg   1   2   48340800            0    145.8gb         48.5gb
green  open   logstash-2018.12.24        ibzXTpwPQgmdES44kMIRYQ   1   2   53518762            0    135.5gb         45.1gb
green  open   .kibana                    5WZRRAvZRfGWNrZDmHPcTA   1   2       1371            4     12.8mb          4.2mb
green  open   logstash-2018.12.10        YTD9TfGUQAmBs9L9HNCa5g   1   2   51240630            0      149gb         49.6gb
green  open   logstash-syslog-2018.12.25 IwVpGu6aS322YD7DrMLn3A   1   2    1645604            0      2.4gb        848.2mb
green  open   logstash-2018.12.26        S4CaqHvTTEKdS5F2p4ix1g   1   2   51996102            0    131.7gb         43.8gb
green  open   test_dce2                  SbH_NnGNTuymg0pG5aUOnQ   5   1       2972            0     12.2mb          6.1mb
green  open   logstash-2019.01.06        rBBxlDUATEK-frx0cG_j5A   1   2   58126138            0    152.2gb         50.6gb
green  open   logstash-2018.12.12        WuKnArfNTu2pj7txVesb3w   1   2   55417345            0    151.7gb         50.5gb
green  open   logstash-2019.01.02        swY9p0eGS268SJLwBzL2Ow   1   2   63389826            0    173.8gb         57.8gb
green  open   logstash-syslog-2018.12.12 2jJd5CWxTtadl24DtWFj8g   1   2    1631882            0      2.4gb        838.4mb
green  open   logstash-2018.12.15        R0hwnPUcTnCWbSeRidrr6A   1   2   45688627            0    131.1gb         43.6gb
green  open   logstash-syslog-2018.12.10 JgHTxAYDSsGBsd6s-WBjWw   1   2    1658199            0      2.5gb        855.7mb
green  open   logstash-syslog-2019.01.07 yFJKq4mpQAKqGzz6IVPKXw   1   2    1006051            0      1.7gb        543.6mb
green  open   logstash-2018.12.08        SQmKCwT0RiCe6Tx37hdxOg   1   2   50308017            0    149.2gb         49.7gb
green  open   kibana-int-backup          XlaOghmlQ0aYGXYM0f_PXQ   1   0          0            0       209b           209b
green  open   logstash-syslog-2019.01.06 nYlPfdquT8quTj1GBB8gqQ   1   2    1690628            0      2.4gb        826.2mb
green  open   logstash-syslog-2019.01.05 sfWFe2BTRKWeJO1vY4975Q   1   2    1386156            0      1.9gb        660.3mb
green  open   logstash-2018.12.20        8bw5PlIKRYOEfUd263qVrQ   1   2   53989565            0      151gb         50.3gb
green  open   test_dce                   rIvcrcRmQvuc2N6KH0JYeg   5   1   10584843            0     27.1gb         13.5gb
green  open   logstash-2018.12.16        BMHa0iDiRMOINvH2zRcOGw   1   2   43789512            0    127.8gb         42.5gb
green  open   logstash-syslog-2018.12.30 OZfg0zW1Re6RAEzIruoykQ   1   2    1579530            0      2.3gb        787.9mb
green  open   .tasks                     t5umoFXvTvK32Sy4JOoRTw   1   1          7            0     20.3kb         10.1kb
green  open   test_dce3                  SCo3LM2hSyCqPiCZAxOSew   5   1          1            0     27.3kb         13.6kb
green  open   logstash-2019.01.03        WwdTtzH-QPuiv9OQMy0fOw   1   2   64055671            0    172.6gb         57.5gb
green  open   logstash-syslog-2019.01.03 WJfWsqp3SRuKBdUAxmzb9Q   1   2    1579089            0      2.3gb        794.8mb
green  open   logstash-2018.12.27        YbkNtSTCTNmzrPA0-HPxKA   1   2   60250646            0      156gb           52gb
green  open   logstash-2018.12.17        H8SupbWSQRGq8-bwjdJ6VQ   1   2   48457973            0    140.6gb         46.8gb
green  open   logstash-syslog-2018.12.27 wmwT_a3XTY-wQ4BBMlXAkA   1   2    1358517            0      1.8gb        621.5mb
green  open   logstash-syslog-2018.12.15 jT_MOJYaTrWBs5BP1LXFmQ   1   2    1652194            0      2.4gb        846.5mb
green  open   logstash-2018.12.19        7IcuISsdT8qDn9Cy1FhSCA   1   2   51007320            0    148.2gb         49.3gb
green  open   logstash-syslog-2019.01.02 RrupEAP4Qm-L58GELG0_7Q   1   2    1573466            0      2.3gb        793.7mb
green  open   logstash-syslog-2018.12.28 fhwMYTPcR2erNUW4Dnp45g   1   2    1433206            0      1.9gb        675.7mb
green  open   logstash-syslog-2018.12.17 aEj8YQaoQJeYwsJzimukvA   1   2    1724092            0      2.6gb        892.6mb
green  open   logstash-syslog-2018.12.20 f1PyefJxShyQwti1tASZ5A   1   2    2472243            0      3.5gb          1.1gb
green  open   logstash-syslog-2018.12.24 qrpZL37qQS6NN12nKUCVXg   1   2    1672144            0      2.5gb        855.9mb
green  open   logstash-syslog-2018.12.23 3OmAC5SVQFel2kFyaPOhgw   1   2    1518504            0      2.1gb        747.7mb
green  open   logstash-syslog-2018.12.11 JL09u5NkSqyIjvh5yFVwLg   1   2    1752284            0      2.5gb        886.4mb
green  open   logstash-syslog-2018.12.21 yO-XmhbxQyOjnDMLx3QnMQ   1   2    1562844            0      2.2gb          781mb
green  open   logstash-syslog-2018.12.13 355WuYMCR_-O-iA9UUW5oQ   1   2    1741833            0      2.5gb        880.8mb
green  open   logstash-syslog-2018.12.19 L65dtUd3SS2lG839cFm_MA   1   2    1759782            0      2.7gb        924.9mb
green  open   logstash-syslog-2018.12.08 4dVwHEZYQNq2pqwUh6N9mg   1   2    1551101            0      2.3gb        792.1mb
green  open   logstash-2019.01.05        2Fx_7OQRQ6KIhAVKUn38Mw   1   2   56544523            0    148.3gb         49.4gb
green  open   logstash-2018.12.22        kb0lJDpNRque5t5-QgViIg   1   2   55890581            0    142.2gb         47.3gb
green  open   logstash-syslog-2019.01.01 h4iCEhOmQh2BDk1zhhU7pA   1   2    1546546            0      2.2gb        763.8mb
green  open   logstash-syslog-2018.12.09 qnTq1u-GR7KxV-kbEljYow   1   2    1534798            0      2.2gb        768.2mb
green  open   logstash-syslog-2018.12.31 tvelftp1SJ6dU1_rs78RuQ   1   2    1530980            0      2.2gb          762mb
green  open   logstash-2018.12.14        0lNQ8YrmRsaEdxd9JqskXQ   1   2   46702224            0    136.5gb         45.4gb
green  open   logstash-syslog-2019.01.04 DxuBh0PeQByHd5nWrnuV_A   1   2    1543040            0      2.2gb        778.1mb
green  open   logstash-syslog-2018.12.22 vaKF6Qr2Su6OFb38i7LkNQ   1   2    1647449            0      2.4gb        848.7mb
green  open   logstash-syslog-2018.12.29 zgDFf4oPQ1Sfp_wY43pkJg   1   2    1431839            0      1.9gb        676.5mb
green  open   logstash-backup_dce2       y7tLA30ZSVujBu-7Bb3xFQ   1   1    1012975            0      2.2gb          1.1gb
green  open   logstash-2019.01.04        Qepp1vOcSlGtU9wWKKlv1w   1   2   59627277            0    163.1gb         54.2gb
green  open   logstash-2018.12.28        PH7STaYNQyO8qg8CUBMHhg   1   2   58035471            0    150.3gb           50gb
green  open   backup_dce2                Y5lrnJwiSvm0YNrPkBMLAA   5   1        998            0      5.8mb          2.9mb
green  open   logstash-2018.12.30        M5OFBQxXQ-iSoPErH6jz7w   1   2   60325056            0    153.9gb         51.2gb
green  open   logstash-syslog-2018.12.26 IHMpECwNSTmrelTzIZUBcw   1   2    1478988            0      2.1gb        731.9mb
green  open   logstash-2018.12.31        1KFD494FQ72pabgTR4hDEg   1   2   63755303            0    162.1gb           54gb
green  open   logstash-2018.12.25        mero1q1vSCekqUQmgnmHEQ   1   2   53363557            0    134.2gb         44.6gb
green  open   logstash-2019.01.07        4WgUoddNQCCq6E8qdQFv5g   1   2   34031329            0     90.2gb         30.3gb
green  open   logstash-2018.12.09        W5mh4FrFQJW_IKoDeXltSg   1   2   54013695            0    155.9gb         51.9gb
green  open   logstash-2019.01.01        QJZ3utoqT9GJlmGFpCpcig   1   2   62720069            0    166.4gb         55.4gb
green  open   logstash-2018.12.23        hDBtEFToSf6jrOlUOmkBvg   1   2   54233332            0    138.8gb         46.2gb
green  open   logstash-syslog-2018.12.14 KTcf4qPzSNaRByKOp4MhtQ   1   2    1862632            0      2.8gb        988.2mb
green  open   logstash-2018.12.11        gHse-Kc-Rfedgpvbt0-vFA   1   2   56393865            0    159.1gb         52.9gb
green  open   logstash-syslog-2018.12.16 KJ7XdzicSJSCWNceS9giZA   1   2    1634493            0      2.5gb        844.4mb
green  open   logstash-2018.12.13        0Q2NiijUQJSkF07G9j2OCw   1   2   51879419            0    146.5gb         48.8gb
green  open   logstash-syslog-2018.12.18 UP4At50qR46C0uhKX-L50g   1   2    1640802            0      2.5gb        856.5mb

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 7 2019, 2:10 PM
herron added a subscriber: herron.Jan 11 2019, 3:53 PM

This is causing logstash disks to get fuller than expected, part of the root cause seems to be that the previous curator action (forcemerge) doesn't complete and thus the "reduce replicas" action doesn't run:

2019-02-05 06:42:03,890 ERROR     Failed to complete action: forcemerge.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: ConnectionTimeout caused by - ReadTimeoutError(HTTPConnectionPool(host='127.0.0.1', port=9200): Read timed out. (read timeout=21600))

Mentioned in SAL (#wikimedia-operations) [2019-02-05T15:15:27Z] <godog> force curator action 'replicas' to set older logstash indices to 1 replica - T213078

Change 488538 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash: curator: re-order replica prune to occur before forcemerge

https://gerrit.wikimedia.org/r/488538

Change 488538 merged by Herron:
[operations/puppet@production] logstash: curator: re-order replica prune to occur before forcemerge

https://gerrit.wikimedia.org/r/488538

fgiunchedi closed this task as Resolved.Feb 7 2019, 1:25 PM
fgiunchedi claimed this task.

Thanks to the patch from @herron this is fixed now:

2019-02-07 00:42:01,483 INFO      Preparing Action ID: 1, "delete_indices"
2019-02-07 00:42:01,489 INFO      Trying Action ID: 1, "delete_indices": Delete indices older than 31 days (based on index name), for logstash- prefixed indices. Ignore the error if the filter does not result in an actionable list of indices (ignore_empty_list) and exit cleanly.
2019-02-07 00:42:02,471 INFO      Deleting selected indices: ['logstash-syslog-2019.01.07', 'logstash-2019.01.07']
2019-02-07 00:42:02,471 INFO      ---deleting index logstash-syslog-2019.01.07
2019-02-07 00:42:02,471 INFO      ---deleting index logstash-2019.01.07
2019-02-07 00:42:02,823 INFO      Action ID: 1, "delete_indices" completed.
2019-02-07 00:42:02,823 INFO      Preparing Action ID: 2, "replicas"
2019-02-07 00:42:02,825 INFO      Trying Action ID: 2, "replicas": after 15 days set number of replicas to 1
2019-02-07 00:42:03,632 INFO      Setting the replica count to 1 for indices: ['logstash-syslog-2019.01.16', 'logstash-2019.01.16', 'logstash-2019.01.19', 'logstash-2019.01.08', 'logstash-2019.01.14', 'logstash-2019.01.09', 'logstash-syslog-2019.01.14', 'logstash-syslog-2019.01.11', 'logstash-2019.01.17', 'logstash-syslog-2019.01.10', 'logstash-syslog-2019.01.08', 'logstash-2019.01.11', 'logstash-2019.01.15', 'logstash-syslog-2019.01.15', 'logstash-2019.01.23', 'logstash-2019.01.10', 'logstash-syslog-2019.01.12', 'logstash-2019.01.18', 'logstash-2019.01.21', 'logstash-syslog-2019.01.21', 'logstash-syslog-2019.01.18', 'logstash-syslog-2019.01.19', 'logstash-syslog-2019.01.13', 'logstash-backup_dce2', 'logstash-2019.01.13', 'logstash-syslog-2019.01.23', 'logstash-2019.01.12', 'logstash-2019.01.20', 'logstash-syslog-2019.01.22', 'logstash-syslog-2019.01.17', 'logstash-2019.01.22', 'logstash-syslog-2019.01.20', 'logstash-syslog-2019.01.09']
2019-02-07 00:42:08,974 INFO      Action ID: 2, "replicas" completed.
2019-02-07 00:42:08,974 INFO      Preparing Action ID: 3, "forcemerge"
2019-02-07 00:42:08,977 INFO      Trying Action ID: 3, "forcemerge": forceMerge logstash- prefixed indices older than 2 days (based on index creation_date) to 1 segments per shard.  Delay 120 seconds between each forceMerge operation to allow the cluster to quiesce. This action will ignore indices already forceMerged to the same or fewer number of segments per shard, so the 'forcemerged' filter is unneeded.
2019-02-07 00:42:09,960 INFO      forceMerging selected indices
2019-02-07 00:42:09,960 INFO      forceMerging index logstash-syslog-2019.02.05 to 1 segments per shard.  Please wait...
2019-02-07 00:43:00,687 INFO      Pausing for 120.0 seconds before continuing...
2019-02-07 00:45:00,742 INFO      forceMerging index logstash-syslog-2019.02.04 to 1 segments per shard.  Please wait...
2019-02-07 00:45:47,625 INFO      Pausing for 120.0 seconds before continuing...
2019-02-07 00:47:47,726 INFO      forceMerging index logstash-2019.02.05 to 1 segments per shard.  Please wait...

There's still the issue that forcemerge might not complete or timeout though out of scope for this task.