Page MenuHomePhabricator

Upgrade Cassandra on AQS to 2.2.6-wmf5
Closed, ResolvedPublic5 Estimated Story Points

Description

During T196044 a new patch version of Cassandra has been build to allow full compatibility with Debian Stretch. AQS should be upgraded to that version as soon as possibile.

Event Timeline

Tested on deployment-aqs01.eqiad.wmflabs, everything looks good. The debdiff is also good, only minor deps changed and nothing related to cassandra internals.

Change 440308 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::aqs: set cassandra version to 2.2.6-wmf5

https://gerrit.wikimedia.org/r/440308

Change 440308 merged by Elukey:
[operations/puppet@production] role::aqs: set cassandra version to 2.2.6-wmf5

https://gerrit.wikimedia.org/r/440308

Change 440337 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] role::aqs: deploy different cassandra versions

https://gerrit.wikimedia.org/r/440337

Change 440337 merged by Elukey:
[operations/puppet@production] role::aqs: deploy different cassandra versions

https://gerrit.wikimedia.org/r/440337

The new python-cassandra package seems to break cqlsh on Jessie:

elukey@aqs1004:~$ sqls
-bash: sqls: command not found
elukey@aqs1004:~$ cqlsh
Traceback (most recent call last):
  File "/usr/bin/cqlsh.py", line 161, in <module>
    from cqlshlib import cql3handling, cqlhandling, pylexotron, sslhandling
  File "/usr/lib/python2.7/dist-packages/cqlshlib/cql3handling.py", line 17, in <module>
    from .cqlhandling import CqlParsingRuleSet, Hint
  File "/usr/lib/python2.7/dist-packages/cqlshlib/cqlhandling.py", line 21, in <module>
    from cassandra.metadata import cql_keywords_reserved
ImportError: cannot import name cql_keywords_reserved

For the moment I left only aqs1004 with the new 2.2.6-wmf5 cassandra version, meanwhile the other nodes have be wmf3. Rather than fix this issue for jessie I'd concentrate on the Stretch upgrades.

The new python-cassandra package seems to break cqlsh on Jessie:

elukey@aqs1004:~$ sqls
-bash: sqls: command not found
elukey@aqs1004:~$ cqlsh
Traceback (most recent call last):
  File "/usr/bin/cqlsh.py", line 161, in <module>
    from cqlshlib import cql3handling, cqlhandling, pylexotron, sslhandling
  File "/usr/lib/python2.7/dist-packages/cqlshlib/cql3handling.py", line 17, in <module>
    from .cqlhandling import CqlParsingRuleSet, Hint
  File "/usr/lib/python2.7/dist-packages/cqlshlib/cqlhandling.py", line 21, in <module>
    from cassandra.metadata import cql_keywords_reserved
ImportError: cannot import name cql_keywords_reserved

For the moment I left only aqs1004 with the new 2.2.6-wmf5 cassandra version, meanwhile the other nodes have be wmf3. Rather than fix this issue for jessie I'd concentrate on the Stretch upgrades.

For posterity sake (and as discussed in the Cassandra stand-up today):

In order to work around a bug that manifests under the Stretch version of Python, we brute-forced cqlsh in -wmf5 to use python-cassandra instead of the vendored version. See:

https://github.com/wikimedia/cassandra/blob/wmf/2.2.6-wmf5/debian/patches/120set_cqlsh_no_bundled.dpatch

However, it would seem that the version of cqlsh in 2.2.6 is incompatible with the version of python-cassandra in Jessie.

Possible solutions:

  • Backport the version of python-cassandra in Stretch, to Jessie
  • Pin AQS to -wmf3
  • Fix the patch to source a templated file that defines CQLSH_NO_BUNDLED conditionally (i.e. don't set it on Jessie)
  • ...
  • Fix the patch to source a templated file that defines CQLSH_NO_BUNDLED conditionally (i.e. don't set it on Jessie)
  • ...

Tested this one, leads to:

elukey@aqs1004:~$ cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})

That IIRC it was the original issue :(

fdans triaged this task as Medium priority.Jun 14 2018, 4:23 PM
fdans added a project: Analytics-Kanban.
fdans moved this task from Incoming to Operational Excellence on the Analytics board.
  • Fix the patch to source a templated file that defines CQLSH_NO_BUNDLED conditionally (i.e. don't set it on Jessie)
  • ...

Tested this one, leads to:

elukey@aqs1004:~$ cqlsh
Connection error: ('Unable to connect to any servers', {'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. Last error: Connection refused")})

That IIRC it was the original issue :(

Did that work before? Is Cassandra bound to loopback?

@Eevans for some reason I didn't check what you asked above, I was probably super tired and didn't test this:

elukey@aqs1004:~$ cqlsh -u cassandra aqs1004-a.eqiad.wmnet
Password:
Connected to Analytics Query Service Storage at aqs1004-a.eqiad.wmnet:9042.
[cqlsh 5.0.1 | Cassandra 2.2.6 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cassandra@cqlsh>

So simply commenting CQLSH_NO_BUNDLED seems to work like a charm with 2.2.6-wmf5!

@Eevans for some reason I didn't check what you asked above, I was probably super tired and didn't test this:

elukey@aqs1004:~$ cqlsh -u cassandra aqs1004-a.eqiad.wmnet
Password:
Connected to Analytics Query Service Storage at aqs1004-a.eqiad.wmnet:9042.
[cqlsh 5.0.1 | Cassandra 2.2.6 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cassandra@cqlsh>

So simply commenting CQLSH_NO_BUNDLED seems to work like a charm with 2.2.6-wmf5!

Are you going to hack that one host locally, or do you want to pursue fixing the package in a way that works for both Jessie and Stretch?

Are you going to hack that one host locally, or do you want to pursue fixing the package in a way that works for both Jessie and Stretch?

I think that we could simply keep two 2.2.6 versions in our puppet config, one for Jessie and one for Stretch, and be done with it. So basically re-upload the previous version of 2.2.6 in jessie-wikimedia, and set puppet accordingly. What do you think?

Are you going to hack that one host locally, or do you want to pursue fixing the package in a way that works for both Jessie and Stretch?

I think that we could simply keep two 2.2.6 versions in our puppet config, one for Jessie and one for Stretch, and be done with it. So basically re-upload the previous version of 2.2.6 in jessie-wikimedia, and set puppet accordingly. What do you think?

Hrmm, all of the solutions are kind of awful here (if not for the outcome, then for the time required), but this would work, yeah.

FWIW, if we fixed the package, I think we could do something like this (untested):

snippet
if python -c 'import cassandra' 2>/dev/null; then
    export CQLSH_NO_BUNDLED="1"
fi

Basically we'd be saying: "If you've installed the Python module, we'll (attempt to) use that, otherwise we'll (attempt to) use the vendored one." Then of course we'd need to make sure not to install the package on Jessie (where the older version isn't compatible), and make sure to install it on Stretch, (to work around the annoying bug). This is also awful, but probably OK within the narrow context (and perhaps less awful than employing Puppet hackery).

But I'll leave it up to you. :)

Change 442251 had a related patch set uploaded (by Elukey; owner: Elukey):
[operations/puppet@production] cassandra: add another package version to the 2.2 list

https://gerrit.wikimedia.org/r/442251

Change 442251 merged by Elukey:
[operations/puppet@production] cassandra: add another package version to the 2.2 list

https://gerrit.wikimedia.org/r/442251

Mentioned in SAL (#wikimedia-operations) [2018-06-28T13:49:14Z] <elukey> downgrade cassadra and cassandra-tools from 2.2.6-wmf5 to 2.2.6-wmf3 in jessie-wikimedia component/cassandra22 - T197062

elukey set the point value for this task to 5.
Vvjjkkii renamed this task from Upgrade Cassandra on AQS to 2.2.6-wmf5 to d5aaaaaaaa.Jul 1 2018, 1:04 AM
Vvjjkkii removed elukey as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed the point value for this task.
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot lowered the priority of this task from High to Medium.Jul 3 2018, 3:23 AM