Page MenuHomePhabricator

Cassandra secondary indexing problem for large pages
Closed, ResolvedPublic

Description

When retrieving and storing a revision for a large page, the following Cassandra errors appear (output for Barack_Obama):

ResponseError: Can't index column value of size 1483409 for index null on local_group_default_T_parsoid_html.data
ResponseError: Can't index column value of size 842881 for index null on local_group_default_T_parsoid_dataW4ULtxs1oMqJeY.data

The offending line in Cassandra's code is src/java/org/apache/cassandra/cql3/statements/UpdateStatement.java:135 (ver 2.1.3) and refers to a secondary-index-validation failure. CASSANDRA-3057, CASSANDRA-4240, CASSANDRA-8081 and CASSANDRA-8280 all suggest that data larger than 65K cannot be indexed, but we do not index the value field so we should not be getting this error in the first place.

Another peculiar thing about the error is that it states for index null.

Event Timeline

mobrovac raised the priority of this task from to High.
mobrovac updated the task description. (Show Details)

Fortunately, this does not affect production as the indexes at issue don't exist there.

It is a Cassandra bug, and we'll stop creating those native secondary indexes to avoid this issue for auto-created tables outside of production. As a result, we need to change the way we implement global title / revision listings.

Cassandra's SI's are no longer used in RB, so resolving.