Page MenuHomePhabricator

Search backend error during full_text search for 'QUERY_SRTING' after 39: i_o_exception: Can't read unknown type [50]
Closed, ResolvedPublic

Description

Right after deploying a config change to activate ltr on enwiki we saw errors:

Search backend error during full_text search for 'query text' after 39: i_o_exception: Can't read unknown type [50]

This is certainly caused by the ltr plugin.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2017-09-14T19:58:27Z] <dcausse> banning elastic1020 to see if T175951 is caused by mixed versions of the ltr plugin

debt triaged this task as High priority.
debt edited projects, added Discovery-Search (Current work); removed Discovery-Search.

This is definitely due to a mixed version of the ltr plugin being deployed on elastic1020.
The binary format of the sltr changed between these versions making it impossible to use sltr with elastic1020 and the rest of the servers.
Restarting more nodes will likely exacerbate the problem with a peak the in the middle of the rolling restart.

I think we have few options:

  • I'd suggest to try to depool elastic1020 from pybal/lvs to limit errors during the week-end
  • On Monday: Switch fulltext traffic to codfw and run a rolling restart on eqiad

or simply undeploy the config change to activate ltr.

Mentioned in SAL (#wikimedia-operations) [2017-09-15T07:38:59Z] <gehel> shutting down and masking elasticsearch on elastic1020 - T175951