Page MenuHomePhabricator

Search backend error during full_text search for 'QUERY_SRTING' after 39: i_o_exception: Can't read unknown type [50]
Closed, ResolvedPublic

Description

Right after deploying a config change to activate ltr on enwiki we saw errors:

Search backend error during full_text search for 'query text' after 39: i_o_exception: Can't read unknown type [50]

This is certainly caused by the ltr plugin.

Event Timeline

dcausse created this task.Sep 14 2017, 7:19 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Mentioned in SAL (#wikimedia-operations) [2017-09-14T19:58:27Z] <dcausse> banning elastic1020 to see if T175951 is caused by mixed versions of the ltr plugin

debt assigned this task to dcausse.Sep 14 2017, 8:27 PM
debt triaged this task as High priority.
debt edited projects, added Discovery-Search (Current work); removed Discovery-Search.

This is definitely due to a mixed version of the ltr plugin being deployed on elastic1020.
The binary format of the sltr changed between these versions making it impossible to use sltr with elastic1020 and the rest of the servers.
Restarting more nodes will likely exacerbate the problem with a peak the in the middle of the rolling restart.

I think we have few options:

  • I'd suggest to try to depool elastic1020 from pybal/lvs to limit errors during the week-end
  • On Monday: Switch fulltext traffic to codfw and run a rolling restart on eqiad

or simply undeploy the config change to activate ltr.

Mentioned in SAL (#wikimedia-operations) [2017-09-15T07:35:23Z] <gehel> depooling elastic1020 - T175951

Mentioned in SAL (#wikimedia-operations) [2017-09-15T07:38:59Z] <gehel> shutting down and masking elasticsearch on elastic1020 - T175951

debt closed this task as Resolved.Sep 22 2017, 2:02 PM