Page MenuHomePhabricator

Global search fails with HTTP 500
Open, Needs TriagePublicBUG REPORT

Event Timeline

Aklapper changed the subtype of this task from "Task" to "Bug Report".Aug 3 2021, 6:00 PM

@Urbanecm Are you sure about this? This task is about the tool failing for a specific query, possibly due to the emoji/Unicode plane 1 character, while T316420 seems to be a more recent issue.

@Urbanecm Are you sure about this? This task is about the tool failing for a specific query, possibly due to the emoji/Unicode plane 1 character, while T316420 seems to be a more recent issue.

Oh, sorry, didn't see that. Thought it's a generic "tool is down" issue. Re-opening.

Different issue indeed. I've fixed T316420 but this one is still failing. It's apparently the combination of the 🌐 symbol with some text is confusing CloudElastic (which powers Global Search). Here's a snippet of the VERY large output it gave:

"error":{
   "root_cause":[
      {
         "type":"query_shard_exception",
         "reason":"failed to create query: ngramField [source_text.trigram] is unknown.",
         "index_uuid":"7fnUbq97RQyGvWibHg71sA",
         "index":"omega:.ltrstore"
      },
      {
         "type":"query_shard_exception",
         "reason":"failed to create query: ngramField [source_text.trigram] is unknown.",
         "index_uuid":"QF6oi_7RQGaLcXuI-xGsfA",
         "index":"omega:.tasks"
      },
      {
         "type":"invalid_regex_exception",
         "reason":"invalid_regex_exception: Analyzer provided generate more than one tokens, if using 3grams make sure to use a 3grams analyzer, for input [\uDF10gl] first is [\uDF10gl] but [glo] was generated."
      },
      {
         "type":"invalid_regex_exception",
         "reason":"invalid_regex_exception: Analyzer provided generate more than one tokens, if using 3grams make sure to use a 3grams analyzer, for input [\uDF10gl] first is [\uDF10gl] but [glo] was generated."
      },
      {
         "type":"invalid_regex_exception",
         "reason":"invalid_regex_exception: Analyzer provided generate more than one tokens, if using 3grams make sure to use a 3grams analyzer, for input [\uDF10gl] first is [\uDF10gl] but [glo] was generated."
      },
      {
         "type":"invalid_regex_exception",
         "reason":"invalid_regex_exception: Analyzer provided generate more than one tokens, if using 3grams make sure to use a 3grams analyzer, for input [\uDF10gl] first is [\uDF10gl] but [glo] was generated."
      },
      {
         "type":"invalid_regex_exception",
         "reason":"invalid_regex_exception: Analyzer provided generate more than one tokens, if using 3grams make sure to use a 3grams analyzer, for input [\uDF10gl] first is [\uDF10gl] but [glo] was generated."
      },
      {
         "type":"invalid_regex_exception",
         "reason":"invalid_regex_exception: Analyzer provided generate more than one tokens, if using 3grams make sure to use a 3grams analyzer, for input [\uDF10gl] first is [\uDF10gl] but [glo] was generated."
      },
      {
         "type":"invalid_regex_exception",
         "reason":"invalid_regex_exception: Analyzer provided generate more than one tokens, if using 3grams make sure to use a 3grams analyzer, for input [\uDF10gl] first is [\uDF10gl] but [glo] was generated."
      },
      {
         "type":"invalid_regex_exception",
         "reason":"invalid_regex_exception: Analyzer provided generate more than one tokens, if using 3grams make sure to use a 3grams analyzer, for input [\uDF10gl] first is [\uDF10gl] but [glo] was generated."
      },
      {
         "type":"invalid_regex_exception",
         "reason":"invalid_regex_exception: Analyzer provided generate more than one tokens, if using 3grams make sure to use a 3grams analyzer, for input [\uDF10gl] first is [\uDF10gl] but [glo] was generated."
      },
      {
         "type":"invalid_regex_exception",
         "reason":"invalid_regex_exception: Analyzer provided generate more than one tokens, if using 3grams make sure to use a 3grams analyzer, for input [\uDF10gl] first is [\uDF10gl] but [glo] was generated."
      },
      {
         "type":"invalid_regex_exception",
         "reason":"invalid_regex_exception: Analyzer provided generate more than one tokens, if using 3grams make sure to use a 3grams analyzer, for input [\uDF10gl] first is [\uDF10gl] but [glo] was generated."
      },
      {
         "type":"invalid_regex_exception",
         "reason":"invalid_regex_exception: Analyzer provided generate more than one tokens, if using 3grams make sure to use a 3grams analyzer, for input [\uDF10gl] first is [\uDF10gl] but [glo] was generated."
      },
      …

The global icon seems to confuse the regex parser, or something? If I remove the GLOB text, the query times out, which is not unusual for difficult regex searches. What makes 🌐GLOB special is CloudElastic seems to reject it very early on.

@1234qwer1234qwer4 Did this search ever work for you in the past?

T316420 indicates there were some upstream changes to CloudElastic, so it's possible I need to change how the regex queries are constructed, or something. CC'ing @EBernhardson here in case they have ideas. The current code for reference is at Query.php#L86-L117.

I reported this just after submitting this search query for the first time and realising it fails.

I reported this just after submitting this search query for the first time and realising it fails.

Okay, in that case it's probably not a new issue following whatever recent changes were made to CloudElastic. Still, the fact the query fails early on tells me either CloudElastic probably can't handle it to begin with, or we're constructing our query incorrectly. This is EBernhardson's area of expertise so I shall wait to hear from them in hopes there's an easy solution.

The upstream change is that elastic was upgraded from 6.8 to 7.10.2 on cloudelastic, with production services migrating this week and the next. I had taken a quick look over global-search and it seemed to be avoiding some of the biggest changes (no more mapping types), but i seem to have missed these other details.

For the first few errors, against .ltrstore and .tasks, it looks like elastic has become more strict in how it handles missing fields. In 6.x elasticsearch always allowed you to query unknown fields and it would resolve as equivalent to a match_none query, since there was nothing to match against. It seems elastic 7.x is now being strict and issuing errors.
For this is suspect we would need to change the index glob patterns to use *_content, *_general, *_file to target the wiki specific indices.

For the invalid regex exception this is unexpected, I'm not aware of any changes we had to make on the cirrus side to get our integration tests passing for regex searches. We run our plain searches against the source field using the query_string query though, as opposed to through a match query as seen here. That might be harder to do here though, as it requires having some code that escapes all the things users might do in query_string context. I'll have to do some experimenting to find what is appropriate here.

The upstream change is that elastic was upgraded from 6.8 to 7.10.2 on cloudelastic, with production services migrating this week and the next. I had taken a quick look over global-search and it seemed to be avoiding some of the biggest changes (no more mapping types), but i seem to have missed these other details.

For the first few errors, against .ltrstore and .tasks, it looks like elastic has become more strict in how it handles missing fields. In 6.x elasticsearch always allowed you to query unknown fields and it would resolve as equivalent to a match_none query, since there was nothing to match against. It seems elastic 7.x is now being strict and issuing errors.
For this is suspect we would need to change the index glob patterns to use *_content, *_general, *_file to target the wiki specific indices.

For the invalid regex exception this is unexpected, I'm not aware of any changes we had to make on the cirrus side to get our integration tests passing for regex searches. We run our plain searches against the source field using the query_string query though, as opposed to through a match query as seen here. That might be harder to do here though, as it requires having some code that escapes all the things users might do in query_string context. I'll have to do some experimenting to find what is appropriate here.

FWIW when the Search team was discussing this today we realized this ticket was created over a year ago. So it's actually not related to the 7.10.2 upgrade.