Page MenuHomePhabricator

CirrusSearch throws an error on several wikis when searching for "intitle:/regex/"
Closed, ResolvedPublic

Description

Steps to reproduce

  1. Try CirrusSearch keyword intitle:/regex/ on any wiki from the following list (either using web UI or API, I used Pywikibot):
  • wikipedia:be-tarask
  • wikipedia:cbk-zam
  • wikipedia:map-bms
  • wikipedia:roa-tara
  • wikipedia:sh
  • wiktionary:sh
  • wiktionary:shy
  • wikibooks:simple
  • wikiquote:simple
  • wikisource:zh-min-nan

Expected behavior
CirrusSearch should IMO work on all WMF non-private wikis (if not opt-outed by that particular wiki community).

Current behavior
On all of these CirrusSearch throws error like this:

WARNING: API error cirrussearch-backend-error: We could not complete your search due to a temporary problem. Please try again later.

or this:

An error has occurred while searching: We could not complete your search due to a temporary problem. Please try again later.

I am not really sure if these wikis have CirrusSearch disabled or some bug occurs. Maybe some have this disabled and some are bugged?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Aklapper changed the task status from Open to Stalled.Apr 21 2020, 3:07 PM

@Dvorapa: Please provide a link which shows the issue. Or explain what a "CirrusSearch keyword" is.
I go to https://sh.wikipedia.org and enter contentmodel or contentmodel: into the search field in the top right corner. No problems.
Please read https://www.mediawiki.org/wiki/How_to_report_a_bug and don't let other people have to guess or interpret - thanks! :)

Sorry, forgot to mention which keyword, thank you!

Dvorapa updated the task description. (Show Details)
Dvorapa updated the task description. (Show Details)

from sh wiktionary:

{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "[1:1091] [source_regex] failed to parse field [locale]",
        "line": 1,
        "col": 1091
      }
    ],
    "type": "parsing_exception",
    "reason": "[1:1091] [source_regex] failed to parse field [locale]",
    "line": 1,
    "col": 1091,
    "caused_by": {
      "type": "x_content_parse_exception",
      "reason": "[1:1091] [source_regex] failed to parse field [locale]",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Unknown language: sh",
        "caused_by": {
          "type": "missing_resource_exception",
          "reason": "Couldn't find 3-letter language code for sh"
        }
      }
    }
  },
  "status": 400
}
Aklapper renamed this task from CirrusSearch throws an error on several wikis to CirrusSearch throws an error on several wikis when searching for "intitle:/regex/".Apr 21 2020, 3:39 PM
Aklapper changed the task status from Stalled to Open.
dcausse triaged this task as High priority.Jun 22 2020, 7:43 AM
dcausse moved this task from needs triage to elastic / cirrus on the Discovery-Search board.
dcausse subscribed.

Adding to the sprint as this seems to me quite important to fix.

Change 607235 had a related patch set uploaded (by DCausse; owner: DCausse):
[operations/mediawiki-config@master] Set proper language code for simple english wikis

https://gerrit.wikimedia.org/r/607235

Change 607278 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Filter known non-standard language codes used by WMF wikis

https://gerrit.wikimedia.org/r/607278

Change 607278 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Filter known non-standard language codes used by WMF wikis

https://gerrit.wikimedia.org/r/607278

Change 607235 merged by jenkins-bot:
[operations/mediawiki-config@master] Set proper language code for some wikis

https://gerrit.wikimedia.org/r/607235

Mentioned in SAL (#wikimedia-operations) [2020-07-13T18:26:39Z] <dcausse@deploy1001> Synchronized wmf-config/InitialiseSettings.php: T250810: Set proper language code for some wikis (duration: 00m 56s)