Page MenuHomePhabricator

Cannot search for strings including pipe symbol (|); returns only AND Results
Open, LowPublic

Description

MW Version: 1.30

Scenario: I need to search pages for a string which contains an embedded pipe.

Request: Treat embedded pipes in searches as string characters.

Current: In MediaWiki search, embedded pipes behave as boolean AND.

Note: This issue was discovered and validated through the MW API, not through MW search interface.

Details:

As implied in this helpfile, srsearch param does not accept multiple values. Therefor, i assume an embedded pipe will be treated as part of the search string.

However, this call:

https://gunretort.xyz/api.php?action=query&list=search&**srsearch=NewTag|Anteater**&srwhat=text&srnamespace=0|3000

returns pages which contain:

no pipe, unrelated word in-between:

NewTag
x
Anteater

no pipe, order is reversed, with an unrelated word in-between:

Anteater banana
NewTag

and (the desired):

{{NewTag|Anteater}}

The apparent logic is "contains both words anyplace on the page." In other words, the pipe in srsearch is being interpreted as an AND.

Ie, multivalue entries ARE supported in srsearch (although helpfile suggests not), except they are interpreted as AND instead of the usual OR.

Is there any workaround, to treat embedded pipe as string?

Event Timeline

Johnywhy created this task.May 7 2018, 1:00 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 7 2018, 1:00 PM
Johnywhy updated the task description. (Show Details)May 7 2018, 1:01 PM
Johnywhy updated the task description. (Show Details)May 7 2018, 1:03 PM
Johnywhy updated the task description. (Show Details)
Johnywhy updated the task description. (Show Details)

(Again removing the ApiFeatureUsage tag as that extension seems to be unrelated here)

Without that error message, how would an API user find out that srsearch does not support multi values? How to avoid getting bug reports from users who will write that getting no results (when they have no strings on their wiki including a |) is unexpected behavior?

Johnywhy added a comment.EditedMay 7 2018, 1:19 PM

(Again removing the ApiFeatureUsage tag as that extension seems to be unrelated here)

Entering "API" in the Tags field here did not bring up MediaWiki-API. Apparently the Tags field here doesn't show substring matches.

Without that error message

What error message? There was none.

when they have no strings on their wiki including a |)

I don't???
Yes, i do. As stated in my OP, one page contains a pipe:

{{NewTag|Anteater}}

That page should be returned.

2 other pages in return do NOT contain a NewTag|Anteater, and should not be returned-- yet they are returned.

This is unexpected behavior, isn't it?

Without that error message

What error message? There was none.

Ah, I am sorry! I thought that this was the same query as https://gunretort.xyz/api.php?action=query&list=search&srsearch=%1FNewTag%1FSpinach&srwhat=text&srnamespace=0|3000&format=xml mentioned in T194016, but it is not, true.

Anomie closed this task as Invalid.May 7 2018, 1:47 PM
Anomie added a subscriber: Anomie.

There's no API bug here. The search you link is returning the same results that the equivalent Special:Search query does, as it should do.

Chances are the underlying search engine is treating the pipe symbol (|) in the same way it does any other punctuation, more or less as a space.

Anomie added a comment.May 7 2018, 1:48 PM

(Again removing the ApiFeatureUsage tag as that extension seems to be unrelated here)

Entering "API" in the Tags field here did not bring up MediaWiki-API. Apparently the Tags field here doesn't show substring matches.

For the record, that bug is tracked as T182458: Phabricator tag search seems to have regressed to prefix search.

Johnywhy added a comment.EditedMay 7 2018, 1:54 PM

There's no API bug here. the underlying search engine is treating the pipe symbol (|) in the same way it does any other punctuation, more or less as a space.

a pipe isn't a space. It shouldn't get interpreted as a space. How is that not a bug? Is there a way to escape it so it's treated like a pipe?

A page containing the following is returned. The order of the words is reversed, with an unrelated word in-between. That's not just a space:

Anteater banana
NewTag

The apparent logic is "contains both words anyplace on the page." In other words, the pipe in srsearch is being interpreted as an AND. Ie, multivalue entries ARE supported in srsearch, except they are interpreted as AND instead of OR.

That's definitely very different than "more or less a space". Definitely unexpected. Where's it documented that punctuation is not recognized in searches? Is there no workaround?

Johnywhy reopened this task as Open.May 7 2018, 1:58 PM
Johnywhy updated the task description. (Show Details)
Anomie closed this task as Invalid.May 7 2018, 2:11 PM

There's no API bug here. the underlying search engine is treating the pipe symbol (|) in the same way it does any other punctuation, more or less as a space.

a pipe isn't a space. It shouldn't get interpreted as a space. How is that not a bug?

It's not a bug in the API. If you think it's a bug in the search backend, file it against whatever search backend you're using.

If you think it's a bug in the search backend, file it against whatever search backend you're using.

I'm using stock MW 1.30 API. What search backend is that?

Anomie added a comment.May 7 2018, 2:13 PM

If you're not using an extension like CirrusSearch, you'd file it in MediaWiki-Search.

Johnywhy added a comment.EditedMay 7 2018, 2:15 PM

If you're not using an extension like CirrusSearch, you'd file it in MediaWiki-Search.

MediaWiki-Search is also on phabricator.
So, i should create a new task, identical to this one, tagged mediawiki-search?
Or, just tag this one mediawiki-search, and re-open it?

thx

Anomie added a comment.May 7 2018, 2:18 PM

Neither. Either create a new task with the proper title and description of the behavior in question, or rewrite this task's title and description when you reopen, add MediaWiki-Search, and remove MediaWiki-API.

Johnywhy updated the task description. (Show Details)May 7 2018, 2:18 PM

Neither. Either create a new task with the proper title and description of the behavior in question, or rewrite this task's title and description when you reopen, add MediaWiki-Search, and remove MediaWiki-API.

what's wrong with the title and description?

Anomie added a comment.May 7 2018, 2:21 PM

As you've discovered, this has to do with the search engine rather than the API. But the title and description are currently focused on the API query you're making instead of clearly describing the behavior of the search engine that occurs both via the API and the web UI's Special:Search.

Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptMay 7 2018, 2:21 PM

@Johnywhy: Please feel free to change the task status to Open and fix the task description, maybe something like:
Do not treat the pipe symbol (|) in the same way as other punctuation (like a space)

(I'd do that myself, but internet in this place makes such requests time out for reasons I don't know. Thanks!)

Johnywhy renamed this task from Embedded Pipe in API srsearch Parameter Returning Unexpected Results to Bug? Embedded Pipe in MediaWiki Search Parameter Returning AND Results.May 7 2018, 7:41 PM
Johnywhy updated the task description. (Show Details)
Johnywhy reopened this task as Open.
Johnywhy updated the task description. (Show Details)May 7 2018, 7:45 PM
Johnywhy renamed this task from Bug? Embedded Pipe in MediaWiki Search Parameter Returning AND Results to How to Search for Embedded Pipe in MediaWiki Search? Pipe is Returning AND Results.
Johnywhy added a comment.EditedMay 7 2018, 8:04 PM

Almost solved. Surround entire string with quotes. You can use html-encoded, or not-- both solutions below work.
The solution below finds both:

NewTag|Anteater

and

NewTag Anteater

We don't want to find the second one (pipe interpreted as space).

the following search-string finds the same above two results, when neither result contains curly braces. So, still a problem.

"{{NewTag|Anteater}}"

https://gunretort.xyz/api.php?action=query&list=search&srsearch=%22NewTag%7CAnteater%22&srwhat=text&srnamespace=0|3000|3004
https://gunretort.xyz/api.php?action=query&list=search&srsearch="NewTag|Anteater"&srwhat=text&srnamespace=0|3000|3004

It seems an escape character is needed, recognized at all levels of the stack.

Aklapper renamed this task from How to Search for Embedded Pipe in MediaWiki Search? Pipe is Returning AND Results to Cannot search for strings including pipe symbol (|); returns only AND Results.May 8 2018, 10:45 AM
EBjune triaged this task as Normal priority.May 10 2018, 5:15 PM
EBjune added a project: patch-welcome.
EBjune added a subscriber: EBjune.

This would be a good task fopr a volunteer, and search platform would be glad to look over any patches submitted.

Anomie removed a subscriber: Anomie.May 10 2018, 5:24 PM
Johnywhy updated the task description. (Show Details)May 23 2018, 2:38 PM
Johnywhy updated the task description. (Show Details)

It seems an escape character is needed, recognized at all levels of the stack.

Johnywhy added a comment.EditedMay 26 2018, 12:23 PM

Then you don't understand what "priority" means.

Priority of a fix should be based on the importance of the functionality, or the severity of the bug.

Priority should NOT be based on whether or not someone volunteers to do it. That has nothing to do with priority.

For example, if a bug BREAKS the wiki, but nobody volunteers to work on it, we don't say it's Low Priority.

The section on the page you linked, explaining priority, mixes the two concepts (importance vs assignment), thus creating a mixed up, confused definition.

I've worked as a professional developer on many projects, for decades, and nowhere have I seen Priority used to indicate whether or not someone is working on it.

Other people agree with me on this:
https://m.mediawiki.org/wiki/Talk:Phabricator/Project_management

https://www.mediawiki.org/wiki/Phabricator/Project_management#Setting_task_priorities defines normal priority "someone is still planning to work on it."
I do not see that's the case for this task...

(General discussions whether current definitions make sense need to go to that wiki discussion page; they are off-topic in this task.)

EBjune lowered the priority of this task from Normal to Low.May 29 2018, 11:18 PM
Vvjjkkii renamed this task from Cannot search for strings including pipe symbol (|); returns only AND Results to chdaaaaaaa.Jul 1 2018, 1:11 AM
Vvjjkkii raised the priority of this task from Low to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed a subscriber: Aklapper.
CommunityTechBot lowered the priority of this task from High to Low.
CommunityTechBot renamed this task from chdaaaaaaa to Cannot search for strings including pipe symbol (|); returns only AND Results.
CommunityTechBot added a subscriber: Aklapper.