Page MenuHomePhabricator

Find better input method for "These words", "Not these words", "One of these Words" and "Exactly this text"
Closed, ResolvedPublic5 Story Points

Description

Motivation
Currently, "These words" and "Not these words" are treated differently than "One of these words" or "Exactly this text". The reason is that the first two allow whitespaces as seperators, whereas the other two don't. Also, stemming is applied in both cases.
However, it seems confusing that those seemingly similar fields behave so differently.

Task
Change the field behavior as shown in the mock.

Acceptance Criteria:

  • There are no placeholders for "These words" "None of these words" and "One of these words".
  • There is the placeholder indicated in the mock for "Exactly this text"
  • "These words" "None of these words" and "One of these words" create pills on comma, whitespace and enter
  • Enter stays aware of its context, and still submits the search if there is no pill to create
  • "Exactly this text" behaves exactly like the main search bar technically
  • pasting still works as it does now: Entering "fisch hase" will automatically turn into two pills.

Event Timeline

Restricted Application added projects: TCB-Team, Design. · View Herald TranscriptNov 24 2017, 12:05 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Lea_WMDE triaged this task as Normal priority.Jan 18 2018, 8:36 AM
Lea_WMDE updated the task description. (Show Details)
Lea_WMDE moved this task from Text stuff to In FUN sprint on the Advanced-Search board.
Lea_WMDE renamed this task from Find better input method for "These words" and "Not these words" to Find better input method for "These words", "Not these words", "One of these Words" and "Exactly this text".Jan 18 2018, 8:48 AM
Lea_WMDE updated the task description. (Show Details)Jan 18 2018, 10:08 AM
Lea_WMDE updated the task description. (Show Details)Jan 18 2018, 10:14 AM
Lea_WMDE set the point value for this task to 5.

@Lea_WMDE @Charlie_WMDE

  • "Exactly this text" behaves exactly like the main search bar technically

What should happen when open quotation marks are entered in this filed but not closed?
For example we enter the following: "this is what I want to search for
Right now we would display the wrong search results, because everything from other fields gets added to the search query if the quotes are not closed.
We would offer two possible solutions:

  1. Display some error message that the search query is incorrectly entered
  2. Keep count of the number of quotations entered and if there's inconsistency add needed quotation mark.

Or if you have any other solutions in mind we're happy to hear them.

Change 405290 had a related patch set uploaded (by Tonina Zhelyazkova; owner: Tonina Zhelyazkova):
[mediawiki/extensions/AdvancedSearch@master] Change the behavior of search term fields.

https://gerrit.wikimedia.org/r/405290

@Lea_WMDE @Charlie_WMDE
We can't make up our minds about another use case :)
What do we want to happen when the user puts quotes around text entered in the pill fields "These words" "None of these words" and "One of these words"?
Do we disregard the quotes? Do we leave them in? Confusion. I suppose the search result for <"test"> and <test> is different.
What do you think?

@Tonina_Zhelyazkova_WMDE , thanks for the questions! To

  1. Wrong number of quotation marks in "exactly that text": You raise very valid points, @Charlie_WMDE and I should discuss this more. As part of this task, don't do any exeption handling, though
  2. User adding quotes to another field: Am I right to assume that if you treated them like any other sign, search would treat them the same way as exactly that text, and thus disregard stemming etc? If yes, I don't think we should do any extra treatment. Our goal is to support the keywords that exist, and we basically don't aim to be smarter (which often means making things more complicated). So just take the input as it is if possible
gabriel-wmde added a subscriber: gabriel-wmde.EditedJan 23 2018, 9:56 AM

Please keep the following in mind when discussing this: If there is an odd number of quotes ("unclosed quotes"), they could interact with other fields in unexpected ways. For example an unclosed quote in "These words" or "Exactly this text" will make the - and OR keywords added by the other fields meaningless, suppressing their functionality without any user feedback (except unexpected search results).

Change 405290 merged by jenkins-bot:
[mediawiki/extensions/AdvancedSearch@master] Change the behavior of search term fields.

https://gerrit.wikimedia.org/r/405290

@Lea_WMDE @Charlie_WMDE

  • "Exactly this text" behaves exactly like the main search bar technically What should happen when open quotation marks are entered in this filed but not closed? For example we enter the following: "this is what I want to search for Right now we would display the wrong search results, because everything from other fields gets added to the search query if the quotes are not closed. We would offer two possible solutions:
  1. Display some error message that the search query is incorrectly entered
  2. Keep count of the number of quotations entered and if there's inconsistency add needed quotation mark.

    Or if you have any other solutions in mind we're happy to hear them.

@Tonina_Zhelyazkova_WMDE Sorry for the late response.

The way to behave here is do nothing. If there's an additional quotation mark, it's just part of the text. So let's say someone types "hello world" "good morning then "hello world" is processed regularly with quotation marks and the rest is just treated like in These words. So basically as if the whole thing would have been typed into the main upper search bar.

@Lea_WMDE @Charlie_WMDE
We can't make up our minds about another use case :)
What do we want to happen when the user puts quotes around text entered in the pill fields "These words" "None of these words" and "One of these words"?
Do we disregard the quotes? Do we leave them in? Confusion. I suppose the search result for <"test"> and <test> is different.
What do you think?

We leave them in. If it was a mistake they will quickly see it because the quotation marks will be clearly visible in the pill. There are cases where you would want to look for something with quotation marks and we want to keep this option open for the user. As in the scenario above, we can think about whether we would want to add a notification text in the form to notify the user of the potentially different search results, but that's for another ticket.

Please keep the following in mind when discussing this: If there is an odd number of quotes ("unclosed quotes"), they could interact with other fields in unexpected ways. For example an unclosed quote in "These words" or "Exactly this text" will make the - and OR keywords added by the other fields meaningless, suppressing their functionality without any user feedback (except unexpected search results).

@gabriel-wmde Can you please elaborate on that. How would they interfere? Why are the interferences unexpected? Are they not reproducible and documented somewhere? The unclosed quotes in "These words" or "exactly this text" will always be attached to a word like "hund or so because otherwise it would not be together in the pill.

If this is really the case then we should file a ticket for that because that seems to be a bug. An unused quote should not be suppressing anything normally, right? Very interested to hear more details on this. And thanks for the heads up.

A more concrete example (where the quotes were inserted erroneously due to the proximity of the keys) :

These words: stew"
Not these words: meat

The expectation would be to get a list of all pages that contain "stew", but not "meat", where the query would be stew -meat. Instead, the query will be stew" -meat, where the code which processes the query will "helpfully" remove everything that is not valid and we end up with all pages that contain "stew" **and** "meat", about 1500 less.

Why are the interferences unexpected? Are they not reproducible and documented somewhere?

IMO they are unexpected because we're effectively splitting up one field, where the keywords and modifiers are typed sequentially and extraneous quotes can be counted, into four fields that are implicitly concatenated, but still, quotes can affect the behavior of other fields. The split into fields could suggest to users that their content is "sanitized" individually (meaning stew" is converted to stew), but it isn't.
The documentation is only for the query syntax and not really applicable to AvancedSearch. Documenting the effects of quotes in the field hints would take 2-3 paragraphs and more examples, it's kind of "Advanced Advanced Search".

Still, my example feels very contrived and I think to be able to write whatever syntax you want into the text fields (and getting more power through that) outweighs the drawback of accidentally searching for the wrong thing and wondering why, because those "Accidents" are either typos or someone deliberately trying to construct a very advanced query.

BTW, - and OR in "these words" are also not escaped currently and could be used to construct even more complex queries.

Pablo-WMDE moved this task from Deploy to Done on the WMDE-Fundraising-Sprint-15 board.

thank you for the explanation @gabriel-wmde

I have changed the placeholder text and the information in the infotext. Ticket can be found here T189377

Lea_WMDE closed this task as Resolved.Mar 23 2018, 12:26 PM