Page MenuHomePhabricator

The prefix keyword should augment and not override the list of requested namespaces
Closed, ResolvedPublic

Description

In T193392 an attempt to align the prefix keyword with common practices used by cirrus keyword was made by emitting warning to the user stating that the namespace requested is not compatible with the list of selected namespace.
This approach caused many custom search boxes (using the InputBox extension) to fail.

Overriding the list of namespaces is not a sane design and will cause issues later on when we'll improve the search query syntax.
I suggest another approach to make this keyword better fit future development:

  • at parse time the prefix keyword will inform the parser that it needs additional namespace to function properly
  • the parser will then decide what to do with this
    • if the namespace was already requested: noop
    • if the namespace was not requested: add it to the list requested namespaces

Since this behavior is unique to the prefix keyword I suggest adding a special key value that the parser can inspect when calling KeywordFeature::parseValue rather than promoting this behavior at the API level.

Impact:

  • existing queries should work as before
  • will allow to support OR between this keyword and other keywords (not possible if the namespace is overridden)

Event Timeline

dcausse triaged this task as Medium priority.May 29 2018, 8:41 AM
dcausse created this task.

Change 436031 had a related patch set uploaded (by DCausse; owner: DCausse):
[mediawiki/extensions/CirrusSearch@master] Prefix keyword should append its namespace not override the initial ones

https://gerrit.wikimedia.org/r/436031

I am not sure whether it's the same thing as the patch is fixing, but I think we have been mixing two things here:

  1. Global query filter, which is applied to all query regardless of query strings, etc. This is what should be set from namespace configuration coming from URL or other places, and I don't think query features should change it. This should also be used to control search behavior (e.g. for Wikidata etc.) and other global things.
  1. Namespaces for specific keywords. This should produce a local filter, which can be either AND-ed with filters provided by other filter keywords, or negated, or OR-ed, or combined any other way you like. For some queries, the effect would be the same as modifying the original namespace filter, but for some it won't be.

Separating those I think would clear up the matters somewhat. It can create conflicts - like having disjoint filters in prefix: and in global namespace setup - i.e. you can ask Property: search on Wikidata in query params and then say prefix:Help:Foo in the query string, and in that case the query would fail. I think it's OK.

I am also not sure that if you ask a namespace via prefix: that wasn't given in the URL we should add it - did it work before? If so, we may want to add the prefix-generated namespaces into the global filter, maybe.

Initially I wanted to change how prefix works by disallowing any changes to the original list of namespace requested from the URL. Sadly this caused many components to fail and I had to revert. I completely agree with you that this behavior makes it difficult for us to make some decisions based on the list of namespaces.
Fixing the root cause by changing the behavior of the prefix: keyword seems hard to achieve in a reasonable amount of time. I'm not saying that it's impossible.

The strategy here is to open the initial list of request namespaces to the ones that the prefix keyword wants:

  • user requests NS_MAIN but type prefix:Help:Foo, the list of requested namespaces will be [ NS_MAIN, NS_HELP ], the filter generated by the prefix keyword will be : [ 'title': 'Foo', namespace: NS_HELP ].
    • previously we simply overwrote the list of namespaces to [ NS_HELP ], forgetting the original NS_MAIN
    • the patch that was reverted emitted a warning and saying that these namespaces are incompatible and asked the user to align the list of namespaces to the one targeted by the prefix keyword. This caused too much trouble on the current wiki workflows.
  • user requests NS_MAIN but type prefix:Foo, the list of requested namespaces will remain unchanged.
  • user requests NS_MAIN but type prefix:all:Foo, the list of requested namespaces will be set to null (by cirrus standards it means all namespaces)

After the refactoring the original list of namespaces will still be available and should remain unchanged but the ParsedQuery will have a getter that returns the additional namespaces found in the syntax that the query building components should take into account to generate a valid query.

Overall I think what you suggest is exactly what I plan to do.

user requests NS_MAIN but type prefix:Help:Foo, the list of requested namespaces will be [ NS_MAIN, NS_HELP ]

This makes sense, but this may change behavior with specialized searches like Wikidata ones. Fortunately, prefix: seems to be mostly useless in Wikidata per se, but if we have namespace where prefix: is useful, it would require to do combined search instead of just local namespace search. Maybe not a big deal, as you can always set the namespace in the search box...

So for now we don't really distinguish between the initial namespaces and the namespaces found in the query.
In the current wikibase situation we fallback to cirrus builder anyways in the presence of keywords so this won't be an issue short term.

Later I agree that we will have to make an important decision:

  1. should all the decisions (about choosing the appropriate query builders) be based solely on the initial list of namespaces, this means also that the parser should not be able to parse keywords that may change the list of namespaces. E.g. when searching on [ NS_ITEM, NS_PROPS ] on wikibase we will spawn a special parser that is only meant to support syntax supported by your query builder.
    • choose all components based on the initial context i.e. requested namespaces (Parser/QueryBuilders/...) => parse => build query
    • drawback: the syntax may slightly change depending on the initial namespace
    • benefit: guarantee that the parsed query is compatible with your builders, in other words: no need to fallback
  2. should we defer all important decisions after parsing, this means we have a single parser and the logic to decide between query builders will be based on the initial set of namespaces + the ParsedQuery (AST).
    • parse the query => decision to choose builders based on inititial context + the parsed query => build query
    • drawback: more complex code to analyze the parsed query, needs a mechanism to fallback
    • advantage: a single syntax

I doubt that from a user POV option 1 is acceptable, it may be very confusing to have a varying syntax depending on the initial set of namespaces.

(This discussion should really go to T189880 )

I doubt that from a user POV option 1 is acceptable, it may be very confusing to have a varying syntax depending on the initial set of namespaces.

I agree. It may also be confusing for us, as it'd be tricky to figure out what is happening in the code in each particular query. I think we need to parse first, and then decide which query builder to use to build this query, in accordance with parsing results.

Change 436031 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Prefix keyword should append its namespace not override the initial ones

https://gerrit.wikimedia.org/r/436031

Vvjjkkii renamed this task from The prefix keyword should augment and not override the list of requested namespaces to 03baaaaaaa.Jul 1 2018, 1:07 AM
Vvjjkkii reopened this task as Open.
Vvjjkkii removed dcausse as the assignee of this task.
Vvjjkkii raised the priority of this task from Medium to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot renamed this task from 03baaaaaaa to The prefix keyword should augment and not override the list of requested namespaces .Jul 2 2018, 3:11 AM
CommunityTechBot closed this task as Resolved.
CommunityTechBot assigned this task to dcausse.
CommunityTechBot lowered the priority of this task from High to Medium.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: gerritbot, Aklapper.