Page MenuHomePhabricator

Special:Search creates broken links by dropping context
Open, MediumPublic

Description

Steps to reproduce

  • Open Special:Search@GermanWikipedia in advanced mode.
  • Enter the following search term: incategory:Vorlage:Fremdsprachenunterstützung insource:/invoke:Vorlage:lang\|full/
  • Make sure no particular namespace is specified.
  • Query.
  • 20 of 223 results should be shown.
  • Note that by nature of incategory all results are from template namespace.
  • In advanced mode all namespaces are searched.
  • Now try to get next 20 from result set, or extend the number of hits to 50 100 250 500.
  • Clicking one of such links leads to no results any longer.
  • Affected:

    Show (previous 20 | next 20) (20 | 50 | 100 | 250 | 500)

Reason

Neither the &profile=advanced nor any namespace information is provided by the offered URL for consecutive pages.

  • Therefore the follow-up is searching ns=0 only.
  • All results are expected in ns=10.

Remedy

The offered links next/previous/more must preserve relevant settings from current page URL, e.g. &profile=advanced or any &ns= which might influence the result set.

Task has been rewritten

On first attempt an escaping-encoding-decoding problem has been assumed.

  • It turned out that this was not the cause.
  • It just happened that suspicious characters occurred in search expression when behaviour was encountered the first time.

Please ignore discussion before 16 December.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@PerfektesChaos the URL attached to the task description leads to no results and thus it's impossible to check the behavior you describe as no search links are available.
Could you indicate the original search query you used and in which input box you typed it (if you used any of the new advanced input boxes).
While trying to search for insource:/\|NAME-AMTSSPRACHE/ I found nothing wrong with the generated links (more results/next|previous page), some part of the search query is URL encoded but this is expected as long as the original search query is kept intact in the search box that you see from the interface.

@PerfektesChaos the URL attached to the task description leads to no results

Well, if I click the link above I receive 20 hits, announcing of 222 total.

incategory:"Vorlage:Fremdsprachenunterstützung" insource:/invoke:Vorlage:lang\|full/

@PerfektesChaos I cannot reproduce, searching for incategory:"Vorlage:Fremdsprachenunterstützung" insource:/invoke:Vorlage:lang\|full/ leads to no results.
It might be because you had set preferred namespaces to search by default? Could you check that it's the case and verify that the list of namespaces is properly carried on when you click on next page, the bug is perhaps here? Some saved namespaces are being lost when you click on such links?

I don't get results initially, but after clicking the "Alle" checkbox I do get 223 results. Try this link. :)

@Aklapper thanks, but can you reproduce the bug after that: empty results when clicking "next page"?

Hah, got distracted and then didn't perform the rest of the steps. No, cannot reproduce on next steps on Firefox 63. Which browser was this tested with?

Okay, it is not the encoding issue, but an undesired behaviour anyway.

  • The search is meaningful for ns=10 only.
  • Explicitly requesting this namespace makes consecutive links 20 50 100 250 500 working.

Entering the search term without namespace specification into Special:Search form gives me 20 of 223 matching pages.

  • Yeah, fine.
  • Now I want to get the next 20, but the chain is broken.
  • Either the first search attempt fails, since limited to ns=0, then I am to specify a namespace. This is reasonable. (old behaviour, iirc?)
  • Or I get 20/223 results. Then consecutive pages are to scroll through this particular result set.
  • Currently after entering the expression into form all namespaces seem to be searched, but follow up links do not.

Ah, and btw there are user preferences for search. My ones are cirrussearch-completion-profile-classic-pref-desc. This may or may not influence which namespaces are gracefully considered.

Oops, meanwhile I changed prefereces. No idea which of the four options for search I have been assigned to when I opened this task. I do not recall that I ever changed them, but I am onwiki for a decade.

@Aklapper * Hi, Master Of Ceremony, would you recommend to close this task as invalid and reopen with new title and description, since the encoding cause turned out to be a wrong assumption?

  • I failed to make any user preference responsible for diverging behaviour on my account.
  • Perhaps some kind of cookies involved?
  • I do get always 20 of 223 results on first attempt.
  • User preferences were checked, no influence.

new task description followed, now replacing

I used these links over years. For sure it worked some weeks or monhts ago as expected. Therefore it is also a Regression case and something went wrong in recent major code changes.

@PerfektesChaos: No strong opinion here - feel free to update the task summary and update the task description and add Regression, I'd say :) (and thanks for investigating!)

PerfektesChaos renamed this task from Special:Search creates broken links by too much encoding to Special:Search creates broken links by dropping context.Dec 16 2018, 10:07 AM
PerfektesChaos removed a project: Discovery-Search.
PerfektesChaos updated the task description. (Show Details)

I used these links over years. For sure it worked some weeks or months ago as expected.

I still can't reproduce:

  • Open Special:Search@GermanWikipedia in advanced mode.
    • OK
  • Enter the following search term: incategory:Vorlage:Fremdsprachenunterstützung insource:/invoke:Vorlage:lang\|full/
    • OK
  • Make sure no particular namespace is specified.
    • What does this mean? I have "Standard" selected (which is Artikel apparently). Should I unselect "Standard"?
  • Query.
    • OK
  • 20 of 223 results should be shown.
    • I see 0 results here (Keeping Standard selected or removing it.)

I have to select Vorlage explicitly to see some results but then if I click next page the namespace selection is properly kept (with or without Advanced Search enabled). I don't see anything wrong.

Could you provide us with the following information:

  • you have the Advanced Search interface enabled or disabled
  • if by using the incognito mode in your browser (or making sure you are not logged basically) you can still reproduce the bug
  • paste the search URL just after hitting "search" the first time
  • paste the search URL just after clicking next page

Thank you.

  • &profile=advanced might be ten years old and offers a selection of namespaces; search is done in all namespaces otherwise.
  • &profile=default is the opposite; small classic form but ns=0 searched only.
  • There are also &profile=images and &profile=all which last searches all namespaces by small form.

The behaviour is identical for both classic form or recently introduced Extension:AdvancedSearch support or switching off that advanced feature which became possible a week ago.

From the URL mentioned nothing should be selected.

You might have selected anything by default which is not helpful to reproduce. No idea what Standard might mean. In English, the article space is identified as (Main).

URL as asked (classic mode):
https://de.wikipedia.org/w/index.php?search=incategory%3AVorlage%3AFremdsprachenunterst%C3%BCtzung+insource%3A%2Finvoke%3AVorlage%3Alang\|full%2F&title=Spezial%3ASuche&profile=advanced&fulltext=1

Consecutive links broken, since they do not contain &profile=advanced.

Same story with recent Extension:AdvancedSearch not disabled:
https://de.wikipedia.org/w/index.php?search=incategory%3AVorlage%3AFremdsprachenunterst%C3%BCtzung+insource%3A%2Finvoke%3AVorlage%3Alang\|full%2F&title=Spezial%3ASuche&profile=advanced&fulltext=1&advancedSearch-current={}

Follow-up at offset=20 does not contain profile:
https://de.wikipedia.org/w/index.php?title=Spezial:Suche&limit=20&offset=20&search=incategory%3AVorlage%3AFremdsprachenunterst%C3%BCtzung+insource%3A%2Finvoke%3AVorlage%3Alang\|full%2F&advancedSearch-current={}

Sorry, this is puzzling me but I fail to reproduce...

Please let us know if you can reproduce the issue while being logged out. If you can't reproduce we might want to investigate what kind of user settings may cause this behavior.

Easy way to reproduce: Switch off JavaScript.

  • Wiki needs to work without JS, e.g. for security reasons.
  • Even without JS the basic server functionality is to be delivered.

Apparently major changes came into effect one month ago, together with that AdvancedSearch extension.

One drastic modification:

  • What if no namespace is specified explicitly?
  • Behaviour for a decade has been that all namespaces are searched.
  • Now no namespace, therefore nothing is searched.
  • That is ridiculous and not intended by users.

Behaviour of form changed, at least with &profile=advanced mode.

  • Old behaviour has been to keep ticboxes as they are.
  • Now somebody is ticking the article (ns=0) option unconditionally, if no namespace is selected.
  • I do not want this.
  • If I want to search in project or template or category namespace, I do not want that some automatism is polluting my result set by article namespace.
  • This is patronizing. No algorithm is permitted to tic options I do not ask for.
  • With AdvancedSearch extension an undesired article selection is arriving in selection area, even if my business is in programming back office and I do not want other pages.
  • It is hard to get rid of the undesired article selection without first specifying another namespace. That is confusing.
  • I should be free to select no namespace, not dominated to search in ns=0, and if I do not ask for any particular namespace then search all again as before.

I tested now for nearly one hour with various browsers and user environment.

  • In my standard experimental and development mode apparently there were events colliding, blocking the entirely new automatisms.
  • After changing browser and user profile other observations could be made.

I failed to find any documentation of modified search behaviour, nor any communication and announcing of these changes to the wiki communities.

Ping AdvancedSearch extensions folks so that they can weigh in.

  • What if no namespace is specified explicitly?
  • Behaviour for a decade has been that all namespaces are searched.
    • It does seem impossible to not select something as whenever the list of requested namespaces is not provided it should get it from user setttings which use wiki default if no customizations have been made. I don't see anything (yet) that changed in that sense. I probably overlook something but as I fail to reproduce the issue it's getting hard to understand where to look at.
  • Now no namespace, therefore nothing is searched.
    • If no namespace are requested default ones should be used, I suspect that your settings got corrupted somehow, you did not tell us if you can reproduce the bug when being logged out (if this happens only using your current User).
  • That is ridiculous and not intended by users.

Thanks for your investigations and your help.

The traditional approach has been:

  • &profile=default → ns=0
  • &profile=images → ns=6
  • &profile=all → every per project ns listed in URL
  • &profile=advanced → if no ns specified, then all ns, otherwise those explicitly mentioned.

There have been URL collected over years on project pages and in various discussions. These are to be maintained. They are relying on previous behaviour.

That situation is not possible to be reproduced now interactively with activated JavaScript on regular browsers, since the running modified script will always attempt to select at least one namespace now, and even if undesired this is ns=0.

However, the PHP response of the server changed. While with &profile=advanced but no ns= all ns were searched, but none today. And this goes for any URL, just cooked by interactive form or existing URL within a page or bookmark. And this answer is absolutely independent of browser or JS or preferences.

BTW, no user saved namespace query has been involved in this task ever.

Thanks, could we reformulate the problem as:

When using the advanced profile and when the user explicitly select no namespaces Special:Search should assume that:

  • expected behavior: all namespaces are requested.
  • actual behavior: default list of namespaces are requested (default here means wiki defaults or user defaults if present)

At a glance I don't see anything that changed in the code that indicates that this behavior changed recently but I will continue to investigate in this direction.

Do we agree that the generated search links are not problematic and that we can rename the ticket to reflect this?

Thank you!

I am not sure about renaming the ticket.

On the question about previous behaviour without explicit namespace given:

  • I made a quick search in German Wikipedia and found many URL which were supposed to show results, even more results out of default article space, without providing &ns= parameter or namespace mentioned in search expression.
  • They were referred to in discussions, and the number of hits produced at that time were mentioned by participants.
  • Even on my personal talk page I found such URL, dumped some weeks ago:

https://de.wikipedia.org/w/index.php?search=insource%3A%2Fref+name%3D%5C%22%5Chttp%3A%2F&title=Spezial:Suche&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D

  • This yields to 15 matches in article talk and user space only today, since such things have been removed from dozens of articles meanwhile.
  • JavaScript should be disabled to check server response without smart algorithms intervening.
  • If artificially narrowed to &limit=10, then reloading and asking for next 10, the PHP generated URL is broken. &profile=advanced is missing in consecutive link, therefore the consecutive page is assuming profile=default:

https://de.wikipedia.org/w/index.php?title=Spezial:Suche&limit=10&offset=10&search=insource%3A%2Fref+name%3D%5C%22%5Chttp%3A%2F

  • Some time ago it has been possible to scroll through the result set, since &profile= has been preserved.
  • This seems to me a pure PHP issue.
  • Personally, I do not use user defaults for namespaces, and I am not aware how wiki defaults look like for a Wikipedia.

Therefore, to answer the question regarding renaming task: No, not really, not for now, the entire circumstances are too mysterious; PHP, JavaScript, recent changes, but which? And which causes? And which automatic namespace tics on form? Very confusing at all.

Lea_WMDE subscribed.

Hi all,
I double checked the role of Advanced-Search and I don't think this is related: The line of code that decides on which namespaces are being selected if no namespaces are selected is from 2016. The current behavior is that in case of no namespaces the user preferences for namespaces are selected (which is the wiki preferences if no user preference exists I think, and that is (Article) in de-wiki, (main) in en-wiki).
But just to be super sure: @PerfektesChaos the behavior that you described (not being able to see the full result list after 20) happens no matter whether Advanced-Search is enabled or not?

https://de.wikipedia.org/w/index.php?search=insource%3A%2Fref+name%3D%5C%22%5Chttp%3A%2F&title=Spezial:Suche&profile=advanced&fulltext=1&advancedSearch-current=%7B%7D

  • For me, this yields to 15 matches in article talk and user space only.
  • I presume you do not see any match?

@Lea_WMDE current behavior is that in case of no namespaces the user preferences for namespaces are selected (which is the wiki preferences if no user preference exists I think, and that is (Article) in de-wiki

Well, wrt to the line of code in rMW /includes/specials/SpecialSearch.php

# Fallback to user preference
$nslist = $this->searchConfig->userNamespaces( $user );

Please note that current preferences sheet does not offer any namespace choice.

  • The only stuff related to PHP page generation are title spelling issues.

Let us step back in history: local help page as of 2014

  • Please look for: Die Namensräume, in denen du suchen willst, wenn du keine abweichenden Angaben machst. Standardwert: Artikel deaktiviert Anfang 2014
  • That reads as: namespaces you like to search if no deviating request. Default: article. Discontinued begin of 2014.

It turns out that there are specific options in effect for me that I cannot control, even more not get informed about current state, not even know that they are in effect.

  • Recent visitors, like you, did not set this option, and I assume you were not active before 2014.

If I refer to an URL and the result is not influenced by well known personal preferences the addressee might set I do expect that everybody will see the same result of URL in contemporary hours.

  • I would call this a mess.
  • Remedy: Influence of the option until 2013 needs to be removed from code.
  • Otherwise, if this shall be kept in rMW core supporting a preference not present in rMW core, by server script all WMF user preferences need to be cleaned.
  • This is first step (of three) to re-organize this confusing namespace strategy.
  • At least it makes unique and reproducable and predictable results.

I'm not saying there is no bug. But I tried to reproduce it as described and wasn't able to. Neither with nor without JavaScript. I do have some questions, though:

  • In one post it is said that "&profile=default → ns=0". As far as I'm aware of this is not entirely correct. The "default" profile only defaults to the main namespace for anonymous and new users, as well as users who never changed their default. Both the Advanced-Search interface as well as the original non-JavaScript interface provide a checkbox to "Remember selection for future searches". This changes the "default" profile for the user.
  • The current task description asks to "Make sure no particular namespace is specified" when clicking the search button. It seems the assumption is that something reproducible would happen then. But this is not the case. Instead, the users default will be used, whatever that is.
  • The task description goes on and claims "that by nature of incategory all results are from template namespace". I'm not aware of such a feature. There are some keywords that are aware of namespaces, but incategory: is not one of them. The category name "Vorlage:Fremdsprachenunterstützung" in the example is just that, a category name. No code is aware that the sub-string "Vorlage:" refers to a namespace with the same name.

This leads to the question how your current default profile looks like, @PerfektesChaos?

For me, this yields to 15 matches in article talk and user space only. I presume you do not see any match?

Indeed. According to the link no namespace selection was made, which means the users default will be used. For me this is the main namespace, and there are zero results.

As far as I'm aware of Special:Search behaves like this for a long time, long before Advanced-Search.

HNY.

First, I never used the “remember” feature, and I learnt now that my old fellow is reused by the advanced form. I simply did not know that my preference made a decade ago and no longer mentioned in preferences has still influence. On the Special:Search advanced form the int:powersearch-remember ticbox is not marked and does not tell me that I am reusing anything I want to be remembered.

From API we do know RESTful Web services.

  • In a nutshell this does mean that a single URL shall contain all information to reproduce the query.

If a search URL is mentioned in some page, stored as bookmark or wherever, the contemporary result set has to be more or less the same for everybody.

  • In a discussion, every participant shall talk about the same thing, independent of personal configuration.
  • Presentation, decoration, initial sorting and so on may depend, but significant content must not be affected.
  • Only external changes may influence the result set.
  • Editing a page and removing a category or changing keywords are external influences.
  • Some minutes earlier or later the search tree will be updated. That is said by “more or less”.

It does also mean that any user preference like exact title spelling is to be part of the generated URL.

  • No user preference of who is currently calling that fowarded URL is permitted to influence the result set.
  • That has been the built-in sin of the historic approach.
  • Two people look simultaneously at the same URL result and talk about different things, since it is the same URL but not the same URL result.

Now returning to our namespace issue.

  • The URL is expected today to contain some explicit &ns settings.
  • On &ns42=1 there ar two ways to deal with:
    • Could be ignored and skipped.
    • If someone asks for matches in non-existing namespace it is reasonable if the result set is empty.
  • If evaluation of explicit namespaces yields no (valid) namespaces, fallback comes into effect (also preserving URL collected over a decade in project pages):
    • &profile=default ns=0 (site default; usually configured as major content namespaces)
    • &profile=images ns=6
    • &profile=all every ns
    • &profile=advanced every ns
    • no valid &profile= is profile=default
  • A mismatch might occur: Three explicit &ns and &profile=all.
    • I would drop the profile. Manually edited URL.

The headline of this task is still valid.

  • While the page URL is asking for &profile=advanced the consecutive (previous 20 | next 20) (20 | 50 | 100 | 250 | 500) discards the &profile= in the follow-up page, that one assuming profile=default.

&profile=default ns=0 […]
&profile=advanced every ns

As I tried to explain this is not what happens, and never happened as far as I know. Both the "default" as well as the "advanced" mode load the users preferred namespaces when there is no other namespace selection provided.

I dug into the relevant code, and found this: https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/specials/SpecialSearch.php$621. It means:

  • The pagination links preserve the &profile=… parameter in all modes except the "advanced" mode.
  • In "advanced" mode, a series of &ns…=1 parameters is added instead. The profile parameter is omitted. This is done because the user is expected to manually select one or more namespaces.
  • If no namespace is selected, both the search results as well as the pagination links all fall back to "default" mode, which is achieved by omitting the profile parameter.

The code exists like this at least since 2011.

It might be that some code in the newer CirrusSearch backend falls back to the main namespace instead of the users preference, causing a mismatch between the search results and the pagination links. @Smalyshev, are you able to check this?

Personally I agree that a more "RESTful" behavior that gives reproducible results for all (e.g. pagination) links on Special:Search would be nice. However, as far as I can tell this was never how Special:Search behaves. "Fixing" this issue would mean to potentially change or even remove the behavior of the "default" profile. This will break on-wiki links, users bookmarks, and such. I don't know who would even be in the position to make a decision like this.

I'll need to check into this but I don't think Cirrus would just default to main namespace... In general, I'd expect SpecialSearch to deal with user defaults and so on and Cirrus always get a defined set of namespaces to search, but it may be not the case in reality.

Would it be fair to rename this something like the following:

  • Two users visiting the same search link have different searches run

My reading here is that the expectation was a user could share a search link with someone else and they would see the same results, but that expectation was broken?

EBjune triaged this task as Medium priority.Jan 17 2019, 6:15 PM

Would it be fair to rename this something like the following:

  • Two users visiting the same search link have different searches run

My reading here is that the expectation was a user could share a search link with someone else and they would see the same results, but that expectation was broken?

Well, yes, this is one of the problems.

  • One is as described by the current task description: When starting a search with &profile=advanced, the URLs offered to show next page or more results drop &profile=advanced and fall back to &profile=default, now loosing all namespaces other than main content.
  • Another point that turned out is that the historic implementation with dependencies on user preference was a big sin, causing very confusing behaviour.
  • The user preference may be used to complete the very first call, but the URL of the result page as shown by browser needs to mention explicitly all circumstances that have been used to generate this page, leaving no leeway for user specific defaults.
  • And any URL offered to scroll or widen the result set needs to mention explicitly the same circumstances.