Maniphest T190010

JavaScript redirect shows irrelevant internal information in search page
Closed, ResolvedPublic
Actions

Description

When I search for something that doesn't exist, e.g. https://en.wikipedia.org/wiki/Special:Search?search=zybjykht, some JavaScript code changes the displayed URL to add &searchToken=.... This is pretty inconvenient because it means I can't easily edit the URL to fix my search query.

In most cases this kind of fake redirect is useful, but here it's kind of irritating and just adds irrelevant information to the URL -- a passed in searchToken just gets ignored, so as far as I can tell this is just internal information.

Details

	Subject	Repo	Branch	Lines +/-
	Only add searchToken to url when something is clicked	mediawiki/extensions/CirrusSearch	master	+15 -4

Customize query in gerrit

Related Objects

Mentioned In: T275901: Notifications cannot be displayed from search result pages in mobile version
Mentioned Here: T217445: AdvancedSearch should always create URLs unambiguous about the namespaces being used
P7539 TestSearchSatisfcation browser breakdown sept 1-12 2018

Event Timeline

Shachaf created this task.Mar 19 2018, 3:43 AM

Restricted Application added projects: Discovery-ARCHIVED, Discovery-Search. · View Herald TranscriptMar 19 2018, 3:43 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Its needed if the "remember selected namespaces" checkbox is selected. We should use js to remove it if that box is not set.

Bawolff removed a project: MediaWiki-Core-Router.Mar 19 2018, 3:50 AM

~~I misunderstood this bug. Appearently we have js code that uses html5 history api to inject the search csrf token into the addressbar. Which seems very silly.~~ Sorry, I made a lot of assumptions about this bug that was wrong. My bad. I was confusing with nsRemember field.

In particular it looks like this is implemented by src/mediawiki.action/mediawiki.action.view.redirect.js redirecting to the "canonical" URL.

Afaik: This token is not a csrf token and not added by MediaWiki core redirect handling.

Rather, it’s added by WikimediaEvents for use during EventLogging campaigns. I agree that it should be proactively hidden from users. It is generated on pageload (if missing) and all added to clicked links as well. But there should be no need to add it to the current url when missing. And when found, after clicking a link, we should remember to hide it on page load as well (after storing it in memory).

We’ve done the same in the past with similar campaigns so this isn’t a new issue. We’ve solved it before already.

This searchToken is primarily used to associate search requests with clicks on those search result pages. The token itself is a random string that is inserted into a log message created when a user performs a search. We then join the search logs with the web request logs to generate click logs. These click logs are fed into the machine learning system that ranks search results. It is certainly not the best way, and we can discuss what might be a better way.

Some notes:

Additional query parameters can not be added to search result urls, as that would bypass caching. There is a special wprov query parameter that varnish ignores but it doesn't seem to fit well here.
Instead the token is added to the URL of the search result page. Any link clicked then sends this token in the referrer header. This gives the direct connection between individual search events and what links were clicked
Other sites implement this as a redirect bounce. We could certainly do that but it seemed more performant to not make the extra round trip to record clicks.
We've evaluated using sendBeacon on clicks to record explicit events instead of parsing the web request logs for referrers. At that time we saw a large enough % of missing events from send beacon (caniuse still reports no sendbeacon for IE or safari) that we decided to collect via referrers.

@EBernhardson See my earlier comment. The existing methodology is not a problem from a technical perspective. The method used is fine. But it does not require it to be visible in the address bar while viewing the search page, nor after the user goes back to the SERP.

It should be solvable by delaying the change to the address bar until a link is clicked. (Verify that browsers includes changes from onclick in its referral).

I will see about getting access to browserstack so i can try out the late-URL edit and see how it works across browsers.

• EBjune triaged this task as Medium priority.Mar 29 2018, 5:25 PM

• EBjune moved this task from needs triage to Up Next on the Discovery-Search board.

EBernhardson moved this task from Up Next to Current work on the Discovery-Search board.Sep 10 2018, 5:46 PM

EBernhardson edited projects, added Discovery-Search (Current work); removed Discovery-Search.

EBernhardson claimed this task.Sep 11 2018, 5:15 PM

Pulled a test set of browsers reporting to search eventlogging for sept 1-12 in P7539. This is not the same data as is collected in the code for this ticket, but is a close enough representative sample that is very easy to query. That data additionally isn't perfect since there is a target sampling rate (x events per wiki per day) rather than some percentage of all requests, but it's probably close enough. It will essentially down-weight some of the largest wikis in the stats. Below is everything with >1% of distinct search sessions for sept 1-14, as classified by our eventlogging user agent classifier.

os	browser	%	results
win 10	chrome 68	24.8%	good
win 7	chrome 68	16.1%	good
win 10	edge 17	4.8%	good
win 10	firefox 61	3.8%	good
os x	safari 11	3.5%	good (only tested high sierra / safari 11.1)
win 8.1	chrome 68	3.4%	good
os x	chrome 68	3.0%	good (tested sierra, os x 10.12)
win 10	ie 11	2.9%	Empty referrer. https://developer.microsoft.com/en-us/microsoft-edge/platform/issues/10474810/
win 7	firefox 61	2.9%	good
win 7	ie 11	2.4%	empty referrer. see above
win 10	firefox 62 beta	2.1%	good
win 10	chrome 69	2.0%	good
win 7	firefox 62 beta	1.6%	good
win 7	chrome 69	1.2%	good
win xp	chrome 49	1.2%	good
win 10	opera 55	1.1%	good

This is missing mobile, because the satisfaction schema doesn't run on mobile. The click tracking here does so i pulled a second set of top browsers from https://pivot.wikimedia.org

Ua Os Family	Ua Os Major	Ua Browser Family	Ua Browser Major	View Count	View %
iOS	11	Mobile Safari	11	627484493	16.3%	good (via iPhone 8)
Windows	10	Chrome	68	317142004	8.2%	good
Other	-	Other	-	218996440	5.7%	???
Android	8	Chrome Mobile	68	191093373	5.0%	good (via pixel 2)
Android	7	Chrome Mobile	68	186383266	4.8%	good (via s8)
Windows	7	Chrome	68	184638473	4.8%	good
Other	-	bingbot	2	147280458	3.8%	irrelevant
Other	-	Googlebot	2	140523077	3.6%	irrelevant
Windows	7	IE	11	115056439	3.0%	empty referrer (no change)
Android	6	Chrome Mobile	68	88377582	2.3%	good (via LG G5)
Windows	10	Firefox	61	79107521	2.1%	good
Other	-	YandexBot	3	67630940	1.8%	irrelevant
Mac OS X	10	Chrome	68	61526463	1.6%	good
iOS	10	Mobile Safari	10	57610289	1.5%	good (via iPhone 7)
Windows	10	Edge	17	52727199	1.4%	good
Mac OS X	10	Safari	11	51704555	1.3%	good
Windows	7	Firefox	61	49638804	1.3%	good
Mac OS X	10	Applebot	0	49432609	1.3%	irrelevant
Android	5	Chrome Mobile	68	45409188	1.2%	good (via s8)
Android	4	Chrome Mobile	38	45251742	1.2%	couldn't find anything old than chrome 63 for android 4 in browserstack
Windows	10	IE	11	41268526	1.1%	empty referrer (no change)
Android	8	Samsung Internet	7	40202711	1.0%	good (via s8, android 7. Not available under browserstack android 8)
Windows	8.1	Chrome	68	39722506	1.0%	good
iOS	11	Chrome Mobile iOS	68	39369767	1.0%	good (via iphone 8)

Overall everything except IE 11 works, and the proposed change did not cause that. The pre-existing use of history.replaceState triggers that bug.

Change 460110 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] Only add searchToken to url when something is clicked

https://gerrit.wikimedia.org/r/460110

gerritbot added a project: Patch-For-Review.Sep 12 2018, 9:44 PM

EBernhardson moved this task from Incoming to Needs review on the Discovery-Search (Current work) board.Sep 12 2018, 9:44 PM

Change 460110 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Only add searchToken to url when something is clicked

https://gerrit.wikimedia.org/r/460110

ReleaseTaggerBot added a project: MW-1.32-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)).Sep 13 2018, 9:00 AM

EBernhardson moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.Sep 13 2018, 6:16 PM

debt closed this task as Resolved.Sep 13 2018, 9:09 PM

Krinkle edited projects, added Performance-Team (Radar); removed good first task.Sep 14 2018, 12:39 AM

Krinkle changed Risk Rating from N/A to default.

A recent update seems to have added another inconvenient redirect: https://en.wikipedia.org/wiki/Special:Search?search=zybjykht now redirects to https://en.wikipedia.org/wiki/Special:Search?search=zybjykht&ns0=1. Is this intentional?

This is an intentional feature added by the people behind AdvancedSearch. The high level goal there is for the URL to represent what is being searched. In particular if a user has a set of namespaces saved as their default search namespaces their search URL's will not be shareable. The exact implementation details are debatable, but the overall goal is reasonable. See T217445 for more details, discussion of the feature should likely also happen there.

OK, that makes sense. I was going to suggest adding the extra parameters to the beginning rather than the end of URL, so the search query is still at the end, but it looks like I can just add ns0=1 to my initial search query to avoid the redirect, which works well enough. Thanks for clarifying!

kostajh mentioned this in T275901: Notifications cannot be displayed from search result pages in mobile version.Mar 1 2021, 1:44 PM

Maintenance_bot removed a project: Patch-For-Review.Mar 1 2021, 2:10 PM

JavaScript redirect shows irrelevant internal information in search pageClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

JavaScript redirect shows irrelevant internal information in search page
Closed, ResolvedPublic
Actions