Reader gets cacheable search results
Closed, ResolvedPublic2 Estimated Story Points
Actions

Description

"As a Reader, I want search results to be cacheable on a very short time span, so if I make a typo and correct it my previous search results are retrieved much faster."

This is for typeahead search typos. So if I type "Washingt" I get search results for that word, and if I type "p" next, I'll get very few results for "Washingtp", and typing backspace will initiate a search for "Washingt" which will be cached. Apparently a big hassle for end users when it does too many searches. Cache window should be somewhere between search index window and the time to identify and correct a typo (<60s, maybe much less).

@eprodromou to determine the correct cache window.

Details

	Subject	Repo	Branch	Lines +/-
	SearchHandler: emit Cache-Control header.	mediawiki/core	master	+54 -8
	SearchHandler: emit cache control header	mediawiki/core	master	+13 -1

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T49145 Formally deprecate jQuery UI after we've stopped using jQuery UI in extensions and core
Open	None	T100270 Replace use of jQuery UI and MW UI with OOUI across all Wikimedia-deployed extensions and core
Open	None	T85394 Use OOUI suggestions/autocompletion components only (instead of jquery.suggestions, jquery.ui.autocomplete)
Open	None	T125725 [epic] Update autocomplete search box with metadata and remove and delete the old searchSuggest system
Open	None	T177251 Dead keys prevent autocomplete in search box
Resolved	ovasileva	T244392 [GOAL] Deploy the new Vue.js search experience
Resolved	Volker_E	T249299 [Epic] Build the new Vue.js search experience
Resolved	phuedx	T244287 Build the Vue.js search component network client
Invalid	None	T229661 Core REST API in MediaWiki
Resolved	• eprodromou	T245672 Search endpoint enhancement
Resolved	daniel	T245675 Reader gets cacheable search results

Event Timeline

• eprodromou created this task.Feb 19 2020, 9:40 PM

Feedback from @Anomie :

seems straightforward enough if SearchHandler doesn't already do it. For the Action API you should be able to get this by including &maxage=5&smaxage=5 in the query.

See https://gerrit.wikimedia.org/g/mediawiki/core/+/6a5a1145b3f8acded6d35f6acac4ac8f29186658/includes/api/ApiMain.php#386 for the relevant Action API function.

• eprodromou moved this task from Backlog to Next Sprint on the Platform Team Workboards (Green) board.Feb 26 2020, 7:20 PM

WDoranWMF set the point value for this task to 2.Feb 26 2020, 7:59 PM

WDoranWMF moved this task from Next Sprint to Ready on the Platform Team Workboards (Green) board.Feb 26 2020, 8:11 PM

• eprodromou triaged this task as Medium priority.Feb 26 2020, 8:12 PM

@Pchelolo did some good research on this, from which I extrapolated with some heuristics. We have a 2 minute time frame for search indexing, with a wide variation. I'm going to say we take half that period, 1 minute, as the best-case cache window.

I'd prefer that this window was set with a configuration variable and defaulted to 0 (no caching) for developer systems or small installations.

daniel claimed this task.Feb 27 2020, 11:22 AM

daniel moved this task from Ready to Doing on the Platform Team Workboards (Green) board.

We have a config setting that controls caching of the result of the OpenSearch action API module, which is used by the search widget: $wgSearchSuggestCacheExpiry. This defaults to 1200 seconds (twenty minutes). In production, it's 10800 seconds (two hours). ApiOpenSearch also sets this cache to public, so it's in Varnish, protecting the search backend from traffic. I expect this is especially important for short prefixes. Without this, the search backend would be hit with hundreds of searches per second for short prefixes like "e" and "s" and "n", etc. We had this happen with the completion search on wikidata, causing massive load and timeouts.

If the target use case is indeed type-ahead, we should follow the same pattern as ApiOpenSearch. We could try and be smart and say we only publically cache prefixes up to three letters, and make the cache for longer input private. That would prevent flooding of the Varnish layer while still protecting the search infrastructure. But this strategy should be adopted by all APIs serving type-ahead completions.

By the way - as far as I can tell, the search endpoint is not currently suitable for typeahead suggestions, since it doesn't do prefix searches. It's based on the searchTitle() and searchText() methods in SearchEngine, which type-ahead completion should use completionSearchWithVariants(), as ApiOpenSearch does.

Agreed that the ApiOpenSearch pattern sounds far superior (I haven't actually looked at the code). It also sounds like more effort than we'd discussed for the current task. Fortunately, traffic levels to the the current Core REST API search endpoint are modest, so this isn't melting down the servers. I'm not aware of any reason that traffic to this endpoing is imminently about to skyrocket. So this sounds to me like a high priority concern, but not an Unbreak Now one either.

Suggestion: complete the current task in this sprint as it is currently described, using a hard-coded cache time rather than a configured one. This seems to make the task straightforward, and (assuming we cache privately so no varnish flooding) makes the current endpoint slightly less bad. Then also file a separate task for reimplementing the endpoint using the techniques from ApiOpenSearch module, and pick that up in the next sprint.

@daniel brings up a great point about prefix search versus full-text search. I think what we need for this particular epic is prefix search. I've broken out a new ticket T246387 for making that work. I called it 'title search', but that might be imprecise.

In our 1:1 Daniel and I compared the caching for Special:Search and the type-ahead widget. The special page has 0 caching, and the type-ahead widget has 2-hour caching. So it probably makes sense to reflect those values for /search/page and /search/title respectively.

Change 575368 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] SearchHandler: emit cache control header

https://gerrit.wikimedia.org/r/575368

gerritbot added a project: Patch-For-Review.Feb 27 2020, 10:03 PM

Following Evan's comment above, I have made this a parent task of T246387. Caching will not be implemented for the existing search/page endpoint, but for a future search/title endpoint. So we can't work on this until we have the search/title endpoint.

daniel mentioned this in T246377: ObjectFactory needs a way to inject configuration settings into objects..Feb 28 2020, 11:03 AM

daniel added a subtask: T246377: ObjectFactory needs a way to inject configuration settings into objects..

daniel removed a parent task: T246387: Reader searches for a title prefix.

daniel moved this task from Doing to Blocked on the Platform Team Workboards (Green) board.Feb 28 2020, 11:10 AM

WDoranWMF moved this task from Blocked to Ready on the Platform Team Workboards (Green) board.Mar 25 2020, 6:50 PM

daniel moved this task from Ready to Doing on the Platform Team Workboards (Green) board.Mar 25 2020, 8:56 PM

Change 583458 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] SearchHandler: emit Cache-Control header.

https://gerrit.wikimedia.org/r/583458

daniel moved this task from Doing to Waiting for Review on the Platform Team Workboards (Green) board.Mar 25 2020, 9:47 PM

Change 575368 abandoned by Daniel Kinzler:
SearchHandler: emit cache control header

Reason:
Ia8366684203381b6c4dc55669a6877e53e9ffe40

https://gerrit.wikimedia.org/r/575368

Change 583458 merged by jenkins-bot:
[mediawiki/core@master] SearchHandler: emit Cache-Control header.

https://gerrit.wikimedia.org/r/583458

ReleaseTaggerBot added a project: MW-1.35-notes (1.35.0-wmf.27; 2020-04-07).Apr 6 2020, 4:00 PM

Maintenance_bot removed a project: Patch-For-Review.Apr 6 2020, 4:11 PM

daniel moved this task from Waiting for Review to Done on the Platform Team Workboards (Green) board.May 25 2020, 8:51 AM

Naike closed this task as Resolved.Jun 5 2020, 3:30 PM

• eprodromou removed a subtask: T246377: ObjectFactory needs a way to inject configuration settings into objects..Jun 15 2020, 3:42 PM