Page MenuHomePhabricator

[4 hours] Investigate replacing geosearch with search keywords
Closed, ResolvedPublic

Description

Cirrus Search's Geo Search feature went live on Thursday, 14th July 2016. Special:Nearby currently uses the geosearch generator, provided by GeoData, but, given the examples in the Geo Search documentation, this could be replaced.

Questions

  1. Can Geo Search really be used as a replacement for the geosearch generator?
  2. How many daily "nearby queries" are performed?
  3. Are there any performance concerns with Geo Search?
  4. Would there be any such concerns under increased load?

...

Outcomes

  1. An email is sent to mobile-l summarising the investigation
  2. If appropriate, follow-on tasks are created

Event Timeline

  1. The main difference will be the ordering of results. The full text geo search feature is ordering results based on number of incoming links, and usage of particular templates (like Featured_article), in addition to additional full text features that can be used (filtering by categories, looking for words in articles, etc.). The existing geosearch generator sorts everything by distance.
  1. I don't have the number of dailies readily available (although they could be pulled from the logs in hive), but graphite reports a daily cycle ranging from 5 to 15 QPS. For comparison full text ranges from 300 to 700 QPS.
  1. Due to sorting by distance, the pre-existing geosearch generator can be a bit more expensive than the new method. Calculating that distance ends up being relatively expensive. That is why the generator always had a strict limit on the size of area it could return results for.
  1. I don't have enough data yet from usage of the new feature, but the initial numbers don't look too bad. The old generator has a p50 of 30-40ms, and a p95 of 90-110ms. The new one (with very minimal data) is showing a p50 of 10-60ms and p95 of 20-100ms. The time spent on the search cluster seems within the same ballpark as the previous, and we currently have room to grow usage of the cluster. Overall I don't think there is much to worry about here.

@EBernhardson: Thanks for doing most of the work for us! I think @Nirzar would be most interested in/best to respond to your response to #1.

  1. Turns out we do have the number of dailies available, 800k - 900k/day. https://searchdata.wmflabs.org/metrics/#kpi_api_usage

The main difference will be the ordering of results. The full text geo search feature is ordering results based on number of incoming links, and usage of particular templates (like Featured_article), in addition to additional full text features that can be used (filtering by categories, looking for words in articles, etc.). The existing geosearch generator sorts everything by distance.

This is one of the biggest problems we're trying to solve for nearby feature. it has good potential but what lacks right now is the content. we need to surface relevant and important content instead a list of buildings with stubs. We're working on this for Wikipedia iOS app and we wanted to do things like "museums nearby", "parks nearby" so as to make the content more interesting to browse and is relevant to you as a reader.

In the first go, even if we filter the results with basic level of article quality, it will cut down the uninteresting nearby articles by a lot.

here's some work design has done for next release for nearby on iOS and we can definitely try and replicate this on web. w/o the map for now but at least the content.
https://www.mediawiki.org/wiki/Wikimedia_Apps/Team/iOS/Nearby

jhobs renamed this task from Investigate replacing geosearch with CirrusSearch's Geo Search to [4 hours] Investigate replacing geosearch with CirrusSearch's Geo Search.Jul 21 2016, 8:12 PM
jhobs triaged this task as Medium priority.
jhobs moved this task from Incoming to Triaged but Future on the Web-Team-Backlog board.
MaxSem renamed this task from [4 hours] Investigate replacing geosearch with CirrusSearch's Geo Search to [4 hours] Investigate replacing geosearch with search keywords.Mar 30 2017, 4:47 PM
MaxSem subscribed.

Changed title because this functionality has been moved to GeoData.

If we evaluate this during kick off and decide is not important we should decline this task.

Change 354943 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/extensions/MobileFrontend@master] POC: Use geosearch for nearby results

https://gerrit.wikimedia.org/r/354943

Jdlrobson added a subscriber: ovasileva.

Performance concerns aside, having looked at this, this doesn't seem like something we could just swap out for Special:Nearby
This is a big change.

From what I've heard talking to people who use this feature, they use it to discover things near to them to edit. On Wikidata for instance, Nearby is very popular.

Switching to show results based on relevance throws away the editing use case of this feature.

Current view:

Screen Shot 2017-05-21 at 12.45.54 PM.png (659×916 px, 151 KB)

With search:

Screen Shot 2017-05-21 at 12.44.02 PM.png (677×718 px, 135 KB)

We might want to consider incorporating both in some way but with regards to the spike I don't suggest we replace.

I should note, last time I checked Nearby was one of our least popular main menu items (based on clicks tracked via the Schema:MobileWebMainMenuClickTracking schema.

@ovasileva @Nirzar we can deploy this change on reading web staging if you want to have a play with this.

We may want to split up Nearby into editing/reading use cases.
I think a similar problem exists with the random feature. Most editors use it to discover pages to edit, readers may want to use it to discover high quality pages to read.

Change 354943 abandoned by Jdlrobson:
POC: Use geosearch for nearby results

https://gerrit.wikimedia.org/r/354943

Can we get a resolution on this?
Doing nothing is always the cheapest option so we could just resolve this and leave things as is.
Is there anywhere it would make sense to document this?