Page MenuHomePhabricator

Enumerate the existing searchey API capabilities
Closed, ResolvedPublic

Description

We need to know who our external API users are and what they're doing so that we can serve their needs. We are unable to make meaningful decisions about how to change our search APIs because we don't know how much people are using our APIs, how they're using it, and what they're using it for. To do that, we need to know what APIs there are that people can use. This task is for us to explore our API and document that.

Expected output of this task is a list of API calls you can make that give you anything that could be considered search data, e.g.:

and so on.

  • Stakeholder: Discovery
  • Benefit: We'll know what APIs we're exposing, so we can measure how people are using those
  • Estimate: Hours

Event Timeline

Jdouglas claimed this task.
Jdouglas raised the priority of this task from to Medium.
Jdouglas updated the task description. (Show Details)
Jdouglas added projects: Epic, Discovery.

Note that there was some recent API documentation work. It might not relate to this, but is worth checking. The link is on the Discovery page: https://www.mediawiki.org/wiki/API:Search_and_discovery

Within core, its pretty straight forward. The things that utilize the SearchEngine class, through which cirrussearch is queried, are the two listed in the task description. Within extensions deployed to the wmf cluster the only addition is ApiFlowSearch which is queried as:

I'll poke around some more though and see about other things that arn't using CirrusSearch but can be considered "search data"

Plausibly search-ish. There are all the example queries that are part of action=query (ApiQuery* classes) on enwiki. Note that since these are example queries they provide extra parameters that are not strictly necessary, and contain dummy data.

I could attempt to prune down the above list, but need some direction on what we mean by "search data"

Is T102079: Metrics about the use of the Wikimedia web APIs that could help this task (or vice versa)? How do you go from "know what APIs we're exposing" to "measure how people are using those"?

How do you go from "know what APIs we're exposing" to "measure how people are using those"?

For posterity's sake, I will answer this question. The answer is "by assigning our analysts (Discovery-Analysis (Current work)) to do the appropriate measurements". But, they need to know what it is they're measuring first.

Resolving, as T100468#1361613 quite accurate documents this.