Page MenuHomePhabricator

Expose all useful file metadata currently in API to search engine
Open, LowestPublicFeature

Description

A recent topic appeared on Extension talk:CirrusSearch. The suggestion is to expose file metadata already available via the API to the search engine.

Suggested fields:

  • Video or audio of a certain "playtime"
  • Framecount
  • Looped images
  • Duration
  • Frame rate,
  • creation date

The example use case given: "The usecases are numerous, for instance for writing an article about world war one, one may want to filter images from that period. When looking for videos to add to a page one may want short animations to showcase the concept, e.g. a moving hurricane , and not be interested in very long videos. The same applies to animated images because in some cases they illustrate the concept better than others, and in some cases they don't, so it might be good to filter those either way."

Also mentioned were fields from the Commons Metadata API

  • GPSLatitude - latitude
  • GPSLongitude - longitude
  • LicenseShortName - short human-readable license name
  • LicenseUrl
  • DateTimeOriginal

Related: T150809: Expose EXIF data to search engine

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

We cannot support arbitrary key/value pairs because that's now how the Elasticsearch data model work. So, supporting all the data currently contained in imageprop is not feasible.

That said, If there is a specific list of things that are requested to be supported, then we can consider those. As it stands, there is only a "set of suggestions" rather than a "we want these specific things", so it's hard to action that.

@Liuxinyu970226 Thanks for tagging this for inclusion in Tech News! It'd be great to get input from that.

I want to note here that my early assessment of task priority above doesn't mean we'll necessarily be able to drop all our ongoing work to do this task when some input is provided. Of course, constructive input is always valued, and will be very useful when this task eventually gets worked on. :-)

Was there any feedback from the 'recently announced' announcement? If so, were there a set of specific things to be added?

Deskana moved this task from needs triage to search-icebox on the Discovery-Search board.
Deskana added a project: SDC General.

In practice, this is unlikely to be worked on soon, because any solution would be a special case which is later undone by SDC General. So this needs to wait for that.

MPhamWMF subscribed.

Closing out low/est priority tasks over 6 months old with no activity within last 6 months in order to clean out the backlog of tickets we will not be addressing in the near term. Please feel free to reopen if you think a ticket is important, but bare in mind that given current priorities and resourcing, it is unlikely for the Search team to pick up these tasks for the indefinite future. We hope that the requested changes have either been addressed by or made irrelevant by work the team has done or is doing -- e.g. upgrading Elasticsearch to a newer version will solve various ES-related problems -- or will be subsumed by future work in a more generalized way.

JJMC89 lowered the priority of this task from Low to Lowest.
JJMC89 changed the subtype of this task from "Task" to "Feature Request".