Page MenuHomePhabricator

Investigate slowness of imageinfo extended metadata (extmetadata) queries
Open, LowPublicSpike

Description

Imageinfo queries involving extended metadata for tens of images are very slow; for hundreds, they time out. Is something wrong here? Is caching working correctly? How can it be improved?

Event Timeline

Mholloway added a project: CommonsMetadata.
Restricted Application added a project: Multimedia. · View Herald TranscriptJun 17 2019, 5:38 PM
Tgr added a subscriber: Tgr.Jun 17 2019, 8:33 PM

Reasons I can think of off the top of my head:

  • Some other part of the imageinfo API is slow. (extmetadata is cached via FormatMetadata::fetchExtendedMetadata() but the API call itself is not.)
  • FormatMetadata::fetchExtendedMetadata() itself is slow. It has dynamic cache invalidation (even if it is a cache hit, the ValidateExtendedMetadataCache hook gets invoked) so while unlikely it is not impossible.
  • Some broken ValidateExtendedMetadataCache hook. (A bug in this recent patch, for example.)
  • The cache (correctly) getting invalidated all the time due to frequent edits coming from SDC. (Theoretically, a change in the structured data shouldn't invalidate it, but this is pre-MCR code and not slot-aware.)
  • Some bug affecting the caching logic (e.g. File::getDescriptionTouched() broke).

If the problem can be reliably reproduced, maybe profiling via X-Wikimedia-Debug could pin it down.

Tgr added a comment.Jun 17 2019, 8:36 PM

Also, the cache expiration is 30 days, so a request involving lots of cache misses would not be that unusual. The API should probably be modified to limit the number of uncached extmetadata lookups and force continuation when the limit is reached.

Mholloway renamed this task from Investigate slowness of imageinfo extended metadata queries to Investigate slowness of imageinfo extended metadata (extmetadata) queries.Jun 17 2019, 8:37 PM
LGoto triaged this task as Low priority.Jun 20 2019, 3:37 PM
LGoto moved this task from Needs triage to Backlog on the Product-Infrastructure-Team-Backlog board.
LGoto added a project: Spike.
Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptJun 20 2019, 3:37 PM