Fields can have multiple values, in several different ways:
- some file metadata (EXIF etc) fields can have multiple values
- we parse some data from HTML code of license templates; some images have multiple license templates
- sometimes the same property can have a value from both the file and the description
- categories, and any properties based on categories, are in many-to-many relation with images
- (there are also multi-languaged values which can be multivalued when all languages are requested, but we already deal with that)
Right now we handle this in a very hacky way for some fields (e.g. concatenate categories with "|") and don't handle it at all for most (one of the values is selected by some random aspect of the code). This will be especially problematic if we want to use CommonsMetadata as a helper tool for the Wikidata migration.
A proper multivalue handling should probably be able to:
- indicate whether or not the given field is multivalued
- indicate the source (e.g. if one of the values comes from the file, the other from the description, we should be able to somehow tell that)
- synchronize properties somehow (e.g. a multilicensed image will have multiple license names and multiple license URLs; the user of the API has to be able to match the right name to the right link)
Version: unspecified
Severity: normal