Page MenuHomePhabricator

Display standard webm metadata/tag
Open, Needs TriagePublic

Description

This is a request to extend the metadata parsing of video files to display standard metadata available in webm video files on Commons image pages.

The definitions at https://www.webmproject.org/docs/container/ take you to https://matroska.org/technical/specs/tagging/index.html for a listing of standard tags that should be accepted for display when available. Using ffmpeg when reprocessing from other formats allows the adding of metadata in this format, and EXIF readers will display these tags for webm files, but Commons currently ignores them.

Some tags have immediate and obvious value to Wikimedia Commons and reusers, such as the COPYRIGHT and URL tags, which can help to confirm the status and source of files even when separated from their original web pages or renamed.

As an example Defending_Our_Future-_Protecting_Humans_and_Animals_from_Antibiotic_Resistance.webm has embedded the following tag name/string pairs if the file is examined in http://exif.regex.info/exif.cgi:

Tag Name	COMMENT
Tag String	https://commons.wikimedia.org/wiki/User:Fae/Project_list/CDC_videos
Tag Name	URL
Tag String	https://www.youtube.com/watch?v=5VNIL3gbqfI
Tag Name	PUBLISHER
Tag String	Centers for Disease Control and Prevention (CDC)
Tag Name	COPYRIGHT
Tag String	Public Domain
Tag Name	SUBJECT
Tag String	Antibiotic Resistance
Tag Name	DATE_RELEASED
Tag String	2019-10-31

Event Timeline

Fae created this task.Nov 2 2019, 12:26 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 2 2019, 12:26 PM
Masumrezarock100 moved this task from Incoming to Backlog on the Commons board.
Fae added a comment.EditedNov 10 2019, 12:44 PM

When Wikimedia Commons generates alternate transcodes (e.g. converting a WebM audio/video file, VP9/Opus, length 20 s, 1,080 × 1,080 pixels to a smaller VP9 360P version) different tags are created for the file, which drops several of the standard Matroska entries.

For example Cúbrete la nariz y la boca al toser o estornudar (niños).webm contains the following tag/string pairs and all of these are lost in the transcode versions:

Tag Name	COMMENT
Tag String	https://commons.wikimedia.org/wiki/User:Fae/Project_list/CDC_videos
Tag Name	PUBLISHER
Tag String	Centers for Disease Control and Prevention (CDC)
Tag Name	COPYRIGHT
Tag String	Public Domain
Tag Name	SUBJECT
Tag String	CDC-TV
Tag Name	DATE_RELEASED
Tag String	2019-03-06

Raising this as a related concern, discovered today, on the presumption that it does not need a separate ticket as any addition of standard fields should be passed on within the transcode process.