Page MenuHomePhabricator

Video2commons scrubbing metadata
Open, Needs TriagePublic

Description

Per this thread, Video2commons is scrubbing metadata. Please fix.

Event Timeline

While jpeg files nearly always come with some metadata i.e. exif, most MP4 files and the transcoded webm files resulting from these in most cases have nearly none. Also exiftool can read, but not write, metadata of video files. ffmpeg however can read and write metadata of video files. And video files can contain a lot of metadata (only not in exif format). Videos can come with copyright information, information about the content (title,location, city, country) and notes. But even if a webm file uploaded to commons contains masses of metadata, the MW software will only show the audio codec and video codec version numbers as metadata at the bottom of the file description page at commons. If you download the original file you can still see the metadata with ffmpeg or exiftool. (As far as I am aware, this is not the case with the transcoded versions).

There are actually three problems:

  1. video2commons strips metadata from videos
  2. MW does not show existing metadata of video files
  3. MW transcoding of videos does not keep metadata
  1. is probably easiest to fix: set the options in ffmpeg to copy the metadata from the source mp4 file to the resulting webm
  1. should be easy as well: transcoding can copy the metadata from the uploaded file to the transcoded versions (ffmpeg)
  1. is probably more difficult.

If you can tell me which flags I should add to ffmpeg I'm happy to add that.

I can only provide this example of adding metadata to a webm-file:

ffmpeg -i input.webm -f ffmetadata -i metadata.txt -c copy -map_metadata 1 output.webm

Maybe @Sdkb can give an example of a mp4-file with metadata

This is running up pretty fast against the limits of my technical knowledge, unfortunately. What I know is just that when I tried to upload videos like File:Contra dancers at the 2019 Flurry Festival.webm (by downloading from my Google Photos account and then uploading via Video2Commons), the metadata I'd have liked to have seen (e.g. date, location) got scrubbed. Doing the same process with photos rather than videos (without Video2Commons) preserves the metadata, so I'm pretty sure the loss is happening while using it to convert format.

The Contra dancers... video downloaded from commons does indeed contain not metadata apart from audio and video codec information. The upload comment states it was uploaded with a tool from labs. But there is no information on the original file? What metadata did the original file contain?

Videos, i have upload (native webm upload) do contain metadata (if the original file is downloaded from commons, but not the transcoded versions delivered by commons, and not in the metadata section of the file description page)

@Sdkb: What is the URL of the source file?

@C.Suthorn, for privacy reasons I'm not willing to link to the URL from my Google Photos account.

@Sdkb can you provide another example of an mp4-file that was uploaded to video2commons and the webm-file on commons that resulted?

Aklapper changed the task status from Open to Stalled.Jul 28 2021, 9:21 PM

@Sdkb: Could you please answer the last comment? Thanks in advance!

@Aklapper and @C.Suthorn, my process for uploading a video I've taken to Commons is always through Google photos, so I can't provide an example different than the one I already did. I've provided a quite thorough description of the issue at this point that should be plenty enough for anyone who wants to to replicate it and work toward solving it.

Sdkb changed the task status from Stalled to Open.Aug 3 2021, 4:09 PM

There's never an example of an mp4 file involved. The previous given example File:Contra dancers at the 2019 Flurry Festival.webm is a webm after transcode. I cannot replicate a transcode with the result and no source.

By the way, File:Contra dancers at the 2019 Flurry Festival.webm is from videoconvert not video2commons.

zhuyifei1999 changed the task status from Open to Stalled.Aug 3 2021, 8:11 PM

@zhuyifei1999 oh oops, that one was from a while ago so I must've misremembered which tool I used; apologies about that (it seems videoconvert may have the same scrubbing problem?). Anyways, I believe I also encountered the issue more recently with https://commons.wikimedia.org/wiki/File:Claremont_Taiko_performance.webm. I tried to upload the original file with this comment for comparison, but it gave an error. Again, all one needs to do to replicate the issue is to take an mp4 video file with metadata, upload it to Commons via video2commons, and notice that the metadata has disappeared.

Sdkb changed the task status from Stalled to Open.Aug 4 2021, 7:22 PM

@zhuyifei1999 I have created an exymple for you: http://up-and-download.de/wiki-largefiles/drei.mp4

The resulting webm should be like:

http://up-and-download.de/wiki-largefiles/Protest_by_the_»Hedonistische_Internationale_-_Sektion_Wilde_Möpse«_in_Berlin-Kreuzberg,_Mariannenplatz,_in_support_of_topfreedom_after_an_incident_at_the_Plansche_playground_2021-07-10_32.webm

that still needs to be revisonuploaded to File:Protest_by_the_»Hedonistische_Internationale_-_Sektion_Wilde_Möpse«_in_Berlin-Kreuzberg,_Mariannenplatz,_in_support_of_topfreedom_after_an_incident_at_the_Plansche_playground_2021-07-10_32.webm

@zhuyifei1999 have you tested the example file? I need the server space back.

I downloaded it. I currently don't have time to test it however.

I believe the ffmpeg arguments for copying metadata are:

-map_metadata 0 copy global (as in not from the codec streams) metadata

and for mp4 originals you might have to add -movflags use_metadata_tags

The minor problem with this can be (from what i read) that some formats don't have global metadata and thus you actually need to read from a stream. But that is all very input file dependent, so it might be a bit of a mess.

BTW. "MW does not show existing metadata of video files" is T49487