Per this thread, Video2commons is scrubbing metadata. Please fix.
Description
Related Objects
- Mentioned In
- T282755: Request of server-side-upload of iptc-metadata-added versions of existing files
- Mentioned Here
- T41105: Pushing proofreading via notifications
T411005: Show webM/Matroska comments with unknown comment keys in File page's Metadata section
T237154: Display standard webm metadata/tag
T49487: Display metadata from videos and audio
Event Timeline
While jpeg files nearly always come with some metadata i.e. exif, most MP4 files and the transcoded webm files resulting from these in most cases have nearly none. Also exiftool can read, but not write, metadata of video files. ffmpeg however can read and write metadata of video files. And video files can contain a lot of metadata (only not in exif format). Videos can come with copyright information, information about the content (title,location, city, country) and notes. But even if a webm file uploaded to commons contains masses of metadata, the MW software will only show the audio codec and video codec version numbers as metadata at the bottom of the file description page at commons. If you download the original file you can still see the metadata with ffmpeg or exiftool. (As far as I am aware, this is not the case with the transcoded versions).
There are actually three problems:
- video2commons strips metadata from videos
- MW does not show existing metadata of video files
- MW transcoding of videos does not keep metadata
- is probably easiest to fix: set the options in ffmpeg to copy the metadata from the source mp4 file to the resulting webm
- should be easy as well: transcoding can copy the metadata from the uploaded file to the transcoded versions (ffmpeg)
- is probably more difficult.
I can only provide this example of adding metadata to a webm-file:
ffmpeg -i input.webm -f ffmetadata -i metadata.txt -c copy -map_metadata 1 output.webm
Maybe @Sdkb can give an example of a mp4-file with metadata
This is running up pretty fast against the limits of my technical knowledge, unfortunately. What I know is just that when I tried to upload videos like File:Contra dancers at the 2019 Flurry Festival.webm (by downloading from my Google Photos account and then uploading via Video2Commons), the metadata I'd have liked to have seen (e.g. date, location) got scrubbed. Doing the same process with photos rather than videos (without Video2Commons) preserves the metadata, so I'm pretty sure the loss is happening while using it to convert format.
The Contra dancers... video downloaded from commons does indeed contain not metadata apart from audio and video codec information. The upload comment states it was uploaded with a tool from labs. But there is no information on the original file? What metadata did the original file contain?
Videos, i have upload (native webm upload) do contain metadata (if the original file is downloaded from commons, but not the transcoded versions delivered by commons, and not in the metadata section of the file description page)
@Sdkb: What is the URL of the source file?
@C.Suthorn, for privacy reasons I'm not willing to link to the URL from my Google Photos account.
@Sdkb can you provide another example of an mp4-file that was uploaded to video2commons and the webm-file on commons that resulted?
@Aklapper and @C.Suthorn, my process for uploading a video I've taken to Commons is always through Google photos, so I can't provide an example different than the one I already did. I've provided a quite thorough description of the issue at this point that should be plenty enough for anyone who wants to to replicate it and work toward solving it.
There's never an example of an mp4 file involved. The previous given example File:Contra dancers at the 2019 Flurry Festival.webm is a webm after transcode. I cannot replicate a transcode with the result and no source.
By the way, File:Contra dancers at the 2019 Flurry Festival.webm is from videoconvert not video2commons.
@zhuyifei1999 oh oops, that one was from a while ago so I must've misremembered which tool I used; apologies about that (it seems videoconvert may have the same scrubbing problem?). Anyways, I believe I also encountered the issue more recently with https://commons.wikimedia.org/wiki/File:Claremont_Taiko_performance.webm. I tried to upload the original file with this comment for comparison, but it gave an error. Again, all one needs to do to replicate the issue is to take an mp4 video file with metadata, upload it to Commons via video2commons, and notice that the metadata has disappeared.
@zhuyifei1999 I have created an exymple for you: http://up-and-download.de/wiki-largefiles/drei.mp4
The resulting webm should be like:
that still needs to be revisonuploaded to File:Protest_by_the_»Hedonistische_Internationale_-_Sektion_Wilde_Möpse«_in_Berlin-Kreuzberg,_Mariannenplatz,_in_support_of_topfreedom_after_an_incident_at_the_Plansche_playground_2021-07-10_32.webm
I believe the ffmpeg arguments for copying metadata are:
-map_metadata 0 copy global (as in not from the codec streams) metadata
and for mp4 originals you might have to add -movflags use_metadata_tags
The minor problem with this can be (from what i read) that some formats don't have global metadata and thus you actually need to read from a stream. But that is all very input file dependent, so it might be a bit of a mess.
BTW. "MW does not show existing metadata of video files" is T49487
In my testing when working with a single input and output file ffmpeg seemed to keep most global standard metadata.
I did the tests using ffmpeg 4.4.6 with V2C. I converted some mp4/h264 files (with added global metadata) to webm/av1 and I was able to confirm that title, description, genre, artist, and date are kept by ffmpeg in the resulting webm file with or without -map_metadata 0 and -movflags use_metadata_tags. I confirmed this using ffprobe. I didn't do an exhaustive test of all global metadata keys supported by mp4. One oddity I did notice was that VLC 3.0.20 was unable to see some, but not all, of the metadata in the output file.
I see no harm in explicitly adding these options to V2C's ffmpeg calls just to be safe, but I'm curious what specific metadata keys (and whether they are global or not) are being dropped so this can debugged further. The example files in this thread are no longer available unfortunately. Adding -movflags use_metadata_tags for mp4 inputs will copy custom/non-standard metadata to the output file, and is not the default behavior of ffmpeg, so maybe using that option will address this issue?
@Amdrel , if you'd like another example of a V2C video, here's one (or take your pick from the category).
Looking into this using that example, in the file I have downloaded on my computer I see that there is some data, like the frame rate and "media created" date when I took the video. However, other metadata like the GPS coordinates is missing. So there is a possibility I was mistaken back in 2021 and that this is actually Google Photos scrubbing the metadata for videos when you download in a way they don't do for photos.
When I download the example video you linked to I can see GPS coordinates in the global metadata labeled as LOCATION and LOCATION-eng when I probe the file with ffprobe. Here is the output that I get:
➜ ~/Downloads ffprobe Nihon-buyō_performance_at_the_Kennedy_Center.webm
ffprobe version 4.4.6 Copyright (c) 2007-2025 the FFmpeg developers
built with Apple clang version 17.0.0 (clang-1700.0.13.5)
configuration: --prefix=/opt/local --cc=/usr/bin/clang --datadir=/opt/local/share/data/ffmpeg --docdir=/opt/local/share/doc/ffmpeg --mandir=/opt/local/share/man --enable-audiotoolbox --disable-indev=jack --disable-libjack --disable-libopencore-amrnb --disable-libopencore-amrwb --disable-libxcb --disable-libxcb-shm --disable-libxcb-xfixes --enable-opencl --disable-outdev=xv --enable-sdl2 --disable-securetransport --enable-videotoolbox --enable-avfilter --enable-avresample --enable-fontconfig --enable-gnutls --enable-libass --enable-libbluray --enable-libdav1d --enable-libfreetype --enable-libfribidi --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-librsvg --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libzimg --enable-libzvbi --enable-lzma --enable-pthreads --enable-shared --enable-swscale --enable-zlib --enable-libaom --enable-libsvtav1 --arch=arm64 --enable-gpl --enable-libvidstab --enable-libx264 --enable-libx265 --enable-libxvid --enable-postproc
libavutil 56. 70.100 / 56. 70.100
libavcodec 58.134.100 / 58.134.100
libavformat 58. 76.100 / 58. 76.100
libavdevice 58. 13.100 / 58. 13.100
libavfilter 7.110.100 / 7.110.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 9.100 / 5. 9.100
libswresample 3. 9.100 / 3. 9.100
libpostproc 55. 9.100 / 55. 9.100
Input #0, matroska,webm, from 'Nihon-buyō_performance_at_the_Kennedy_Center.webm':
Metadata:
COM.ANDROID.MODEL: Pixel 7
MAJOR_BRAND : isom
MINOR_VERSION : 131072
COMPATIBLE_BRANDS: isomiso2mp41
COM.ANDROID.CAPTURE.FPS: 60.000000
LOCATION : +38.8944-77.0560/
LOCATION-eng : +38.8944-77.0560/
COM.ANDROID.MANUFACTURER: Google
ENCODER : Lavf58.76.100
Duration: 00:04:55.57, start: -0.007000, bitrate: 13551 kb/s
Stream #0:0(eng): Video: vp9 (Profile 0), yuv420p(tv, bt709, progressive), 1920x1080, SAR 1:1 DAR 16:9, 59.94 fps, 59.94 tbr, 1k tbn, 1k tbc (default)
Metadata:
HANDLER_NAME : VideoHandle
VENDOR_ID : [0][0][0][0]
ENCODER : Lavc58.134.100 libvpx-vp9
DURATION : 00:04:55.569000000
Stream #0:1(eng): Audio: opus, 48000 Hz, stereo, fltp (default)
Metadata:
HANDLER_NAME : SoundHandle
VENDOR_ID : [0][0][0][0]
ENCODER : Lavc58.134.100 libopus
DURATION : 00:04:55.553000000The coordinates seem to line up with the video description as they point to the Kennedy Center. Just in case the coordinates were added later I did an upload of the same video with a local version of V2C and was able to confirm that LOCATION: +38.8944-77.0560/ and LOCATION-eng: +38.8944-77.0560/ were still present in the video metadata after uploading. It may be possible that the tool that you're using to check the video metadata is not recognizing the GPS coordinates when stored under those keys?
To read the metadata, I was just looking at the metadata section of the file page. Is there a reason that the coordinates (which per above do seem to be part of the file's metadata) — and every other piece of metadata apart from "software used" — might not show up there in the same way they do for photos? If so, perhaps the issue is with how we display metadata for video files rather than something Video2Commons is doing.
you can use the API of mediawiki to look up what metadata has been recognized by mediawiki. There is indeed a different selection of what metadata is shown on the file description page for example for png, jpeg and webp files. What is shown for video files on the file description page can be changed by the developers. Still v2c needs to include the metadata in the video file for them to be displayable.
You can download the original file (not transcoded) from the file description page and than use ffmepg locally to see what metadata was included by v2c.
Thanks, @C.Suthorn! This is a little beyond my depth technically. But if there are pieces of metadata that aren't being shown for webp files that ought to be, to match the behavior of image file types, then it'd be helpful for someone to file a ticket for that.
It is beginning to sound like this is out of scope for v2c, but rather something for MediaWiki - is there someone cc'd here who might be able to assign to the right owner?
To read the metadata, I was just looking at the metadata section of the file page.
Well, that has sent everyone on a wild goose chase... A good reminder to always be specific about WHAT you are doing when filling a bug report.
MediaWiki TMH does support some webm metadata, this was done by T237154: Display standard webm metadata/tag. The list of supported metadata is https://github.com/wikimedia/mediawiki-extensions-TimedMediaHandler/blob/master/includes/Handlers/WebMHandler/WebMHandler.php#L446
Non-known keys are indeed not displayed right now.
Created a new task for this as T411005: Show webM/Matroska comments with unknown comment keys in File page's Metadata section
To reiterate:
- video2commons strips metadata from videos (it doesn't, or at least not any longer)
- MW does not show existing metadata of video files (rather, only known properties, T41105)
- MW transcoding of videos does not keep metadata (it doesn't [actively]. But we also don't do this for thumbnails of images so....)
Can we close this ticket ?