The Wikipedia Zero pirates (T129845) switched from hidden RAR files to blatant copyright violations. Most videos are from YouTube which can be downloaded in WebM format to easily re-upload. Unfortunately, Google does not include any video identifiers, but lists the encoder as Google. This should be included in img_metadata.
A sample from اتحداك تشوف المقطع بدون ما تضحك مقاطع مضحكة جداََ عالم الاندرويد و المجانيات.webm (Identified as a YouTube video)
# hachoir-metadata AQCWqFD9iBU.webm Common: - Duration: 10 min 7 sec 44 ms - Producer: Google - MIME type: video/webm - Endianness: Big endian Video stream: - Image width: 640 pixels - Image height: 360 pixels - Compression: V_VP8 Audio stream: - Channel: stereo - Sample rate: 44.1 kHz - Bits/sample: 32 bits - Compression: A_VORBIS
Note that legitimate videos come from YouTube pretty frequently...
Metadata extraction if it's not done already would need to be done in getid3 I think, the library that we use for fetching stream info, or else in a reimplementation (ugh).
So there's a couple of EBML elements in the WebM/Matroska stream that I think could be added to getid3's extraction easily:
MuxingApp 2 [4D]
WritingApp 2 
These are both "Google" in the typhoon example file.
Also there can be Vorbis comments in the Vorbis audio stream, which lists 'encoder=google' too. :) But that doesn't seem to be exposed to getid3 at the moment and I don't know how hard it would be to integrate the vorbis comment extraction.
Ok, prior code was removing all the matroska-specific metadata on the MediaWiki side because it was heavy on binary junk, presumably. Patch in https://gerrit.wikimedia.org/r/376088 puts the 'comments' subsection back, which contains the WritingApp and MuxingApp tags which list 'Google'.