Page MenuHomePhabricator

Metadata : embed demographic, language metadata into audio file
Open, LowPublicFeature

Description

As discussed here, it would be interested to embed metadata into each wav file. Metadata would be the same as the ones currently recorded on Wikimedia Commons (locutor, language, licence, ...).

Event Timeline

Pamputt changed the subtype of this task from "Task" to "Feature Request".

I went searching into how it would be possible to achieve this.

In the RecordWizard, I found the following places where it "could" be possible to do this:

However, I could not find a PHP method that would allow us to add metadata in a file. It appears we might need libs that are dedicated to that. At least, based on my search, it appears each file format (.wav and .webm) have different ways to store the metadata.
Yet, I found this lib : https://www.php.net/manual/en/pharfileinfo.setmetadata.php. So, maybe that's the way?

Further research indicate it is hardly possible (and most likely impossible) to add metadata to .wav files. However, I'm currently investigating adding metadata to the .ogg files that are part of the datasets.

Yug renamed this task from Embed metadata into wav file to Embed metadata into audio file.Feb 16 2021, 11:37 PM
Yug renamed this task from Embed metadata into audio file to Metadata : embed demographic, language metadata into audio file.Feb 16 2021, 11:50 PM
Yug triaged this task as Low priority.Jul 6 2022, 10:40 AM

The WAV specification indicates (page 23) in the INFO List Chunk section that metadata can be stored in a WAV file among predefined chunks ("keywords"). The specification says also "New chunks may be defined, but an application should ignore any chunk it doesn't understand". So in principle, we can define our own metadata and add them in the WAV file. Obviously, we need to document the new chunks somewhere. See alo https://www.robotplanet.dk/audio/wav_meta_data/

Finally, one needs to find a software that can do that.