Page MenuHomePhabricator

Metadata : embed demographic, language metadata into audio file
Open, Needs TriagePublicFeature

Description

As discussed here, it would be interested to embed metadata into each wav file. Metadata would be the same as the ones currently recorded on Wikimedia Commons (locutor, language, licence, ...).

Event Timeline

Pamputt changed the subtype of this task from "Task" to "Feature Request".

I went searching into how it would be possible to achieve this.

In the RecordWizard, I found the following places where it "could" be possible to do this:

However, I could not find a PHP method that would allow us to add metadata in a file. It appears we might need libs that are dedicated to that. At least, based on my search, it appears each file format (.wav and .webm) have different ways to store the metadata.
Yet, I found this lib : https://www.php.net/manual/en/pharfileinfo.setmetadata.php. So, maybe that's the way?

Further research indicate it is hardly possible (and most likely impossible) to add metadata to .wav files. However, I'm currently investigating adding metadata to the .ogg files that are part of the datasets.

Yug renamed this task from Embed metadata into wav file to Embed metadata into audio file.Feb 16 2021, 11:37 PM
Yug renamed this task from Embed metadata into audio file to Metadata : embed demographic, language metadata into audio file.Feb 16 2021, 11:50 PM