As discussed here, it would be interested to embed metadata into each wav file. Metadata would be the same as the ones currently recorded on Wikimedia Commons (locutor, language, licence, ...).
Description
Event Timeline
I went searching into how it would be possible to achieve this.
In the RecordWizard, I found the following places where it "could" be possible to do this:
- https://github.com/lingua-libre/RecordWizard/blob/5e3becc8d5ca09a4067f3e8147e8f05a2f1299df/modules/vue/rw.vue.studio.js#L269
- https://github.com/lingua-libre/RecordWizard/blob/5e3becc8d5ca09a4067f3e8147e8f05a2f1299df/modules/rw.Record.js
However, I could not find a PHP method that would allow us to add metadata in a file. It appears we might need libs that are dedicated to that. At least, based on my search, it appears each file format (.wav and .webm) have different ways to store the metadata.
Yet, I found this lib : https://www.php.net/manual/en/pharfileinfo.setmetadata.php. So, maybe that's the way?
Further research indicate it is hardly possible (and most likely impossible) to add metadata to .wav files. However, I'm currently investigating adding metadata to the .ogg files that are part of the datasets.
The WAV specification indicates (page 23) in the INFO List Chunk section that metadata can be stored in a WAV file among predefined chunks ("keywords"). The specification says also "New chunks may be defined, but an application should ignore any chunk it doesn't understand". So in principle, we can define our own metadata and add them in the WAV file. Obviously, we need to document the new chunks somewhere. See alo https://www.robotplanet.dk/audio/wav_meta_data/
Finally, one needs to find a software that can do that.