Page MenuHomePhabricator

[DSE Hackathon] Sounds of the Commons: Neural Audio Mashups
Closed, ResolvedPublic

Description

This task is for tracking and discussing the DSE Hackathon project "Sounds of the Commons: Neural Audio Mashups"

Description: There are a ton of cool and interesting audio files publicly available on the Wikimedia Commons (https://commons.wikimedia.org/wiki/Category:Audio_files). Let’s make a dataset of interesting sounds that fall into a couple of categories (ex: music/nature/human/animal/interior/exterior), next we’ll write a script that will randomly combine these audio files and sample the latent spaces of their combined embeddings to create new machine-generated audio files. Lastly, we’ll stitch together all of our newly-created sounds into a never-before-heard sound collage monstrosity and submit it back to the Commons. We’ll leverage some neural net architectures like autoencoders and tools like Magenta to encode waveforms and perform resynthesis. If there’s time we can also look at generating some visuals to go along with the new sounds using Generative Adversarial Networks (GAN) or we can also try to build a classifier that attempts to categorize the newly generated sounds. Don’t worry if these are new topics for you, all experience-levels are welcome! There will be a number of micro-tasks that anyone can get involved with.

Comms channel (IRC or Slack?)

  • Slack - #dse-hackathon-neural-audio
    • IRC - #wikimedia-ml on libera

Code Repo(s):
Let's share any and all code/scripts/etc on https://gitlab.wikimedia.org/

Tasks:

Event Timeline

@ACraze: Which project tag should this task have, so people can also find this task when searching via projects? Thanks!

Interested in playing with autoencoders.

write a script that will randomly combine these audio files and sample the latent spaces of their combined embeddings to create new machine-generated audio files

Does this entail we train the autoencoder with the dataset we curated from commons and then have it generate a sample audio file from random numbers? Maybe I'm a bit confused about what 'randomly combining' audio files means here.

EDIT: I think the second point clarifies a bit. We combine encodings of two audios to generate a new one? Interesting.

I think the second point clarifies a bit. We combine encodings of two audios to generate a new one? Interesting.

Yep that's correct! We will be interpolating two encodings and having nsynth synthesize our new sounds!

Quick update:

@fkaelin created a 90 GB dataset of wav files hosted on Commons and stored it on hdfs, here's his code:
https://gitlab.wikimedia.org/fab/research-ml/-/blob/fk/swift/notebooks/wav.ipynb

Next step is to try to randomly encode + mix / crossfade / process the encodings together and then have nsynth generate new audio based on the mashed up encodings.

This may take a long time depending on the setup, maybe we can do this in parallel? or as jobs on spark? We could even run a couple jobs on the stat box too?

Change 726903 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Add audio packages to support DSE hackathon on stat100x

https://gerrit.wikimedia.org/r/726903

Change 726903 merged by Elukey:

[operations/puppet@production] Add audio packages to support DSE hackathon on stat100x

https://gerrit.wikimedia.org/r/726903

Quick Update from Hackathon Day 3:

Hackathon Day 4 Update:

elukey claimed this task.