[DSE Hackathon] Sounds of the Commons: Neural Audio Mashups
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• ACraze
	Oct 1 2021, 5:55 PM

Description

This task is for tracking and discussing the DSE Hackathon project "Sounds of the Commons: Neural Audio Mashups"

Description: There are a ton of cool and interesting audio files publicly available on the Wikimedia Commons (https://commons.wikimedia.org/wiki/Category:Audio_files). Let’s make a dataset of interesting sounds that fall into a couple of categories (ex: music/nature/human/animal/interior/exterior), next we’ll write a script that will randomly combine these audio files and sample the latent spaces of their combined embeddings to create new machine-generated audio files. Lastly, we’ll stitch together all of our newly-created sounds into a never-before-heard sound collage monstrosity and submit it back to the Commons. We’ll leverage some neural net architectures like autoencoders and tools like Magenta to encode waveforms and perform resynthesis. If there’s time we can also look at generating some visuals to go along with the new sounds using Generative Adversarial Networks (GAN) or we can also try to build a classifier that attempts to categorize the newly generated sounds. Don’t worry if these are new topics for you, all experience-levels are welcome! There will be a number of micro-tasks that anyone can get involved with.

Comms channel (IRC or Slack?)

Slack - #dse-hackathon-neural-audio
- IRC - #wikimedia-ml on libera

Code Repo(s):
Let's share any and all code/scripts/etc on https://gitlab.wikimedia.org/

Tasks:

Create Dataset
- .wav files will work the best, (preferrably shorter files, although we can trim longer files if needed)
- should we keep track of the all the file metadata for reference?
- Exploratory datasets: https://analytics.wikimedia.org/published/datasets/one-off/wav/example/
Design Pipeline / Workflow
- Rough Idea: Randomly select two audio files from dataset -> encode files -> interpolate encoded files -> perform synthesis on combined encodings -> save as new wav file
Create generation script
- Let's try using Magenta's Nsynth model
  - Helpful tutorial: https://github.com/magenta/magenta-demos/blob/main/jupyter-notebooks/NSynth.ipynb
  - Basic code: https://gitlab.wikimedia.org/accraze/dse-hackathon-nsynth-intro/-/blob/main/generator.py
Generate audio
- On CPU this seems to take ~ 6 min to generate 1 sec of audio. Can we speed this up or run our script in parallel?
- Where should we store all the new audio files?
Sequence new audio files into a "song"
- Can we do this with code? Maybe string together all the sound files using pygame.mixer.music module or a library like pydub?
  - Basic sequencer code: https://gitlab.wikimedia.org/accraze/dse-hackathon-nsynth-intro/-/blob/main/sequencer.py
  - Update: Some folks have been generating midi sequences with the MusicVAE models packaged in Magenta Studio
- What should we title the song? Idea: use an anagram solver? (i.e. https://www.thewordfinder.com/anagram-solver/)
Upload new audio to commons
- Should we upload all machine-generated audio or just the sequenced "song"?
- Can we use pywikibot?
[Bonus] Generate visuals to go with new audio
- Library lucidsonicdreams lets you feed audio to a GAN model, which then generates a video file based on pitches/frequencies/rhythms in audio.

Details

	Subject	Repo	Branch	Lines +/-
	Add audio packages to support DSE hackathon on stat100x	operations/puppet	production	+5 -0

Customize query in gerrit

Related Objects

Mentioned In: T292699: Conda's CPPFLAGS may not be correct when pip installing a package that needs c/cpp compilation
Mentioned Here: T287267: Update ROCm version on GPU instances.
T292699: Conda's CPPFLAGS may not be correct when pip installing a package that needs c/cpp compilation

Event Timeline

• ACraze created this task.Oct 1 2021, 5:55 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 1 2021, 5:55 PM

• ACraze updated the task description. (Show Details)Oct 1 2021, 6:05 PM

• AKhatun_WMF subscribed.Oct 2 2021, 3:28 AM

@ACraze: Which project tag should this task have, so people can also find this task when searching via projects? Thanks!

Interested in playing with autoencoders.

write a script that will randomly combine these audio files and sample the latent spaces of their combined embeddings to create new machine-generated audio files

Does this entail we train the autoencoder with the dataset we curated from commons and then have it generate a sample audio file from random numbers? Maybe I'm a bit confused about what 'randomly combining' audio files means here.

EDIT: I think the second point clarifies a bit. We combine encodings of two audios to generate a new one? Interesting.

• MPhamWMF updated the task description. (Show Details)Oct 4 2021, 6:42 PM

• ACraze updated the task description. (Show Details)Oct 5 2021, 5:33 AM

I think the second point clarifies a bit. We combine encodings of two audios to generate a new one? Interesting.

Yep that's correct! We will be interpolating two encodings and having nsynth synthesize our new sounds!

Quick update:

@fkaelin created a 90 GB dataset of wav files hosted on Commons and stored it on hdfs, here's his code:
https://gitlab.wikimedia.org/fab/research-ml/-/blob/fk/swift/notebooks/wav.ipynb

Next step is to try to randomly encode + mix / crossfade / process the encodings together and then have nsynth generate new audio based on the mashed up encodings.

This may take a long time depending on the setup, maybe we can do this in parallel? or as jobs on spark? We could even run a couple jobs on the stat box too?

Aklapper added a project: Machine-Learning-Team.Oct 5 2021, 7:35 AM

Change 726903 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] Add audio packages to support DSE hackathon on stat100x

https://gerrit.wikimedia.org/r/726903

gerritbot added a project: Patch-For-Review.Oct 6 2021, 1:33 PM

Change 726903 merged by Elukey:

[operations/puppet@production] Add audio packages to support DSE hackathon on stat100x

https://gerrit.wikimedia.org/r/726903

Maintenance_bot removed a project: Patch-For-Review.Oct 6 2021, 2:10 PM

• ACraze updated the task description. (Show Details)Oct 6 2021, 11:35 PM

Quick Update from Hackathon Day 3:

We managed to get Nsynth running on stat1007, here are the notes from earlier today: https://etherpad.wikimedia.org/p/generative-setup (big thanks to @AKhatun_WMF and @elukey!!)
We can now generate raw audio based on encodings from wav files found on Commons. The synthesis step takes a looong time so we are looking into scaling it somehow
@fkaelin created some small categorical wav datasets for exploration and analysis here: https://analytics.wikimedia.org/published/datasets/one-off/wav/example/
Some folks have been generating MIDI sequeneces via the MusicVAE model, (check out @TAndic 's awesome blastbeat sonata: https://drive.google.com/file/d/170AV7ID-iNK6y0q8YokAh1WtiSGFmCtT/view !!!)
I updated the basic generation script to include interpolation and crossfading between two encodings: https://gitlab.wikimedia.org/accraze/dse-hackathon-nsynth-intro/-/blob/main/generator.py
The next step is to generate a bunch of raw audio, either on the statboxes or spark/hadoop, and then we can compile them all together. We could also use more midi sequences to fill in the gaps or play alongside the raw audio.
We also will need to figure out a name for our final composition before uploading it to the commons.
Feel free to jump in if any of these tasks interest you!

elukey mentioned this in T292699: Conda's CPPFLAGS may not be correct when pip installing a package that needs c/cpp compilation.Oct 7 2021, 9:24 AM

Hackathon Day 4 Update:

We improved the Nsynth/Magenta install docs (cheers to @elukey and @AKhatun_WMF for working through this!) https://etherpad.wikimedia.org/p/generative-setup (also see: T292699 for more info re: CPP flags)
The generator.py script was able to run on stat1008 using a GPU (!!!!!!!!!!!!!!!!) this was made possible due to work done in T287267
I ran some ad-hoc generative tasks on stat1007 and produced ~50 seconds of raw audio using splices from the smaller Audio_files_of_1951 dataset (no audio playback on firefox): https://drive.google.com/file/d/1jG7Dv_O33ccKXgILmLBhMrxYODm3gRXp/view
I also wrote a basic wav sequencer that takes a directory path and randomly sequences all wav files into a new wav file: https://gitlab.wikimedia.org/accraze/dse-hackathon-nsynth-intro/-/blob/main/sequencer.py
The last two tasks are: 1. Generate a bunch of raw audio files 2. Sequence them together into a new "song". 2. Upload the new "song" to the Commons (we will need a title and other metadata)
Feel free to jump in if any of these tasks interest you!

• ACraze updated the task description. (Show Details)Oct 8 2021, 4:12 AM

kevinbazira subscribed.Oct 12 2021, 10:53 AM

elukey closed this task as Resolved.May 30 2022, 8:04 AM

elukey claimed this task.

isarantopoulos moved this task from Unsorted to 2023-2024 Q3 Done on the Machine-Learning-Team board.Nov 20 2023, 11:40 AM

[DSE Hackathon] Sounds of the Commons: Neural Audio MashupsClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

[DSE Hackathon] Sounds of the Commons: Neural Audio Mashups
Closed, ResolvedPublic
Actions