This task is for tracking and discussing the DSE Hackathon project "Sounds of the Commons: Neural Audio Mashups"
Description: There are a ton of cool and interesting audio files publicly available on the Wikimedia Commons (https://commons.wikimedia.org/wiki/Category:Audio_files). Let’s make a dataset of interesting sounds that fall into a couple of categories (ex: music/nature/human/animal/interior/exterior), next we’ll write a script that will randomly combine these audio files and sample the latent spaces of their combined embeddings to create new machine-generated audio files. Lastly, we’ll stitch together all of our newly-created sounds into a never-before-heard sound collage monstrosity and submit it back to the Commons. We’ll leverage some neural net architectures like autoencoders and tools like Magenta to encode waveforms and perform resynthesis. If there’s time we can also look at generating some visuals to go along with the new sounds using Generative Adversarial Networks (GAN) or we can also try to build a classifier that attempts to categorize the newly generated sounds. Don’t worry if these are new topics for you, all experience-levels are welcome! There will be a number of micro-tasks that anyone can get involved with.
Comms channel (IRC or Slack?)
- Slack - #dse-hackathon-neural-audio
- IRC - #wikimedia-ml on libera
Code Repo(s):
Let's share any and all code/scripts/etc on https://gitlab.wikimedia.org/
Tasks:
- Create Dataset
- .wav files will work the best, (preferrably shorter files, although we can trim longer files if needed)
- should we keep track of the all the file metadata for reference?
- Exploratory datasets: https://analytics.wikimedia.org/published/datasets/one-off/wav/example/
- Design Pipeline / Workflow
- Rough Idea: Randomly select two audio files from dataset -> encode files -> interpolate encoded files -> perform synthesis on combined encodings -> save as new wav file
- Create generation script
- Let's try using Magenta's Nsynth model
- Generate audio
- On CPU this seems to take ~ 6 min to generate 1 sec of audio. Can we speed this up or run our script in parallel?
- Where should we store all the new audio files?
- Sequence new audio files into a "song"
- Can we do this with code? Maybe string together all the sound files using pygame.mixer.music module or a library like pydub?
- Basic sequencer code: https://gitlab.wikimedia.org/accraze/dse-hackathon-nsynth-intro/-/blob/main/sequencer.py
- Update: Some folks have been generating midi sequences with the MusicVAE models packaged in Magenta Studio
- What should we title the song? Idea: use an anagram solver? (i.e. https://www.thewordfinder.com/anagram-solver/)
- Can we do this with code? Maybe string together all the sound files using pygame.mixer.music module or a library like pydub?
- Upload new audio to commons
- Should we upload all machine-generated audio or just the sequenced "song"?
- Can we use pywikibot?
- [Bonus] Generate visuals to go with new audio
- Library lucidsonicdreams lets you feed audio to a GAN model, which then generates a video file based on pitches/frequencies/rhythms in audio.