Interested in playing with autoencoders.
write a script that will randomly combine these audio files and sample the latent spaces of their combined embeddings to create new machine-generated audio files
Does this entail we train the autoencoder with the dataset we curated from commons and then have it generate a sample audio file from random numbers? Maybe I'm a bit confused about what 'randomly combining' audio files means here.