Neural audio: tickling a RAVE model with sig~ and noise~ in Pure Data using nn~ (Martsman dataset)

Подписаться 568

50% 1

In this video, I'm showcasing another dynamic control and audio setup in Pure Data where I feed various kinds of signal data to the decoder unit of a RAVE model including constant values via the sig~ object and repetitive seed triggers into the noise~ object. The decoder interprets these signals as latent encodings and converts them into audio information.
The training for this model has been done with a major selection of my own release material augmented to a several days long dataset.
Realtime processing is done via the nn~ object.
The track "Architect" from my "Spoor" release has been created using this setup. martsman.bandc...
RAVE is "A variational autoencoder for fast and high-quality neural audio synthesis” created by Antoine Caillon and Philippe Esling of Artificial Creative Intelligence and Data Science (ACIDS) at IRCAM, Paris.
RAVE on GitHub: github.com/aci...
nn~ on GitHub: github.com/aci...
To train RAVE models on Colab or Kaggle, you can use these Jupyter notebooks i've set up: github.com/dev...

Опубликовано:

18 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 11

@dishop 7 месяцев назад

Круто!

@tsdoihasdoihasdoih2493 11 месяцев назад

v1 config proves to be killer quality once again

@martsm_n 10 месяцев назад

Yeah, they are definitely robust. I'd very much like so see/hear people's experiments with V2 and V3, however. There certainly must be advantages that simply do not fit my use case/ practices. @antoinecaillon2531 maybe?

@apheadair 11 месяцев назад

Stunning! How do you find the best value range to manipulate the latent space? I see that in this case, you are between -5 and +3. Is it just by trying or is there a rule of thumb?

@martsm_n 11 месяцев назад

The exact values and constellations differs from model to model. In V1 models, the first 3-4 dimensions seem to be the most sensitive and values in the +/- one-digit range lead to nice results whereas in the higher dimensions usually you need to go higher into the two-digit range to notice differences in the timbre. V2 and V3 models behave differently most probably due to the regularization equalling out things between the dimensions - I usually have a limited range +/- 2 that I can skip through without bleeding ears.

@beatsbykabuki 11 месяцев назад

I could listen to this for hours! I noticed you use an augmentation script in your Kaggle notebook, but not in the Google Collab notebook. Is there a specific reason for this?

@martsm_n 11 месяцев назад

Thank you :) No particular reason to leave that script out on the Colab notebook, no. If you want to add it to your trainings, just make sure there's enough disk space on your Google Drive or use the runtime's disk space for storing the augmented dataset before preprocessing. The script can blow up your dataset quite a bit.

@Karkwaa 11 месяцев назад

Wonderful work, thanks a lot ! What is the role of your 'sequencex' and specifically the 'giver' inside ?

@martsm_n 11 месяцев назад

Thanks for your feedback! These sub patches are responsible for generating a rhythmic/ repetitive structure that triggers other events in the patch. Initially I used the metro object in earlier experiments but noticed irregularities in timing. The giver sub patch creates sample based sequences using bang~ and has been more reliant.