Guitar Lessons by Chip McDonald - chip@chipmcdonald.com: Prediction: there will be an A.I./GAN (Generative Adversarial Network) Learning VST Plugin That Will Revolutionize Audio Mixing by 2020

Friday, February 8, 2019

Prediction: there will be an A.I./GAN (Generative Adversarial Network) Learning VST Plugin That Will Revolutionize Audio Mixing by 2020

This could happen this year, but certainly within 3 years I would think.

A VST plug in in which you provide a target sound - an example of what you want, basically as many conformal-equalization/convolution plugins do now, but...

... it uses confrontational machine learning based on post-processing a sample of your novel sound.

  The tricky part would be to eliminate pitch from the process, I think. You don't want the plugin to try to pitch correct your guitar input to the target sample's pitch.  Another aspect that might be difficult would be integrating a time constant so that it doesn't just try to do an FFT/convolution/bin based transform.

  There are already plugins that claim to have neural-net based algorithms involved with evaluative processing.  This is not the same as what I am suggesting, in that those plugins are implementing existing IFR tools to alter the sound, as opposed to directly replicating the sound from scratch.  In other words, GANs are already used to make an input - a picture of someone - appear to be someone "new", modified by a GAN having been trained on a data set.

The GAN doesn't know it's changing things we have labels for: colors, shading, angles, etc..  It's just making the data fit what we want it to do.  In the same way, you'd feed your GAN plugin an example of guitar sounds you like, then it would morph your guitar sound based on making an output data set fit your expectation-data set.

This might work really well if applied to speaker simulation, since present convolution based plugins are only applying math linearly with a single value as the input function.  A GAN applied to an example data set of a range of dynamic values into a speaker (equivalent to a bright face versus a dark, high eyebrows or low, etc..) would be able to create a new data set (function applied to a d.i. guitar signal) that would alter the data set in a similar non-linear way across the input range.

 It wouldn't be real time at first, since you'd be applying the process to single buffered time frames - 5 ms chunks overlapping by a 1 am maybe - on a 3 minute input file.  So for each buffered frame you'd apply the GAN function with that frame's input level of sample for the 5 ms (which I think means you'd have to train the GAN on a similar matrix derived from the same time base, 5ms/44.1 khz).  Repeat until EOF.

 I think that such a plugin could be used for "finalizing" guitar sounds and mastering, but also perhaps even for mixing, provided your example target has a similar instrumentation as what you're giving it for an input to transform.

 It would be revolutionary, because it would probably make the bedroom recording result sound deceptively close/identical to Whatever Established Professional Recording one wanted, if "trained" properly.  Or at least one could create a mastering spectral curve/harmonic balance that matched an input data set, that would either create weird artifacts to instrument sounds (in order to make the match) or if the input set was close enough, bring it to the Uncanny Valley and perhaps make it sound strange in that respect as well.  Which would be interesting, and probably attract attention unfortunately for a few years as producers abuse the sound for it's novelty.

 Or, it could simply work very well and "fix" whatever you record to sound as much like the sound of something else you wanted.

   








No comments:

Post a Comment