How iZotope utilizes machine learning to bring cutting-edge audio production tools
Originally published on GPU Audio Medium
Two months ago, GPU Audio got the chance to interview Jean-Marc Jot, former iZotope VP of Research and Chief Scientist. Jean-Marc's role at the company included supervising the audio algorithm research team which has worked on cutting-edge technology incorporated in iZotope products.

iZotope has developed some of the world's most innovative and celebrated audio production software tools, including the audio repair tool RX and the mastering suite Ozone. "We support the relatively near-term development of iZotope products," he began. "For my team, that part of the activity has been driven by the product roadmap — that [takes] about half of our time. The other half of our attention is [focused on] inventing, anticipating, and exploring future opportunities usually in continuation of what we've already shipped. [This involves] developing prototypes to enable other teams in the company to evaluate how successful those ideas might be."
iZotope differentiated itself early on by exhibiting best-in-class use cases of machine learning, primarily in its extremely popular software RX, which is an industry standard for those working in audio correction of almost any kind. "Machine learning has been essential in a few industries closely related to our field such as image processing, recognition, speech processing and synthesis — but in audio and music, that's a more recent development so it still seems somewhat new and fresh," Jot said. While training ML models on music and audio data is less common, iZotope was one of the first companies that took the risk to invest in the field. "It was new and uncommon to leverage machine learning for the kind of products that the company made, but it was very successful: for example, some of the most innovative features of RX, which is one of the company's flagship products, leverage machine learning," he explained.

At iZotope, the primary application, as Jean-Marc mentioned, has been blind source separation. "Take, for instance, a recording that contained undesired interference — that could be background noise or it could be somebody shouting in the background unexpectedly — something that is annoying coming into your recording," he continued. With machine learning, you can train an algorithm to automatically classify different sounds so that it's then able to attenuate them.

Initially, the applications were mostly done using supervised learning, a method that requires labeling the data to obtain the desired results. Essentially, it's a subcategory of machine learning where there are both inputs and correct outputs to help train the algorithm. More recently, the team has used other methods too, including unsupervised learning and Generative Adversarial Networks (GAN). "Multiple neural networks fight each other to try to obtain a solution, [which] is better than what one could obtain alone," Jean-Marc said. "Supervised learning continues to be relevant; if you can't generate data that is exhaustively tagged and labeled, you're going to start resorting to unsupervised learning methods."

Even though machine learning opens up a plethora of opportunities, Jot conveyed that the company's research team often explores solving new problems without utilizing ML in the initial stages of algorithm invention. "You can't really solve a problem with machine learning unless you're able to feed data to train the machine learning model," he elucidated.

For SoundWide — the parent company that includes iZotope, Native Instruments, and Plugin Alliance — the needs of its community stand at the forefront. "The needs that we saw were to allow content creators to create with more ease [and] lower the barrier to entry for a growing user market," Jean-Marc said. "Any technique that we could think of that would make it easier and quicker to get to the creative state of flow is valuable in our product roadmap."
Made on