GPU Audio. The future of the audio industry lies in graphics cards.
ANOUK DYUSSEMBAYEVA | JULY, 30 / 2022
Originally published on GPU Audio Medium
The concept of utilizing GPUs has been lying on the surface for a long time, and while many companies tried to approach it, they never quite achieved the results they intended. "[That happened] for fundamental reasons; computer science, math, and so on," Alexander Talashov, GPU Audio co-founder and CEO says, adding that everyone from Acustica Audio and Liquid Sonics to Nils Schneider made attempts at developing the tech. GPU Audio is the first company to develop the world's first full-stack solution for GPU-based technology for audio processing, solving the core challenges that have made it impossible to apply graphics cards in the audio industry before. That involved everything from embedding the scheduler to building own ways to operate at low latencies.

But like so many areas of scientific innovation, it took a fresh perspective to see things differently enough to find the solution to these problems. Sasha wasn't new to graphic processing units, having experience in programming them for high-frequency stock trading while trying to calculate lots of data. At the time, Vasiliy Sumatokhin, who has been working as a sound engineer for the past 20 years, had the initial idea of using plugins that would offload the power to the GPU. The two met in 2011 and were introduced by a mutual friend.
The beginning
"We didn't start immediately — we knew we needed to do something to unlock GPUs for pro audio," the CEO recounts. While high-frequency trading is low latent, it works with big chunks of data, which cannot be said for sound: when you do a recording at 1ms latency at 96 kHz sample rate, you work with 96 samples only. This is a tiny amount of data in comparison to stock trading. Founding the company in 2012, it almost became bankrupt. "We tried not with plug-ins but bringing the DAW itself," Sasha explains. The startup itself was viable, but its first and only investor got into financial troubles, leaving Sasha and his colleagues without a cash flow.

The team started from scratch in 2014, this time focusing on working with an engine that would run plugins, and began collecting significant results in 2018. "It was three years of research and development because we weren't just programming something," the co-founder says. In fact, in order to fully turn the concept into a reality, they had to re-invent the math so that it would describe the traditional algorithms, be able to migrate them for the GPU architecture, and run them fast.
Reinventing the math
As the CEO shares, 95% of math that was used to describe mathematical algorithms that the engineers used to create products is based on digital signal processing. Usually, people are taught that DSP algorithms are very sequential, making parallel programming a very non-standard notion when it comes to graphics cards. Traditionally, IIR filters are used. While the infinite impulse response is very easy to implement and it is of higher quality, it's not heavy. In order to build filters, which require more power the more you build, it's best to avoid using IIRs as it would mean that everything would be processed sequentially. In the general scheme of IIR, there's feedback since it's infinite. Because of this infinite feedback, you cannot parallelize the code.

There are filters and algorithms that were well implemented for GPUs and can provide the same impulse response. They can show the same law of the output from the input and behave similarly to IIR. Furthermore, they can be described in a formula so they could still be computed in parallel. "We found those designs and implemented our own designs of pretty much anything that can happen in pro audio, algorithmically speaking," Sasha says. "We also found other implementations and tuned them for our own purposes."

As an example, the founder depicts a delay plugin, which is made of certain DSP components and algorithms. When you run a delay line on the CPU, you run it with one thread. Here, the team ran the delay line on multiple GPU cores, and because they were run on graphics cards, they were run as quickly as on the CPU in real-time, taking into account that in order to achieve this, they needed to transfer the data to the GPU and back. "In real time, we can run things as fast as on CPU mitigating the latency issue, but because the GPU has thousands of cores, we can run more plugins," Sasha elaborates.
Building the program
A factor of equal importance was creating a program that would do it all. Conventionally, there are two routes: the DAW (Ableton Live, Reaper) and the plugins which they host. When it comes to making plugins work for the DAW, there are pro audio standards such as the VTS3 API or Audio Unit SDK, which is commonly used for Mac, and there is Avid's AAX standard. When you create a plug-in that is connected to the DAW, you usually develop a VST3 interface, an audio unit interface, or use JUCE in order to make it work.

For GPU Audio, it works a lot differently. While the company delivers the same user experience and feeling, the mechanism behind the plugin differs, utilizing an entire stack of technology that wasn't represented in the industry before. For a standard VST3 plug-in to function, you work with three things: an editor that allows you to represent the GUI on your plugin, a processor that actually does the processing (which turned out to be the trickiest part), and a controller that stores the state of the plug-in and is used by the GUI to store and recall the user-selected parameters.

"The solution to implementing this was very unique and solved a bunch of historic challenges that were related to the industry and GPU audio segmentation," the CEO says. For example, when implementing your custom GUI, you have to think about how to launch and render the graphic user interface without the graphics card that's involved for GPU audio processing. In a computer, there are two GPUs — one embedded in the processor (Intel GPU) usually used for offloading the graphics and a more powerful one like the NVIDIA discrete card.

In order to render their own plugin GUI, the team implemented the VST3 editor as a Chrome browser. This way, they were able to use JavaScript HTML to show and render elements. "This is one of the key points because you can run Chrome with a certain GPU device to render the GUI and tell it to use the Intel GPU. You can use Chromium to render things on the embedded GPU as well as run the web code," Sasha continues. Approaching the problem this way allows you to run the graphical user interface almost anywhere and you can store that data somewhere else, which means that you can run controls on your laptop but have the real processing elsewhere in the near future.
Introducing the scheduler
At the same time, if you were to run two GPU FIR convolvers, which are two independent plugins, they wouldn't 'know' about each other since they connect with a GPU device and send data there. Embedded in the GPU Audio coding is a scheduler that is used to collect all of that data from those clients (plugins) and then figure out how to dispatch them to receive the results in time. "The scheduler manages all of the priorities of dispatching those plugins — it actually runs them simultaneously within tiny latencies super effectively," Talashov adds. Of course, the GPU-based plugins are specifically developed in a way to be effectively dispatched on the device, and the company is rolling out a plethora of different plugins that can all be run on the scheduler as a sequence of effects.
Whenever you run a game on a graphics card, it's using it entirely for this specific task. It's entirely different when it's a song: you have 30 tasks, each with the right and left channels. Add a sequence of GPU-powered effects to it and you get 60–100 channels that you have to run sequentially. "What the scheduler — an operating system for the audio on the GPU — does is it allows you to run many independent chains of tasks, effects, and processing of the GPU on the device itself, dispatch[es] them in a very complex behavior within tiny latencies, and returns them back," the CEO explains.
Building new ways to operate at low latencies
When talking about the readiness of the industry for the innovation, Sasha admits that he found there to be a bias coming from academia. Some stated that bringing graphics cards to pro audio was "meaningless", with many more being skeptical. The overall spirit was understandable: before, the GPUs were latent as they couldn't react to the tiny amounts of data quickly enough.

Back then, they were designed to work in standard refreshments with 16.6 milliseconds (ms) for one frame or 60 FPS standard refresh rate of monitors, thus making power offloading from the CPU practically unreasonable. With the advent of PCI Express — the Peripheral Component Interconnect Express (PCIe) — the startup was able to work with latencies lower than 150 microseconds for both round-trip data transfer and minimal DSP algorithm processing.

With the PCIe replacing older bus standards and giving the opportunity to pack and transfer the data to the graphics card and back within a 1ms latency window, GPU Audio found new ways to utilize it for low latency applications. "For 1ms latency, I can pack 64 megabits; for PCIe Gen 4.0 and 5.0, it is even more than that," Sasha elucidates. "We didn't solve the issue — the industry solved that." At the same time, modern SDKs that allowed people to program GPUs weren't providing clear solutions on how to work with this tiny latency, so the team developed its own ways to operate at low latencies in order to be able to turn the graphics cards into low latent devices.
CPU vs GPU bias
Another bias revolved around the question of why would you use a GPU when a CPU is enough? There are, like with any technology, real cases when offloading doesn't make any sense. For instance, when you're working as a DJ and doing playbacks in real time, utilizing a graphics card for that doesn't seem practical: it's already rendered, so you don't need too much processing power. When it comes down to the pro audio industry, however, the company's technology allows you to now work as a professional without having to invest too much money into your gear. If before you needed to work with a DSP accelerator or a desktop that had an AMD Threadripper with a 64-thread CPU that costs $2000 and a motherboard that costs another $1000, now you don't need it since the GPUs are already embedded in your $900 RTX 3060-based laptop.

This innovation not only allows you to have substantially more processing power but creates a foundation for improving modern workflows which have been traditionally limited because of their reliance on the CPU. "When they work together on mixing and rendering audio movie set-ups, professionals have to use two or three PCs or two sockets, two processors, and server-grade solutions in order to do that on the fly," Talashov explains, emphasizing just how much processing power that requires. In this context, graphics cards fit very well into the spatial audio field: while you traditionally work with two channels during a recording session, you often have 64 or more channels when it's working with object-based immersive mixing workflows.
Looking into the future
GPUs are also a natural fit for machine learning-powered algorithms, with everything from effects that utilize neural networks to noise-removing algorithms. "All of the companies that work with machine learning and AI use GPUs as a primary device to do the training of the neural networks — this is where we also unlock workflows for audio professionals and creators," he continues. "There are a ton of great fitting use cases for audio pros on top of unlocking more power and your GPU for less money. This is a foundation for the future of audio software and innovation to come."

GPU Audio positions itself as an infrastructural company, meaning that even though it develops products, its ultimate goal is to set a new audio standard for the industry. "We want to power up as many products as possible," Sasha concludes. "The idea is to create the standard and partner with companies through both SDK and direct collaboration to create products that would run heavy things, as well as creating products that lie beyond traditional pro audio and music experiences."
Made on
Tilda