This article explores how some different HDR imaging alike techniques can be adopted right into the audio domain.
The early adopters – game developers
In the lately cross-linked article “Finding Your Way With High Dynamic Range Audio In Wwise” some good overview was given on how the HDR concept was already adopted by some game developers over the recent years. Mixing in-game audio has its very own challenge which is about mixing different arbitrary occurring audio events in real-time when the game is actually played. Opposed to that and when we do mix off-line (as in a typical song production) we do have a static output format and don’t have such issues of course.
So it comes as no surprise, that the game developer approach turned out to be a rather automatic/adaptive in-game mixing system which is capable of gating quieter sources depending on the overall volume of the entire audio plus performing some overall compression and limiting. The “off-line mixing audio engineer” can always do better and if a mix is really too difficult, even the arrangement can be fixed by hand during the mixing stage.
There is some further shortcoming and from my point of view that is the too simplistic and reduced translation from “image brightness” into “audio loudness” which might work to some extend but since the audio loudness race has been emerged we already have a clear proof how utterly bad that can sound at the end. At least, there are way more details and effects to be taken into account to perform better concerning dynamic range perception.
The plain 1:1 translation – focusing on bit resolution
In their paper “High Dynamicrange Simultaneous Signal Compositing, Applied To Audio” R. Janzen and S. Mann already presented a sort of 1:1 HDR concept translation which just deals with the intermediate extension of the actual bit depth in a ADC stage during recording. They also presented the compositing technique as used with different LDR exposures whereas within audio they must be captured simultaneously and can’t be shot in a serial fashion anymore. This is due to the fact that our hearing is so much sensitive to out-of-phase recordings and the resulting equalization.
With such high bit resolution extensions, one can imagine sampling a 200 dB dynamic range signal using four 16-bit ADCs in parallel, just as an example. While this might appear rather esoteric in a well controlled audio recording situation where we can just stick to a typical 24-bit audio converter, there seems to be serious applications in RADAR or SONAR processing. Also, one could imagine a mobile audio recorder solution, which features such high DR to get rid of that typically ugly sounding automatic input gain leveler, even in difficult recording situations.
A perceptual approach – dealing with perceived dynamics
If I would have to boil down the HDR imaging concept into just one little sentence then I would say, it’s all about balancing the local vs. the global contrast – as being perceived by human. And as this comprehensive article about HDR imaging (written by Sven Bontinck) already explained – that is a complex matter of perception within our visual system which includes both, the eye and brain. Within our hearing system this is also a matter of perception and we usually call that a “psychoaccoustic” effect.
Psychoaccoustics heavily determine how we actually perceive transient vs. steady state signals, how things are frequency dependent in our hearing, why ear fatiguing and masking effects are occurring or how our hearing copes and takes advantage of overtones – just to name some of the dimensions. As a side note, this is also the basis for designing lossy audio encoders (such as MPEG 1 layer 3) which are capable of eliminating certain audio content but w/o noticeable artifacts. In order to present a very well-balanced dynamic range impression to the ear it must be figured out how the aforementioned different dimensions are affecting and interacting to each other.
Today, the basic DSP building blocks and patterns one needs to realize such a perceptual approach not only are already available but also well-understood: Whether that’s overtone generation, transient management, up- and downward compression/expansion, parallel processing, look-ahead techniques while dealing with its overall frequency dependency. Some limited and specific combinations were already combined and implemented into broadcast processors or audio exciters, just to name the two.