how I listen to audio today

Developing audio effect plugins involves quite a lot of testing. While this appears to be an easy task as long as its all about measurable criteria, it gets way more tricky beyond that. Then there is no way around (extensive) listening tests which must be structured and follow some systematic approach to avoid ending up in fluffy “wine tasting” categories.

I’ve spend quite some time with such listening tests over the years and some of the insights and principles are distilled in this brief article. They are not only useful for checking mix qualities or judging device capabilities in general but also give some  essential hints about developing our hearing.

No matter what specific audio assessment task one is up to, its always about judging the dynamic response of the audio (dynamics) vs its distribution across the frequency spectrum in particular (tonality). Both dimensions can be tested best by utilizing transient rich program material like mixes containing several acoustic instruments – e.g. guitars, percussion and so on – but which has sustaining elements and room information as well.

Drums are also a good starting point but they do not offer enough variety to cover both aspects we are talking about and to spot modulation artifacts (IMD) easily, just as an example. A rough but decent mix should do the job. On my very own, I do prefer raw mixes which are not yet processed that much to minimize the influence of flaws already burned into the audio content but more on that later.

Having such content in place allows to focus the hearing and to hear along a) the instrument transients – instrument by instrument – and b) the changes and impact within particular frequency ranges. Lets have a look into both aspects in more detail.

a) The transient information is crucial for our hearing because it is used not only to identify intruments but also to perform stereo localization. They basically impact how we can separate between different sources and how they are positioned in the stereo field. So lets say if something “lacks definition” it might be just caused by not having enough transient information available and not necessarily about flaws in equalizing. Transients tend to mask other audio events for a very short period of time and when a transient decays and the signal sustains, it unveils its pitch information to our hearing.

b) For the sustaining signal phases it is more relevant to focus on frequency ranges since our hearing is organized in bands of the entire spectrum and is not able to distinguish different affairs within the very same band. For most comparision tasks its already sufficient to consciously distinguish between the low, low-mid, high-mid and high frequency ranges and only drilling down further if necessary, e.g. to identify specific resonances. Assigning specific attributes to according ranges is the key to improve our conscious hearing abilities. As an example, one might spot something “boxy sounding” just reflecting in the mid frequency range at first sight. But focusing on the very low frequency range might also expose effects contributing to the overall impression of “boxyness”. This reveals further and previously unseen strategies to properly manage such kinds of effects.

Overall, I can not recommend highly enough to educate the hearing in both dimensions to enable a more detailed listening experience and to get more confident in assessing certain audio qualities. Most kinds of compression/distortion/saturation effects are presenting a good learning challenge since they can impact both audio dimensions very deeply. On the other hand, using already mixed material to assess the qualities of e.g. a new audio device turns out to be a very delicate matter.

Lets say an additional HF boost applied now sounds unpleasant and harsh: Is this the flaw of the added effect or was it already there but now just pulled out of that mix? During all the listening tests I’ve did so far, a lot of tainted mixes unveiled such flaws not visible at first sight. In case of the given example you might find root causes like too much mid frequency distortion (coming from compression IMD or saturation artifacts) mirroring in the HF or just inferior de-essing attempts. The most recent trend to grind each and every frequency resonance is also prone to unwanted side-effects but that’s another story.

Further psychoacoustic related hearing effects needs to be taken into account when we perform A/B testing. While comparing content at equal loudness is a well known subject (nonetheless ignored by lots of reviewers out there) it is also crucial to switch forth and back sources instantaneously and not with a break. This is due to the fact that our hearing system is not able to memorize a full audio profile much longer than a second. Then there is the “confirmation bias” effect which basically is all about that we always tend to be biased concerning the test result: Just having that button pressed or knowing the brand name has already to be seen as an influence in this regard. The only solution for this is utilizing blind testing.

Most of the time I listen through nearfield speakers and rarely by cans. I’m sticking to my speakers since more than 15 years now and it was very important for me to get used to them over time. Before that I’ve “upgraded” speakers several times unnecessarily. Having said that, using a coaxial speaker design is key for nearfield listening environments. After ditching digital room correction here in my studio the signal path is now fully analog right after the converter. The converter itself is high-end but today I think proper room acoustics right from the start would have been a better investment.

processing with High Dynamic Range (3)

This article explores how some different HDR imaging alike techniques can be adopted right into the audio domain.

The early adopters – game developers

In the lately cross-linked article “Finding Your Way With High Dynamic Range Audio In Wwise” some good overview was given on how the HDR concept was already adopted by some game developers over the recent years. Mixing in-game audio has its very own challenge which is about mixing different arbitrary occurring audio events in real-time when the game is actually played. Opposed to that and when we do mix off-line (as in a typical song production) we do have a static output format and don’t have such issues of course.

So it comes as no surprise, that the game developer approach turned out to be a rather automatic/adaptive in-game mixing system which is capable of gating quieter sources depending on the overall volume of the entire audio plus performing some overall compression and limiting. The “off-line mixing audio engineer” can always do better and if a mix is really too difficult, even the arrangement can be fixed by hand during the mixing stage.

There is some further shortcoming and from my point of view that is the too simplistic and reduced translation from “image brightness” into “audio loudness” which might work to some extend but since the audio loudness race has been emerged we already have a clear proof how utterly bad that can sound at the end. At least, there are way more details and effects to be taken into account to perform better concerning dynamic range perception. [Read more…]

processing with High Dynamic Range (2)

This comprehensive and in-depth article about HDR imaging was written by Sven Bontinck, a professional photographer and a hobby-musician.

A matter of perception.

To be able to use HDR in imaging, we must first understand what dynamic range actually means. Sometimes I notice people mistake contrast in pictures with the dynamic range. Those two concepts have some sort of relationship, but are not the same. Let me start by explaining in short how humans receive information with our eyes and ears. This is important because it influences the way we perceive what we see and hear and how we interpret that information.

We all know about the retina in our eyes where we find the light-sensitive sensors, the rods and cones. The cones provide us daytime vision and the perception of colours. The rods allow us to see low-light levels and provide us black-and-white vision. However there is a third kind of photoreceptors, the so-called photosensitive ganglion cells. These cells give our brain information about length-of-day versus length-of-night duration, but also play an important role in the pupillary control. Every sensor need a minimum amount of incitement to be able to react. At the same time all kind of sensors have a maximum amount that they may be exposed to. Above that limit, certain protection mechanisms start interacting to prevent damage occurring to the sensors. [Read more…]

processing with High Dynamic Range (1)

Back in time when I was at university, my very first DSP lectures were actually not about audio but image processing. Due to my interest in photography I followed this amazing and ever evolving domain over time. Later on, High Dynamic Range (HDR) image processing emerged and beside its high impact on digital photography, I immediately started to ask myself how such techniques could be translated into the audio domain. And to be honest, for quite some time I haven’t got a clue.


This image shows a typical problem digital photography still suffers from: The highlights are completely washed out and so the lowlights are turning into black abruptly w/o containing further nuances  – the dynamic range performance is pretty much poor and this is actually not what the human eye would perceive since it features both: a higher dynamic range per se but also a better adoption to different (and maybe difficult) lighting conditions.

On top, we have to expect severe dynamic range limitations in the output entities whether that’s a cheap digital print, a crappy TFT display or the limited JPG file format, just as an example. Analog film and prints does have such problems in principle also but not to that much extend since they typically offer more dynamic resolution and the saturation behavior is rather soft unlike the digital hard clipping. And this is where HDR image processing chimes in.

It typically distinguishes between single- and multi-image processing. Within multi-image processing, a series of Low Dynamic Range (LDR) images are taken in different exposures and combined into one single new image which contains an extended dynamic range (thanks to some clever processing). Afterwards, this version is rendered back into an LDR image by utilizing special  “tone mapping” operators which are performing a sort of dynamic range compression to obtain a better dynamic range impression but now in a LDR file.

Within single-image processing, there must be one single HDR image already available and then just tone mapping is applied. As an example, the picture below takes advantage of single-image processing from a RAW file which typically does have much higher bit-depth (12 or even 14 bit as of todays sensor tech) opposed to JPG (8 bit). As a result a lot of dynamic information can be preserved even if the output file still is just a JPG. As an added sugar, such a processed image also translates way better over a wide variety of different output devices, displays and viewing light conditions.


tasty meal preparations with Density mkIII

Since precise routing and stuff like that is not taken down into the cookbook as of now, here are some exciting tips and tricks to experiment with and maybe to obtain a different approach to cook audio with Density mkIII.


As a starter just use the default preset and dial in huge amounts of compression right with the DRIVE knob. Now mix this back to the dry signal by using the DRY:WET option to obtain a thick sounding result (New York style compression). Since the COLOR option ignores any DRY:WET settings one can dial it in afterwards to thicken the soup even further. Hmm, tasty!

Second course

Set DRY:WET back to a 100% wet signal but also pull RANGE back to the left so that there will be no gain reduction anymore. There is no compression anymore now but one can still use the MAKEUP knob to drive the gain of the non-linear circuits. Use this and experience a hot (driven) meal.

Main course

By finishing the second course, you not only have a sophisticated non-linear amplifier now where you can dial in the coloration with the COLOR knob to taste. You also can use this in M/S mode to adjust the stereo imaging in a quite unique fashion just by adjusting the amounts of saturation per channel right with the MAKEUP knobs. Omph, I’m feelin so wide now!


Just dial in again some amounts of compression by turning RANGE clockwise, maybe full to the right but RELAX the attack times so that some transients can pass. Those will be eaten now by the non-linear amplifier as an added sugar.

Espresso, anyone?

preFIX 1.0 – out now!

preFIX – getting those alignments done

[Read more…]

preFIX – final teaser and release info


preFIX - gate and expander section with detailed sidechain fitering options

[Read more…]

the gate/expander in use

written by susiwong

A basic gate has a single parameter, the threshold – when the level is above the threshold the signal passes unchanged, when the level drops below the threshold the signal gets switched off, simple as that. Attack time ideally should be as fast as possible without causing clicks or distortion, so it’s preset to a sensible compromise with most gates, a few good gates even offer you a choice of two settings. Knee, hold and release determine shape and speed of the fade out, release is responsible for the overall decay time, knee changes the behaviour around the threshold level, helping you avoid the dreaded “motorboating” effect where the gate switches on and off rapidly. Think BSS or Drawmer gate vs Alesis compressor …

Hold simply specifies the “reaction time” from the moment the signal passes the threshold till the begin of the gain reduction – critical to preserve as much meat as possible from drums or keeping guitar decay intact. This is mostly what separates the good from the bad and the ugly. Last is the “range” or “floor” parameter, it sets a certain minimal volume to which the signal gets attenuated when dropping below the threshold, instead of being muted completely. Very helpful when you need to reduce the background noise between a singer’s phrases for example, much less obtrusive than muting the track completely. Set the floor so the background noise gets masked well enough by the music, often 3dB or 6dB are enough. This technique is also known as downward expansion, paired with a longer release and soft knee it’s often used for distorted guitars (with slow decay), too.

Some good gates offer sidechain filters allowing you to “zero in” on the important part of a complex signal, take a tom mic of a multi-miced drumset for example, where a lot of similar signals (bleed) are fighting for control. Difficult even with sidechain, impossible without. Worth noting that these filters do NOT influence your audio signal, only the signal used for detection, hence the name sidechain. And finally an external sidechain allows you to even borrow a signal from another channel to trigger your gate – the creative options are huge.Unfortunately not all hosts have this implemented in a user-friendly way. One popular example is tightening up the bass by triggering its gate from the kick. [Read more…]

loudness wars – episode IV

Yes, a new hope. While some of the recently established  metering systems did not successfully managed the loudness race problems in general there seems to be a new hope concerning those issues and this comes from the broadcasters standardization efforts. Started in 2006 the ITU recommendation BS.1770­‐1 defined already some replacement for the common QPPM metering and instead was oriented towards loudness metering. [Read more…]