the world of sound localization according to psychoacoustics

Sound localization refers to the ability of the human auditory system to determine the location of a sound source in space. This is done by analyzing the differences in the arrival time, intensity, and spectral content of the sound waves that reach the two ears. The human ear is able to localize sounds both horizontally (azimuth) and vertically (elevation) in the auditory space.

The brain processes the incoming sound signals from both ears to calculate the interaural time difference (ITD) and interaural level difference (ILD), which are used to determine the location of the sound source. Interaural time difference refers to the difference in the time it takes for a sound wave to reach each ear, while interaural level difference refers to the difference in the level of the sound wave that reaches each ear.

The auditory system uses both ITD and ILD as complementary cues that work together to allow for accurate sound localization in the horizontal plane, aka stereo field. For example, sounds coming from straight ahead might have similar ITDs at both ears but different ILDs, while sounds coming from the side might have similar ILDs at both ears but different ITDs.

It’s also worth noting that the relative importance of ITD and ILD can vary depending on the frequency of the sound. At low frequencies, ITD is the dominant cue for sound localization, while at high frequencies, ILD becomes more important. Research has suggested that the crossover frequency between ILD and ITD cues for human sound localization is around 1.5 kHz to 2.5 kHz, with ITD cues being more useful below this frequency range and ILD cues being more useful above this range.

In addition to ITD and ILD, the auditory system also uses spectral cues, such as the shape of the outer ear and the filtering effects of the head and torso, to determine the location of sounds in the vertical plane and also to identify backside audio events.

The temporal characteristics of an audio event, such as its onset and duration, can have an impact on sound localization as well. Generally speaking, sounds with a more distinct onset, such as a drum hit, are easier to localize than sounds with a more sustained signal, such as white noise. This is because the onset of a sound provides a more salient cue for the auditory system to use in determining the location of the sound source, especially in regards to ITD.

In the case of a drum hit, the sharp onset creates a more pronounced difference in the arrival time and intensity of the sound at the two ears, which makes it easier for the auditory system to use ITD and ILD cues to locate the sound source. In contrast, with a more sustained signal like white noise, the auditory system may have to rely more on spectral cues and reverberation in the environment to determine the location of the sound source.

how I listen to audio today

Developing audio effect plugins involves quite a lot of testing. While this appears to be an easy task as long as its all about measurable criteria, it gets way more tricky beyond that. Then there is no way around (extensive) listening tests which must be structured and follow some systematic approach to avoid ending up in fluffy “wine tasting” categories.

I’ve spend quite some time with such listening tests over the years and some of the insights and principles are distilled in this brief article. They are not only useful for checking mix qualities or judging device capabilities in general but also give some  essential hints about developing our hearing.

No matter what specific audio assessment task one is up to, its always about judging the dynamic response of the audio (dynamics) vs its distribution across the frequency spectrum in particular (tonality). Both dimensions can be tested best by utilizing transient rich program material like mixes containing several acoustic instruments – e.g. guitars, percussion and so on – but which has sustaining elements and room information as well.

Drums are also a good starting point but they do not offer enough variety to cover both aspects we are talking about and to spot modulation artifacts (IMD) easily, just as an example. A rough but decent mix should do the job. On my very own, I do prefer raw mixes which are not yet processed that much to minimize the influence of flaws already burned into the audio content but more on that later.

Having such content in place allows to focus the hearing and to hear along a) the instrument transients – instrument by instrument – and b) the changes and impact within particular frequency ranges. Lets have a look into both aspects in more detail.

a) The transient information is crucial for our hearing because it is used not only to identify intruments but also to perform stereo localization. They basically impact how we can separate between different sources and how they are positioned in the stereo field. So lets say if something “lacks definition” it might be just caused by not having enough transient information available and not necessarily about flaws in equalizing. Transients tend to mask other audio events for a very short period of time and when a transient decays and the signal sustains, it unveils its pitch information to our hearing.

b) For the sustaining signal phases it is more relevant to focus on frequency ranges since our hearing is organized in bands of the entire spectrum and is not able to distinguish different affairs within the very same band. For most comparision tasks its already sufficient to consciously distinguish between the low, low-mid, high-mid and high frequency ranges and only drilling down further if necessary, e.g. to identify specific resonances. Assigning specific attributes to according ranges is the key to improve our conscious hearing abilities. As an example, one might spot something “boxy sounding” just reflecting in the mid frequency range at first sight. But focusing on the very low frequency range might also expose effects contributing to the overall impression of “boxyness”. This reveals further and previously unseen strategies to properly manage such kinds of effects.

Overall, I can not recommend highly enough to educate the hearing in both dimensions to enable a more detailed listening experience and to get more confident in assessing certain audio qualities. Most kinds of compression/distortion/saturation effects are presenting a good learning challenge since they can impact both audio dimensions very deeply. On the other hand, using already mixed material to assess the qualities of e.g. a new audio device turns out to be a very delicate matter.

Lets say an additional HF boost applied now sounds unpleasant and harsh: Is this the flaw of the added effect or was it already there but now just pulled out of that mix? During all the listening tests I’ve did so far, a lot of tainted mixes unveiled such flaws not visible at first sight. In case of the given example you might find root causes like too much mid frequency distortion (coming from compression IMD or saturation artifacts) mirroring in the HF or just inferior de-essing attempts. The most recent trend to grind each and every frequency resonance is also prone to unwanted side-effects but that’s another story.

Further psychoacoustic related hearing effects needs to be taken into account when we perform A/B testing. While comparing content at equal loudness is a well known subject (nonetheless ignored by lots of reviewers out there) it is also crucial to switch forth and back sources instantaneously and not with a break. This is due to the fact that our hearing system is not able to memorize a full audio profile much longer than a second. Then there is the “confirmation bias” effect which basically is all about that we always tend to be biased concerning the test result: Just having that button pressed or knowing the brand name has already to be seen as an influence in this regard. The only solution for this is utilizing blind testing.

Most of the time I listen through nearfield speakers and rarely by cans. I’m sticking to my speakers since more than 15 years now and it was very important for me to get used to them over time. Before that I’ve “upgraded” speakers several times unnecessarily. Having said that, using a coaxial speaker design is key for nearfield listening environments. After ditching digital room correction here in my studio the signal path is now fully analog right after the converter. The converter itself is high-end but today I think proper room acoustics right from the start would have been a better investment.

ThrillseekerXTC mkII released

ThrillseekerXTC – bringing mojo back

ThrillseekerXTC mkII is a psychoacoustic audio exciter based on a parallel dynamic equalizer circuit. It takes our hearing sensitivity into account especially regarding the perception of audio transients, tonality and loudness.

The mkII version now introduces:
• Plugin operating level calibration for better gainstaging and output volume compensated processing.
• A reworked DRIVE/MOJO stage featuring full bandwidth signal saturation and a strong
focus on perceived depth and dimension. It provides all those subtle qualities we typically associate with the high-end analog outboard gear.
• Special attention has been taken to the mid frequency range by introducing signal compression which improves mid-range coherence and presence.
• Relevant parts of the plugin are running at higher internal sampling frequencies to minimize aliasing artifacts.

Available for Windows VST in 32 and 64bit as freeware. Download your copy here.

42 Audio Illusions & Phenomena

In a comprehensive series of five YouTube videos, Casey Connor provided an awesome overview and demonstration of 42 (!) different psychoacoustic effects. Watching and hearing (headphones required) not only is so much entertaining and educational but also provides some deep insights why we all do not hear in the exact same way. Relevant for all of us in the audio domain whether it is sound design, mixing, mastering or development. Highly recommended!

processing with High Dynamic Range (3)

This article explores how some different HDR imaging alike techniques can be adopted right into the audio domain.

The early adopters – game developers

In the lately cross-linked article “Finding Your Way With High Dynamic Range Audio In Wwise” some good overview was given on how the HDR concept was already adopted by some game developers over the recent years. Mixing in-game audio has its very own challenge which is about mixing different arbitrary occurring audio events in real-time when the game is actually played. Opposed to that and when we do mix off-line (as in a typical song production) we do have a static output format and don’t have such issues of course.

So it comes as no surprise, that the game developer approach turned out to be a rather automatic/adaptive in-game mixing system which is capable of gating quieter sources depending on the overall volume of the entire audio plus performing some overall compression and limiting. The “off-line mixing audio engineer” can always do better and if a mix is really too difficult, even the arrangement can be fixed by hand during the mixing stage.

There is some further shortcoming and from my point of view that is the too simplistic and reduced translation from “image brightness” into “audio loudness” which might work to some extend but since the audio loudness race has been emerged we already have a clear proof how utterly bad that can sound at the end. At least, there are way more details and effects to be taken into account to perform better concerning dynamic range perception. [Read more…]

how old are your ears?

(via)

quote of the day

It’s not about bending the laws of physics at all, it’s about understanding hearing and utilizing psychoacoustics. – bootsy

hearing is believing

(via)

is seeing believing?

The McGurk effect is a compelling demonstration of how we all use visual speech information. The effect shows that we can’t help but integrate visual speech into what we ‘hear’.

This (hopefully) makes you rethink about mixing “visually”.

(thanks to Seppes Santens)

myths and facts about aliasing

A recent trend in the audio producer scene seems to be to judge an audio effect plug-in just by analyzing the harmonic spectrum, which is usually done by throwing a static sine-wave right into the plug-in and then look at the output with a FFT spectrum analyzer afterwards. In this article I’m going to talk about what this method is capable of and where its limitations and problems lie and that aliasing gets confused with a lot of other phenomenons quite often. I’m also clearly showing that this method alone is not sufficient enough to judge an audio plug-in’s quality in a blackbox situation.

a spectrum plot showing noise, harmonic distortion and aliasing

a harmonic spectrum plot showing quantization noise, harmonic distortion and aliasing effects

[Read more…]