how I listen to audio today

Developing audio effect plugins involves quite a lot of testing. While this appears to be an easy task as long as its all about measurable criteria, it gets way more tricky beyond that. Then there is no way around (extensive) listening tests which must be structured and follow some systematic approach to avoid ending up in fluffy “wine tasting” categories.

I’ve spend quite some time with such listening tests over the years and some of the insights and principles are distilled in this brief article. They are not only useful for checking mix qualities or judging device capabilities in general but also give some  essential hints about developing our hearing.

No matter what specific audio assessment task one is up to, its always about judging the dynamic response of the audio (dynamics) vs its distribution across the frequency spectrum in particular (tonality). Both dimensions can be tested best by utilizing transient rich program material like mixes containing several acoustic instruments – e.g. guitars, percussion and so on – but which has sustaining elements and room information as well.

Drums are also a good starting point but they do not offer enough variety to cover both aspects we are talking about and to spot modulation artifacts (IMD) easily, just as an example. A rough but decent mix should do the job. On my very own, I do prefer raw mixes which are not yet processed that much to minimize the influence of flaws already burned into the audio content but more on that later.

Having such content in place allows to focus the hearing and to hear along a) the instrument transients – instrument by instrument – and b) the changes and impact within particular frequency ranges. Lets have a look into both aspects in more detail.

a) The transient information is crucial for our hearing because it is used not only to identify intruments but also to perform stereo localization. They basically impact how we can separate between different sources and how they are positioned in the stereo field. So lets say if something “lacks definition” it might be just caused by not having enough transient information available and not necessarily about flaws in equalizing. Transients tend to mask other audio events for a very short period of time and when a transient decays and the signal sustains, it unveils its pitch information to our hearing.

b) For the sustaining signal phases it is more relevant to focus on frequency ranges since our hearing is organized in bands of the entire spectrum and is not able to distinguish different affairs within the very same band. For most comparision tasks its already sufficient to consciously distinguish between the low, low-mid, high-mid and high frequency ranges and only drilling down further if necessary, e.g. to identify specific resonances. Assigning specific attributes to according ranges is the key to improve our conscious hearing abilities. As an example, one might spot something “boxy sounding” just reflecting in the mid frequency range at first sight. But focusing on the very low frequency range might also expose effects contributing to the overall impression of “boxyness”. This reveals further and previously unseen strategies to properly manage such kinds of effects.

Overall, I can not recommend highly enough to educate the hearing in both dimensions to enable a more detailed listening experience and to get more confident in assessing certain audio qualities. Most kinds of compression/distortion/saturation effects are presenting a good learning challenge since they can impact both audio dimensions very deeply. On the other hand, using already mixed material to assess the qualities of e.g. a new audio device turns out to be a very delicate matter.

Lets say an additional HF boost applied now sounds unpleasant and harsh: Is this the flaw of the added effect or was it already there but now just pulled out of that mix? During all the listening tests I’ve did so far, a lot of tainted mixes unveiled such flaws not visible at first sight. In case of the given example you might find root causes like too much mid frequency distortion (coming from compression IMD or saturation artifacts) mirroring in the HF or just inferior de-essing attempts. The most recent trend to grind each and every frequency resonance is also prone to unwanted side-effects but that’s another story.

Further psychoacoustic related hearing effects needs to be taken into account when we perform A/B testing. While comparing content at equal loudness is a well known subject (nonetheless ignored by lots of reviewers out there) it is also crucial to switch forth and back sources instantaneously and not with a break. This is due to the fact that our hearing system is not able to memorize a full audio profile much longer than a second. Then there is the “confirmation bias” effect which basically is all about that we always tend to be biased concerning the test result: Just having that button pressed or knowing the brand name has already to be seen as an influence in this regard. The only solution for this is utilizing blind testing.

Most of the time I listen through nearfield speakers and rarely by cans. I’m sticking to my speakers since more than 15 years now and it was very important for me to get used to them over time. Before that I’ve “upgraded” speakers several times unnecessarily. Having said that, using a coaxial speaker design is key for nearfield listening environments. After ditching digital room correction here in my studio the signal path is now fully analog right after the converter. The converter itself is high-end but today I think proper room acoustics right from the start would have been a better investment.

Dynamic 1073/84 EQ curves?

Yes we can! The 1073 and 84 high shelving filters are featuring that classic frequency dip upfront the HF boost itself. Technically speaking they are not shelves but bell curves with a very wide Q but anyway, wouldn’t it be great if that would be program dependent in terms of expanding and compressing according to the curve shape and giving a dynamic frequency response to the program material?

Again, dynamic EQs makes this an easy task today and I just created some presets for the TDR Nova EQ which you can copy right from here (see below after the break). Instructions: Choose one of the 3 presets (one for each specific original frequency setting – 10/12/16kHz) and just tune the Threshold parameter for band IV (dip operation) and band V (boost operation) to fit to the actual mix situation.

They sound pretty much awesome! See also my Nova presets for the mixbus over here and the Pultec ones here.

[Read more…]

Kotelnikov GE – mastering

Here is my go-to mastering preset for Kotelnikov GE. Just change the threshold and you are there.

<TDRKotelnikovGE thresholdParam=”-24.0″ peakCrestParam=”-3.0″ softKneeParam=”6.0″ ratioParam=”3.0″ attackParam=”6.0″ releasePeakParam=”20″ releaseRMSParam=”300″ makeUpParam=”0.0″ dryMixParam=”off” outGainParam=”0.0″ keyHPFrequencyParam=”60″ keyHPSlopeParam=”6.0″ keyStereoDiffParam=”80″ keyStereoBalanceParam=”Center” fdrVisibleParam=”On” fdrActiveParam=”On” fdrTypeParam=”Shelf A” fdrFrequencyParam=”50″ fdrAmountParam=”80″ yingParam=”On” yangParam=”Off” deltaParam=”Off” bypassParam=”Off” equalLoudParam=”Off” qualityParam=”Insane” modeParam=”Stereo” grDispScaleParam=”4″ grDispModeParam=”Gain Reduction”/>

released: SlickHDR

SlickHDRSlickHDR is a “Psychoaccoustic Dynamic Processor” which:

  • balances the perceived global vs. local micro dynamics of any incoming audio.
  • creates a rich in contrast, detailed and clearly perceived image which translates way better across different listening environments.
  • provides a convenient workflow by simply adjusting three dynamic processors to show a roughly same load.
  • offers further and detailed control about overall tone and release time behavior.

The stunning UI artwork and all renders were done by Patrick once again. Made with love in switzerland – as he said!

SlickHDR is a freeware VST audio plug-in for Windows x32 and you can download a copy right here: >>> DOWNLOAD <<<

Related Links

SlickHDR – final teaser & release info

teaserSlickHDR is a “Psychoaccoustic Dynamic Processor” which:

  • balances the perceived global vs. local micro dynamics of any incoming audio.
  • creates a rich in contrast, detailed and clearly perceived image which translates way better across different listening environments.
  • provides a convenient workflow by simply adjusting three dynamic processors to show a roughly same load.
  • offers further and detailed control about overall tone and release time behavior.

Technically speaking, SlickHDR contains a coupled network of three dynamic processors with two of them running in a “stateful saturation” configuration and one based on look-ahead processing.

Fixed amounts of the unprocessed signal are then injected into the network at several specific points and also mixed back into the networks output. Being networked, all processors are highly interacting with each other and this is utilized to cope with a wide variety of sound (sic!) to balance the perceived audio dynamic range.

The stunning UI artwork and render was done by Patrick once again. Made with love in switzerland – as he said.

SlickHDR will be available around end of January 2014 as a freeware VST audio plug-in for Windows x32.

processing with High Dynamic Range (3)

This article explores how some different HDR imaging alike techniques can be adopted right into the audio domain.

The early adopters – game developers

In the lately cross-linked article “Finding Your Way With High Dynamic Range Audio In Wwise” some good overview was given on how the HDR concept was already adopted by some game developers over the recent years. Mixing in-game audio has its very own challenge which is about mixing different arbitrary occurring audio events in real-time when the game is actually played. Opposed to that and when we do mix off-line (as in a typical song production) we do have a static output format and don’t have such issues of course.

So it comes as no surprise, that the game developer approach turned out to be a rather automatic/adaptive in-game mixing system which is capable of gating quieter sources depending on the overall volume of the entire audio plus performing some overall compression and limiting. The “off-line mixing audio engineer” can always do better and if a mix is really too difficult, even the arrangement can be fixed by hand during the mixing stage.

There is some further shortcoming and from my point of view that is the too simplistic and reduced translation from “image brightness” into “audio loudness” which might work to some extend but since the audio loudness race has been emerged we already have a clear proof how utterly bad that can sound at the end. At least, there are way more details and effects to be taken into account to perform better concerning dynamic range perception. [Read more…]

processing with High Dynamic Range (2)

This comprehensive and in-depth article about HDR imaging was written by Sven Bontinck, a professional photographer and a hobby-musician.

A matter of perception.

To be able to use HDR in imaging, we must first understand what dynamic range actually means. Sometimes I notice people mistake contrast in pictures with the dynamic range. Those two concepts have some sort of relationship, but are not the same. Let me start by explaining in short how humans receive information with our eyes and ears. This is important because it influences the way we perceive what we see and hear and how we interpret that information.

We all know about the retina in our eyes where we find the light-sensitive sensors, the rods and cones. The cones provide us daytime vision and the perception of colours. The rods allow us to see low-light levels and provide us black-and-white vision. However there is a third kind of photoreceptors, the so-called photosensitive ganglion cells. These cells give our brain information about length-of-day versus length-of-night duration, but also play an important role in the pupillary control. Every sensor need a minimum amount of incitement to be able to react. At the same time all kind of sensors have a maximum amount that they may be exposed to. Above that limit, certain protection mechanisms start interacting to prevent damage occurring to the sensors. [Read more…]

processing with High Dynamic Range (1)

Back in time when I was at university, my very first DSP lectures were actually not about audio but image processing. Due to my interest in photography I followed this amazing and ever evolving domain over time. Later on, High Dynamic Range (HDR) image processing emerged and beside its high impact on digital photography, I immediately started to ask myself how such techniques could be translated into the audio domain. And to be honest, for quite some time I haven’t got a clue.

MM

This image shows a typical problem digital photography still suffers from: The highlights are completely washed out and so the lowlights are turning into black abruptly w/o containing further nuances  – the dynamic range performance is pretty much poor and this is actually not what the human eye would perceive since it features both: a higher dynamic range per se but also a better adoption to different (and maybe difficult) lighting conditions.

On top, we have to expect severe dynamic range limitations in the output entities whether that’s a cheap digital print, a crappy TFT display or the limited JPG file format, just as an example. Analog film and prints does have such problems in principle also but not to that much extend since they typically offer more dynamic resolution and the saturation behavior is rather soft unlike the digital hard clipping. And this is where HDR image processing chimes in.

It typically distinguishes between single- and multi-image processing. Within multi-image processing, a series of Low Dynamic Range (LDR) images are taken in different exposures and combined into one single new image which contains an extended dynamic range (thanks to some clever processing). Afterwards, this version is rendered back into an LDR image by utilizing special  “tone mapping” operators which are performing a sort of dynamic range compression to obtain a better dynamic range impression but now in a LDR file.

Within single-image processing, there must be one single HDR image already available and then just tone mapping is applied. As an example, the picture below takes advantage of single-image processing from a RAW file which typically does have much higher bit-depth (12 or even 14 bit as of todays sensor tech) opposed to JPG (8 bit). As a result a lot of dynamic information can be preserved even if the output file still is just a JPG. As an added sugar, such a processed image also translates way better over a wide variety of different output devices, displays and viewing light conditions.

MM-HDR

interview series (8) – Sascha Eversmeier

Sascha, are you a musician yourself or do you have some other sort of musical background? And how did you once got started developing your very own audio DSP effects?

I started learning to play bass guitar in early 1988, when I was 16. Bass is still my main instrument, although I also play a tiny bit of 6-string, but I’d say I suck at that.

The people I played with in a band in my youth where mostly close friends I grew up with, and most of us kept on making music together when we finished school a couple of years later. I still consider that period (mid-nineties) as sort of my personal heyday, musical-wise. It’s when you think you’re doing brilliant things but the world doesn’t take notice. Anyway. Although we all started out doing Metal, we eventually did Alternative and a bit of Brit-influenced Wave Rock back then.

That was also the time when more and more affordable electronic gear came up, so apart from doing the usual rock-band lineup, we also experimented with samplers, DATs, click tracks and PCs as recording devices. While that in fact made the ‘band’ context more complex – imagine loading in a dozen disks into the E-MU on every start of the rehearsal until we equipped it with an MO drive – we soon found ourselves moving away from writing songs through jamming and more to actually “assembling” them by using a mouse pointer. In hindsight, that was really challenging. Today, the DAW world and the whole process of creating music is so much simpler and intuitive, I think.

My first “DAW” was a PC running at 233Mhz, and we used PowerTracks Pro and Micro Logic – a stripped-down version of Logic -, although the latter never clicked with me. In 1996 or 97 – can’t remember – I purchased Cubase and must have ordered right within a grace period, as I soon got a letter from Steinberg saying they now finished the long-awaited VST version and I could have it for free, if I want. WTF? I had no idea what they were talking about. But Virtual Studio Technology, that sounded like I was given the opportunity to upgrade myself to being “professional”. How flattering, you clever marketing guys. Yes, gimme the damn thing, hehe.

When VST arrived, I was blown away. I had a TSR-8 reel machine, a DA-88 and a large Allen&Heath desk within reach and was used to run the computer as a midi sequencer mainly. And now, I could do it all inside that thing. Unbelievable. Well, the biggest challenge then was finding an affordable audio card, and I bought myself one that only had S/PDif in & outputs and was developed by a German electronics magazine and sold in small amounts through a big retail store in Cologne, exclusively. 500 Deutschmarks for 16 bits on an ISA card. Wow.

The first plugin I bought was Waves Audio Track, sort of a channel strip, which was a cross-promotion offer from Steinberg back then, 1997, I guess. I can still recall its serial number by heart.

Soon, the plugin scene lifted off, and I collected everything I could, like the early mda stuff, NorthPole and other classics. As our regular band came to nothing, we gathered our stuff and ran sort of a small project studio where we recorded other bands and musicians and started using the PC as the main recording device. I upgraded the audio hardware to an Echo Darla card, but one of my mates soon brought in a Layla rack unit so that we had plenty of physical ins and outs.

You really couldn’t foresee where the audio industry would go, at least I couldn’t. I went fine with this “hybrid” setup for quite a long time, and did lots of recording and editing back then, but wasn’t even thinking of programming audio software myself at all. I had done a few semesters of EE studies, but without really committing myself much.

Then the internet came along. In 1998, I made a cut and started taking classes in Informatics. Finished in 2000, I moved far away, from West Germany, to Berlin and had my first “real” job in one of those “new economy” companies, doing web-based programming and SQL. That filled the fridge and was fun to do somehow, but wasn’t really challenging. As my classes included C, C++ and also Assembler, and I still got a copy of Microsoft’s Visual Studio, I signed up to the VST SDK one day. At first, I might have done pretty much the same thing as everybody: compile the “gain” and “delay” plugin examples and learn how it all fits together. VST was still at version 1 at that time, so there were no instruments yet, but I wasn’t interested much in those anyway, or at least I could imagine writing myself a synthesizer. What I was more interested in was how to manipulate the audio so that it could sound like a compressor or a tube device. I was really keen on dynamics processing at that time, perhaps because I always had too few of those units. I had plenty available when I was working part-time as a live-sound engineer, but back in my home studio, a cheap Alesis, dbx or Behringer was all I could afford. So why not try to program one? I basically knew how to read schematics, I knew how to solder, and I thought I knew how things should sound like, so I just started out hacking things together. Probably in the most ignorant and naive way, from today’s perspective. I had no real clue, and no serious tool set, apart from an old student’s copy of Maple and my beloved Corel 7. But there were helpful people on the internet and a growing community of people devoted to audio software, and that was perhaps the most important factor. You just weren’t alone. [Read more…]

interview series (7) – Dave Gamble

Dave, can you tell us a little about how you got into music, and your professional career as an audio effects developer so far?

Started writing trackers as a child, then wrote some code to allow me to DJ with trackers. By 14 I was writing commercial software. Had some great teachers and lecturers who helped me a lot. Did my final-year project with Focusrite. Won the project prize. Spent 4.5 years at Focusrite (I was employee 12 or 13) to add DSP to the company, during which time we acquired Novation, and grew quite a lot. We made a lot of money from audio interfaces, so that kinda took over, and I wanted to get back to the DSP (at Focusrite I did Forte suite, helped with Liquid Channel/Mix, Saffire suite, plus other non DSP projects). Left for Sonalksis, built all their shipping products (except CQ1 and DQ1), although I’d built tbk1 years before and they’d been selling it. Was fun but chaotic. Left to go freelance so I could start my own outfit, during which time I worked with Neyrinck, TAC System, Focusrite, Novation, Studio Devil, FXpansion, Brainworx/Plugin Alliance, etc. Then started dmgaudio. And here we are now. [Read more…]