how I listen to audio today

Developing audio effect plugins involves quite a lot of testing. While this appears to be an easy task as long as its all about measurable criteria, it gets way more tricky beyond that. Then there is no way around (extensive) listening tests which must be structured and follow some systematic approach to avoid ending up in fluffy “wine tasting” categories.

I’ve spend quite some time with such listening tests over the years and some of the insights and principles are distilled in this brief article. They are not only useful for checking mix qualities or judging device capabilities in general but also give some  essential hints about developing our hearing.

No matter what specific audio assessment task one is up to, its always about judging the dynamic response of the audio (dynamics) vs its distribution across the frequency spectrum in particular (tonality). Both dimensions can be tested best by utilizing transient rich program material like mixes containing several acoustic instruments – e.g. guitars, percussion and so on – but which has sustaining elements and room information as well.

Drums are also a good starting point but they do not offer enough variety to cover both aspects we are talking about and to spot modulation artifacts (IMD) easily, just as an example. A rough but decent mix should do the job. On my very own, I do prefer raw mixes which are not yet processed that much to minimize the influence of flaws already burned into the audio content but more on that later.

Having such content in place allows to focus the hearing and to hear along a) the instrument transients – instrument by instrument – and b) the changes and impact within particular frequency ranges. Lets have a look into both aspects in more detail.

a) The transient information is crucial for our hearing because it is used not only to identify intruments but also to perform stereo localization. They basically impact how we can separate between different sources and how they are positioned in the stereo field. So lets say if something “lacks definition” it might be just caused by not having enough transient information available and not necessarily about flaws in equalizing. Transients tend to mask other audio events for a very short period of time and when a transient decays and the signal sustains, it unveils its pitch information to our hearing.

b) For the sustaining signal phases it is more relevant to focus on frequency ranges since our hearing is organized in bands of the entire spectrum and is not able to distinguish different affairs within the very same band. For most comparision tasks its already sufficient to consciously distinguish between the low, low-mid, high-mid and high frequency ranges and only drilling down further if necessary, e.g. to identify specific resonances. Assigning specific attributes to according ranges is the key to improve our conscious hearing abilities. As an example, one might spot something “boxy sounding” just reflecting in the mid frequency range at first sight. But focusing on the very low frequency range might also expose effects contributing to the overall impression of “boxyness”. This reveals further and previously unseen strategies to properly manage such kinds of effects.

Overall, I can not recommend highly enough to educate the hearing in both dimensions to enable a more detailed listening experience and to get more confident in assessing certain audio qualities. Most kinds of compression/distortion/saturation effects are presenting a good learning challenge since they can impact both audio dimensions very deeply. On the other hand, using already mixed material to assess the qualities of e.g. a new audio device turns out to be a very delicate matter.

Lets say an additional HF boost applied now sounds unpleasant and harsh: Is this the flaw of the added effect or was it already there but now just pulled out of that mix? During all the listening tests I’ve did so far, a lot of tainted mixes unveiled such flaws not visible at first sight. In case of the given example you might find root causes like too much mid frequency distortion (coming from compression IMD or saturation artifacts) mirroring in the HF or just inferior de-essing attempts. The most recent trend to grind each and every frequency resonance is also prone to unwanted side-effects but that’s another story.

Further psychoacoustic related hearing effects needs to be taken into account when we perform A/B testing. While comparing content at equal loudness is a well known subject (nonetheless ignored by lots of reviewers out there) it is also crucial to switch forth and back sources instantaneously and not with a break. This is due to the fact that our hearing system is not able to memorize a full audio profile much longer than a second. Then there is the “confirmation bias” effect which basically is all about that we always tend to be biased concerning the test result: Just having that button pressed or knowing the brand name has already to be seen as an influence in this regard. The only solution for this is utilizing blind testing.

Most of the time I listen through nearfield speakers and rarely by cans. I’m sticking to my speakers since more than 15 years now and it was very important for me to get used to them over time. Before that I’ve “upgraded” speakers several times unnecessarily. Having said that, using a coaxial speaker design is key for nearfield listening environments. After ditching digital room correction here in my studio the signal path is now fully analog right after the converter. The converter itself is high-end but today I think proper room acoustics right from the start would have been a better investment.

interview series (12) – Daniel Weiss

First of all, congrats on your Technical Grammy Award this year! Daniel, you’ve once started DSP developments during the early days of digital audio. What was the challenge to that time?

Thank you very much, Herbert.

Yes, I started doing digital audio back in 1979 when I joined Studer-Revox. In that year Studer started their digital audio lab with a group of newly employed engineers. At that time there were no DSPs or CPUs with enough power to do audio signal processing. We used multiplier and adder chips from the 74 chip series and/or those large multiplier chips they used in military applications. The “distributed arithmetic” technique we applied. Very efficient, but compared to today’s processors very inflexible.

The main challenges regarding audio applications were:

  • A/D and D/A converters had to be designed with audio in mind.
  • Digital audio storage had to rely on video tape recorders with their problems.
  • Signal processing was hardware coded, i.e. very inflexible.
  • DAWs as we know them today have not been feasible due to the lack of speedy processors and the lack of large harddisks. (The size of the first harddisks started at about 10 MByte…).
  • Lack of any standards. Sampling frequencies, wordlengths and interfaces have not been standardized back then.

Later the TMS32010 DSP from TI became available – a very compromised DSP, hardly useable for pro audio.

And a bit later I was able to use the DSP32 from AT&T, a floating point DSP which changed a lot for digital audio processing.

What makes such a converter design special in regards to audio and was the DSP math as we know it today already in place or was that also something rather emerging to that time?

The A/D and D/A converters back then had the problem that they either were not fast enough to do audio sampling frequencies (like 44.1 kHz) and/or their resolution was not high enough, i.e. not 14 Bits or higher.

There were some A/D and D/A modules available which were able to do digital audio conversion, but those were very expensive. One of the first (I think) audio specific D/A converters was the Philips TDA1540 which is a 14 bit converter but which has a linearity better than 14 bit. So we were able to enhance the TDA1540 by adding an 8 bit converter chip to generate two more bits for a total of about 16bits conversion quality.

The DSP math was the same as it is today – mathematics is still the same, right? And digital signal processing is applied mathematics using the binary numbering system. The implementation of adders and multipliers to some extent differed to today’s approaches, though. The “distributed arithmetic” I mentioned for instance worked with storage registers, shift registers, a lookup table in ROM and an adder / storage register to implement a complete FIR filter. The multiplication was done via the ROM content with the audio data being the addresses of the ROM and the output of the ROM being the result after the multiplication.

An explanation is given here: http://www.ee.iitm.ac.in/vlsi/_media/iep2010/da.pdf

Other variants to do DSP used standard multiplier and adder chips which have been cascaded for higher word-lengths. But the speed of those chips was rather compromised when comparing to today’s processors.

Was there still a need to workaround such word-length and sample rate issues when you designed and manufactured the very first digital audio equipment under your own brand? The DS1 compressor already introduced 96kHz internal processing right from the start, as far as I remember. What were the main reasons for 96kHz processing?

When I started at Studer the sampling frequencies have been all over the place. No standards yet. So we did a universal Sampling Frequency Converter (Studer SFC16) which also had custom built interfaces as those haven’t been standardized either. No AES/EBU for instance.

Later when I started Weiss Engineering the 44.1 and 48 kHz standards had already been established. We then also added 88.2 / 96kHz capabilities to the modular bw102 system, which was what we had before the EQ1, DS1 units. It somehow became fashionable to do high sampling frequencies. There are some advantages to that, such as a higher tolerance to non-linearly treated signals or less severe analog filtering in converters.

The mentioned devices were critically acclaimed not only by mastering engineers over the years. What makes them so special? Is it the transparency or some other distinct design principle? And how to achieve that?

There seems to be a special sound with our devices. I don’t know what exactly the reason is for that. Generally we try to do the units technically as good as possible. I.e. low noise, low distortion, etc.
It seems that this approach helps when it comes to sound quality….
And maybe our algorithms are a bit special. People sometimes think that digital audio is a no brainer – there is that cookbook algorithm I implement and that is it. But in fact digital offers as many variants as analog does. Digital is just a different representation of the signal.

Since distortion is such a delicate matter within the design of a dyncamic processor: Can you share some insights about managing distortion in such a (digital) device?

The dynamic processor is a level controller where the level is set by a signal which is generated out of the audio signal. So it is an amplitude modulator which means that sidebands are generated. The frequency and amplitude of the sidebands depend on the controlling signal and the audio signal. Thus in a worst case it can happen that a sideband frequency lies above half the sampling frequency (the Nyquist frequency) and thus gets mirrored at the Nyquist frequency. This is a bad form of distortion as it is not harmonically related to the audio signal.
This problem can be solved to some extent by rising the sampling frequency (e.g. doubling it) before the dynamic processing is applied, such that the Nyquist frequency is also doubled.

Another problem in dynamics processors is the peak detection. In high frequency peaks the actual peak can be positioned between two consecutive samples and thus get undetected because the processor only sees the actual samples. This problem can be solved to some extent by upsampling the sidechain (where the peak detection takes place) to e.g. 2 or 4 times the audio sampling frequency. This then allows to have kind of a “true peak” measurement.

Your recent move from DSP hardware right into the software plugin domain should not have been that much of a thing. Or was it?

Porting a digital unit to a plug-in version is somewhat simpler compared to the emulation of an analog unit.
But the porting of our EQ1 and DS1 units was still fairly demanding, though. The software of five DSPs and a host processor had to be ported to the computer platform. The Softube company did that for us.

Of course we tried to achieve a 1:1 porting, such that the hardware and the plugin would null perfectly. This is almost the case. There are differences in the floating point format between DSPs and computer, so it is not possible to get absolutely the same – unless one would use fixed point arithmetic; which we do not like to use for the applications at hand.
The plugin versions in addition have more features because the processing power of a computer CPU is much higher than the five (old) DSPs the hardware uses. E.g. the sampling frequency can go up to 192kHz (hardware: 96kHz) and the dynamics EQ can be dynamic in all seven bands (hardware: 4 bands maximum).

Looking into the future of dynamic processing: Do you see anything new on the horizon or just the continuation of recent trends?

We at Weiss Engineering haven’t looked into the dynamics processing world recently. Probably one could do some more intelligent approaches than the current dynamics processors use. Like e.g. look at a whole track and decide on that overview what to do with the levels over time. Also machine learning could help – I guess some people are working in that direction regarding dynamics processing.

From your point of view: Will the loudness race ever come to an end and can we expect a return of more fidelity back into the consumer audio formats?

The streaming platforms help in getting the loudness race to a more bearable level. Playlists across a whole streaming platform should have tracks in them with a similar loudness level for similar genres. If one track sticks out it does not help. Some platforms luckily take measures in that direction.

Daniel, do you use any analog audio equipment at all?

We may have a reputation in digital audio, but we do analog as well. A/D and D/A converters are mostly analog and our A1 preamp has an analog signal path. Plus more analog projects are in the pipeline…

Related Links

interview series (11) – Andreas Eschenwecker

Andy, your Vertigo VSC compressor has already become a modern classic. What has been driven you to create such a device?

I really like VCA compressors. VCA technology gives you a lot of freedom in design and development and the user gets a very flexible tool at the end. I was very unhappy with all VCA compressors on the market around 2000. Those were not very flexible for different applications. These units were working good in one certain setting only. Changing threshold or other parameters was fiddley and so on. But the main point starting the VSC project was the new IC VCA based compressors sounded one dimensional and boxy.

Does this mean your design goal was to have a more transparent sounding device or does the VSC also adds a certain sound but just in a different/better way?

Transparency without sounding clean and artificial. The discrete Vertigo VCAs deliver up to 0,6% THD. Distortion can deliver depth without sounding muddy.

Does this design favour certain harmonics or – the other way around – supresses some unwanted distortions?

The VSC adds a different distortion spectrum depending when increasing input level or adding make-up. The most interesting fact is that most of the distortion and artifacts are created in the release phase of the compressor. The distortion is not created on signal peaks. It’s becoming obvious when the compressor sets back from gainreduction to zero gainreduction. Similar to a reverb swoosh… after the peak that was leveled.

Where does your inspiration comes from for such technical designs?

With my former company I repaired and did measurements on many common classic and sometimes ultra-rare compressors. Some sounded pretty good but were unreliable – some were very intuitive in a studio situation, some not…
At this time I slowly developed an idea what kind of compressor I would like to use in daily use.

From your point of view: To which extend did the compressor design principles changed over the years?

The designs changed a lot. Less discrete parts, less opto compressors (because a lot of essential parts are no longer produced), tube compressors suffer from poor new tube manufacturing and some designers nowadays go more for RMS detection and feed forward topology. With modern components there was no need for a feedback SC arrangement anymore. I think RMS is very common now because of its easy use at the first glance. For most applications I prefer Peak detection.

Having also a VSC software version available: Was it difficult to transfer all that analog experience into the digital domain? What was the challenge?

In my opinion the challenge is to sort out where to focus on. What influence has the input transformer or the output stage? Yes some of course. Indeed most of the work was going into emulating the detection circuit.

Which advantages did you experienced with the digital implementation or do you consider analog to be superior in general?

I am more an analog guy. So I still prefer the hardware. What I like about the digital emulations is that some functions are easy to implement in digital and would cost a fortune in production of the analog unit.

Any plans for the future you might want to share?

At the moment I struggle with component delays. 2021/22 is not the right time for new analog developments. I guess some new digital products come first.

Related Links

The TesslaSE Remake

There were so many requests to revive the old and rusty TesslaSE which I’ve once moved already into the legacy folder. In this article I’m going to talk a little bit about the history of the plugin and its upcoming remake.

The original TesslaSE audio plugin was one of my first DSP designs aiming at a convincing analog signal path emulation and it was created already 15 years ago! In its release info it stated to “model pleasant sounding ‘electric effects’ coming from transformer coupled tube circuits in a digital controlled fashion” which basically refers to adding harmonic content and some subtle saturation as well as spatial effects to the incoming audio. In contrast to static waveshaping approaches quite common to that time, those effects were already inherently frequency dependent and managed within a mid/side matrix underneath.

(Later on, this approach emerged into a true stateful saturation framework capable of modeling not only memoryless circuits and the TesslaPro version took advantage of audio transient management as well.)

This design was also utilized to supress unwanted aliasing artifacts since flawless oversampling was still computational expensive to that time. And offering zero latency on top, TesslaSE always had a clear focus on being applied over the entire mixing stage, providing all those analog signal path subtleties here and there. All later revisions also sticked to the very same concept.

With the 2021 remake, TesslaSE mkII won’t change that as well but just polishing whats already there. The internal gainstaging has been reworked so that everything appears gain compensated to the outside and is dead-easy to operate within a slick, modernized user interface. Also the transformer/tube cicuit modeling got some updates now to appear more detailed and vibrant, while all non-linear algorithms got oversampled for additional aliasing supression.

On my very own, I really enjoy the elegant sound of the update now!

TesslaSE mkII will be released by end of November for PC/VST under a freeware license.

ThrillseekerXTC mkII released

ThrillseekerXTC – bringing mojo back

ThrillseekerXTC mkII is a psychoacoustic audio exciter based on a parallel dynamic equalizer circuit. It takes our hearing sensitivity into account especially regarding the perception of audio transients, tonality and loudness.

The mkII version now introduces:
• Plugin operating level calibration for better gainstaging and output volume compensated processing.
• A reworked DRIVE/MOJO stage featuring full bandwidth signal saturation and a strong
focus on perceived depth and dimension. It provides all those subtle qualities we typically associate with the high-end analog outboard gear.
• Special attention has been taken to the mid frequency range by introducing signal compression which improves mid-range coherence and presence.
• Relevant parts of the plugin are running at higher internal sampling frequencies to minimize aliasing artifacts.

Available for Windows VST in 32 and 64bit as freeware. Download your copy here.

Getting the most out of the SPL Tube Vitalizer

In this article I’m going to share some analysis insights but also proposing an easy to follow 3-step approach for finding the sweet spot while processing any kind of material with this device.

Preparing for winter season: room heating with style

So, having now a Tube Vitalizer here on my desk (at least for some time), I was surprised about the lack of usable online reviews and background information. One just finds the usual YT quality stuff which might be entertaining in the best case but also spreads misinformation ever so often. To save those influencers honor it must be said that the Vitalizer concept is really not that easy to grasp and its quirky user experience makes it not easier. The manual itself is a mixed bag since it contains some useful hints and graphs on the one hand but lots of marketing blurb obscuring things on the other. Time to clean up the mess a little bit.

What it actually does

While easily slotted into the “audio exciter” bucket, some more words are needed to describe what it actually does. Technically speaking, the Vitalizer is basically a parallel dynamic equalizer with an actual EQ curve behaviour which aims to mimic equal loudness contours as specified in ISO226. Rather simplified, it can be seen as a high and low frequency shelving EQ to dial in a basic “smile” EQ curve but one which takes hearing related (psychoacoustic) loudness effects into account. It does this also by generating curves differently based on signal levels, hence the term “dynamic EQ”. And wait, it also adds harmonic content galore.

Taming the beast

To obtain an equal loudness contour the main equalizers center frequency must be properly set depending on the tonal balance of the actual source material. This center frequency can be dialed in somewhere between 1k and 20kHz by adjusting the Hi-Mid Freq knob which defines a cross-over point: while frequencies below that point gets attenuated, the higher frequencies gets boosted. However, this attenuation is already a signal level dependent effect. Opposed to that, the LF EQ itself (which actually is not a shelving but a bell type curve) has a fixed frequency tuned to 50Hz and just the desired boost amount needs to be dialed in. The LF curve characteristic can be further altered (Bass soft/tight) which basically thickens or thins out the below 100Hz area. Finally, this EQ path can be compressed now with the Bass Comp option.

A typical EQ curve created by the Vitalizer

On top of the main EQ path, the Tube Vitalizer offers an additional HF boost and compression option which both can be dialed in to complement the LF behaviour in a very similar fashion but in the high frequency department. Internally, both are in a parallel configuration and mixed back into a dry signal path. The according Process Level knob can be seen as a kind of dry/wet option but only for main the EQ part. The upper HF part is mixed back in separately by the Intensity dial.

Gain-Staging is key

For the EQ section as a whole, the Drive knob is the ticket for proper gain-staging. If compression can be dialed in properly for both compressors (as indicated by the blue flashing lights) input gain is in the right ballpark. One might expect to hear actual compression going on but it appears to be a rather gentle leveling effect.

Gain-staging for the output stage has to be concerned separately which might become an issue if the tube stage is activated and operates in shunt limiting mode. Now you have to take care about proper input levels since the Attenuators for both output channels are operating after the limiter and not beforehand.

Tube stage limiting: input (red) vs output (blue)

Which directly leads us to the additional harmonic content created by this device. First of all, there is always additional harmonic content created by this device, no matter what. One might expect the device to not show any such content with the solid state output stage but it actually does. The tube output stage just increases that content but signal level dependent of course and 2nd order harmonics are always part of that content. A serious additional amount of harmonics gets added as soon as the HF filter gets engaged by dialing in Intensity (and LC Filter mode activated!) but this sounds always very smooth and natural in the top end, surprisingly.

Delicious content

Also impressive is the low noisefloor for both output stage modes, tube and solid state. The first one introduces pretty strong channel crosstalk, though.

Workflow – Finding the sweet spot in 3 easy steps

Initial condition:

  • Drive, Bass, Bass Comp and Intensity set to 0
  • Device is properly gain-staged

1. Set Process to 5 and now find the best fit for Hi-Mid Freq for the given source material. For already mixed 2bus stuff you can narrow it down to 2-3kHz most likely.

2. Dial in Bass (either left or right depending on source and taste) and some compression accordingly.

3. Only then dial in some further HF content via Intensity and some compression accordingly. Adjust HF Freq so it basically fits the source/taste.

Workflow – Tweaking just one knob

My good old buddy Bootsy told me this trick which works surprisingly well.

Initial condition:

  • Left most position: Bass
  • Right most position: Bass Comp, High Comp, High Freq
  • 12-o-clock position: Drive, Intensity
  • Hi-Mid-Freq set to 2.5kHz

Now, just dial in some (few) Process Level to taste.

He also recommends to drive the input to some extend (VU hitting the red zone) using the Tube stage in limiter mode while always engaging LC Filter mode for HF.

What loudspeakers and audio transformers do have in common

Or: WTF is “group delay”?

Imagine a group of people visiting an exhibition having a guided tour. One might expect that the group reaches the exhibitions exit as a whole but in reality there might be a part of that group just lagging behind a little bit actually (e.g. just taking their time).

Speaking in terms of frequency response within audio systems now, this sort of delay is refered to as “group delay”, measured in seconds. And if parts of the frequency range do not reach a listeners ear within the very same time this group delay is being refered to as not being constant anymore.

A flat frequency response does not tell anything about this phenomena and group delay must always be measured separately. Just for reference, delays above 1-4ms (depending on the actual frequency) can actually be perceived by human hearing.

This always turned out to be a real issue in loudspeaker design in general because certain audio events can not perceived as a single event in time anymore but are spread across a certain window of time. The root cause for this anomaly typically lies in electrical components like frequency splitters, amplifiers or filter circuits in general but also physical loudspeaker construction patterns like bass reflex ports or transmission line designs.

Especially the latter ones actually do change the group delay for the lower frequency department very prominently which can be seen as a design flaw but on the other hand lots of hifi enthusiast actually do like this low end behaviour which is able to deliver a very round and full bass experience even within a quite small speaker design. In such cases, one can measure more than 20ms group delay within the frequency content below 100Hz and I’ve seen plots from real designs featuring 70ms at 40Hz which is huge.

Such speaker designs should be avoided in mixing or mastering situation where precision and accuracy is required. It’s also one of the reasons why we can still find single driver speaker designs as primary or additional monitoring options in the studios around the world. They have a constant group delay by design and do not mess around with some frequency parts while just leaving some others intact.

As mentioned before, also several analog circuit designs are able to distort the constant group delay and we can see very typical low end group delay shifts within audio transformer coupled circuit designs. Interestingly, even mastering engineers are utilizing such devices – whether to be found in a compressor, EQ or tape machine – in their analog mastering chain.

out now: SlickEQ “Gentleman’s Edition”

SlickEQ_German

Key specs and features

  • Modern user interface with outstanding usability and ergonomics
  • Carefully designed 64bit “delta” multi-rate structure
  • Three semi-parametric filter bands, each with two shape options
  • Five distinct EQ models: American, British, German, Soviet and Japanese
  • Low band offers an optional phase-lag able to delay low frequencies relative to higher frequencies
  • High pass filter with optional “Bump” mode
  • Low pass filter with two different slopes (6dB/Oct and 12dB/Oct)
  • Parametric Tilt filter with optional “V” mode.
  • Six output stages: Linear, Silky, Mellow, Deep, Excited and Toasted
  • Advanced saturation algorithms by VoS (“Stateful saturation”)
  • Highly effective loudness compensated auto gain control
  • Stereo, mono and sum/difference (mid/side) processing options
  • Frequency magnitude plot
  • Tool-bar with undo/redo, A/B, advanced preset management and more

SlickEQ is a collaborative project by Variety of Sound (Herbert Goldberg) and Tokyo Dawn Labs (Vladislav Goncharov and Fabien Schivre). For more details, please refer to the official product page: http://www.tokyodawn.net/tdr-vos-slickeq-ge/

Related

interview series (9) – D.W. Fearn

Doug, when and how did you arrived in the music business?

I have had an interest in electronics ever since I was a kid growing up in the 1950s and 1960s. I built a crystal radio  receiver when I was 8 and my first audio amplifier (tubes, of course) when I was 10. I passed the test for an amateur radio license when I was 12 and that experience of communicating using Morse code was excellent training for  learning to hear. I built a lot of my own radio equipment, and experimented with my own designs.

The high school I attended had an FM broadcast station. Most of the sports and musical events were broadcast, and I learned about recording orchestras, marching bands, choirs, and plays. Friends asked me to record their bands, which was my first experience working with non-classical music.

Another major factor was that my father was a French horn player in the Philadelphia Orchestra. As a kid, I would attend concerts, rehearsals, and sometimes recording sessions and broadcasts. I learned a lot about acoustics by walking around the Academy of Music in Philadelphia during rehearsals.

It would seem logical that my musical exposure and my interest in electronics would combine to make the career in pro audio I have had for over 40 years now.

I was a studio owner for many years before starting the D.W. Fearn manufacturing business, which started in 1993. [Read more…]

a very comprehensive review on Thrillseeker VBL

And don’t miss to read the whole review here with lots of hands-on examples.