A recent trend in the audio producer scene seems to be to judge an audio effect plug-in just by analyzing the harmonic spectrum, which is usually done by throwing a static sine-wave right into the plug-in and then look at the output with a FFT spectrum analyzer afterwards. In this article I’m going to talk about what this method is capable of and where its limitations and problems lie and that aliasing gets confused with a lot of other phenomenons quite often. I’m also clearly showing that this method alone is not sufficient enough to judge an audio plug-in’s quality in a blackbox situation.
So, what do we actually see in such a spectrum plot?
All kinds of noise in general
The quality of noise is its random distribution over time and frequency. It usually does not correlate to other artifacts and so it appears in the spectrum rather uncorrelated to the input signal.
Which is a special case of noise. Though, this kind of noise relates and depends on the DSP math which is used (floating point versus integer) and the according bit depth. Its algorithm dependent too and in general most feedback structures are sensitive to quantization errors. A specific example are the different Biquad implementations and how they are affected by the bit resolution which is used. Some show more noise in the low-frequency spectrum than others (which busts another myth that all Biquads are the same but I’m not going further into this one here – if you are interested just read this short but excellent overview).
The quality of this kind of distortion is that it is always related to the original input signal in a harmonic sense. So the distortion might appear as the third or the fifth harmonic from the original (or both) – just as an example. Harmonic distortion always occurs if a system shows nonlinear behaviour and this is not limited to digital systems, of course.
Under certain conditions, aliasing might appear. The aliasing effect is a pure digital effect and is always related to a frequency bandwidth limit. If a digital algorithm produces frequencies which are actually higher than the Nyquist frequency of the system, then aliasing artifacts are generated which usually folds back into the supported bandwidth and so gets visible in the spectrum. The usual method to avoid this is to increase the sample rate. Sometimes, aliasing is hard to spot in the visible spectrum and that is when it’s masked by harmonic distortion.
This often gets confused with aliasing but is a completely different thing and not only appears in the digital domain. E.g. a typical woofer in a loudspeaker system produces intermodulation distortions. This already implies, that oversampling of the system can’t be the cure in this case. IM distortion is always created if one part of the spectrum modulates another one. A typical example is a dynamics processor where the lower frequency content impacts the overall frequency response. Frequency band splitting is a usual technique to reduce such artifacts. In a digital system, IM distortion can imply aliasing as well but only if the Nyquist frequency gets exceeded by the new and additionally created frequencies.
In fact, there is quite a lot going on in a typical spectrum plot and one should expect to see a mixture of the afore-mentioned phenomenons and not just one single. Especially, you’ll never ever see aliasing alone in a spectrum plot.
What you see is what you get?
Unfortunately: No. There are indeed some problems and limitations with such a simple measurement method. Some are arising from psycho acoustics while others are lying in the solely static approach of the measuring method itself.
The human hearing system in general
“Historically distortion has been measured using specific signals sent through a system and quantified by the degree to which the signal is modified by the system. The human hearing system has not been taken into account in these metrics.”  And so Geddes and Lee already in 2003 proposed a new approach for a THD  measurement system which takes the hearing into account, since the hearing itself is non-linear and especially loudness perception is frequency dependent (Fletcher-Munson ).
Special masking effects
As with any ordinary frequency in a broadband audio signal, a harmonic can also be masked as well. As an example, a simple phase shift of the harmonic can lead up to a 6dB lower level impression of its distortion effect while the spectrum analysis shows the very same level, visually.
The lack of time/memory
The lack of memory in the discussed measurement method makes its inability to reflect what actually is perceived by human hearing even worse. E.g. if we have a DSP system which applies harmonic distortion based on audio transient information, the spectrum measurement is completely lost. For example, if harmonic distortion is only applied to the transient itself it might look much worser than it actually is perceived by human hearing. The other way around, if the distortion is not applied instantaneous, the analyzer does not see it at all but we can hear it of course.
Looking at a spectrum plot gives just a glimpse in the very same way that a simple map is just a snapshot and not the real deal. Or, does the nutrition fact list on your meal tells you how the meal is going to taste? (Not at all and even worse: That list can be faked with artificial nutrition supplies no human body actually can absorb – which actually is done by some manufacturers). I see the spectrum analysis rather as a developer tool where the developer has enough insights about the circuit and knows what to look for in the spectrum (e.g. for debugging). Due to its limitations, such method is not sufficient enough to use it in a blackbox situation as the one and only tool for judging audio quality. The actual hearing is not properly reflected and therefore the results might be completely misleading.
 L.W. Lee and E.R. Geddes, “Auditory Perception of Nonlinear Distortion,” presented at the
115th Convention of the Audio Engineering Society, (October, 2003)