processing with High Dynamic Range (1)

Back in time when I was at university, my very first DSP lectures were actually not about audio but image processing. Due to my interest in photography I followed this amazing and ever evolving domain over time. Later on, High Dynamic Range (HDR) image processing emerged and beside its high impact on digital photography, I immediately started to ask myself how such techniques could be translated into the audio domain. And to be honest, for quite some time I haven’t got a clue.

MM

This image shows a typical problem digital photography still suffers from: The highlights are completely washed out and so the lowlights are turning into black abruptly w/o containing further nuances  – the dynamic range performance is pretty much poor and this is actually not what the human eye would perceive since it features both: a higher dynamic range per se but also a better adoption to different (and maybe difficult) lighting conditions.

On top, we have to expect severe dynamic range limitations in the output entities whether that’s a cheap digital print, a crappy TFT display or the limited JPG file format, just as an example. Analog film and prints does have such problems in principle also but not to that much extend since they typically offer more dynamic resolution and the saturation behavior is rather soft unlike the digital hard clipping. And this is where HDR image processing chimes in.

It typically distinguishes between single- and multi-image processing. Within multi-image processing, a series of Low Dynamic Range (LDR) images are taken in different exposures and combined into one single new image which contains an extended dynamic range (thanks to some clever processing). Afterwards, this version is rendered back into an LDR image by utilizing special  “tone mapping” operators which are performing a sort of dynamic range compression to obtain a better dynamic range impression but now in a LDR file.

Within single-image processing, there must be one single HDR image already available and then just tone mapping is applied. As an example, the picture below takes advantage of single-image processing from a RAW file which typically does have much higher bit-depth (12 or even 14 bit as of todays sensor tech) opposed to JPG (8 bit). As a result a lot of dynamic information can be preserved even if the output file still is just a JPG. As an added sugar, such a processed image also translates way better over a wide variety of different output devices, displays and viewing light conditions.

MM-HDR

About these ads

Comments

  1. The concept of HDR audio is quite new. As I was looking for some information about it, I started to get the impression that this techinque is essentially multiband compression but with a fancy name. Correct me if I’m wrong please.

  2. I remember that one way to do dynamic range photos with people is to use a tripod to first photograph the background in perfect exposure and then the actual subject in perfect exposure. This is even used for extreme macro shots, the only difference between the two photos is the focus.

    I’ve realized there was a correlation between HDR in images and audio, but I’ve never made a specific comparison. It’d be awesome to see/hear that. If anything, HDR has been in music long before photography, except the range was deliberately controlled by the composers and musicians, not the sound engineers, who didn’t exist (at least not like the ones today). Now it’s all about LOUD LOUD LOUD in some genres and the engineers are doing damage control for the most part. :D

  3. recursively burning bear says:

    the key here, although, is moderation. there are way too many overprocessed conte-jour sunset HDR landscapes out there! :)

    i wonder, if there will ever be built a camera capable of shooting 32-bit images in one exposure. the requirement to avoid moving objects spread over multiple exposures limits feasible types of scenes at the current state of HDRI…

    if I was to make assumptions about what the audio HDR processing would be like, I’d consider some kind of transient processor based on HDRI principles, i.e. where attack/release time would be roughly equivalent to radius of tonemapping in HDRI, dynamically adjusted as a function of “contrast” between samples

  4. One of the most interesting developments in the audio industry in ages. Always felt there was little difference between Instagram and Bootsie’s plugs. Let’s hear it for synesthesia! You’re the man Boots!

  5. Just be careful, H.

    If you come from photography (as I am), you must know that HDR is NOT always a good idea.
    Yes, it is “more” info, but it can lead to some very nasty (ugly) results. and in any case, one have to remember : this is a RE-DEFINITION of aesthetics. plain and simple.

    Our brains (through our ears and eyes) got used in the last 100 or so years to a certain way visual/auditory info is presented to us, and diverting from this way can lead to an unnatural (sometimes really nasty) results.

    You have to remember one more thing, that people fail to pay attention to : there is a finite end to how much we can perceive (visually and auditory). beyond that, it all becomes a mush and we fail to process audio\visual info the way it is intended. there are countless of researches on this topic.

    MORE DOESN’T NECESSARILY MEANS BETTER.

    Just remember : with power comes responsibility !

    I remember someone taking a clip with GH3 (video clip). allegedly “wonderful” sharpness and high dynamic range (it was captured with Atomos Ninja).

    The sad truth (to me at least) was that there was SO MUCH info to pay attention to, that I couldn’t focus on ANYTHING. everything “screamed” for attention, eventually leading to me noticing NOTHING.
    I ended up with a HUGE headache and didn’t understand was the clip was, AT ALL.

    I’m sure you’ll do a great job, as always…. it’s just that it is a slippery topic (I’m not sure modern human beings are “ready” to process HDR. even though it initially VERY impressive…)

    Just my 0.02

  6. HDR in audio could be a higher audible dynamic range.
    But what does this mean?
    Just as the eye can adapt levels on different image areas, the ear would have to adapt to different “areas” of the audio as well, while these areas can be defined in different ways: Frequency bands, panorama positions, phase, location … In fact, the human ear, whilst often taken much less into account than the human eye, might be capable of being much more powerful in these areas than the eye.
    I also find it very sad that so often, there’s only one way to mix down music: The way “like all the others do”, over-compressed mud, more or less.
    It’s OK in the car or on-the-go in noisy environment, but extremely annoying and fatiguing to listen to at home.
    Oh how much I enjoy my 70’s music here :-)

  7. Ariel said something very important:
    > …there was SO MUCH info to pay attention to, that I couldn’t focus on ANYTHING …

    I wonder how many of us would actually prefer the first photo for being more attractive due to its higher contrast and washed-out hair detail making her “even more blonde” ;)
    Do I really want an authentic, technically perfect image, or would I prefer one that comes closer to my dreams, focusing on few details?
    What is really missing today is something adaptive that will represent the content depending on the viewer or the listener, the environment and the scenario it’s presented in. At least with some brightness, environment sound/noise and mood sensors integrated. I claim to be the first applicant for a related non-commercial patent here :)

    • This reminds me of a discussion I’ve had with Bob Ohlsson, where he insisted that a new loudness standard can’t solve the problem. And indeed you need both, proper tools and standards on the production side but also a solution on the playback side which adopts to the enviroment or personal preferences.

      • I prefer the first photo. why ? because it FOCUSES one (pun intended) on the subject. the second photo compels you – subconsciously – to pay attention to a lot of things, not necessarily ones you would have payed attention to if you had a choice (namely : the background). even the shirt that is now with a lot more info (“50 shades of gray” :-] ) attracts our attention unknowingly, forcing us to process the visual data.

        HDR can be done. is if preferable ? not in my opinion. at least not in all situations.

        A movie has a subject and a story.
        A song has a singer and lyrics.

        When you pile so much info within your creative product, SOMETHING IS GOING TO BE SCARIFIED.

        You just have to choose what…

  8. Sven Bontinck says:

    As a professional photographer myself, and a hobby-musician, I have a lot of experience with the role dynamics play, both in pictures and music. Ansel Adams used his zone system for photography to make his images the way he wanted them to look. That involved very careful measuring of the light and the overall contrast within different areas of the picture that he was going to capture. After that he adapted the development of each film to change the behaviour (steepness) of the curves. The last step was to print the whole dynamic range onto a medium that could reproduce as much as possible. Therefore he used barite paper that can create a very deep black and has a beautiful bright white as paper base. The contrast such paper can give is exceptionaly high and very beautiful to look at.

    However, I agree with some other posters that even in modern pictures and with very high-end cameras, a HDR photo is not always that attractive compared to a normal range picture. I call it deliberately a normal range and not LDR because it is a range that mimics about the dynamics of our sight. Of coarse bootsie is right to mention the negative influence of cheap prints, bad monitors and so on, but there is another problem.

    The idea of using a high dynamic range for music as in photography may look tempting, but one may not oversee the natural protection of our sensors, the eyes and the ears. Both have a system that protect us from too intens signals. The pupils of our eyes are like diaphragms in a lens, they regulate the amount of light that may enter our eyes. If this doesn’t work well, like when an ophthalmologist puts some pupil dilating drops in them, one knowes what our pupils do to protect us from too harsh, bright light.

    Our eyes have a certain dynamic range that they can capture at one moment. Everybody knows how hard it is to look at someone with the sun behind that persons back. This is just a matter of too great dynamics. Our eyes protect themselves, our pupils contract to lower the amount of light entering our eyes, and at the same time the person at the foreground will become almost completely black. Everybody recognizes this phenomenon I guess? That is just the way we have learned to see things. Trying to fill in all the bits of information is like using a fill-in flash to lighten up the persons face and front. It just doesn’t look naturaly most of the time.

    Even with high-end HDR cameras with multiple shots taken, the much higher total dynamic range is pushed into a normal range and the final result is often a lifeless, gray-ish, almost boring variant of the original. Without artificialy enhancing the saturation and contrast, most of the time the original, non-HDR image seems more natural.

    The same can be seen in our ears. Tiny muscles will contract the eardrum at a certain threshold of loudness to give more resistance against louder sounds. This way the ear drum becomes stiffer, will not move that much and our inner ears remains protected. This all happens within fractions of a second, but it take some time to go from no contraction to full contration and back. During contraction the dynamic range that we can hear at one moment is shifted to a higher level. The lower levels that a person heard before, when the muscles were not contracted, become masked by the louder sensitivity level now. Therefore I think that in sound it is pretty useless to try to listen to an HDR signal, because our own protection system works time dependend and will shift our sensitivity levels constantly. I think it will be very hard to find a technique that can bring benefits.

    Whats more, if we would use the whole dynamic range of a digital system, even a 16 bit, it would exceed the dynamic range of our ears by far and our own sensitivity level would shift constantly because even that 16 bit range is simply too high for our human hearing. Nowdays it is possible to create music without touching the volume button with levels from whisper quiet to ear damaging levels, again with a 16 bit system and a good monitoring system. I wonder how you are going to implement HDR for sound? If it is possible, I am very interested to know how. Good luck with the challenge and thanks for all your hard work in the past.

    Sven

    • Thats an excellent post!!

      Whats more, if we would use the whole dynamic range of a digital system, even a 16 bit, it would exceed the dynamic range of our ears by far and our own sensitivity level would shift constantly because even that 16 bit range is simply too high for our human hearing. Nowdays it is possible to create music without touching the volume button with levels from whisper quiet to ear damaging levels, again with a 16 bit system and a good monitoring system.

      The thing is, all actual music productions don’t use a 16bit dynamic range anymore. A typical production with a RMS level of -9 or even -6dB just uses a very few bits actually to encode its dynamics. This is the dilemma we are in and so the challenge is (imho) how we can preserve more
      *perceived* dynamics in such an LDR output. Or, the other way around, how can we increase *perceived* loudness w/o sqashing the audio down to such poor DR.

      • Sven Bontinck says:

        I cannot agree more with your answer Bootsie. The dynamic range in some productions are even less than 6dB and are not worth listening to anymore.
        To me it’s all in the mix. When you, for example, carefully look at the way Bruce Swedien mixed the music for Michael Jacksons albums many years ago, it is clear that these mixes have a high perceived loudness, but when zooming out in a DAW, the waveforms seems not squashed at all, but instead they reveil a lot of crisp transients. Like Swedien likes to say, it’s all in the transients. Maybe that is something you can use for your investigations?

        Comparing with pictures, it comes to my mind that I sometimes use selections from parts of images to do an optimalisation of the local contrast (raising the contrast a lot of the time). It’s a bit cheating, I know, but it enhances the perceived overall contrast (dynamics) a lot. This way I can use a normal dynamic range when I take the photo, but with local enhancements of the contrast and sometimes the saturation, I can create the impression that the overall dynamics in the picture are much higher than they really were in the original. The only condition is that the histogram (a diagram that shows the actual use and position of the light value range that you capture, versus the maximum range the camera can handle), is filled to the max, without overshoots in the highlights, as well in the shadows.

        Maybe you can use this technique for sound too?

        I wish you all the best.

        Sven

        • Sven,
          > … I sometimes use selections from parts of images …
          Now, applying this on audio and deciding on how to automatically classify, find and treat such areas, that’s where it gets interesting!

      • Increasing perceived loudness without decreasing the D/R could easily be done by maximising only parts of the spectrum, leaving much space for other areas (or other instruments/sounds) in the spectrum outside the ear’s masked range (see adapted quantization in mp3 encoding, which is a good example application of what dynamic range is perceived by the ear depending on frequency and the masking window).
        How well this can be done in a musical context is another question. Sounds like the mixing process would have to be involved.

        • Sven Bontinck says:

          Hi Ralph
          Indeed it seems the most obvious way to involve the mixing process to accomplish this goal, however I think there is a way to do it on an already mixed track, too.
          Again comparing with photography, overall contrast impression can artificialy be enhanced by increasing the contrast at sharp edges, the so called unsharp mask process or filter.
          With sound it should be possible to do a similar trick.
          The peaks in music are most of the time the loudest levels present for a very short time span. The average RMS levels are the more or less sustained levels in between the peaks. At first it seems like raising the average levels by using compression or any other technique, will always lower the perceived dynamics because the difference between both is becoming less.
          However, if a plugin could lower the average level of the music, but some milliseconds prior to a transient or peak, I think that the short moment of ear relaxation right before the peak, will give the impression that those following peaks are louder than they were before.
          That way one can raise the perceived loudness because the average levels can be mixed a little bit hotter, without touching the peak levels.
          I don’t know yet what time span before the peaks is neaded, but a variable value should do the trick. A too short time will give no perceivable effect, whilst a too long time will introduce a slightly pumping effect. Anything in between can work in my opinion.

          If anyone is willing to test this, it can be done with traditional plugins. I don’t have the time atm to do it. Copy a mix and use a transient plugin to enhance the peaks in the copied track. Shift the copied track a few milliseconds before the original and insert a gate in the original track. Use a side chain input from the copied track. Let the gate close a few dB’s at levels that just touch the peaks. You can play with the time shift, as well with the threshold and gain reduction. I am interested to know the results.

          • Sven,
            I did a similar, even simpler experiment: Took a drum loop with enough ride cymbal in the background, and did nothing else than reducing the volume before each transient down to -7dB for 25ms.
            Your idea does indeed work! Although what you enhance is the transients, not the overall perceived loudness. Anyway that’s a great concept that may at least return some “dynamic transients” to overly compressed audio without decreasing the perceived overall loudness. Nice!

            • Sven Bontinck says:

              Hi Ralph
              Thank you very much for trying my idea. You are absolutely right and indeed it was my intension to enhance only the perception of the power of the transients with that technique, because when you do it that way, it will trick our mind thinking the music as a whole has more power, but at the same time the average RMS level does not need to stay that high as compared to very hot productions. Your test proved that it wasn’t necessary to raise the average levels to get this impression.
              At the same time I hope that this way the listeners ears will stay healthier for a longer time because listening for longer periods at hot levels will ruin our ears in the end. I know to many young people who already have hearing problems. If we only can give the impression that the music has more power by tricking our brains into perceiving louder transients, although the peaks remain at the same value and the average RMS does not need any changes, I think perhaps we are going in the right direction that Bootsie wanted to take?
              If this technique should prove to be of good use, implementing it in a plugin will inevitably introduce a little bit of latency because the “gate” or “ducking” needs to be able to work before the arriving transients. The latency will be reasonably short on the other hand as you tried it with 25ms and it seemed to work already, right Ralph? I had about the same maximum value in mind because any longer the time will be, will enter the time span of early reflections that we can hear in reverbs for example and i don’t think this will give good results because our brain can hear those very short time differences as separate events.

              • I would like to clearly separate between the perceived loudness of transients and the overall perceived average loudness.
                Sven, you are right in that the transients are subjectively enhanced and this will surely make the ear “focus on them” better, but it might only help in decreasing the RMS level by a very small, barely noticeable amount. But it could rather help clearing up the overly compressed audio a little bit and re-activate, so to say, the transients.
                Imagine the more or less noisy environment many, if not most people listen to music today: In the car, on the street, on the bus … the loudness war has been started for a reason.
                Even with highly “focussed”, sharp transients, reducing the RMS level will still be perceived as reduced average loudness.
                But maybe there is a small range in-between that can be used for reducing RMS level by a small amount, just so much that it’s barely noticeable, but gives transients a little more room to “breathe”.
                I definitely see the pre-transient-level-reducing-gate (let’s call it CupRetraga, Cutting Pre-Transient Gate ;o) as an appealing idea.
                And Bootsy might even add something completely diffferent on top, as we know him ;)

  9. uncajesse says:

    I have a good idea what at least one component is. I have the best of its analog (using it tomorrow for FoH gig), and can’t wait to see it done PROPERLY in digital form. A clue for inquisitive minds; A father and son in Florida. A man in California.

  10. PASCAL Philippe says:

    Just a post to point out there is an alternative to HDR, but generating directly an LDR image : “exposure fusion”. This is what is used in those modern cameras, taking several exposures automatically.
    They call it “HDR”, but it is not. Like “Fake HDR effect” is not HDR. No camera is powerful enough to generate a 64bits HDR, and tonemap an LDR after. It already takes several minutes on a dual core ;) Exposure fusion gives a very fast and more natural rendering by just “cutting” tones slice of each exposure to get the best of the sensor on each, unlike HDR witch “compress” tones.
    Try it, you will probably like it ;)

    • Sven Bontinck says:

      Hi Pascal
      This is exactly what bootsie was talking about when he wrote about multi-image processing. Taking several exposures automatically is called bracketing and was original used to be able to be able to choose afterwards what exposure was the best for a given light condition if the lighting circumstances are somewhat difficult. Nowdays it is used for creating HDR images more and more. The biggest problem is that it can only be used for static content.
      If you look at the captured dynamic range with exposure fusion, it really is an HDR range that is taken into account, albeit with some clever calculation as Bootsie stated already, but it is converted into 3 times 8 bit per color, meaning an 24 bit image.
      Every camera, even low quality consumer cameras internally uses more than 8 bit for each of the three colors. However, those consumer cameras use algorithms that will always render an 24 bit image for compatibility reasons and the ease of use for consumers, mostly the well known jpg image file with 8 bit per color.
      Some high-end professional cameras can capture 14 bit per color, 42 bit in total. This is not converted to 24 bit images if you use their RAW files. The dynamic range with such cameras is incredibly high. Otherwise, if you choose to save your images with those cameras as jpg’s, they are also rendered to an 24 bit image. And yes, the calculations are done into the camera’s own specialised processor in real time, sometimes those cameras can write 3×12 or 36 bit images with rates of almost ten frames per second. So they are very capable for tonemapping nowdays. If you recalculate an 36 bit image to a standard 24 bit file, this is tonemapping.
      In some way you can compare calculating RAW images into jpg’s, with your DAW’s internal 24 or 32 bit (floating point) data calculations when you render a song to 16 bit CD quality. They both first have a very high dynamic range. A better way to compare both systems for imaging and sound is when you look at the dynamic range a camera can capture versus the dynamic range a mic can capture and the bitdepth you set for recording. Those two values are very high with modern gear. So, the problem lies not in the possibilities of our image capturing or sound recording gear, but what we do with the data afterwards.

  11. Wow.. I can’t wait to check this plug out Bootsy. Very interesting project indeed. Right now NY type compression is helping me out in way to catch all the nuances in a small dynamic range.. I am so eager to hear how your new plug works. ‘wish you the very best..

  12. Sven Bontinck says:

    Hi Ralph
    I cannot reply directly to your last post for some reason, but I’ll do it in this new post.
    You are absolutely right when you say that it will only help to focus the ear on the transients. That was also my intention because when there are no clearly distinguishable peaks or transients anymore, because every bit is so heavy compressed or limited, music becomes loud, liffeless and dull. All the dynamics are gone by that time.
    At the same time I think that, like you wrote about, if people should listen to music in quiet environments, the dynamics will be perceived higher if the music also was mixed and mastered that way.

    I explain how, again comparing with imaging. The noise floor of music can be compared with the minimum amount of light present in a projection room. When people use a beamer for example, the darkest or black parts of an image on the white projection screen have essentialy the same brightness as the reflections of the amount of ambient light (and some overshoot from the beamer, too) that is still present in that projection room. If the room is completely dark, it doesn’t matter how “white” the screen is. However, if the windows are still letting light thru and there is a lot more ambient light because of that, the black zones in the projections will become grey-ish and we start losing dynamics. This phenomenon is the same as what happens when there is too much noise when we listen to music.

    There is a trick that easily can be used with imaging when there is too much ambient light present. Instead of a white screen, we can use a grey projection screen and a beamer with a brighter light. The white parts in the projections, stay white because the light loss caused by the grey screen is compensated by the brighter light source. At the same time, the black and dark parts gain detail because they will become darker again because the “noise floor” or ambient light influence is lowered by the grey screen. So, in imaging it is possible to create a higher dynamic perceptible range that way, without changing the original image content.

    If you translate this to music, the only two things to do are to bring the peaks or maximum levels to a higher level, so the low level sounds will come above the noise floor, or leave the maximum level intact and isolate the listener from environmental sound pollution. The dynamic range that you can hear in that case will increase by itself because of the low noise floor.

    Regarding the loudness war, it is not only because people listen to music in cars and so on, but also because the limited range we hear at one moment in time. If the dynamic range of music is better fitted into that limited range, we will like it more because it is easier for the brain to retrieve information from. Even our brains are at times lazy you know ;-)
    I still wonder what Bootsie will give us as solution for enhancing the dynamics in music. It is a very interesting developement indeed.

  13. Quite interesting when you overlay the 2 images in Photoshop with image 1 as the top layer and adjust it’s opacity. Around 34% keeps much of the detail whilst bringing the subject slightly more into focus. I guess that’s the equivalent of using NY Compression in this analogy ;)

  14. Reblogged this on How to Produce Electronic Music.

Trackbacks

  1. […] processing with High Dynamic Range (1) […]

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 1,152 other followers

%d bloggers like this: