Noise Shaping

Jim Audiomisc · Feb 9, 2017

I've continued to think about ways to reduce the 'sea of noise bits' that tend to occupy many of the lowest bits per sample of most 'high rez' files and streams and bloat their sizes. It occured to me to wonder if the technique known as 'Noise Shaping' might help.

If anyone is interested, they might like to have a look at

http://www.audiomisc.co.uk/MQA/intoshape/NoiseShapingHighRez.html

where I investigate this for basic feasibility.

Julf · Feb 9, 2017

Jim Audiomisc said:
If anyone is interested, they might like to have a look at

http://www.audiomisc.co.uk/MQA/intoshape/NoiseShapingHighRez.html

Interesting discussion - and yes, explaining noise shaping without resorting to the proper mathematical tools is a challenge.

In any case, it really boils down to "there is no free lunch". You can trade sample rate for number of bits and vice versa (DSD is the perfect example), but the total amount of data stays the same if you want to represent the original waveform as precisely as possible. The question is where the optimal trade-off point is. Do we really need to represent more than 16 bits of amplitude, considering the limited dynamic range of any real-life recording?

Jim Audiomisc · Feb 9, 2017

Julf said:
Interesting discussion - and yes, explaining noise shaping without resorting to the proper mathematical tools is a challenge.

The question is where the optimal trade-off point is. Do we really need to represent more than 16 bits of amplitude, considering the limited dynamic range of any real-life recording?

FWIW I've now added a link to a demo copy of the program I wrote to generate the example results. This may help some to get the idea. But I'm a lousy programmer so apologies to good coders who have a weak stomach.

For the purposes of argument I assumed that 16bit should be the target for the output LPCM. I doubt every real recording gets anywhere needing that given how many recordings have three quarters of booger-all in terms of dyanmic range. But that's another story... The advantage of 16bit is that its a standard value. I have wondered about 384k 8bit Noise Shaped optimally. I wonder if that would make more sense than SACD/DSD as at least you *can* dither it properly without hitting its endstops. But it isn't exactly a common standard and might put people off. 8-]

Anyway, I was really just trying to examine the concept and invite people to have a think about it. More the better if it also helps people twig Noise Shaping.

I did look around for suitable filter values to get optimum shaping for audible noise reduction. But the examples I found all seem to assume you want the output at 44.1k or 48k when I want here to keep the rate as per the input 'high rez' and preserve that. I found the Lipshitz paper that lists some examples, but dunno how to tranform them up for the high rates.

Julf · Feb 9, 2017

Jim Audiomisc said:
FWIW I've now added a link to a demo copy of the program I wrote to generate the example results. This may help some to get the idea. But I'm a lousy programmer so apologies to good coders who have a weak stomach.

It is nowhere as bad as some of the code I have to deal with

The advantage of 16bit is that its a standard value. I have wondered about 384k 8bit Noise Shaped optimally. I wonder if that would make more sense than SACD/DSD as at least you *can* dither it properly without hitting its endstops. But it isn't exactly a common standard and might put people off.

I agree. Considering 192k/16 accomplishes the same result and takes up the same amount of data, but works with existing systems, I don't think it would catch on - even with the marketing budget of MQA.

martin clark · Feb 9, 2017

Interesting thread scope...

Jim Audiomisc said:
For the purposes of argument I assumed that 16bit should be the target for the output LPCM. I doubt every real recording gets anywhere needing that given how many recordings have three quarters of booger-all in terms of dyanmic range. But that's another story... The advantage of 16bit is that its a standard value.

Quite, the elephant in the 'HD' room - while there remains advantage capturing and editing at higher edit depth (purely to take the residue of editing dsp well below the noise floor) the reality is - studio mics and preamps struggle to achieve even 18bits of resolution at optimal 'level'; capsule size vs. hf extension vs. thermal noise vs a few other considerations etc. And if it was recorded to tape... seems a shame we cant do something constructive with 'the 'sea of noise bits' that tend to occupy many of the lowest bits per sample' (great phrase!)

For instance - some (considerable) while ago Werner posted a link here to a recording reduced to dithered & noise-shaped at just 4-bit LPCM output; at the time many couldn't believe the result. It is certainly educational.

adamdea · Feb 10, 2017

Jim Audiomisc said:
FWIW I've now added a link to a demo copy of the program I wrote to generate the example results. This may help some to get the idea. But I'm a lousy programmer so apologies to good coders who have a weak stomach.

For the purposes of argument I assumed that 16bit should be the target for the output LPCM. I doubt every real recording gets anywhere needing that given how many recordings have three quarters of booger-all in terms of dyanmic range. But that's another story... The advantage of 16bit is that its a standard value. I have wondered about 384k 8bit Noise Shaped optimally. I wonder if that would make more sense than SACD/DSD as at least you *can* dither it properly without hitting its endstops. But it isn't exactly a common standard and might put people off. 8-]

Anyway, I was really just trying to examine the concept and invite people to have a think about it. More the better if it also helps people twig Noise Shaping.

I did look around for suitable filter values to get optimum shaping for audible noise reduction. But the examples I found all seem to assume you want the output at 44.1k or 48k when I want here to keep the rate as per the input 'high rez' and preserve that. I found the Lipshitz paper that lists some examples, but dunno how to tranform them up for the high rates.

Thanks Jim , this is a really interesting article.
My feeling is that what stands in the way of people grasping this stuff is the fundamental point that once dithering is taken into account the bit depth merely defines the overall noise level.

Once that is grasped, it's fairly easy to have a rational conversation about what is required because its clear that 16 bits is enough for any (final format distribution) need and also that a "rectangular box" noise function is pretty wasteful even at 44.1kHz. But the digital stair step idea runs deep.

However- back to noise shaping- one point which has been made in the past is that noise shaping might be dangerous if one is to perfom dsp susbsequently eg for eq or room processing or digital speaker filtering. Do you have any views on that. Also what level of ultrasonic garbage is safe?

On the 8/384 point, whilst it might have some merit, I really struggle as to the need anything more than 16/96. If there really were any need for any more than 16/44, then I would expect that 16/96 would nail it. Those who insist on high rez seem to experience continuously improving returns as one gets beyond the possible corner case (24/96) through the hugely implausible (24/192), to the absurd (DXD) and the frankly get a grip on yourself man and get a life (DSD 512 and beyond).

This is the hallmark of foo: there are no limits to the imagination. Sadly this leaves the problem that no one is really interested in defining a reasonable spec for something more than 16/44 which covers the possibility that it might be too tight a spec but without wasting bits unnecessarily. Unless (possibly) that's what MQA is (wrapped up in a blanket of marketing nonsense)

DANOFDANGER · Feb 10, 2017

adamdea said:
This is the hallmark of foo: there are no limits to the imagination. Sadly this leaves the problem that no one is really interested in defining a reasonable spec for something more than 16/44 which covers the possibility that it might be too tight a spec but without wasting bits unnecessarily. Unless (possibly) that's what MQA is (wrapped up in a blanket of marketing nonsense)

Couldnt agree more.

Jim Audiomisc · Feb 10, 2017

adamdea said:
However- back to noise shaping- one point which has been made in the past is that noise shaping might be dangerous if one is to perfom dsp susbsequently eg for eq or room processing or digital speaker filtering. Do you have any views on that. Also what level of ultrasonic garbage is safe?

Someone would have to give me some more details wrt the idea that shaping is 'dangerous' for later DSP before I could really comment on that. But the reality is that most decent systems from the ADC onwards now will employ dither and noise shaping anyway.

Similarly, I'm not sure what the dividing line between 'safe' and 'unsafe' might be. But consider SACD/DSD which is essentially swamped in dither and noise shaping, and has to be because 'one bit' makes that unavoidable. If people think that is 'safe' then a few LSB-worth of shaped dither for 16bit would seem many orders of magnitude 'safer' to me.

People worried by this might also like to chew on the thought that having some shaped noise at HF may actually help *linearise* the behaviour of later stages in the chain like the DAC or even power amp.

As an opinion, I think that 96k/16 decently made and shaped should be fine for an audio delivery format. However it makes sense to use a higher rate and 24 bit for source recordings at the *start* of the process, though. Just as it makes sense to ensure the peaks don't get closer than a few dB to the max.

That to me seems good practice to simply to give more 'elbow room' for the recording process and any following reprocessing into the final result sold to the end-users.

IIRC Bob Stuart published at least one paper which said much the same many years ago. The problem is that none of this is new. It is that people in general don't seem aware of it.

FWIW I don't have any real worries if people prefer to use 192k/16 or higher rates. I suspect the reality is that - once you've dodged the wasted noise bits - there isn't much up there anyway. So once FLACed it wouldn't change the file size much. And an advantage of removing the excess noise is that you've taken out the 'added fat' and people can then see more clearly how much audio was in the package because the FLACed size is a better guide to the amount of content! 8-]

That might help people to stop judging by the size of the *box*.

As an aside, this might also help people to realise the implications when a HFN examination of a 'high rez' download shows it is actually a high rate 24 bit version of DSD with a *lot* of HF process noise.

davidsrsb · Feb 10, 2017

96/8 would probably work well, while keeping file size sane. The limit case of single bit suffered from clock jitter sensitivity, so 8 bit is likely a sensible compromise, suiting computer architectures and 96k is high enough to allow plenty of room to low pass the shaping noise out

Tony L · Feb 10, 2017

I must admit I've only skimmed Jim's article as the bits and bytes of digital audio are way beyond my current pay grade, but even so I've been around digital pro-audio pretty much from it's birth so have a few views/hunches.

I agree completely that bit-depth is not the issue for domestic audio and that 16bits used correctly is way more dynamic range than 98% of audio systems would have a hope in hell of handling, and even if they could chances are you'd not want to be in the room without ear-protection. It equates to about 96db, so given we can't really hear much at all below about 40db that is a heck of a lot of usable range. A look at the real DR stats for pop and rock CDs show just how little is often used. I really don't think below 16 bit is worth considering though, I have too many memories of crunchy 8 bit samplers etc, though I guess much of that was the low sample-rate, which I am convinced is where the issue lies.

I am absolutely convinced that the problems with standard red-book is not the bit depth or frequency range, but the impact of the filter that abruptly cuts of all the malformed digital noise above 22kHz. This being why CD players, DACs etc sound different, and this being why well-sorted 96kHz or above recordings just tend to sound 'more analogue'. As such and without having the math or anything to prove it my hunch is 16/96 is all anyone should ever need assuming the whole recording process is to that standard or it is a transcription of a analogue tape. It just gets that filter right up and out of the way. It likely won't do anything much to help old digital masters recorded to DAT etc, which is actually a heck of a lot of music from the mid-80s onward.

By saying all that bog standard red book 16/44 can sound superb when really done right!

Jim Audiomisc · Feb 10, 2017

I'd be wary of 96k/8bit for reasons akin to Tony's comments about the problems of reconstruction filters, etc, with 44.1k/48k.

To get 96k/8bit to work you'd need reasonably high order Noise shaping and a somewhat higher HF noise level than 96k/16bit. That would take things a lot close a lot of the safety-space between what is required and what might often be done.

Nice thing about low-order Shaping is that it isn't difficult to do reasonably well and still give a clear 'space' for the audio. Above third order can become difficult to do. And the harder something is to do well, the more scope there is for the music biz to louse it up! :-/

davidsrsb · Feb 10, 2017

Yes, you would need to add pre-emphasis and de-emphasis to stand a chance. One thing the MQA debate and analysis has shown is that you don't need to handle anything like 0dbFS above 20 kHz

darrenyeats · Feb 10, 2017

I've been playing around with filters since I started doing upsampling via Squeezebox/SoX. I've used only linear phase, non-imaging filters, the only tweaking I've done is to where the passband ends. SoX always gives you an optimally smooth transition band.

ISTM that relaxing the "flat to 20kHz" thing helps a lot. Passband to 19kHz is enough - in fact, I'd be very happy if my hearing approached 19kHz! Am yet to try 18kHz.

A reconstruction filter rolling off 19-22kHz sounds better to me at the top end - for some recordings anyway - and I don't feel I'm missing anything. In fact the opposite. I don't know whether this is:
1. Simply the amps/speakers being asked to produce less energy >19kHz.
2. A gradual filter roll off. Does linear phase mean this should not matter? Don't know enough. I assume brick wall filters are done because they are easy/cheap/low processing power though.

I'm thinking there's a third possibility:
3. Mitigation of problems caused by filters used in producing music for 44/48kHz distribution (whether in ADCs or in studio). Again, I assume brick wall digital filters are done because they are easy/cheap/low processing power. But filter shape aside, fair to say a lot digital filtering falls significantly short of current SOTA (which I believe is Saracon), and http://src.infinitewave.ca/ indicates the errors in most filters, whatever their general level, tend to build toward higher frequencies. So using a good up-sampling filter to undercut higher frequency problems caused by a poor down-sampling filter could improve quality on average! This would obviously depend on the recording - care taken, vintage of ADCs, resamplers etc.

The above would be in combination with non-linearity from amp/speakers, with inaudible frequencies producing distortion in the audible band.

SoX VHQ is not far behind Saracon in quality.

davidsrsb · Feb 11, 2017

Relaxing the filter 3dB point down to 19kHz simplifies the filter design.
Most CD players are made with >20kHz brick wall filters because the chip does it and specmanship.
Not feeding a typical dome tweeter with >19kHz energy that we cannot hear anyway avoids breakup nasties

darrenyeats · Feb 11, 2017

Re:OP I use TPDF in SoX - see https://en.m.wikipedia.org/wiki/Dither
"If the signal being dithered is to undergo further processing, then it should be processed with a triangular-type dither that has an amplitude of two quantisation steps"

Given the upsampling, oversampling, filtering, sigma-delta etc that goes on in modern DACs, I'd go TPDF except with a NOS DAC.

Jim Audiomisc · Feb 11, 2017

FWIW I used TPDF in my demo as I regard that as the basic standard.

And AIUI the reason 'brick wall' (i.e sinc-based) filtering was widely adopted from the start is that this should then simply 'pass though' whatever the effective impulse response of the system used to generate the *recording* may have. i.e. it was assumed its the job of the recording engineers, etc, to decide what they wanted you to get.

All that said, I used a first-get Marantz player for about a decade quite happily. Although I *did* add another analogue low-pass filter after it. This was an old 'Toko' passive filter design they made for Yamaha's 'pilot tone nulling' FM tuners. These gave a flat response up to about 18kHz and then rolled smoothly as they didn't need to kill the 19k pilot. Yamaha's auto-nulling did that. Worked quite nicely for the CD player.

Jim Audiomisc · Feb 11, 2017

I did experiment with using SoX for the requantization. However I found that it refused to let me employ any of the named shapings it lists. This was for 192k/24 -> 192k/16. My conclusion was that the shapings it offers are directed at producing 44.1k/48k output so only have coefficients for that. Did I miss something?

Werner · Feb 11, 2017

darrenyeats said:
A reconstruction filter rolling off 19-22kHz sounds better to me at the top end - for some recordings anyway - and I don't feel I'm missing anything. In fact the opposite. I don't know whether this is:
1. Simply the amps/speakers being asked to produce less energy >19kHz.
2. A gradual filter roll off. Does linear phase mean this should not matter?

I assume brick wall filters are done because they are easy/cheap/low processing power though.

The ideal reconstruction filter, sinc(x), is infinitely steep. It stands to reason that the industry attempts to approach this for digital replay. (But doing so, it ignores the ADC side of things, but that is a different story). So it is to be expected that most commercial CD filters, at least before the often entirely misunderstood apodising craze started, are quite steep and cut at 22kHz. Making them steep adds cost. Making them cut at exactly Fs/2 halves the number of coefficients, reducing cost again. Thus symmetrical linear phase half-band FIR. Since steepness is not infinite, imaging occurs.

This would be perfectly innocuous if the recording side was done properly, meaning with zero content at Fs/2. In such case the DAC side would not ring (since zero content at Fs/2), and would generate a minimum of images, since the ADC side already ensured that there is not much near Fs/2.

But that is not how industry works. Seeing those nice and economic linear phase half-band FIR filters in DACs, they decided to adopt the same filter style for in-ADC downsampling (and, broadly, also in downsampling software, which, until recently, was abysmally bad, see ProTools, Pyramix, Merging, ... at src.infinitewave.ca). So the ADCs also got steep half-band filters. These ensure a couple of bad things:
-aliasing happens, infecting the 20-22kHz part of the recording
-strong ADC filter ringing is inserted at 22kHz
-DAC-side ringing is triggered
-DAC-side imaging is triggered

In short: industry practice guarantees that the 20-24kHz range of CD is buggered during replay. Can't hear it directly, but metal domes will ring, and systems with IMD issues may show this up.

And yet, the recipe is simple, provided we drop that old fetish of needing flatness to beyond 20kHz.

Just somewhere in the chain, ideally at the ADC side, start rolling off at 18kHz, and reach zero, or at least a suitably low level (most music hasn't got that much of high treble anyway), at Fs/2. That gives the filter a 4kHz-wide transition band. 4kHz is also, give or take, the width of the highest critical band in the ears of healthy young people. This then means that the filter's ringing is of the same order as the innate temporal acuity of that highest critical band, i.e. good enough. So in one step you ensure that:
-DAC-side ringing is not triggered
-DAC-side imaging is not triggered
-any ringing is inaudible, even for those with fresh ears
-the downstream system is only fed with the music, nothing else.

they are easy/cheap/low processing power. But filter shape aside, fair to say a lot digital filtering falls significantly short of current SOTA (which I believe is Saracon), and

iZotope is the reference (*). Has been for a long time.

(* When configured properly. Last year it was very fashionable at CA to dream up iZotope settings and validate them by ear. A lot of garbage was generated, and people loved it. Of course, the tiniest settings differences were always day-and-night audible.)

Tony L · Feb 11, 2017

Jim Audiomisc said:
i.e. it was assumed its the job of the recording engineers, etc, to decide what they wanted you to get.

Giggles. IME any such recording system is bought at great cost, unpacked, hooked up, the "engineer" spends best part of a day swearing at some box called a 'word clock' and tries to figure out why Cubase is exactly one hour out of SMPTE sync with the f***ing ADAT and when the right combination if swear words are found to rectify the situation it is never spoken of again. I'd be utterly amazed if you could find many/any studio sound engineers who understand this stuff down to the bits, bytes, noise shaping, filters etc level. Basically if most of what you are monitoring sticks to the tape you are done!

PS Obviously you do all this in Logic Pro X or whatever on a MacBook these days!

davidsrsb · Feb 11, 2017

Werner said:
Just somewhere in the chain, ideally at the ADC side, start rolling off at 18kHz, and reach zero, or at least a suitably low level (most music hasn't got that much of high treble anyway), at Fs/2. That gives the filter a 4kHz-wide transition band. 4kHz is also, give or take, the width of the highest critical band in the ears of healthy young people. This then means that the filter's ringing is of the same order as the innate temporal acuity of that highest critical band, i.e. good enough

So given that existing recordings were not filtered properly, a post processing stage of a 18kHz 3dB point, 4th order or more low pass should reduce these artifacts?

Noise Shaping

pfm Member

Facts are our friends

pfm Member

Facts are our friends

pinko bodger

You are not a sound quality evaluation device

pfm Member

pfm Member

pfm Member

Administrator

pfm Member

pfm Member

pfm Member

pfm Member

pfm Member

pfm Member

pfm Member

pfm Member

Administrator

pfm Member