advertisement


Illusions - what do we really hear

It irks me when people think scientific theories are little more than conjecture -- e.g., I have a theory the butler did it.

That's not on the same footing as, say, the Theory of Special Relativity, unless the butler is your favourite brother or something.

Joe
 
Yes, it's already been stated here but is worth repeating - what we hear is a construct of our brain processing. Fundamentally, there is not enough data in the signals that are picked up by the two ears to fully construct the auditory scene that we construct - we need to use all sorts of pattern-matching, extrapolation, experience of the behaviour of sounds in the real world, sight, etc. to generate the fairly robust auditory scene that we continuously do.

There is a whole hot area of research into this called "auditory scene analysis" (ASA) which is trying to ascertain the rules/techniques that allows us to successfully do this in real time - it's quiet a processing feat.

One of the important points that comes from the research is that we are continually processing the auditory data & updating our best-guess auditory scene.

People who interpret psychoacoustics as being the illusional part of hearing & what makes it untrustworthy are completely missing this fundamental point - psychoacoustics is what allows us to make sense of the jumble of pressure waves impinging on our eardrums. It's what allows us to pick out the auditory objects, such as the bassoon in the orchestra & be able to follow it's musical line through a performance or be able to switch to listening to the string section.

As Adamea says, stereo reproduction is itself a trick - a trick that uses some learned knowledge about psychoacoustics to present an acceptable illusion of a real auditory scene. However, not knowing the full rules/techniques that our brains use in psychoacoustics somewhat hampers this goal of realistic audio reproduction. As a result, we can find that small discoveries are stumbled upon which audibly improve matters in a small way but we have no clear explanation yet for how they are working at the psychoacoustic level.

Without this knowledge of psychoacoustic rules, we are also stumbling around using unsophisticated measurements & I believe, incorrect concepts about the limits of audibility. A lot of the improvements that I hear reported in audio are about increased realism, increased clarity, etc. - in other words they are no longer about frequency/amplitude improvements - they are improvements in other factors which our psychoacoustic rules are picking up on & we are perceiving as more realistic. Or, maybe they are small changes in freq/amplitude that currently are dismissed as inaudible but further knowledge about psychoacoustic workings may well reveal them to be audible when part of the dynamics of music & not when tested in a lab with simple tones?

I don't want to get into an obj Vs subj debate but I feel that this is at the heart of this conflict - what is important in the dynamic waveform to our psychoacoustic processing is not known & therefore not easily measured. It is also very difficult to A/B these sorts of improvements as they are not freq/amplitude/timing differences which are easily A/Bed.

It always strikes me that we are involved in a hobby that touches on these very complex, current areas of research - but they are only using test tones & we are using music signals :). Realising this might bring a deeper understanding of why there are so many disagreements.
 
The video linked to in the OP known as the McGurk effect is just one example of how all our perceptions are multi-sensory & how our auditory perception uses any & all information to create it's best-guess auditory scene. Another effect - the ventriloquism effect, we encounter all the time - when in a cinema the sound seems to be coming from the direction of the images speaking on the screen. The same applies when looking at any amplified band playing on stage - we hear the sounds coming from the direction of the instruments rather than through the speakers. In fact, I would also suggest that as the guitarist leans towards the crowd as he plays, we probably perceive the guitar being played louder.

But it's not all one way traffic - what we hear affects what we see, too. Most of the examples I know of are lab experiments so not as interesting as the McGurk effect but still relevant to the fact that perception is multi-sensory or is multi-modal the correct term?.
 
JK's post is largely correct, except that with hearing there is an abundance of data, not a lack. The ears provide an awful lot, and the other senses add to it, often in a contradictory fashion.

The auditory system creates order in this chaos, extracting objects in a hierarchy of processing (from hair cells up to you yourself foot tapping and air guitaring), extracting objects of increasing abstraction (less 'sound' and more 'concept') and decreasing data rate (from 1000s of nerve firings per second to 'Ah, a middle C on a Steinway in a small room').

This processing runs in both ways. The upper levels, and even some of the more mechanistic lower levels, can be consciously steered by the auditor. That's why you can follow a bass line in the presence of many other things in the song. You ask the processor to deliver more of the bass thingies, and less of the rest.

All of this is extremely powerful and magnificently wonderful. But it also guarantees that one really cannot have the exact-same auditory experience twice: the perception is a function of the (objective) sound fields, plus the listener's mental bagage, plus the listener's conscious and unconscious steering of the process.
 
Yes, it's already been stated here but is worth repeating - what we hear is a construct of our brain processing. Fundamentally, there is not enough data in the signals that are picked up by the two ears to fully construct the auditory scene that we construct - we need to use all sorts of pattern-matching, extrapolation, experience of the behaviour of sounds in the real world, sight, etc. to generate the fairly robust auditory scene that we continuously do.

There is a whole hot area of research into this called "auditory scene analysis" (ASA) which is trying to ascertain the rules/techniques that allows us to successfully do this in real time - it's quiet a processing feat.

One of the important points that comes from the research is that we are continually processing the auditory data & updating our best-guess auditory scene.

People who interpret psychoacoustics as being the illusional part of hearing & what makes it untrustworthy are completely missing this fundamental point - psychoacoustics is what allows us to make sense of the jumble of pressure waves impinging on our eardrums. It's what allows us to pick out the auditory objects, such as the bassoon in the orchestra & be able to follow it's musical line through a performance or be able to switch to listening to the string section.

As Adamea says, stereo reproduction is itself a trick - a trick that uses some learned knowledge about psychoacoustics to present an acceptable illusion of a real auditory scene. However, not knowing the full rules/techniques that our brains use in psychoacoustics somewhat hampers this goal of realistic audio reproduction. As a result, we can find that small discoveries are stumbled upon which audibly improve matters in a small way but we have no clear explanation yet for how they are working at the psychoacoustic level.

Without this knowledge of psychoacoustic rules, we are also stumbling around using unsophisticated measurements & I believe, incorrect concepts about the limits of audibility. A lot of the improvements that I hear reported in audio are about increased realism, increased clarity, etc. - in other words they are no longer about frequency/amplitude improvements - they are improvements in other factors which our psychoacoustic rules are picking up on & we are perceiving as more realistic. Or, maybe they are small changes in freq/amplitude that currently are dismissed as inaudible but further knowledge about psychoacoustic workings may well reveal them to be audible when part of the dynamics of music & not when tested in a lab with simple tones?

I don't want to get into an obj Vs subj debate but I feel that this is at the heart of this conflict - what is important in the dynamic waveform to our psychoacoustic processing is not known & therefore not easily measured. It is also very difficult to A/B these sorts of improvements as they are not freq/amplitude/timing differences which are easily A/Bed.

It always strikes me that we are involved in a hobby that touches on these very complex, current areas of research - but they are only using test tones & we are using music signals :). Realising this might bring a deeper understanding of why there are so many disagreements.
I hear the sound of alarm bells
 
JK's post is largely correct
I think that's rather generous. His second half is a clumsy attempt to coopt the constructed nature of hearing in favour of the opposite position namely that every claim of audible change must correspond with a physical change in the signal.
, except that with hearing there is an abundance of data, not a lack. The ears provide an awful lot, and the other senses add to it, often in a contradictory fashion.

The auditory system creates order in this chaos, extracting objects in a hierarchy of processing (from hair cells up to you yourself foot tapping and air guitaring), extracting objects of increasing abstraction (less 'sound' and more 'concept') and decreasing data rate (from 1000s of nerve firings per second to 'Ah, a middle C on a Steinway in a small room').

This processing runs in both ways. The upper levels, and even some of the more mechanistic lower levels, can be consciously steered by the auditor. That's why you can follow a bass line in the presence of many other things in the song. You ask the processor to deliver more of the bass thingies, and less of the rest.

All of this is extremely powerful and magnificently wonderful. But it also guarantees that one really cannot have the exact-same auditory experience twice: the perception is a function of the (objective) sound fields, plus the listener's mental bagage, plus the listener's conscious and unconscious steering of the process.
Beautifully put.
 
If you can't measure a change you aren't hearing one, I think. The mic is better than your ears.
Seriously? OK, off the top of my head let's take one example. The stereo image is slightly off centre. How is your mic (singular) going to hear that? Or there is a subtle phase anomaly that your ears pick up, making voices in a choir sound "off". Your mic picks that up too?
In many cases I fear human ears are far more sensitive than any mics we ordinary mortals have, or know how to use properly. I have spent my whole life refining my hearing ability (as most of the rest of us have) despite losing HF sensitivity as I age.
Mics and measurement techniques are constantly playing catch up with what our poor old ears can do! I can even hear the change in sound quality when a single valve is changed. Eh?
 
Seriously? OK, off the top of my head let's take one example. The stereo image is slightly off centre. How is your mic (singular) going to hear that?
How could it? you are describing a stereophonic phenomenon.

But if the mic(s) can't hear something, how can it be in a recording?
 
JK's post is largely correct, except that with hearing there is an abundance of data, not a lack. The ears provide an awful lot, and the other senses add to it, often in a contradictory fashion.
I disagree to some extent - there is not always enough data to reach one unique solution i.e. unequivocally map the auditory scene - you know the mathematical term for this sort of problem - is intractable the correct term? Solving the problem requires further data from other sense or from our experience & knowledge of audio working in the real world. Sometimes this extra information can be contradictory but mostly it allows us to form the correct solution & map the auditory scene.

The auditory system creates order in this chaos, extracting objects in a hierarchy of processing (from hair cells up to you yourself foot tapping and air guitaring), extracting objects of increasing abstraction (less 'sound' and more 'concept') and decreasing data rate (from 1000s of nerve firings per second to 'Ah, a middle C on a Steinway in a small room').

This processing runs in both ways. The upper levels, and even some of the more mechanistic lower levels, can be consciously steered by the auditor. That's why you can follow a bass line in the presence of many other things in the song. You ask the processor to deliver more of the bass thingies, and less of the rest.
Mostly agree but again some small difference in my understanding. We perceive auditory objects, such as the bass line because auditory processing has used rules/techniques to categorise & group some of the jumble of pressure waves together as belonging to one auditory object. We can focus on this object & follow it's trajectory in real time as a result of this focus. But similarly we can switch, almost instantly, to shift our focus to another auditory object. I think the jury is still out on exactly how this is achieved, I believe - are all auditory objects represented in our processing & we are just shifting our focus from one to another or is this shift in focus causing us to shift our processing of the auditory stream.

All of this is extremely powerful and magnificently wonderful. But it also guarantees that one really cannot have the exact-same auditory experience twice: the perception is a function of the (objective) sound fields, plus the listener's mental bagage, plus the listener's conscious and unconscious steering of the process.
Yes, we definitely can hear the same thing in very different ways due to what we decide to focus on.
 
How does a subtle phase anomaly make voices in a choir sound off?
What are you talking about?

When two or more voices go in & out of phase with respect to one another, it can be perceived as a vibrato effect. If there is a phase anomaly in reproduction this won't happen or will be perceived in a different way
 
When two or more voices go in & out of phase with respect to one another, it can be perceived as a vibrato effect. If there is a phase anomaly in reproduction this won't happen or will be perceived in a different way

Bollocks.
 
Can someone point me at a concrete example of a subtle phase anomaly that makes choirs sound off?

Or can someone provide a choral recording that is susceptible to this effect?
 
JK's post is largely correct, except that with hearing there is an abundance of data, not a lack. The ears provide an awful lot, and the other senses add to it, often in a contradictory fashion.

The auditory system creates order in this chaos, extracting objects in a hierarchy of processing (from hair cells up to you yourself foot tapping and air guitaring), extracting objects of increasing abstraction (less 'sound' and more 'concept') and decreasing data rate (from 1000s of nerve firings per second to 'Ah, a middle C on a Steinway in a small room').

This processing runs in both ways.

I wonder, therefore, whether the abundance of aural stimulation, which isn't backed up by other senses, might create conditions in which the psychoacoustic processes 'crave' further data (for want of a better expression). So, while one part of my brain might say "ah, a middle C on a Steinway" another part is frantically saying "where the **** is that Steinway?". So any small changes to the sonic picture might have a surprisingly profound effect on the way that sonic picture is perceived.
 
The Flower Duet from Lakme by Delibes mentioned in that paper I linked to but performed by Anna Netrebko & Elina Garanca
I could watch & listen to this all day but I believe there are sections where you can clearly hear the two voices locking in vibrato?
https://youtu.be/Vf42IP__ipw
 


advertisement


Back
Top