advertisement


Bruno Putzeys on audio pricing

Only in certain categories. I've not heard of failure to identify differences in loudspeakers, or turntable systems, for example?

There is still value in blind tests for eg loudspeakers because you will focus on the sound, not the appearance, design prejudices you might have (metal tweeters sound grating, ports chuff etc).

Tim

As I said, blind listening delivers some results when testing gross differences. The problem comes when identifying subtle differences - which are overwhelmingly not differentiated in blind testing. That either means they don't exist, or the 'resolution' of the test process is insufficient: ie, the method impairs fine-grained acuity because it modifies the mental state of the listener. That's the challenging question to consider.

No-one's dispute the fact that setting aside prejudice is useful: it's about whether the baby is thrown out with the bathwater here.
 
There is nothing intrinsically wrong with blind testing, or even double-blind testing. Reliable DBT schemas are a cornerstone of scientific method. The key word there though is 'Reliable'...

Audio is the only branch of science, technology or engineering that I know of that uses a one-size-fits-all approach to DBT. Audio seems to have adopted the ABX test as the only game in town to such an extent that people in audio use ABX and DBT as interchangeable terms. This sets up a series of logic-chopped fallacious statements, to make it seem as if criticising the use of ABX ultimately criticises scientific method.

ABX is not DBT. ABX is a sub-set of DBT, used primarily in the audio world. It derives from the triangle test used in food science since the 1940s. Food science professionals have developed other forms of DBT for their branch of sensory science, because following Nunnally (1960) it was shown that small samples generally fail to reject the null hypothesis and after Ennis (1993), it became clear that the sample sizes used in food science were simply too small to have statistical power. Today in other branches of sensory science, the results of a triangle test (or similar) are considered underpowered if they have a sample size of less than 220 (ideally 318). In audio, we consider an ABX test with 20 subjects to be a major project, and yet according to the research, this sample size is going to have an uncorrectable weighting toward Type II error by its very nature.

Unfortunately, the Semmelweis reflex runs pretty deep in audio, and to dare to criticise ABX makes me a knuckle-dragging science hater among those who don't want the boat rocked. But the fact remains that if we use a test that has been shown repeatedly to be underpowered even with a sample size far larger than the ones we can muster, it should not be considered robust or reliable. This doesn't even work on the "even a stopped clock tells the right time twice a day" rule; it's more like insisting the stopped clock is always telling the right time, and the rest of the world is wrong.

Now... what the hell does all this have to do with the OP?

References:

Ennis, D. M. (1993). The Power of Sensory Discrimination Methods. Journal of Sensory Studies, vol 8, pp. 353-370.
Nunnally, J.C. (1960). The Place of Statistics in Psychology. Educational and Psychological Measurement, vol. 20, no. 4, pp. 641-650.
 
Hmm interesting results. I'm not sure I would agree with the conclusion except in it's grossest sense. The second most popular speaker certainly wasn't the second most accurate speaker, (speaker C clearly is). It's FR shows a classic boom and tizz response (at least at 0 deg which I'm assuming is how they were oriented relative to listening position). I would actually have ordered the speakers as A, C, D B. D before B because apart from being a bit ragged it is actually flatter than B from around 400hz up.

So actually the listeners rated arguably the least accurate speaker as their second choice. So the correlation of most accurate is preferred most, falls apart somewhat.

I don't know whether it falls apart completely given that the preferences are largely going to be subjective, but it is interesting to observe that in both the 'speaker comparison and the CD/MP3 the 'best' specification was the most popular choice.

Considering the CD/MP3 comparison, I'm surprised that it was only a 70:30 result and not something higher.
 
Unfortunately, the Semmelweis reflex runs pretty deep in audio, and to dare to criticise ABX makes me a knuckle-dragging science hater among those who don't want the boat rocked. But the fact remains that if we use a test that has been shown repeatedly to be underpowered even with a sample size far larger than the ones we can muster, it should not be considered robust or reliable.

There is a remarkable *lack* of published blind test results. Personally I would love to see more and better testing and results. It seems to me though that the industry doesn't much like the objectivity that should result and would rather extol the virtues of high-res or rhapsodise about inky blacks. That holds us all back.

Tim
 
Put another way, I'd like to see all those who disagree with controversial null results (eg the famous Meyer/Moran test) to *do a better test* that disproves the result, not just to dream up rationales for why it might not be reliable.

Tim
 
Double blind testing is o.k. but if I want to be really sure about a purchase I always insist on tripple blind testing. It's the only way to guarantee success.
 
There is a remarkable *lack* of published blind test results. Personally I would love to see more and better testing and results. It seems to me though that the industry doesn't much like the objectivity that should result and would rather extol the virtues of high-res or rhapsodise about inky blacks. That holds us all back.

Tim

The charlatans will never support this sort of testing, and even non-charlatans like item audio do not see a commercial value. I think this means it ain't going to happen. Most purchasers use subjective assessment. This leads to some manufacturers going for "wow!" audio and visual characteristics. The only defence against this in the retail market is purchasers insisting on extended home evaluations ... which the non-charlatan dealers usually offer.

Nic P
 
Put another way, I'd like to see all those who disagree with controversial null results (eg the famous Meyer/Moran test) to *do a better test* that disproves the result, not just to dream up rationales for why it might not be reliable.

Tim

Except that if you run a 3-AFC, a duo-trio or a tetrad test and come up with results that do not reflect the findings of an ABX test, it gets rejected on the grounds of not giving the same results as an ABX test.

Been there, done that.

Edit: A Tetrad test is particularly worthy of inclusion here, because it is designed to retain statistical force with smaller sample sizes. We'd still need to be looking at 60+ subjects, but that's more reachable than 200+ subjects.
 
Put another way, I'd like to see all those who disagree with controversial null results (eg the famous Meyer/Moran test) to *do a better test* that disproves the result, not just to dream up rationales for why it might not be reliable.

Tim

Not arguing for or against here, but the science/mathematics of statistical significance relating to study sample sizes is considered "settled". So that is a perfectly valid and objective reason to raise doubts as to the worth of studies with statistically insignificant sample sizes. There is nothing rose tinted or wishful about it, just pure hard nosed scientific principles being applied.

For something to be accepted as "fact" in scientific circles the result must be based on studies that are:


Objective - meaning unbiased
Repeatable - meaning others must be able to get the same results
Reliable - meaning has statistical reliability and accuracy
Peer reviewed - meaning the methodology, conclusions etc are all accepted


ALL of the above criteria must be met. They can't be picked and chosen to suit the expected outcome.
 
Except that if you run a 3-AFC, a duo-trio or a tetrad test and come up with results that do not reflect the findings of an ABX test, it gets rejected on the grounds of not giving the same results as an ABX test.

Been there, done that.

Got any interesting links to examples?

Thanks

Tim
 
Edit: A Tetrad test is particularly worthy of inclusion here, because it is designed to retain statistical force with smaller sample sizes. We'd still need to be looking at 60+ subjects, but that's more reachable than 200+ subjects.

Tetrad testing seems to be a very good way to go. It doesn't require the specification of a sensory attribute. Linky
 
Got any interesting links to examples?

Thanks

Tim

In the case of duo-trio, not any more. That all disappeared back when Dennis sold the mag group to Future.

As to tetrad testing, I want to research this more. It's not going well in terms of finding time and funding. No one has the money for research into research these days. I'm hoping that by shouting about it loud enough someone at one of the Universities will try running with the ball on this.
 
A Tetrad test is particularly worthy of inclusion here, because it is designed to retain statistical force with smaller sample sizes. We'd still need to be looking at 60+ subjects, but that's more reachable than 200+ subjects.
I think you're asking the wrong question, posing the wrong hypothesis.

One subject is enough, with a modest number of trials. We don't care whether the average human, or average hifi enthusiast can distinguish amps by ear, or whether there is general agreement on 'good', the question is far more basic. Is it possible for anyone to distinguish apparently competent amps/cables/CDPs by ear? Is the prose that's filled the Hifi press for 40 years anything more than fantasy?

Paul
 
I think you're asking the wrong question, posing the wrong hypothesis.

One subject is enough, with a modest number of trials. We don't care whether the average human, or average hifi enthusiast can distinguish amps by ear, or whether there is general agreement on 'good', the question is far more basic. Is it possible for anyone to distinguish apparently competent amps/cables/CDPs by ear? Is the prose that's filled the Hifi press for 40 years anything more than fantasy?

Paul

The testers were incompetent or biased, the experimental method was flawed, the cables had different LCR properties, etc., etc., why would anyone let themselves in for the abuse from the objectivists when a positive result is published?

Nic P
 
I think you're asking the wrong question, posing the wrong hypothesis.

One subject is enough, with a modest number of trials. We don't care whether the average human, or average hifi enthusiast can distinguish amps by ear, or whether there is general agreement on 'good', the question is far more basic. Is it possible for anyone to distinguish apparently competent amps/cables/CDPs by ear? Is the prose that's filled the Hifi press for 40 years anything more than fantasy?

Paul

Under blind, level-matched conditions, yes. Hi-Fi Choice has been running blind, level-matched AB group tests for years. I worked on dozens of them... possibly into the hundreds. The differences were marked enough to be able to spot a model played in the morning session reintroduced in the afternoon session as a check.

The usual response to this is "well, that's what you get if you don't make the test blind enough".
 
Is the prose that's filled the Hifi press for 40 years anything more than fantasy?

Paul

I have found magazine reviews very useful for shortlisting equipment I want to audition. I know what aspects of audio reproduction I value, so can often tell from a review when a piece of kit is likely to be worth auditioning. I have seldom disagreed vehemently with a review. You also find reviewers whose opinions you come to trust.

Nic P
 
The usual response to this is "well, that's what you get if you don't make the test blind enough".
I'd rather know why you found differences.

And why wasn't the Harbeth Challenge a piece of cake for somebody?

Paul
 
I'd rather know why you found differences.

And why wasn't the Harbeth Challenge a piece of cake for somebody?

Paul

The poll I ran a year or so ago asking if people heard significant differences between cables (interconnects and loudspeaker) had well over 200 votes and high 70s of percent had felt they heard significant differences. IMO the people who think there are no differences between competently designed electronics and cables are a very small, but vociferous, minority ... time for another poll?

Harbeth challenge ... do you really take such PR-based bollocks seriously!!!

Nic P
 
I'd rather know why you found differences.

And why wasn't the Harbeth Challenge a piece of cake for somebody?

Paul

Why we found differences... don't know, but we did find differences and those differences frequently did not equate to the 'audio status quo' (silver cables did not necessarily sound bright, Naim did not necessarily sound 'pacy'), although sometimes they did.

Expectation bias cuts both ways. Those who anticipate hearing a difference may hear a difference when there is none and those who anticipate no difference will be unable to hear differences even when they exist. But if those differences were heard when there were no differences to be had in reality, surely when products were resubmitted blind, the results would be inconsistent.

They weren't.

As to the Harbeth challenge, I have no idea why so few signed up for it. Alan Shaw got no people rising to his challenge; by modifying it to suit the objections raised by people (such as the comparator foot-switch and making the test in a public place) I got as many people as could be counted on the fingers on one hand... that had been in a nasty accident with a threshing machine.

I would have expected those with strong opinions on either side of the fence would have rushed to support their point. But no.

As I said at the time, I suspect it's because those who want to fight are more keen to fight their corner than attempt to resolve the matter.
 
As I said at the time, I suspect it's because those who want to fight are more keen to fight their corner than attempt to resolve the matter.

There is nothing to resolve, other than the confusion that is confined to but a small handful of obsessive audiophiles.
 


advertisement


Back
Top