advertisement


Anyone tried blind testing DACs?

It could just make ‘baubles’ that aren’t foo.
Keith

Knowing audiophiles, you can sell them anything as long as you could spin a tale about it and then publish it in a bauble magazine or get some bauble reviewer on a forum to wax lyrical about how it’s just more visual and how it lifts a veil so you can see so many more colours and how it’s engineered from unobtainium and how this bauble is fashioned in 1 inch thick anodised aluminium and how it comes with a stack of 3 other baubles that work in system synergy but that you can’t appreciate the bauble straightaway, no Sir it needs a burn in lasting just enough time to surpass your return for a full refund and that you must ‘live with it’ to really appreciate it on your Xmas tree for a few weeks. The bauble has been around for over a hundred years but yet again this year someone who works in their garage has now found a new technology to better last year’s bauble and continues to do so every year that surpass the latest and greatest bauble. This breakthrough in bauble technology can be viewed at this year’s bauble show over a beard, pie and ale. Everything from stars, blackholes, cells, pharmacology, quantum computing engineering etc...thrives and aims for the highest hierarchy of hypothesis testing preferably at the gold standard of heavily replicated randomised double blind control trials, yet audiophiles think audio reproduction of Nora Jones or DSOM is beyond bias and cohort correction because there is something illusive, magical and incomprehensible. Finally, if you disagree that the bauble is not better and worth the 5 figure price, then you must be a nasty objectivist troller and should be packed off to the local psych ward under section 3 of the mental health act or worse still, have your post wiped by the moderator who knows best when it comes to policing free speech.
 
Hg,
You can make a scientific test that's good from the point of view of "double blinding".

That does not help if everything else about the test is crap. What is the ambient noise in the room? What recordings? Are the listeners comfortable and have they been given X hours to familiarise themselves with the system/room? Is each listener not being asked to participate in more than (say) four rounds of comparison per (say) day? Do we have enough participants to make that work? What is the state of the listeners' hearing? There are probably many more important aspects to consider!

Failure to dispove the null hypothesis is of no scientific value unless everything else about the test is spot on. It's not sufficient for it to be double blind.

This is what we have no chance of doing at home.

What we can do is listen double-blind at home and if we can't disprove null hypothesis blind, but we hear huge differences sighted, that helps! It suggests bias in our sighted perception. That's very useful knowledge, and that's why I encourage blind tests.

But it doesn't - it cannot - scientifically prove that no difference is audible because it's an amateur test.

Yes all the same problems plague sighted tests as well. This means certainty is beyond our grasp at home. I want to know why people think certainty must be possible in every situation.

High degree of certainty must be earned, which is why gold-standard scientific trials are a big deal.

Exactly. Spot on. There are soooo many points being missed here.

All this blind listening is all very ‘science experiments R us’ :rolleyes:

Just out of interest....if you do visit a dealer and choose your favourite product ‘blind’...I am assuming that you take it home and always listen ‘blind’ also? Because if you don’t.....well, that doesn’t take a big pile of figuring out....
 
I rather suspect most contributors on here are of an age where the scientific method was not formally taught as teenagers. But whatever. I've thought a little about this over time, and I'm aware of the possible biases in sighted listening. But the advocates of blind testing seem almost wilfully blind to the possibility of bias or other confounding factors in a blind test, such that they decline to control for any such bias by first establishing that the test methodology reliably (to the desired standard of statistical probability) distinguishes between two known differences, of a similar degree to those you want to use the test for.

I am aware that when listening blind, my 'mode' of listening is different to when I am relaxed and enjoying my music, and also different to when I am listening in a sighted audition. I find myself in 'analytical' mode, which is somewhat different to how I listen normally. I would design a blind test to see if this was a confounding factor, by first using the proposed test methodology to distinguish between two known, different, sound sources, perhaps CD vs a decent bitrate MP3 or equivalent. If the test showed I could identify the difference reliably, then the test is likely to be sufficiently sensitive. I simply can't understand why those who demand blind tests, or advocate them as a gold standard, don't do this. I can only surmise it is because they have doubts that the differences would show themselves.

In the second 2014 DAC bake-off / blind tests we did a control to ensure all participants could (blind) tell the difference between two close volumes (one dit on the pre-amp, which was 1db I seem to recall - a surprisingly subtle change) - all passed with ease. All failed to distinguish the DACs.
 
As far as I can see you have introduced an idea of blind test stress which may even be subliminal. But you have not provided any evidence that this phenomenon actually exists and influences the blind test results. So far it looks little more than an ad hoc argument to rescue the subjectivist position. In addition you seem to be shifting the burden of proof by asking others to disprove the ad hoc argument.

Heh...I suspect that you've never been subjected to actual scientific peer review before. Reviewers suggesting new/repeated experiments to account for possible confounding factors that the authors had not considered is par for the course. The onus is on the authors to satisfactorily cover all the bases to prove their point. The reviewers are never asked to conduct the experiments themselves.

Lack of solid peer review is reason enough to take objective measurements of hi-fi gear with a massive grain of salt. And forum comments don't count as rigorous peer review.
 
When you decide to listen and like a piece of music do you do so from a personal POV or one with a peer reviewed history?

I assume that was directed at me but I don't understand the context of the question given what I wrote. I clearly don't hold the hobbyist attempt at "science," with its utter lack of rigor or review, in very high regard so to me the average objectivist stance doesn't carry any more weight or reliability than the average subjectivist stance. In the end, I'd rather just sit back and enjoy the music, with all my cognitive biases, unbalanced hearing, fully acknowledged imperfections in my listening environment, etc.

Iff there were rigorous, peer reviewed scientific studies assessing these questions, then I would certainly pay attention to them at purchase time. But far more time would be spent simply listening to music rather than analysing.
 
Hg,
You can make a scientific test that's good from the point of view of "double blinding".

That does not help if everything else about the test is crap. What is the ambient noise in the room? What recordings? Are the listeners comfortable and have they been given X hours to familiarise themselves with the system/room? Is each listener not being asked to participate in more than (say) four rounds of comparison per (say) day? Do we have enough participants to make that work? What is the state of the listeners' hearing? There are probably many more important aspects to consider!

Failure to dispove the null hypothesis is of no scientific value unless everything else about the test is spot on. It's not sufficient for it to be double blind.

This is what we have no chance of doing at home.

Nonsense. Competent people are perfectly capable of performing satisfactory audibility experiments in their homes. It will take a bit of time, thought and often the help of others to measure what is wanted but nothing difficult. The conditions are part of the experiment and require documenting. At a later date perhaps one or two things may have been done differently but that doesn't invalidate the results of the experiment. Sound perception is not an exact quantity like sound pressure. Different conditions measure different sound perceptions not invalid sound perceptions.

Like the example of scientifically invalid hypotheses discussed above your null hypotheses is another excellent example of how to sell nonsense to scientifically illiterate audiophiles. If the objective of an audibility experiment is to test the hypothesis "people can identify a difference to a given level of confidence" then the results would be analysed to prove or disprove this. This is the common hypothesis tested. If the objective of the experiment is to test the hypothesis "people cannot hear a difference to a given level of confidence" (i.e. drawn from a random population) then the results would be analysed differently to prove or disprove this. This is not high level stuff but A level school statistics. The second analysis is a common one used for example to check if a dice is loaded.

If a person was to genuinely try to design an audibility experiment and followed the scientific method as taught to all school children things like what hypothesis to test, what to measure (i.e. what sound perception means), and more would come out. It is not difficult but it does require the person to genuinely want to perform a scientific experiment. This is likely to be rare among those audiophiles who have faith that expensive audiophile hardware is better and will be more motivated in building support for this belief rather than finding out what is actually going on in a cold neutral scientific manner.

What we can do is listen double-blind at home and if we can't disprove null hypothesis blind, but we hear huge differences sighted, that helps! It suggests bias in our sighted perception. That's very useful knowledge, and that's why I encourage blind tests.

Well if people get pleasure from audibility experiments then go for it. Looks too much like work to me but we all get different things from our hobbies. Unlike many audiophiles however I do have confidence in most of the already published audibility experiments which are aligned with my experience and understanding of sound perception so I have no motivation from this quarter. If a difference is so small it requires an audibility experiment to confirm it's existence or not then I am unlikely to be bothered and will opt for the cheapest or differences in some other parameter not directly associated with the sound.
 
The way to prove you can hear a difference, is to listen in a series of trials and keep score of the trials where you succeed (e.g. ABX).

The null hypothesis is the assumption no difference is audible in which case the score would tend to be around half the trials (listener just guessing).

You can calculate the chance of random guesses meeting or exceeding your score by fluke - will depend on the wins AND the number of trials (stats). A chance 5% or less is a significant result. (Edit : ) I think to convince a DAC sceptic we'd need much less.

It's up to the listener to prove audibility. But giving them a fair shake is absolutely key to this. One important problem is the rapid onset of fatigue in multiple trials, which is why proper tests use many listeners. You need to consider the music too. I recall a blind test of MP3 vs FLAC where I scored no better than random guesses (and it felt like it!) with two tracks. The last track was better and I scored 9/10, which was a significant result. For one listener to score that over 10 trials means the difference was gross (trials 1-5 felt very easy, around trial 7 I was struggling and with trial 9, the one I got wrong, and trial 10 I was at sea). And as the difference gets more subtle, the stress goes up, do the maths.

I've spent more time doing blind tests than arguing about them.
 
Last edited:
In the second 2014 DAC bake-off / blind tests we did a control to ensure all participants could (blind) tell the difference between two close volumes (one dit on the pre-amp, which was 1db I seem to recall - a surprisingly subtle change) - all passed with ease. All failed to distinguish the DACs.

Very good. Shows that the system works.

Tim
 
The way to prove you can hear a difference, is to listen in a series of trials and keep score of the trials where you succeed (e.g. ABX).
Except for all the experiment design criticisms noted previously.

As an alternative where participation is more in line with normal listening, you could divide a population into two groups who listen to one piece of music via one item of equipment and another piece via a second item, the two groups having different combinations of music and equipment.

Each group grades the two pieces on the quality of recording and the quality of the performance on a numerical basis, differences in the grading of each piece between the two groups can be assessed using a chi squared test.
 
Interesting idea. It feels like a huge number of graders would be needed to find a signal - naturally a small signal since these are the ones we doubt - in what would be a lot of noise (variations in equipment, conditions and simply the way people subjectively grade, concentration level etc.) Difficult in a different way.

I've not found an easy way to do it properly - I've researched, and I've tried doing things.
 
You would probably need to run the test with people coming to a common system and have a large testing population, but then operational convenience is not necessarily evidence of an experiment being appropriate.
 
The way to prove you can hear a difference, is to listen in a series of trials and keep score of the trials where you succeed (e.g. ABX).

The null hypothesis is the assumption no difference is audible in which case the score would tend to be around half the trials (listener just guessing).

You can calculate the chance of random guesses meeting or exceeding your score by fluke - will depend on the wins AND the number of trials (stats). A chance 5% or less is a significant result. (Edit : ) I think to convince a DAC sceptic we'd need much less.
Since you've made an effort with an edit I will restrain myself and repeat what I said above. Usually in audibility experiments one wants to test if something is audible and so this defines the hypothesis and the relevant statistical test. You seem to understand this and that failing to hear a difference does not mean no difference is heard. However, what you appear to be working hard to push away is the blindingly obvious of performing a check for what you actually want to know. This is a different statistical test to see to what degree of confidence the results are random but it is a common one used to check to what degree all sorts of things that should be random actually are.

[Deleted p*ss taking of audiophile nonsense which becomes harder and harder to resist as contributions in threads like this grow. It may be time to bow out of this one and wait for the next one, or the one after that, or...]
 
This is a different statistical test to see to what degree of confidence the results are random but it is a common one used to check to what degree all sorts of things that should be random actually are.
Hg, this is the bit I'm not following. Can you unpack this please.
 


advertisement


Back
Top