Well the problem is knowing or agreeing where the limit of audibility is. If two amplifiers, correctly tested, test identically then I think we can be sure that differences people hear are imagined.
BUT, if there are measured differences , it is very difficult to assert confidently that they are such that they lie below the limit of audibility for all (or for the average, or for the trained) listener without separately testing what that limit of audibility is. Of course we do have evidence for some limits - we can't hear above a certain frequency (which limit falls with age), we can't hear frequency related phase, we can't reliably level match by ear below about 1.5dB etc. But in general, if a test shows a difference, say frequency response in the audible region, however small, how can we be sure that it will be inaudible? The assertion that all competent amplifiers into good speakers in a real listening environment are audibly identical can only be shown to be true by an extensive series of DBTs which ain't going to happen (and wouldn't be accepted by the extreme subjectivistas anyway) and so the argument goes round.
My position is that amplifiers that measure the same, sound the same, but that we don't really know at what level measured differences become audible.