I disagree.
Room related frequency response aberrations are significant, but here's the thing. I can continue to recognise different voices as they move from room to room. This isn't just a contiguous thing (that my brain adapts to their voice in room as I hear it move from room to room) - I can hear someone's voice in another room and recognise it with no prior experience of that person's voice in that room. In extreme cases I may not be able to hear what they are saying articulately (if, say, the reverberation in the room was extremely long and powerful), but I'd still be able to recognise the individual voices.
If the changes in room dynamics were so impassable, this should be impossible. Their voices will be shaped and influenced fairly heavily by the room in which they are speaking and its treatment, but their fundamental 'character' remans extant. The same applies to musical instruments (both live and recorded), animal sounds, kettles boiling, washing machines spin drying, the sound of typing, etc, etc.
Why would loudspeakers be an exception to this?
I disagree with you disagreeing, although parts of what you say I do agree with, and support your right to disagree at all times.
Voices, like faces, are a peculiar example of acute human acuity: we see faces where there aren't faces. Despite the inherent similarity of all faces, every one strikes us as unique. The same would not apply quite as readily to differentiating pianos in different acoustics - although it's testimony to the brain's pattern-resolving ability that you can to some extent. Consider how much information about a recording can be derived from hearing it over the phone.
The bottom line is that everything's pretty good and we're counting angels dancing on pinheads.