Although the paper is not explicit on it, I think it may be flawed for a totally different reason:
Most of the tests were done using a pair of highly regarded,
smooth-measuring full-range loudspeakers in a rural
listening room
The picture being painted is that a significant number of the 60 test people did a significant portion of their listening in one particular room with one particular system.
Pertinent questions are: how acquainted were the listeners with this room and system, and how well rested were their ears after having traveled to this rural place?
Habituation is a strong force in perception, and for instance a clear and stable stereo can generally only be perceived when the listener knows the system and room very well, or when the latter are of a very very very high standard.
So, given that the differences between hi-res and well-done CD-res are never more than subtle, isn't there a chance that a sizeable portion of the tests were marred by the participants being not receptive to small differences for the above reason? This would seriously skew the outcome.