It makes sense to have a sample rate that gives some headroom between the highest components of the music and the Nyqiust limit. If nothing else, it allows for simpler filtering, etc. You also have to make some allowance for the *averaged* spectra not showing up rare transient events. The averaged result tells you to total amount of information, but not how to faithfully preserve it.
But I've never really understood the focus on 24bit. I have wondered if it arose for two reasons.
1) That people think in terms of bytes, so simply went from 2 to 3 bytes per sample. Decision habituated by the habits of computing in recent decades which have largely forgotten systems that *didn't* base on multiples of 8 bits for words.
2) That people have no real understanding of how LPCM works when it comes to methods like noise shaping. So don't understand how you can get audible resolution and dynamic range (e.g. low noise floor) much higher than the bald value implied by 2^16.
Curious, really given the enthusiam of some for DSD, and a classic examplar for high rate *low* bit depth. Although personally I think it goes too far and then encounters problems of its own. But that's another story...
After that, marketing may come into it. People flocked to buy 'high power' vacuum cleaners when told they were about to be 'banned' (they weren't). Some makers has a great time selling cleaners simply because they were inefficiently designed and so used a lot of power to do no better than other cleaners that drew less power.
Maybe it's "bigger = better" marketing.
I can see the point of 96k/24 or higher when *recording* as it gives more 'space' for avoiding losses or mistakes. But once you *have* the recording I end up agreeing with Bob Stuart - that you don't really need more than 96k/16 - well produced.
The snag here is thus the usual one. That the result depends on the care and skill of those making recordings and 'mastering' what gets distributed.
When I get a chance I'll add a webpage that gives some simple examples of 'C'-type code to make clear how anyone can write a bit freezing program. And add links to some of the example programs. But I'd hope that by now the basic idea and method is pretty clear to anyone interested who fancies writing a program to do this.
BTW I can get to the AES papers, etc. But not some of the other references. So I may ask for help with finding some of those.