advertisement


YouTube audio quality

Jim Audiomisc

pfm Member
I've become curious about the audio quality of YT output following someone saying their 'opus' audio-only version is 'best'. Doing this having been given an example that has been uploaded so I know what "went in". I'm curious to know what versions people choose when getting a YT item for its audio, and how they select this, etc.

Only just started, but this
http://jcgl.orpheusweb.co.uk/temp/YT-4Spectra.jpeg
shows a fairly basic comparison that does, indeed, show some differences. None of the versions I tried match the spectrum of what was input. The 'opus' looks the same at a glance, but closer inspection shows it isn't the same at HF.

Having had a recommendation for the opus, do people have other 'versions' they think would be the best? In terms of audio.
 
I've become curious about the audio quality of YT output following someone saying their 'opus' audio-only version is 'best'. Doing this having been given an example that has been uploaded so I know what "went in". I'm curious to know what versions people choose when getting a YT item for its audio, and how they select this, etc.

Only just started, but this
http://jcgl.orpheusweb.co.uk/temp/YT-4Spectra.jpeg
shows a fairly basic comparison that does, indeed, show some differences. None of the versions I tried match the spectrum of what was input. The 'opus' looks the same at a glance, but closer inspection shows it isn't the same at HF.

Having had a recommendation for the opus, do people have other 'versions' they think would be the best? In terms of audio.
Are you comparing an AAC upload with an opus conversion of the AAC?
I was under the impression that perceptual codecs were supposed to be transcoded from PCM not transcoded between themselves and that they could produce funny results if you did.
 
I intend to check and compare a number of different YT 'output format versions'. Chose opus as a start because of a recent comment. However part of the interest is that uploads may well *not* be opus. Indeed, in this case the upload was 44k1 sample rate, when the 'opus' transcode YT offer is 48k!

The averaged spectra are just a "quick peek". Aiming to look in more detail than that to see if I can create meaningfull in-out 'diff' patterns as a residual exposing changes. Annoying when the sample rate is changed, though, as it means I'll have to use a trick like upsample everything to a high rate (common) rate and then align those for a diff. Make life interesting, though. :)
 
Making some progress with this. The snag with opus seems to be that it simply "doesn't do" 44k1 sample rate. So if someone uploads a 44k1 file it gets rate-changed in addition to being lossy encoded. That said, the changes made by YTs mp4 encoding are bigger than I'd expected, even keeping the sample rate the same.
 
I was under the impression that perceptual codecs were supposed to be transcoded from PCM not transcoded between themselves and that they could produce funny results if you did.

That's part of the basic problem here. In general, people who get YT videos don't know what was submitted as the *source* for YT to use. In addition even when the same codec is used, may be that the source was a higher bitrate. So far as I know, YT don't offer a "what we got" choice. (?)

So what I can do is limited to what I've been able to get as a reliable source for comparison. However the advantage I have is that the material is also on commercial CDs and the person who produced them sent me his YT submissions for my 'source'. I now have a 48k submission as well as one at 44k1.

As with MQA I think this will take more than one or two examples. But what I have is enough to keep me occupied for now. Gives some preliminary results, and lets me write some analysis software. e.g. Needed to write a cross-correlator as what YT outputs is *not* time aligned to what was submitted. So I need to determine the offset to sample-align comparisons.

One delay has been that the machine I'm using refused to start up last night! Had to take it apart and shuggle some connectors. Now working again. Fingers crossed. (but not well cross correlated. 8-] )
 
Making some progress with this. The snag with opus seems to be that it simply "doesn't do" 44k1 sample rate. So if someone uploads a 44k1 file it gets rate-changed in addition to being lossy encoded. That said, the changes made by YTs mp4 encoding are bigger than I'd expected, even keeping the sample rate the same.
OPUS is weird about how it deals with sample rates. I suggest reading the documentation to understand it better. https://opus-codec.org/

You can also pick up opusenc while you are there.
 
I get an error from the above URL which tells me it can't be fetched.

However, I've just put this

http://jcgl.orpheusweb.co.uk/temp/YT-NOT-FINAL.pdf

up as a taster and to indicate the sort of approach I'm taking to doing comparisons and trying to find out what a pass though YT does. As per the label is it FAR from final and contains all sorts of typos, missed explanations, etc. And only shows results for one example thus far. But it should show the ways I'm trying to identify changes that occur as the audio goes though YT.
 
That said, the changes made by YTs mp4 encoding are bigger than I'd expected, even keeping the sample rate the same.
Opus applies lowpass filter at 20 kHz, from RFC 6716:
(*) Although the sampling theorem allows a bandwidth as large as half
the sampling rate, Opus never codes audio above 20 kHz, as that is
the generally accepted upper limit of human hearing.

And from your pdf:
However at higher frequencies the level of the error becomes a
larger fraction of the input. And above about 16kHz the error level is actually bigger than the
input signal power!
that's probably shaped dither.
 
Opus applies lowpass filter at 20 kHz, from RFC 6716:

And from your pdf:

that's probably shaped dither.

Thanks. Sounds plausable/reasonable. Need to look at the specs, etc.

I'll deal with the output using aac next. Then hope to do some other comparisons, etc.

Curious, though, to kill the signal >20k and not then keep the dither in the HF > 20k which that implicitly assumes isn't audible. But maybe the codec has reasons for this.
 
Curious, though, to kill the signal >20k and not then keep the dither in the HF > 20k which that implicitly assumes isn't audible.
Sorry, not sure what you mean. The lowpass filter is applied before encoding and the dither is applied after decoding. Actually, if the dither is applied and what type, depends on the decoder that is used.

For example, I used SoX to generate white noise and 1 kHz tone. To be somewhat similar to your case, I generated them at 44.1 kHz sampling rate and then converted to 48 kHz. Then I used ffmpeg to encode them to opus. And finally I decoded them back to wav in a few different ways:
  • opusdec
  • ffmpeg without additional options (so AFAICT without dither)
  • ffmpeg with triangular dither
  • ffmpeg with output to float
And here are the results:
(Sorry, seems like the forum doesn't let me post URLs. Look there:
i.postimg.cc/cH2F4sG5/sin1k.png
i.postimg.cc/ZnNVW1TY/whitenoise.png
)
 
And here are the results:
(Sorry, seems like the forum doesn't let me post URLs. Look there:
i.postimg.cc/cH2F4sG5/sin1k.png
i.postimg.cc/ZnNVW1TY/whitenoise.png
)
Here's the images
sin1k.png

whitenoise.png
 
Sorry, not sure what you mean. The lowpass filter is applied before encoding and the dither is applied after decoding. Actually, if the dither is applied and what type, depends on the decoder that is used.
...
(Sorry, seems like the forum doesn't let me post URLs. Look there:

)

I've not yet been able to get the opus org url to work here. So to check - You mean that opus applies the filter when encoding into opus. And applies dither when decoding opus into LPCM?

FWIW I haven't had a problem here with adding URLs.
 
Here's the images
sin1k.png

whitenoise.png

Interesting to see how much hf gets added when a simple sinewave goes though an encode-decode cycle. The comparison is interesting because it shows that opusdec does more like I'd have expected and uses noise shaping to poke the dither noise up into the HF. It still seems too low an order of shaping to push the 'flat' bit at HF fully into > 20kHz, but that probably doesn't matter much, audibly.

FWIW I used Audacity to get LPCM from the YT videos. I'll try using ffmpeg, etc, to extract the audio, etc. But I'll probably do that later on as I doubt most people playing YT videos are explicitly using opus or ffmpeg, but whatever is embedded in their player. Which means checking what VLC does, I guess, as a a representitive example. For now, though, I'll focus in the aac 44k1 example for comparison, using Audacity again to get LPCM.

And I'll probably know more when I can get the opus org pages to work!
 
So to check - You mean that opus applies the filter when encoding into opus. And applies dither when decoding opus into LPCM?
Yes. I think it's typical for any lossy codec to do that. At the beginning of encoding filter out what is not audible or barely audible, to not waste bits on preserving that. At the end of decoding, which is usually done in float type, apply dither when converting to 16 bit int type.

FWIW I haven't had a problem here with adding URLs.
I've just created account, so it probably doesn't trust me yet :)

I'll also throw it out there, just in case, that preserving audible transparency and preserving waveform shape is not necessarily the same thing. Lossy codecs focus on the former, so analytical comparison of waveforms may be of limited usefulness.
 
I'll also throw it out there, just in case, that preserving audible transparency and preserving waveform shape is not necessarily the same thing. Lossy codecs focus on the former, so analytical comparison of waveforms may be of limited usefulness.

Agreed. However my interest is in exposing the extent to which the resulting output which people get varies from one YT-mode choice to another. Also perhaps with the choice of 'renderer' program, *and* with the choice made by the person who decided what to upload. Part of the problem here is not knowing what was uploaded in many cases.

Once people can see the variability they can consider that perhaps no single YT output mode is always the best - audibly or by analysis - and why that may be the case. The ideal would be if YT offerred a "what we got" option. As it is, people may be choosing on the basis of some initial experience that doesn't apply in other cases - which they then miss out on as a result if they want the really 'best' audible version in every case.

Maybe someone should organise some listening tests by uploading examples of various formats, then cross comparing by ear. However that's not for me as I can't claim to have particularly good hearing.
 
YT offers Stunning 8K UHD video , assume they could offer Quad DSD , 32/786 if they wanted

So what audio modes does that 8k UHD offer? I just list the modes avalable and for the test files and others I've got that list is in the pro tem document I uploaded. When you choose a video, it gets audio from that list so far as I can tell. Is the UHD different? if so, can you give a URL / PID for one example and I'll see what it lists on offer for audio.
 
OK, I've now put up this page covering the issue http://www.audiomisc.co.uk/YouTube/SpotTheDifference.html

It is just a brief lifiting of the lid as more examples, etc, would be needed. However other 'audio promotion' YT videos may do when I have a CD of what they offer. And if I can find my data on the flac Proms from 2017 I can do more 'reference' as well at some point. In the meantime people might themselves like to compare the RVW (and other) videos against Cds - by mk 1 ears as well as analysis. :)
 


advertisement


Back
Top