From your description, i'm reading that the problem you believe is happening is that common mode rejection on the ethernet cabling is insufficient to avoid some degree of bleed across to the receiving equipment, and if this is the case, then moving to optical would resolve this problem (and as you point out, may introduce other problems from the circuitry which would need investigating).
It would be interesting to know whether it's possible to intentionally add common mode noise to an ethernet switch, by, say, fiddling with the power rails. If so, it would probably be a decent test bed if you can vary the added noise and see how the receiver handles this.
My experience with SFP+ transceivers is limited to SR installations (typically within the rack). You would probably want to look at the different optical standards as these might alter how the transceivers operate, so for example, LR or LR multimode might have different properties. Probably a lot of options to explore if you are so inclined.
As for jitter on packets being audible, that does stretch the imagination. You'd need the jitter to somehow affect in a negative way the data being written to the circular buffer employed within the streamer, such as to produce some sort of audible characteristic. Given packets are being read intermittently due to the much great bandwidth of the network compared to that required for the stream, it is I think unlikely this would bleed through in any way, even if you could come up with a mechanism (which I doubt).
In your article, i'm in the 'it can't make a difference' camp, but i'm totally fine with you exploring this and would welcome evidence to prove me wrong (not anecdotal) as this is how progress is made.