Interesting stuff, I've never seen measurements of a headphone taken on multiple listeners' heads before. I think this could have some mileage in it, but only if you have volunteers from both ends of the bell curve w.r.t. to head shape...
When looking over response graphs you do need to make a few adjustments to your expectations and understand the possible variations within the test being undertaken.
Usually when measuring head phone responses you need to take and average of multiple readings, you would measure once, remove the phones from the rig, re-seat the phones and repeat the process several times. Some sites will publish the "raw data" to show how much a variance you get between readings but most only show the averaged results as a single graph. Head phones are very sensitive positioning and placement on any head, Try it yourself one day, just move the phones forward, then to the pack of the ear or adjust the head band so they sit higher or lower over the Pinna on the side of your head.
Taking measurement directly from a real persons head is very tricky. It involves putting a small calibrated microphone in a defined position within the persons ear canal, moving the microphone even a fraction of a millimeter will greatly impact the reading. Achieving a consistently repeatable result would be very difficult. The only proper way to do this is would be to take a cast of different ears then fit the cast to the test rig with the microphone sitting in a set position for each test run. You still need to do the multiple readings to average out any variance.
Just for fun there are also different standards for measuring headphone responses with each giving a slightly different reading. Some popular systems include the HMS from Head Acoustics, the GRAS series of rigs and the HATS (Head And Torso Simulator) range of rigs from Bruel & Kjaer there also different IEC standards that govern how the test should be performed in these rigs as well as the geometry, reflectivity and hardness of the standardized ear sample used.
To make life even more interesting, many low budget reviewers are also using a test rig called MiniDSP HEARS (Headphone & Earphone Audio Response System). While this rig is cheap it also not very well controlled, I am unaware of any international standards or Norms that govern its calibration and implementation so as to achieve consistent results between different laboratories.
Rtings.com have a good video where Sam (there headphone techie) discuss the intricacies and vagaries of headphone measurement and rig calibration.
It's interesting to note that he doesn't actually mention which IEC standard he tests to. Based on the content of the video they appear winging it to get results that look good across different use cases. It's not ideal but at least they have disclosed their methodology and are consistent in their execution and so the results should be repeatable.
I suppose the take home lesson from all this is that response curves should be taken as indicative rather than absolute. If all the graphs come from one LAB (or reviewer) then you could expect that any graph from that source should be directly comparable. But a test done on the B&K HATS rig will be different from the test done on GRAS rig, likewise a test done to different standards may show different results even though they may be done on the same rig.
LPSpinner