Music Quality Are Engineers Modeling the Human Ear Wrong

Copyright claimed by Ron Plachno, author, as of date written October 16-19, 2013
Note that this article may become published in a new book and copyrighted in the Library of Congress

Are Engineers Modeling the Human Ear Wrong?

by Ronald J. Plachno
October 16-19, 2013

As I do home recordings and also in my mind have the questions about which compression method for music is best, mp3 versus ma4 and other types, a bigger issue occurs to me by listening tests. And this is my question:

THE QUESTION
Digital audio seems to reduce music quality, since it seems to combine multiple audio sources that the human ear hears independently into a single source which is simply less impressive to our more sophisticated ears.

THE EYE ANALOGY
In my opinion, in a similar manner, science and engineers modeled the eye wrong. While the discovery of movies and TV which are optical illusions were great, they underestimated the eye. The first movies had 24 still frames a second. Since the human eye retains image we saw that as continuous motion. TV due to interlacing had 30 frames per second. For a long time it was believed that was all the eye can handle. But when progressive scan came out with 60 frames a seconds it looked so great to some humans that some sales people called it "High Definition" - which should really refer to the number of lines down a screen, 1080, and not how often the screen changed (got refreshed). And now I understand some refresh at 120 frames per second - or more. How good is this eye of ours?

MY EAR OPINION

Even though we have only two ears it seems to me that each ear can hear many sounds for some reason quite independently. If those sounds have not yet been combined into a digital single sound, those independent sounds seem richer. Why do I say that? Some items below:

LIVE MUSIC - whether a band, an orchestra or a combination always seems far richer to me in audio quality and complexity than a two speaker stereo system.

5:1 / 7:1 SURROUND SOUND - If we have only two ears then why is it that 5:1 surround sound sounds far more amazing to us than the two point source sound of stereo? We seem to be able to hear those 6 sounds independently.. somehow. And we view it as an improvement over two sound sources. Of course for 7:1 it is 8 sounds.

NORMAL ROOM SOUNDS: I do believe at times I hear multiple sounds in a room and I can tune into one of them and almost ignore others. This means our ear is not simply combining those sounds but we hear those sounds independently somehow - as if each ear perhaps has 50 microphones each and not one. For example, we might hear a football game, our spouse, a refrigerator making noise, people talking in the background, a child's toy, a bird singing outside, and somehow I seem to be able to tune my ear radio to listen to the sound source I wish. I have never been a dog, but I think a dog does the same thing. They might be hearing 50 sounds, but one of those is of interest and they perk up their ears and tune their ear radios for that one sound.

NORMAL OUTDOOR SOUNDS

When I was young, our family lived across the street from railroad tracks which at the time even included steam locomotives. Later in the Chicago area, we had a house in the flight pattern of O Hare field, which at the time was the busiest airport in the US. For a time we lived in England in the path of the supersonic Concorde. At times people would ask us if the deafening sounds bothered us. We would ask "What sounds?" And again it appeared to me that our ears hear sounds separately. If something is quite normal as a background sound, our brains might even ignore it. Now if we just heard one sound source per ear and the Concorde mixed in with the sounds around us, then it seems clear to me that I could not ignore it. It must be an independent sound processed independently in my ear and brain combination for me to ignore it.

AUDIOPHILES WHO LIKE TAPE BETTER - back when I used to read audiophile articles, many audiophiles seemed to not trust digital at all. Some of them talked about the "Fat sound" of tape where they said nothing was being thrown away. They said as soon as someone deals in an algorithm and goes digital, they are combining things and throwing sound away. Well, my thought is - what if they are right and wrong at the same time? Perhaps engineers make no errors at all throwing anything away - but it is the process of combining or method of combining that is bad?

MIDI MUSIC PLAYING THE FIRST TIME - Other than live music, the only time I recall in the last ten years being "enthralled by music" was some of the times I played my midi music back and converted it to audio for the first time. Now my system is limited to 28 separate notes at a time and no more than 16 instruments for the midi portion. But let us say that 10 instruments going on and 20 note sounds total for those 10 instruments are far more than the two sound sources I normally hear from stereo. Now someone can argue that even though those sounds are being made fresh by a Roland Sound Canvas that converts midi to audio, the end result still must come out two speakers. I agree with that. But I think that it is an analog addition of those sounds and not a digital sound. I also now believe that speaker distance enters into this all somehow. But anyway, what I am saying here is a value judgment and then later trying to understand it. But yes, a number of times playing back midi to speakers for the first time, I felt "wow!! - a lot of sounds there!".

MY BELIEF OF THE EAR ERROR I BELIEVE ENGINEERS MAKE

Having been an engineer and around them a lot, I have at times seen them get the complex thing very right but may miss something outside what they consider their own project, since they are not "looking in that direction". I think that is what we have. I believe that engineers may have looked at an oscilloscope or frequency analyzer to see what the composite of 10 audio signals in a room look like and then figured out how to combine them. The problem is not what they did. The problem is that they may have assumed our ears combined audio like an oscilloscope and/or frequency analyzer using a single microphone, whereas perhaps that is not what our ears do at all. Our ears may have the equivalent of "many microphones" that can separately deal with many sound sources at the same time.

MY POINT

I think our ears are far better than we gave them credit for. I believe an ear does not hear just one sound each ear but can distinguish many sounds from many sources. In that manner, as soon as sounds are combined digitally much is lost. And my ears tell me that much .... is lost ... after digital combining. I think that even the very simple example of why 5:1 surround sound or 7.1 surround sound is more impressive, proves that point even all by itself.

FURTHER OBSERVATIONS

I find that listening to a midi song playing through speakers before mix down can give results I will never hear after mix-down to digital. I seem to hear almost all sounds it seems distinctly and clearly, and therefore I sometimes foolishly think that I have adjusted my volume settings, and changing volumes during a song correctly. But when I get to the digital mix down, some of what I thought I just heard a bit ago seems missing. Actually, in fairness, it seems to be there but at a lower sound level. It is as if being now included in one sound source, it has to compete far harder in volume level to be heard than when it was a distinct and separate sound.

Also headphones used for sound mixing seem oddly to be more along the lines of the final mix down digital product than my speaker system amplifying what are likely analog combined signals. I am not sure why. Perhaps for our ear to hear distinct audio sounds it must be at a distance or at some different angle or something for us to distinguish. I do not have the answer. I can only say that there are big differences to my ear between analog sounds from speakers, headphones and digital mix-downs.

WHAT THIS MEANS TO ME REGARDING COMPRESSION

At the risk of getting everyone mad at me, I think we lose much as soon as digitally mix from many different sound sources to a single source sound. And I think the problem is that no matter how great the genius of the engineer, this process seems to take multiple and distinct sound sources and combine them into a single very complex sound source. And in that process, the distinct nature of the different sounds seems lost, and now what is left is competing far, far more for volume settings to be heard - or to be ever lost in the background. Of course some types of compression are better than others and I have also looked into that as have others, but digital music seems to have one thing in common - to my ears it seems a single audio source rather than the joy of many distinct audio sources.

To me, digital mixing of music is like a photo of the beach. Now depending on codec and compression, some photos will be brighter or have more contrast, but there are still photos. They are not the beach. If at the beach our eyes see things in three dimensional imaging. We hear beach sounds. Our toes feel the sand. We feel the wind and see the birds fly. A photo might be the best we have, but it is not the beach. Of course an engineer can say that is not a fair analogy since many sense are included there. Well, customers are not always reasonable people. But as engineers we should try to get as close as we can get.

MY SUGGESTED FIX

If the above is correct, then the issue is not a problem in making a technical error in combining - it is the combining approach in the first place. If I am right in modeling the ear as many microphones, then digitizing it would mean perhaps finding a way to keep 20 or more sounds separate and not combine them into a single waveform. That is easier to say than do. But perhaps one should start with a tape and see why some say it gives a better output. We can say those audiophiles are lunatics, but what if they are right but just do not know how to state the issue well?

On the other hand, one could take a group of beings that likely will not begin with the answer and yet be quite fair. One could do an experiment with dogs. Dogs seem quite good at hearing small sounds, such as the sound of their human's car as it pulls into the driveway. One could try multiple recording methods, and then watch the reaction of the dogs to sounds played back.

Yes, I know, some will say that I am mad for not "going with the flow here". But I think there is something here - otherwise, why do some think 5:1 surround sound is different than stereo enough to buy 4 more speakers?

------------

THE SUBJECTIVE SIDE

The above is all about the technical side only of musical recording, and it is almost solely based on doing home audio recordings and listening at various stages. However, at least from my tastes musical enjoyment becomes far more complex based on a number of subjective issues. I realize that for engineers, this may be where the "good stuff" ends, since subjective is subjective, and yes I agree, engineers can do very little about fixing subjective items they cannot control.

What you Begin With

For my tastes it is still far more important what you begin with than the recording method itself. I see that so much in home recording. But first let us take a case that for me the "who" is the person or group recorded makes a huge difference. If my granddaughter sent me a song on mp3, of course I would listen to it, and right away, and treasure it. And I likewise would do that of relatives and friends as well. That would be far more important to me than listening to a higher quality recording of say a group I did not know or like.

There are also some professional groups that may not be available in high quality recordings. One of those a group I believe called the Los Admiradores long ago came out with a phonograph record called "Bongos, Bongos, Bongos" that I admired very much. Great orchestration and clever song arrangement for what appeared to be a 6 or 7 piece band. To my knowledge this record never made it to what even I agree is the higher quality digital sound. And so I spent honestly hours trying to record a phonograph record and using scratch removal software and effects and the like to try and get a modern copy. But even with the problems, I would far rather listen to their music in low audio quality that many items of not so good music in a better recording quality method.

Mix Down Quality Matters More

Time and again I find that often unless I did a great job during music mix-down from multiple sources into a single digital stereo output, that the quality of that mix down matters far more than what compression method is used later. It would be convenient and nice for me to blame the compression method, but I find too often it was the person doing the mix down that had far more to do with quality than what was done later. Of course this is generally a well known Engineering principle - that you have to start with something good in the first place, because seldom can you make bad sound better. You can often make it worse or if you are really good, it can stay the same, but better than the original is often not a reasonable goal.

And then there is "fix it in the mix"

"Fix it in the Mix" is a recording person's expression for trying to fix less than perfect sound recordings at the same time we mix down to the final stereo output. Perhaps a singer coughed; if so, our deft fingers may shove the volume for them (that single vocal track) down right at that time, for just a second, if we can avoid cutting into their singing. Perhaps there was noise at the beginning or end. That is easy, we kill those sounds. What the singer did great but ended a phrase wrong in just one place? No problem, we lower the volume right at that point. And in my case, since I am a one person band, all of these issues come from me.

Well, okay, now let us say someone did a recording and they are not Frank Sinatra backed up by the real Paul Mauriat Orchestra or Percy Faith Orchestra. Okay, there are going to be days when a less than perfect group may actually sound better with some nuances missing. That means that an mp3 of them might in fact sound better since it eliminates some of the problem sounds and instead concentrates on the better sounds. And yes, unfortunately, I have seen that also - and most likely with my own recordings. Anyone remember the Dolby days when Dolby and DBX were used to kill noise? Some days they killed some music along with it. Some days that can be a feature and sometimes not.

If one adds the subjective items onto the technical items, the recording method or compression method becomes a most complex subject - at least for me.

Ronald J. Plachno
October 16-19, 2013

Back To Articles

Back to Main Index for Ron Plachno Site