I recently connected my AirPods Pro to our Apple TV, something I usually do on the occasion that my wife wants to sleep rather than unwind with some show. The resulting sound is always exceptionally good, even when it’s not immersive. In this particular case, the AirPods didn’t try to envelop me in sound because I was watching an older show. Instead, the stereo audio was augmented in such a way to suggest the sound was more or less coming from the TV. This is actually what I want most of the time. Some home theater enthusiasts may balk at reading that, but I don’t want to be enveloped in sound while trying to wind down. All I want is clear dialog and not to be jarred by suddenly loud musical scores and sound effects. In other words, I just want my TV to sound right.
The CRT televisions of my youth didn’t sound good, but they sounded right. Despite only having one or two laughably small speakers, I rarely had any issue interpreting the sound of whatever movie or show I was watching. By comparison, the flat screen televisions of my adult lifetime never sounded right. Their audio quality was better, sure, but they all sounded wrong. Dialog was quiet and often obscured by other audio tracks. Increasing the volume helped, but at the expense of jarringly loud moments.
Edward Vega, did an excellent YouTube video for Vox explaining this phenomenon, and specifically why dialog in particular has gotten increasingly hard to hear. He lays out several reasons, but the parts I think are most germain to televisions not sounding right are dynamic range…
[A] big thing that [filmmakers] want to preserve is a concept called dynamic range. The range between your quietest sound and your loudest sound. If you have your dialog, that’s going to be at the same volume as an explosion that immediately follows it. The explosion is not going to feel as big. You need that contrast in volume in order to give your ear a sense of scale. But the thing is, you can only make something so loud before it gets distorted. So if you want to create that wide dynamic range you have no choice but to push those quieter sounds lower instead of pushing the louder sounds louder. So explosions go up and dialog comes down.
…and the large number of tracks necessary for modern surround sound.
The content that we watch [on our televisions and smartphones]is not mixed for us, primarily. Rerecording mixers mix for the widest surround sound format that is available. typically like big release films. That is Dolby Atmos, which has true 3D sound up to 128 channels. The thing is, if you’re not at a movie theater that can showcase the best sound Hollywood has to offer, [then] you can’t experience all of those channels. So after the movie is mixed for the 128 Atmos tracks somebody, has to create a separate version of the film’s audio where all those same sounds live on one or two or five tracks.
Basically, those who make movies and shows produce audio for multi-speaker theater setups, home or actual, and do so at the expense of more typical setups that involve just the TV’s built-in speakers1.
Edward concludes his video by saying the issue is intractable and gives the audience three options:
So the solutions we have are:
- Buy better speakers and only go to theaters that have impeccable sound.
- Take a chill pill and try to just worry a little bit less about picking up every single word that gets said.
- Just keep the subtitles on.
Suggestions two and three are absurd, as is the notion of exclusively going to theaters. The only real suggestion is to buy speakers, but that begs the question what kind of speakers does one need just to make the TV sound right? Those producing content would seemingly have everyone buy and install a home theater with capabilities as close to an actual theater as possible. My problem, beyond cost and wires, is that I don’t always want a theater-like experience for home viewing just like I don’t want a concert-like experience for home listening. Most of the time, I don’t want an experience at all. Again, I just want my TV to sound right.
“Sounding right” seems like table stakes for any home theater system, but I’ve found it to be elusive. My previous receiver with two decent bookshelf speakers never sounded right, even after I added a center channel. In hindsight and given the aforementioned complexity and priorities of audio in modern movies and shows, I now don’t see how those three dumb speakers ever could sound right. In fact, it seems the only way to make shows and movies sound right with dumb speakers is to use an ever increasing number of them. That’s fine, even exciting, if you are a home theater enthusiast, but a bunch of dumb speakers is not the answer for me.
The answer for me, and I would wager most people, is computerized speakers2.
“Computerized speakers” is the term I am using to mean speakers with built-in computerized audio processing. This is in contrast to what I am calling “dumb speakers”, which merely reproduce already processed audio being sent from another component, typically a receiver. Having the smarts built-in make computerized speakers more flexible and less finicky than their dumb counterparts. With dumb speakers, more is necessary3. With computerized speakers, more is merely preference. You can have a single soundbar or fill the room with a bunch of computerized speakers for better immersion, and everything can sound right regardless. I’d wager most people don’t buy any speakers for their TV and that a majority of those who do just buy a soundbar.
Soundbars range in both quality and price. Some are cheap and probably sound like crap while others are expensive and reportedly sound quite good. Within this range, I think Apple is making some of the best computerized speakers4 for home theater on the market for their price. The HomePods that we typically use when watching TV also just sound right., all the dialog is clear and I still hear everything without being jarred by some overly loud sound effect or music cue.
While the HomePods can only can to kinda sorta fake surround sound, the AirPods Pro reproduce a remarkably spatial surround sound experience. “Spatial” is the word that Apple uses and one that I think is very apt. Like with stereo, surround sound with AirPods Pro can feel like it’s at a distance where the audio is set further back in the soundscape. Unlike stereo, this distance varies depending on what’s happening in the video. A close up sounds close while a medium shot sounds further away. Apple’s deliberate use of distance is best illustrated by a third example of Apple’s exceptional computerized speakers.
I don’t own a Vision Pro, but a friend let me try his a few weeks ago. Top on my list was to sample some of Apple’s immersive video. Immersive video is not the same as 3D video. 3D video still comes from a rectangle and is therefore still directional. You look toward the rectangle to watch the 3D video. Immersive video has no rectangle. The video is all encompassing, and you watch by looking all around. With non-immersive video, the Vision Pro’s audio pods kept some sound at a distance just like when I watch my Apple TV with AirPods. With immersive video however, the sound is more often right there, which makes sense because you are right there.
Time was you could just buy the TV and have everything sound right. You could optionally add two decent dumb speakers and have everything sound good too, because everything you were watching was stereo or mono. The sound in modern movies and shows has gotten too complicated to sound good on two dumb speakers. Getting decent home theater sound with only one or two speakers requires good computerized speakers and right now Apple is making some of the best computerized speakers for the buck, especially if you just want everything to sound right.
-
Supporting the top end makes sense, but doing so at the expense of how the majority watches shows and movies seems folly to me. It would be like if the makers of Cheers insisted on filming and presenting their sitcom in letterbox despite being solely viewed on standard aspect ratio televisions. ↩︎
-
The obvious term to contrast with “dumb speaker” is “smart speaker”, but alas, that term is already taken. ↩︎
-
It might be possible for a home theater system consisting of a receiver and two dumb speakers to sound right, but I am doubtful for two reasons. First off is the sheer number of speakers in a given speaker. Good dumb speakers tend to have one or two, maybe three speakers. Good computerized speakers, on the other hand, tend to have half a dozen or more. The second reason is that the market for home theater receivers has gotten more complex, not less, in part because they cater specifically to home theater enthusiasts. ↩︎
-
It’s stuff like this that makes me feel like Apple should have kept “Computer” in its name. ↩︎