Goodbye Stereo, hello 360º Sound

In the past five years, there has been a paradigm shift in the speakers market. We’ve started seeing a different form factor of audio capable devices, 360-degree audio speakers, emerging. I want to have a look at the reasons behind the appearance of this form factor and the benefits it brings us.

First, it is important to look at the market segmentation reasoning:

1. Since the inclusion of Bluetooth in phones there has been a variety of (mainly cheap, initially) speakers that sought to abolish the need for cables. Docs and Bluetooth speakers were the answer. But at the time there was no premium solution for Bluetooth speakers and besides sound quality, there was room for more innovation (or gimmicks, like a floating speaker). To luxuriate the Bluetooth speaker one of the solutions that were created was a 360° speaker.

The original Bluetooth speakers were directional speakers and since it is unknown where they will be placed, how many people need to listen to them and where they are sitting; having a directional speaker is a disadvantage in comparison to a 360° one.

2. From another perspective, 360° speakers function as a cheaper alternative to hi-fi audio systems. Many customers are just interested in listening to music in their home in comfort and do not require a whole setup with wires and receivers. They also mainly play music using their mobile phones.


So it fits right in the middle. Now let’s look at some use cases:

Parties — It can be connected to other speakers and have increased sound. It’s also relatively easy for other people to connect to it.

Multi-room — It can allow you to play music whilst controlling it with your phone in all sections of your house. It can also be controlled remotely.

Conference calls — or actually any call. It’s also possible to put it on speaker on your phone but that’s sometimes hard to hear.

Smart — Today we have assistant speakers with arrays of microphones that sometimes come in the form of 360. It’s a bit different but a 360° microphone array is as useful as a speaker array.

I want to focus on the 360° form factor and discuss why it is so important and a real differentiator. To be able to understand more about 360° audio, its advantages and the future of 360° audio consumption, it is important to have a look at the history of sound systems.


The person as sound — Before there were speakers there were instruments. People used their own resonance to make a sound and then found resonance in drums, and string-based instruments. That led to a very close and personal interaction which could be mobile as well. People gathered around a singer or musician to hear them.

Phonograph and Gramophone — This was the first time music became reproducible mechanically. However, it was still mono (one channel). From an interaction perspective, it was a centerpiece with the sound coming out of the horn.

Stereo systems — Stereo was an ‘easy sell’, after all we all have two ears. Therefore speakers that can pleasure them both are fabulous. Some televisions were equipped with mono speakers but more advanced televisions had stereo speakers too.

Surround — 3/5/7.1 systems were introduced mainly for the use case of watching movies in an immersive way. These systems included front, back, center, and sub speakers (sometimes even top and bottom). It is still quite rare to find music recordings that are made for surround. Algorithms were also created for headphones, to mimic surround.

But there is a limitation with these systems. Let’s compare it to the first two reproducible sound systems: the human voice and the Phonograph. They both had more mobility. You could place them wherever you wanted to and people would gather around and listen to music. I can’t say it’s exactly the same experience, but it doesn’t hurt the premise of the instrument. However, with stereo systems and surround systems, you need to sit in a specific contained environment in a specific way to really enjoy their benefits. Sitting in a place where you cannot really sense that spatial experience makes these systems redundant.

Sources of music

Audio speakers in the present

Considering current technologies and their usage, our main music source is our mobile phones. It’s a music source that doesn’t have to be physically connected via cables. Our listening experience is more like a restaurant experience where it’s not important where the audio is coming from as long as it’s immersive. 360° speakers then were able to provide exactly that with fewer speakers. But we lost something along the way, we lost stereo and surround. In other words, we lost the immersive elements of spatial sound.

Audio speakers in the near future

There are huge investments in VR, AR and AI and all of these fields are affecting sound and speakers. In VR and AR we are immersed visually and auditory, currently using a headset and headphones. At home we’ve started controlling it via our voices, turning lights on and off, changing music and so on.

Apple’s HomePod has a huge premise in this respect. Its spatial algorithm could be the basis for incredible audio developments. Apple might have been late to the 360° market but they have tremendous experience in audio and computing and this is why I think this is the next big audio trend: “The spatially aware 360° speaker”.

From Apple’s presentation

Although they sell it as one speaker it can obviously be bought in pairs or more. The way these understand each other will be the key to this technology.

Spatiality is important because in a 360° speaker a lot of sound goes to waste, and a lot of power is inefficient. Some of that sound is being pushed against a wall which causes too much reverb. Most of the high frequency that is not being projected at you is useless.

Here are the elements to take into account

  1. Location in the room — near a wall, in the corner, center?
  2. Where is the listener?
  3. How many listeners are there?
  4. Are there other speakers and where?

In Apple’s demonstration, it seems that some of these are being addressed. It’s clear to see that they thought about these use-cases and therefore embedded their chip into the speaker which might become better over time.

The new surround

360° speakers can already simulate 3D depending on the array of speakers that are inside the hardware shell. This will be reflected in the ability to hear stereo if you position yourself in the right place.

But things get much more interesting if the speaker/s are aware of your location. If you are wearing a VR headset and have two 360° speakers you can potentially walk around the room and have a complete surround experience. A game’s experience could be super immersive without the need for headphones. Projected into AR, a room could facilitate more than one person at a time.

Consider where music is being listened to. In most instances, a 360° speaker would be of greater benefit than a stereo system. In cars, which usually have four speakers, offices and clubs, 360° speakers would work better than a stereo system. Even headphones could be improved by using spatial awareness to block noises from the surrounding environment and featuring a compass to communicate your orientation. Even a TV experience can be upgraded with just HomePods and some software advancements.

What about products like Amazon Echo Show?

A screen is a classic one direction interaction. Until we have 360-degree screens which work like a crystal ball with 360° audio, I don’t see it becoming the next big thing; after all, we still have our phones and tablets.

The future of 360 in relation to creation and consumption tools

Here are a bunch of hopes and assumptions:

  1. Music production and software will adopt 360° workflows to support the film and gaming industry; similar to 3D programs like Unity, Cinema 4D, and Adobe.
From Dolby Atmos

2. New microphones will arise, ones that record an environment using three or more microphones. It will initially start with a way to reproduce 3D from two microphones, like field recorders, but quickly it’ll move into more advanced instruments driven by mobile phones which will adopt three to four microphones per phone to be able to record 360° videos with 360° sound. Obviously, it’ll be reflected in 360° cameras individually as well.

3. A new file type that can encode multiple audio channels will emerge and it will have a way of translating it to stereo and headphones.


I can’t wait to see this becoming reality and having a spatially aware auditory and visual future based on augmented reality, using instruments like speakers or headphones and smart glasses to consume it all.

Here are a couple of companies/articles that I think are related

Voice assistance and privacy

Voice assistants technologies are hyped nowadays. However one of the main voiced concerns is about privacy. The main concern about privacy is that devices listen to us all the time and document everything. For example, Google keeps every voice search users do. They use it to improve its voice recognition and to provide better results. Google also provides the option to delete it from your account.

A few questions that come to mind are: how many times do companies go over your voice messages? How often do they compare it with other samples? How often does it improve thanks to it? I will try to assume answers to these questions and suggest solutions.

A good example for a privacy considered approach is Snapchat. Messages in Snapchat are controlled by the user, and they also disappear from the company’s servers. Considering the age target they aimed for, it was a brilliant decision since teenagers don’t want their parents to know what they do, and generally, they want to “erase their sins”. Having things erased is closer to a real conversation than a chat messenger.

Now imagine this privacy solution in a voice assistant context. Even though users aspire the AI to know them well, do they want it to know them better than they know themselves?

What do I mean by that? Some users wouldn’t want their technology to frown upon them and criticize them. Users also prefer data that doesn’t punish them for driving fast or being not healthy. This is a model that is now led by insurance companies.

Having spent a lot of time in South Korea I have experienced a lot of joy rides with taxi drivers. The way their car navigation works is quite grotesque. Imagine a 15-inch screen displaying a map goes blood red with obnoxious sound FX in case they pass the speed limit.

Instead, users might prefer a supportive system that can differentiate between public information that can be shared with the family to private information which might be more comfortable to be consumed alone. When driving a car, situations like this are quite common. Here is an example — A user drives a car and has a friend in the car. Someone calls and because answering will be on the car’s sound system the driver has to announce that someone else is with them. The announcement is made to define the context of the conversation thus to prevent content or behaviors that might be private.

The voice assistant will need to be provided with contextual information so it could figure out exactly what scenario the user is in, and how / when to address them. But we will probably need to let it know about our scenario in some way too. Your wife can hear that you are with someone in the car but can’t quite decipher who with. So she might ask “are you with the kids?”.

Voice = social

Talking is a social experience that most people don’t do when they are alone. Remember the initial release of the bluetooth headset? People in the streets thought that you are speaking to them but you actually were on the phone. Another example is the car talking system. Some people thought that the guy sitting in the car is crazy because he is talking to himself.

Because talking is a social experience we need to be wary of who we speak to and where; so does the voice assistant. I know a lot of parents that have embarrassing stories of their kids “blab” things they shouldn’t say next to a stranger. Many times it’s something that their parent said about a person or some social group. How would you educate your voice assistant? By creating a scenario where you actively choose what to share with it.

Companies might aspire to get the most data possible, but I doubt that they really know how to use it. In addition, it doesn’t correspond with the level of expectations that consumers expect. From the users perspective, they probably want their voice assistant to be more of a dog, than a human or a computer. People want a positive experience with a system that helps them remember what they don’t remember, and that forgets what they don’t want to remember. A system that remembers that you wanted to buy a ring for your wife but doesn’t say it out loud next to her, and reminds you in a more personal way. A system that remembers that your favorite show is back but doesn’t say it next to the kid because it’s not appropriate for their age.

A voice assistant that has Tact.

Being a dog voice assistant is probably the maximum voice assistants can be nowadays. It will progress but in the meantime, users will settle on something cute like Jibo that has some charm to it in case it makes a mistake and that can at least learn not to repeat it twice. If a mistake happened and for example, it said something to someone else, users will expect a report about things that got told to other users in the house. The Voice assistant should have some responsibility.

Mistakes can happen in privacy, but then we need to know about it before it is too late.

Using Big Data

The big promise of big data is that it could globally heal the world using our behavior. There is a growing rate of systems that are built to cope with the abundance of information. Whether they cope or not is still a question. It seems like many of these companies are in the business of collecting for the sake of selling. They actually don’t really know what to do with the data, they just want to have it in case that someone else might know what to do with it. Therefore I am not convinced that the voice assistant needs all the information that is being collected.

What if it saved just one day of your data or a week, would that be contextual enough?

Last year I was fascinated by a device called Kapture. It records everything around you at any give moment. But if you noticed something important happen you can tap it and it will save the previous 2 minutes. Saving things retrospectively, capturing moments that are magical before you even realized they were so, that’s incredible. You effortlessly collect data and you curate it while all the rest is gone. Leaving voice messages to yourself, writing notes, sending them to others, having a summary of your notes, what you cared about, what interested you, when do you save most. All of these scenarios could be the future. The problem it solved for me was, how can I capture something that is already gone whilst keeping my privacy intact.

Kapture

Social privacy

People are obsessed with looking at their information the same as they are obsessed with looking in the mirror. It’s addictive, especially when it comes as a positive experience.

In social context the rule of “the more you give the more you get” works, but it suffers in software. Maybe at some point in the future it will change but nowadays software just don’t have the variability and personalization that is required to actually make life better for people who are more “online”. Overall the experience is more or less the same if you have 10 friends in Facebook or 1000. To be honest it’s probably worst if you have 1000 friends. The same applies to Twitter or Instagram. Imagine how Selena Gomez’s Instagram looks like. Do you think that someone in Instagram thought of that scenario, or gave her more tools to deal with it? Nope. It seems like companies talk about it but rarely do about it and it definitely applies to voice data collections.

It seems clear, the ratio of reveal doesn’t justify or power the result users get. One of the worst user experiences that can happen is for example signing into an app with Facebook. The user is led to a screen that requests them to grant access to everything…and in return they are promised they could write down notes with their voice. Does it has anything to do with their address, or their online friends, no. Information is too cheap nowadays and users got used to just press “agree” without reading. I hope we could standardize value for return while breaking down information in a right way.

Why do we have to be listened to every day and be documented if we can’t use it? Permissions should be flexible and we should incorporate a way to make the voice assistant stop listening when we don’t want them to listen. Leaving a room makes sense when we don’t want another person to listen to us, but how will that look like in a scenario in which the voice assistant is always with us? Should we tell it “stop listening for five minutes”?

Artificial intelligence in its terminology is related to a brain but maybe we should consider its usage or creation to be more related to a heart. Artificial Emotional Intelligence (A.E.I) could help us think of the assistant differently.

Use or be used?

How does it improve in our lives and what is the price we need to pay for it? In “Things I would like to do with my Voice Assistant” I talked about how useful some capabilities would be in comparison to how much data will this action need to become a reality.

So how far is the voice assistant from reading emotions, having tact and syncing with everything? Can this thing happen with taking care of privacy issues in mind? Does your assistant snitch on you, or tell you when someone was sniffing and asking weird questions? It’s not enough to choose methods like differentiated privacy to protect users. Companies should really consider the value of loyalty and creating a stronger bond between the machine and the human rather than the machine and the company that created it.

Further more into the future we can get to these scenarios:

There could also be some sort of behavioral understanding mechanism that mimics a new person that just met you in a pub. If you behave in a specific way the person will probably know how to react to you in a supportive way even though they didn’t knew you before. In the same way a computer that knows these kind of behaviors can react to you. Even more assuming there are sensors that tells it what’s your physical status and recognize face pattern and tone of voice.

Another good example are Doctors that many times can diagnose patients’ disease without looking at their full health history. Of course it’s easier to look at everything, but they would usually do that in case they need to figure out something that is not just simple. When things are simple it should be faster and in the tech’s case more private.

Summary

There are many ways to make Voice assistants more private whilst helping people trust them. It seems like no company has adopted this strategy yet. It might necessitate that this company would not rely on a business model that is driven by advertising. A company that creates something that is being released to the wild, a machine that becomes a friend that has a double duty for the company and the user, but one that is at least truthful and open about what it shares.