Voice: The Next UI Layer

Cody Simms
5 min readFeb 29, 2016
Amazon Echo

Lately, I’ve been obsessed with the power of voice. Not voice as in self-expression (though important!), and not voice as in voice calls or — worse! — voicemail, but rather voice as the next major UI layer for software in the living room and beyond.

As voice recognition and, more importantly, AI-driven understanding of the meaning behind voice-based commands and queries all develop to maturity, we’re seeing a dizzying array of voice-driven use cases start to take shape. Don’t get me wrong, I think voice is a feature not a product, but I think it’s a convenient enough feature that it will change consumer behavior significantly. Apps and experiences that are quick to orient themselves to voice-based use cases will end up capturing rapid marketshare in the coming months and years, much as apps that were first to social and first to mobile have done over the past decade.

Voice will soon become a critical — if not the most critical — UI layer for navigation, search, and social in the living room and across home automation. We’re already seeing many signs of this from big and small players alike.

Echo: the next iPhone?

The most obvious case here is Amazon Echo. Echo is still early, and its current feature set is basically an MVP that hints at the possibilities ahead. But even then, I’ve found Echo to be the most impressive mainstream consumer product launched since the iPhone. And I’m starting to think that it can influence how we interact with technology inside the home just as strongly as the iPhone has changed how we interact with technology outside the home. You see, I have a pretty nice Sonos system in my living room. The sound quality of my Sonos is probably 10X that of my Echo, and yet, I find myself going to my Echo probably 10X more often. The reason? Convenience. Rather than finding my phone, unlocking it, opening the Sonos app, waiting for it to load, searching for what I want, selecting it, and hitting play…I simply have to say “Alexa, play Willie Nelson.” Done. Simple. And this is the obvious use case. There are so many more things that Echo can do today that will only get better with time: weather, kitchen timers (super nice as a hands free feature!), repeat shopping, IFTTT integrations, sports, NPR…the use cases are already quite vast…and just getting started. If I could do phone calls with Echo as my speakerphone, I would. If I could use Echo to power my TV, I absolutely would just so I wouldn’t have to find my darn remote control every time. (And no, the smartphone never really did end up taking over the living room controls as we all thought it would…it just became an alternate living room device.) There are surely some cool things being developed between Echo and Slackbot that I can’t wait to try. I’m fascinated to see where Echo the platform and Echo the product (surely it will evolve into multiple form factors) both go over the next year and beyond.

Beyond Amazon, we’re already seeing Apple bring voice into the living room with Siri as a core part of the new AppleTV. (There is room for improvement, sure, but the potential is clearly there.) Google is innovating on voice with Google Now. And voice has long been a mainstay of the video game world, acting as a primary means of social communication and chat. Google and Apple have both also been pushing dictation-based use cases in other products too such as Google Docs and Apple iMessage…they are clearly working to sharpen their voice-based intelligence across the board.

VR is Lonely.

And then of course there is virtual reality. VR is seeing all types of innovation presently, but one of the unspoken challenges in VR is that it’s really darn hard to navigate cleanly. You can’t type easily in VR, and with or without controllers, your physical hands are obscured anyway. Switching from one app to another isn’t simple. And searching or inputting data in VR is just plain hard. Voice can solve this. Another big challenge of VR? It’s pretty isolated. This isolation goes against just about every pattern of media consumption that has emerged in the last 15 years. When it comes to media consumption, our attention spans have gotten shorter and shorter…and more and more social while we consume. (Honestly, when was the last time you watched an hour of TV without looking at your phone once during that time? And now ask the same question to a fifteen year old…) VR today is kind of the opposite. We can’t easily scroll feeds and check messages while we use it, and the promise of VR is that we’ll use it to escape for long durations at a time. What changes this isolationism? Voice. We’ve seen early attempts at voice as a social layer in VR from companies like AltspaceVR and LiveLikeVR, and I’m very bullish that voice will emerge as a primary UI for social in VR in addition to solving navigation and search issues as well.

Talk Don’t Type

Outside of the living room, there are use cases where voice is simply more convenient than typing and/or easier to consume than reading. At Techstars, we’ve adopted Voxer as our primary communication channel for all things “more urgent than email”. It’s so refreshing to not have to try to communicate nuance or detail in a long email but rather to just say what you need to say out loud. As a recipient of a vox, it’s great to hear the nuance and inflection of the person sharing their message. The asynchronicity and multi-party chat nature of Voxer is so much easier than games of phone tag for a company like Techstars, where we are globally distributed and always on the go. It’s easy in and easy out. No waiting for ringing…no awkward “hey how’s it goings”. Just straight to the point. Just as Echo is more convenient than my Sonos, Voxer is more convenient than texting, email, or phone calls. When done right, voice is a killer feature. This is likely why we’re now seeing an explosion of voice-based social networks and channels emerge like Unmute, Anchor, Clyp, and Pundit. They are doing to podcasting what Tumblr and Medium did to blogging.

David Hasselhoff: Visionary

And then there’s the automotive space. The promise of talking with our cars has been real at least since the days of Hasselhoff and KITT in Knight Rider (1982!), but most innovations in this space to date haven’t advanced past the late 1980s cars that talked at you…“Your door…is ajar…” As a true hands-free environment, the automobile offers an amazing canvas for future voice-controlled experiences. Tesla integrates with Google voice recognition software today, and one can only imagine significantly more innovation in the near future from Google, Apple, Tesla, and more.

I’m bullish on voice as a game changing UI layer for the next generation of software-based user experiences. If you are working on something interesting in this space, I’d love to hear about it.

Finally, I’ll demo some of these use cases on Snapchat over the next 24 hours (and then they’ll expire). If interested, add me at http://snapchat.com/add/kidsallright