Punch, type, click, swipe, gesture, speak, THINK

The user interface (UI) has markedly changed in the past few quarters. From “mobile first, native vs web and responsive” story lines, founder and investors’ dreams now echo the next revolution of UI, voice (or is it gesture?).

The chart below defines a recent history of internet/human interface device sales, starting with broad adoption of the keyboard + mouse via PCs in the 90s.


Source: PCs, Smartphones – BusinessInsider, Tablets – Digitimes, Echo – Meeker/KPCB, various

The transition from Static Epoch to Mobile Epoch coincides with an explosion of smartphone and tablet sales. This is not only a “mobile” story, however, it is a UI story as well. The mobile explosion was enabled by the development of small/cheap touch screens. Suddenly you could be connected and do anything anywhere with your fingers alone. Lugging a keyboard and a mouse around – or for that matter, typing on a Blackberry keyboard – were just inadequate for seamless mobile work. We are now entering a new age, moving from the Touch Age to the next. The question is whether it is voice or gesture or both.

Both voice and gesture have been around for a while. Most premium handset manufacturers have had voice (Siri, Google Now, etc) and gesture integrated in smartphones since 2013, and Kinect has been around since 2010. While these applications of voice and gesture had highly anticipated launches and early adoption, none proved to be mainstream. Few people I know talk with or gesture to their phones, and no one I know – save for a few overgrown gamers – own a Kinect. Tech cheerleaders would say the underlying technology had not yet been perfected enough for prime time, with burdensome training needs, reliability challenges and high battery use. However, my time as an engineer taught me that failed UI launches usually come from a lack of user insight rather than technology gaps.

It is simply weird to gesture or talk to my phone in a busy elevator or sitting among my colleagues at work. For kicks, I recently and unexpectedly “okay, googled” my Nexus 6P to setup a calendar invite, while sitting near my four quietly working colleagues. When I afterwards asked them on a scale of 1 to 10 how annoying it was to listen to me (10 being insanely annoying), they said 2, 6, 7, 4. That’s pretty annoying.

Context of use is everything. Mary Meeker’s 2016 report cites that 43% of voice search occurs in the car while 36% occurs at home. That doesn’t leave a big slice for work and social pursuits. Not so surprising; google glass wasn’t much of a hit around other human beings either, but the car and home do make sense as environments where social challenges are minimal.

I don’t believe either voice nor gesture will become as ubiquitously horizontal as touch, but they will both play big roles in parts of our lives. Here is how I see it:


As you can see above, I am more bullish on voice. Outside of VR (I broadly consider VR to be part of gesture), gesture is mostly a natural extension of touch with a few more degrees of freedom and planes of movement. For many use cases, touch is simply easier and more efficient than gesture. Voice, on the other hand, allows you to be in control when your hands are otherwise occupied or aren’t near a device, opening up a far broader range of new use cases.

Platform domination of gesture and voice a risk for entrepreneurs

The “so what” for entrepreneurs is that the tech giants are rapidly assembling their claims to be the voice and gesture platforms of record. The impact of this to startups depends on whether you are a pure voice or gesture technology or a voice or gesture enabled application.

Launching a pure gesture or voice technology is a tough road for startups because it means long sales cycles selling an embedded software to very large companies like Samsung, HTC, Google, Apple, Moto etc. Alternatively, it means a direct-to-consumer hardware device like Thalmic Labs’ Myo Armband. They seem to be doing well for now, but few startup hardware devices have happy endings.

Soon every B2B and B2C software company will be considering its voice and gesture enablement strategies much as companies endeavored upon mobile strategies en masse 5 to 7 years ago. This may evolve analogously to the iOS and Android app stores. Early in their lifecycles, app stores were a meaningful and sometimes differentiated distribution strategy for a software/app company, at least enough so to get viral adoption from an early base of consumers and hungry investors. No more. There is too much noise, and the data now show the vast majority of people spend their time on just three apps. The same will happen on Alexa and Oculus. Alexa is particularly problematic in that without any visual cues, I won’t remember to use that niche birthday reminder skill I downloaded last week. So, should you launch an Alexa skill or Oculus app? If they channel a UI that your customer demands, yes, but as a business model unto their own, no.

I feel you hearing what I’m thinking. Yup, gesture and voice are not the end game. It might be 10 years, it might be 25, but UI by thought is the final frontier. Companies like Muse and Neurable are in the earliest stages of commercializing this possibility. In the mean time, we have a few more years to keep our thoughts to ourselves.

7 thoughts on “Punch, type, click, swipe, gesture, speak, THINK

  1. Good thoughts, Guy.

    I’ve been watching Oculus and Magic Leap and other AR/VR plays. While interesting, the idea of a gesture based experience that goes beyond ‘your personal space’ seems very out of place to me. I can’t imagine sitting in an airport gate and seeing everyone swiping the air to explore content the way they today are consumed by looking at their screens. As you experienced with your co-workers, social acceptance of these new interaction methods will be as critical as the technology. A new product called Nucleus is the first video-based Alexa product for room-to-room or place-to-place video chatting using Alexa. It will be interesting to see if there are other screen-based extensions to Alexa over time to enable smarter interactions that go beyond just voice.

  2. Guy, good insights, but you are actually still missing the UI of the future. It’s not about one to one replacement of voice command instead of swipe or poke or turning on switches. It’s about the road toward contextual understanding of audio. Imagine having a personal assistant that uses audio signals as much as all other sensor information to know where you are, what your pattern of activity is, and know when you so intimately that you don’t got to speak precisely.

    And speaking of startups in this field, we are one that is will begin to dominate embedded audio space with our technology, but no VC thus far understands what we do. I’ve become so disillusioned with the VC community’s short sightedness that we stopped trying to talk to VCs. VC’s don’t understand the nature of embedded software and the product development cycle. They are stuck in the app world. $50K to write an app in 1 month and start counting downloads. That kind of investment makes a quick buck for investors but that does not fundamentally add to productivity.

  3. Nice post, Guy.

    At a conference last week on the West Coast, digital marketers at enterprise companies were deeply concerned about how to weave together all the threads of digital interaction into a single user experience. Your assessment of voice and gesture UI fits into that wider context for me — clearly, consumers will expect to use every method at their disposal to address their problems, which you spell out nicely, and companies without investment in UI could be creating their own pain points for their customers.

    The upshot is that this is a reasonable place to invest, especially in exploratory technologies that might enable future business. Likewise, for enterprise companies, the timing is good to imagine future business models voice and gesture UI can open up, and how prepared their organizations are to capitalize on them.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.