Punch, type, click, swipe, gesture, speak, THINK

The user interface (UI) has markedly changed in the past few quarters. From “mobile first, native vs web and responsive” story lines, founder and investors’ dreams now echo the next revolution of UI, voice (or is it gesture?).

The chart below defines a recent history of internet/human interface device sales, starting with broad adoption of the keyboard + mouse via PCs in the 90s.


Source: PCs, Smartphones – BusinessInsider, Tablets – Digitimes, Echo – Meeker/KPCB, various

The transition from Static Epoch to Mobile Epoch coincides with an explosion of smartphone and tablet sales. This is not only a “mobile” story, however, it is a UI story as well. The mobile explosion was enabled by the development of small/cheap touch screens. Suddenly you could be connected and do anything anywhere with your fingers alone. Lugging a keyboard and a mouse around – or for that matter, typing on a Blackberry keyboard – were just inadequate for seamless mobile work. We are now entering a new age, moving from the Touch Age to the next. The question is whether it is voice or gesture or both.

Both voice and gesture have been around for a while. Most premium handset manufacturers have had voice (Siri, Google Now, etc) and gesture integrated in smartphones since 2013, and Kinect has been around since 2010. While these applications of voice and gesture had highly anticipated launches and early adoption, none proved to be mainstream. Few people I know talk with or gesture to their phones, and no one I know – save for a few overgrown gamers – own a Kinect. Tech cheerleaders would say the underlying technology had not yet been perfected enough for prime time, with burdensome training needs, reliability challenges and high battery use. However, my time as an engineer taught me that failed UI launches usually come from a lack of user insight rather than technology gaps.

It is simply weird to gesture or talk to my phone in a busy elevator or sitting among my colleagues at work. For kicks, I recently and unexpectedly “okay, googled” my Nexus 6P to setup a calendar invite, while sitting near my four quietly working colleagues. When I afterwards asked them on a scale of 1 to 10 how annoying it was to listen to me (10 being insanely annoying), they said 2, 6, 7, 4. That’s pretty annoying.

Context of use is everything. Mary Meeker’s 2016 report cites that 43% of voice search occurs in the car while 36% occurs at home. That doesn’t leave a big slice for work and social pursuits. Not so surprising; google glass wasn’t much of a hit around other human beings either, but the car and home do make sense as environments where social challenges are minimal.

I don’t believe either voice nor gesture will become as ubiquitously horizontal as touch, but they will both play big roles in parts of our lives. Here is how I see it:


As you can see above, I am more bullish on voice. Outside of VR (I broadly consider VR to be part of gesture), gesture is mostly a natural extension of touch with a few more degrees of freedom and planes of movement. For many use cases, touch is simply easier and more efficient than gesture. Voice, on the other hand, allows you to be in control when your hands are otherwise occupied or aren’t near a device, opening up a far broader range of new use cases.

Platform domination of gesture and voice a risk for entrepreneurs

The “so what” for entrepreneurs is that the tech giants are rapidly assembling their claims to be the voice and gesture platforms of record. The impact of this to startups depends on whether you are a pure voice or gesture technology or a voice or gesture enabled application.

Launching a pure gesture or voice technology is a tough road for startups because it means long sales cycles selling an embedded software to very large companies like Samsung, HTC, Google, Apple, Moto etc. Alternatively, it means a direct-to-consumer hardware device like Thalmic Labs’ Myo Armband. They seem to be doing well for now, but few startup hardware devices have happy endings.

Soon every B2B and B2C software company will be considering its voice and gesture enablement strategies much as companies endeavored upon mobile strategies en masse 5 to 7 years ago. This may evolve analogously to the iOS and Android app stores. Early in their lifecycles, app stores were a meaningful and sometimes differentiated distribution strategy for a software/app company, at least enough so to get viral adoption from an early base of consumers and hungry investors. No more. There is too much noise, and the data now show the vast majority of people spend their time on just three apps. The same will happen on Alexa and Oculus. Alexa is particularly problematic in that without any visual cues, I won’t remember to use that niche birthday reminder skill I downloaded last week. So, should you launch an Alexa skill or Oculus app? If they channel a UI that your customer demands, yes, but as a business model unto their own, no.

I feel you hearing what I’m thinking. Yup, gesture and voice are not the end game. It might be 10 years, it might be 25, but UI by thought is the final frontier. Companies like Muse and Neurable are in the earliest stages of commercializing this possibility. In the mean time, we have a few more years to keep our thoughts to ourselves.