After using the Amazon Echo for the past few months it is impossible to ignore the advancement towards ambient intelligence, where computers are listening, sensing all the time and respond to the presence of humans. The Echo inexorably moves us closer towards interfacing with computers, like we do with other humans. But with Echo’s current limited capabilities, there are also significant challenges to overcome before this becomes reality.
The Echo keeps me company in the kitchen when I cook and eat. In addition to playing music, I use it to order from Amazon, listen to audio books, set kitchen timers and add items to my grocery list.
In the short time with Echo, I have been able come to see the enormous potential, that Echo is more than a voice-enabled device for buying Amazon products. It is a computing platform which can be extended through ‘skills’. Skills are voice commands that enhance functionality like summoning an Uber or ordering pizza. These skills are are built by third party application developers, like apps on the mobile devices.
As the Echo evolves with better natural language processing and sensory capabilities, it will provide exponentially more services. It will be able to see, hear and detect motion either itself or by interfacing with external sensors. It will be able distinguish me from my family members through voice recognition. If a camera was present, it would recognize me visually and identify my gestures to draw additional context around my speech.
An omni-directional motion sensor would enable the lights to be turned on or off automatically when I enter or leave the room. The camera, microphone and motion sensors could be used to enhance in-home security, generating notifications when detecting a stranger.
Enabling the Echo to use voice recognition along with visual identification to securely identify me will enabling voice commerce. I can securely bank, pay bills, transfer money to family and friends, order food, book tickets and hotels or perform any transaction that was carried over the telephone in the past. All this can be performed without explicitly logging in, my command for the transaction automatically authenticates me.
Expanding the Reach
A device with a natural language interface would make the internet accessible to a group of users who would otherwise not be online due to technological challenges. For example these devices could be used by seniors, visually impaired users and illiterate or semi-literate people. They could use it to communicate with family, friends and doctors, perform secure online commerce and access emergency services. This would also be a perfect device to learn a foreign language.
But for all the promise shown by the Echo, there are some challenges that need to be overcome before we interface with computers through voice.
Echo has been successful as it does a rather small set of things very well, the interface has been deliberately kept simple by being directive based and non-conversational. Every question has to be prefixed by ‘Alexa’ and the Echo does not remember state between questions or context around them. For example, I would like to follow the question “Alexa, where is ‘Star Trek Beyond’ playing?” with “Get me 2 tickets for 6.00 PM tonight”, but this is not currently possible. The Echo is not able to use the information about the movie from the first question in the second question. As another example, if a camera is present, It would enable Echo to determine if I is talking to it using directionality or even sensing if I’m the only person in the room. I could point to a light and say, “turn it on” without having to explicitly explain to Alexa “it” is the main light in the bedroom .
Additionally, unlike a mobile phone, this is a shared device, where user experience has to be personalized. The device has to be able to switch contexts when talking to different members of my family.
Marketplace and Application Discovery
The most useful device is the one with the widest range of applications. Developers are drawn to a large customer base, cool technologies, ease of development and platforms where they can monetize their applications. Currently the Echo is still in the ‘cool gadget’ category without a compelling application.
A marketplace (a store) has to provide an efficient way to discover new skills and services. A store may have to be built that interacts with a user through a voice interface, building one without visual context is an interesting, challenging problem.
Privacy and Ethics
As a user, I’m concerned that the Echo can hear, see and store my information that may be used for later contextual reference. It is a challenge to store and use this data without compromising my privacy.
I am looking forward to the day when the Echo has evolved past these challenges. On that day, I will be watching my favorite episode of Star Trek when an incoming phone call from my accountant is redirected by Echo to voicemail, alerting me only when I finish watching the episode.