A developer has put together a GPT-3 demo that attempts to enhance Apple’s Siri voice assistant and allow far more conversational voice commands. In a video posted to Reddit, developer Mate Marshalko shows the assistant controlling his HomeKit smart home devices and responding to queries in response to relatively vague prompts that today’s voice assistants like Siri would typically struggle to understand.
For example, the AI is shown turning on Marshalko’s lights in response to the voice prompt “Just noticed that I’m recording this video in the dark in the office, can you do something about that?” Later on in the video Marshalko asks the assistant to set his bedroom to a temperature that would “help me sleep better,” and he responds by setting his bedroom thermostat to 19 degrees celsius.
“Just noticed that I’m recording this video in the dark in the office, can you do something about that?”
In an accompanying blog post, Marshalko explains how the demo works. Essentially, he’s using Apple’s Shortcuts app to interface between Siri, GPT-3, and his HomeKit-enabled smart home devices. A voice command to Siri causes Shortcuts to send a lengthy prompt to the AI service requesting a response in a machine-readable format. Once it gets a response, Shortcuts parses it to control smart home devices and/or respond via Siri. The blog post is worth reading in its entirety for a fuller explanation.
It’s an intriguing demo, and shows what voice assistants could be capable of by integrating this new generation of AI technology, but it’s not perfect. For starters, Marshalko says that each command costs $0.014 per API request sent to GPT-3. The video is also edited, so it’s unclear if the demo works 100 percent of the time or if responses are being shown selectively. A more intelligent backend also won’t do anything to help if the voice recognition software can’t understand what words a user is saying in the first place.
The demo raises larger questions about how we want voice assistants to work. Do you really want to have a conversation with a voice assistant, or do you just want to be able to bark short and simple commands to control it like a robot? Marshalko himself also raises the risk of the assistant saying “unexpected things” in a follow up Reddit comment:
Nevertheless, the demo still offers fascinating hints of what sort of voice assistant interactions might soon be possible if the likes of Apple, Amazon, and Google integrate this technology into their services. And it’s impressive that these tools are open and approachable enough that a relatively slick demo like this can be hacked together using widely available consumer software.