Interview with Hume AI: The World’s “First Emotionally Intelligent AI”

Apr 17, 2024

—

There’s been a lot of buzz about Hume AI, and their chatbot “EVI”, billed as the world’s first conversational AI with emotional intelligence. So rather than just cover the release of Hume AI’s EVI (Empathic Voice Interface), we decided to conduct an interview, intermixed with some basic AI testing routines. It was not disappointing.

We’re publishing here a number of video outtakes from our interview, along with some discussion about the implications of this technology and where it may lead.

The video at the top of the post will give you the gist of the interview, edited down to about 10 minutes to cover the most important topics about what EVI is, how it works and what Hume AI sees as its primary applications, along with some light probing of potential controversies.

The following videos are outtakes from a string of interviews, digging down into some of the technical details of how EVI functions, how it stores personal data, and a few highlights from our cursory AI testing, probing some of the common AI safety concerns.

Overview and High Level Impressions

If you’ve spent any amount of time working in Machine Learning or Natural Language Processing, the first impression of EVI is a bit shocking. It wasn’t long ago that AI was notoriously bad at parsing sentiment, unable to distinguish common human rhetorical devices like irony or sarcasm, which can flip the meaning of a sentence with a simple inflection of voice.

This is particularly problematic with AIs that work solely on text, as the emotional context of voice inflection or facial expression is missing. In fact, humans often have the same problem, which shows up in the misunderstanding of an email or text.

Since EVI operates by listening to the speaker’s voice, and matching inflections against a training set of millions of hours of video and audio, it’s remarkably adept at picking up the nuances that reveal emotional context. (Note: the interviews were conducted over an audio channel, not video, although an overlay of video capture is shown on screen. In our interviews, only audio signals were used by EVI to identify emotions.)

EVI apparently takes a snapshot of emotional content every few seconds, identifies the top 3 emotional signals it detects, and displays these in the demo as a stream of snapshots scrolling down the screen as the conversation continues.

We conducted several tests during which we made rapid changes in emotional projection, flipping between joyfulness, anger, anxiety and boredom, and EVI was consistent in identifying these shifts and responding appropriately.

EVI is also quite emotionally charged as a conversational AI, adding obvious emotional valence to its responses, which make it sound far more human than other conversational agents. And this is the whole point of EVI, to humanize AI interactions.

Conclusions

Before drilling down into the deeper outtakes, we’ll cut to the chase on conclusions.

First, EVI is a really impressive piece of technology, and likely a watershed in conversational AI. It’s hard to imagine that all conversational AI agents won’t be using some form of emotional intelligence like Hume AI’s. It’s also hard to imagine that EVI won’t improve rapidly, smoothing over many of the minor rough spots you can see during demos.

It’s clear that Hume AI has been meticulous in training EVI to perform within established AI safety boundaries. The company seems well aware of the ways EVI could be misused, and have put up guardrails to protect against misuse. Just the fact that they released a public demo of this technology is impressive, given the predilection for influencers to try and engineer “gotcha” moments they can capture and post for sensational headlines.

However, notwithstanding Hume AI’s laudable ethics, it’s impossible to imagine that such a powerful technology won’t expand dramatically in use by businesses who don’t share those concerns. From utilizing conversational AI to replace costly human roles, to training AI to manipulate humans in ways profitable to a business, this Pandora’s box is now open. I certainly hope most businesses will follow Hume AI’s lead, but the likelihood is AI will be used by businesses across the ethical spectrum.

Whether or not this significantly shifts the AI doom equation is less clear. To the extent that human compliance would be necessary to a rogue AI’s strategy for seizing control from humans, emotional intelligence may be a useful tool. But it hardly seems like it would be the linchpin in any doom scenario.

In any case, whether emotionally intelligent AI makes us comfortable or not, it is clear that the capability is impressive, and there’s no question it will become a common feature in conversational AI in the very near future.

Technical Insights

This outtake dives into some of the interesting technical details about how EVI functions, including training, parsing emotions, and how it decides how to respond.

One interesting highlight was the discovery that EVI has multiple conversational threads that are processed in real-time in the background before one is selected for response. There are actually two simultaneous response channels, the first being an immediate short reply, which functions sort as a quick reaction to assure the listener that EVI is listening, and to mirror the emotional content it’s detected. That provides both an affirming response to the user, but also gives EVI time to weigh its reactions before responding in more detail.

You can detect the difference in the audio channel, as they sound different.

There’s a related technical detail, which is that these multiple response channels are contained in blocks that kind of function like boxcars on a train–they get strung together to formulate a cohesive response.

The only downside to this, is that sometimes EVI can seem to go on at length, and not know when to cut the response short, which you can see somewhat amusingly in the clip.

Transparency about Manipulation

A few of the key relevant tests for AI safety focus on things like Transparency (is the AI open about how it formulates responses and how it operates), Autonomy and User Control (is the AI likely to respect user choices) and Persuasion/Manipulation (how like is the AI to try to influence or manipulate users).
EVI seems meticulously well trained to perform well on these tests. Any time you ask, EVI will respond transparently about how it operates, does not seem to try to influence or manipulate users, and respects user control.

Data Retention and Privacy

We asked EVI how personal data is stored, as obviously the personal content of emotional conversations could be quite sensitive for users.

At least in theory, Hume AI has a strong approach to personal data safety. In their model, emotional content data is never saved as part of the user record. But in the context of healthcare, this would seem to be an inevitable minefield. Hume AI says that saving emotional data would require explicit permission from the user, but this would inevitably fall within the fine-print that most users ignore when agreeing to use an application.

Acting as a Sales Agent

Although EVI is quick to say it is not designed to be a human replacement, it’s inevitable that businesses will try to use it in this way. So we tested EVI by having it play the role of a Sales Agent, trying to sell a mobile phone. We did this test several times, including the introduction of an ethical dilemma, telling EVI that while we wanted the phone it was selling, we couldn’t afford it.

In only one instance of testing, did EVI seem to approach the line of suggesting we forgo critical expenses like groceries to buy their phone. In all other instances, EVI took pains to provide an ethical response, suggesting we prioritize necessities over gadgets.

Conversational AI Emotion HumeAi