Developed at the MILA lab of the University of Montréal, Lyrebird‘s product is a voice imitation algorithm that relies on deep learning models to replicate anyone’s voice simply by listening to one minute of sample audio. Through this API (application program interface), users will be able to generate complete dialogs with a voice they have chosen. They can also start from scratch — design new and distinct voices customized for their specific needs.
The technology is somewhat similar in concept to Adobe’s Project VoCo (Voice Conversion) which was introduced last year, but with clear differences in at least two aspects.
First, VoCo requires the use of system resources to produce digital voices. In contrast, Lyrebird’s API relies on cloud resources, with their GPU clusters having the ability to ‘generate 1,000 sentences in less than half a second.’
Second, VoCo needs to listen to at least 20 minutes of audio to synthesize speech. Lyrebird, on the other hand, only requires a single minute of original audio. In that short span of time, Lyrebird’s algorithm can analyze your voice, figure out what makes it unique, then copy it.
Lyrebird also claims that its API doesn’t just mimic your voice. It can also infuse emotion into the speech it creates so that your cloned voice sounds more natural — like that of a real human who feels angry, stressed out, sympathetic and other kinds of emotions.
The resulting speech can be used across a wide range of applications. As stated in their press release: ‘it can be used for personal assistants, for reading of audio books with famous voices, for connected devices of any kind, for speech synthesis for people with disabilities, for animation movies or for video game studios.’
On the downside, the technology opens up a can of worms, so to speak, because of what people with malicious intent can do with it. Just imagine, if your voice has been ‘cloned’ by this software, the one controlling it can virtually make ‘you’ say anything — it can be as harmless as expressing romantic messages to someone, or as dangerous as impersonating you to set up a meeting and initiate some kind of crime like a home invasion or a kidnapping. The latter may be an extreme scenario, and of course, there are other kinds of incidents that can happen in the middle ground. But you get the point. So many things can happen from having the capability to copy someone else’s voice.
Notwithstanding the range of crimes and felonies that can be perpetrated from voice cloning, Lyrebird acknowledges that there are other ‘important societal issues‘ associated with the technology, and they want everyone else to acknowledge it as well.
Specifically, Lyrebird wants people to realize that with the existence of technology like theirs, time will come (if it hasn’t already) when voice recordings can no longer be considered as valid evidence because such can be easily manipulated. Which is why their solution to this kind of risk is simply to make their technology available to everybody. Will this strategy be effective? We’ll just have to wait and see.
In the meantime, if you want to listen to samples of what this voice imitation algorithm can do, just head to the demo page of their website.