If you’re a voice actor or voice talent and you’re in various voice over groups on Facebook, you might have run into posts mentioning Speechelo, a company that offers text-to-speech (i.e. A.I. voices) at a low price. The idea is that you can enter the text you want voiced, and tada! You’ll have a great, human-sounding voiceover you’ll be able to use in your marketing videos, on the spot. Say goodbye to voice actors, expensive recordings, or delivery delays.
This is the company’s elevator pitch (taken from their website): “Instantly Transform Any Text Into A 100% Human-Sounding Voice Over with only 3 clicks! We GUARANTEE no one will tell your voiceover is A.I. generated.”
I was curious to hear what the fuss was all about, and honestly their supposed voice-generated Vimeo demo looked and sounded pretty good. So, I clicked on the link in the latest Facebook post that mentioned the company and bought the software on sale for $37.
Before I give you my review, I should admit that my confidence in the company sank rather quickly. In the barely-over-24-hours since I bought the product, I received seven emails beyond my subscription receipt and login emails. I’m sure there are a few others piling up as I write this blog entry.
Meet the Spammy Company
First, they were selling me upgrades (ok, acceptable). It turns out the version I bought only allows users to create voice overs with scripts of less than 700 words.
Subsequent emails seemed to be promoting completely unrelated services (something to do with making money and an “apartament” -yes, the demo video has that typo on it). Upon closer inspection, it seems the video and emails are selling the idea of creating your own online video marketing agency…presumably using A.I. voices from Speechelo. Ok, that would be somewhat related, but I think you’ll agree their marketing needs some serious (serious) work.
Another email stated that Speechelo was wildly popular because they received 400 support tickets, which, if you ask me, really isn’t a selling point. The email suggests that 400 people couldn’t make the platform work for them in some way. (Or perhaps they wanted their money back?)
Overall, the company feels spammy.
The A.I. Voices
While the site says that the base version of the software has thirty A.I. voices, that includes all the languages available. When it comes down to it, there are only four English US voices in the base software and another four if you buy the Pro version. For the US (which would include Canada), there are two male voices, one female and one male child (though the child is definitely an older female). One thing all the voices have in common is that they all sound pleasant but very fuzzy (forgive the analogy but it’s like they have peach hairs on them).
As a native French Canadian, I also wanted to hear their FR CA options, but it turns out they only have one A.I. voice available and it’s only accessible with the Pro version of the software. I was still able to hear a very short preview of the voice and it was unfortunately extremely choppy (not nearly as good as most of the US voices).
Testing the Platform
Before looking around, I looked at the tutorial video. It turns out that to get the best results you must enter each sentence of your script one at a time (time consuming to say the least). The tutorial then suggests “merging” individual voice-generated sentences together to create one long VO file.
Upon using the platform in this way, I noticed that text changes became incredibly cumbersome. That said, you could prepare your text in advance, and/or download your voice-generated sentences individually. Either way, you can’t number your voice-generated sentences, so figuring out the order (whether on the platform or after downloading) isn’t straightforward.
If you choose the “merge” option and you then want to change your script, you’ll have to create a new voice-generated sentence (which won’t be in the right order before you merge) and delete the old one. You can, however, re-order your individual voice-generated sentence before merging them to create your final VO, but since you can’t number your sentences, it’s easy to make a mistake and put them in the wrong order…So we’re building puzzles.
Previewing A.I. Voices
As I was inputting sentences into the program, I couldn’t listen to full voice-generated previews of my text. Instead, the voice would only play small portions of my sentences. This is a bit pointless, since adding punctuation is apparently instrumental in getting a voice over that sounds natural. Their system forces you to generate the VO, listen back, delete if needed, create a new one by copy/pasting the old text, fix the punctuation, generate again, listen back, and so on.
Let’s hope (for their sake) they come up with a better system.
A.I. Voice Modes
Each A.I. voice could be used in three modes
Aside from not really being able to tell the difference, they seemed impossible to change once the VO was generated, so versatility is an issue.
How Can I Get my A.I. Voice Over?
Once I created my voice-generated sentences and merged them into one longer VO file, I had to wait until a download button appeared. This took a while, during which I had no idea if I’d ever be able to get my voice over off of the platform (I really didn’t want to have to add to their support ticket count). I turned my attention to a few Facebook messages and when I eventually I looked back at the page, I noticed a download button (sigh of relief).
This is the A.I. voice over I created with the program. I think I managed to stay on the Friendly mode, but perhaps it got switched to Normal or Serious on one or two sentences towards the end.
The Verdict: Human vs A.I. Voices
I don’t know about you, but the robotic and monotone A.I. voice makes it difficult to understand the information in the VO. I personally disconnect when I listen. This is precisely why announcer reads are no longer popular; repetitive musicality in speech makes it difficult to retain information. Conversational reads are not only more engaging, they’re just plain easier to understand (and retain).
While I’ll admit the voices are huge step up in the world of synthetic voices and text-to-speech, we’re not quite there yet.
Who Will Use These A.I. Voices?
I personally don’t know what kind of company would find this type of A.I. VO acceptable. Perhaps a foreign one who needs a short text translated? A conspiracy theory YouTube channel? A company that simply doesn’t care about voice quality? As for those, I can’t name that many that are based in North America. But sure, they probably exist.
All I know is that in today’s competitive climate, having a crappy voice over just isn’t the norm. If it were, a lot of people who aren’t able to sound conversational and real would be working a ton.
Artistic Limitations of A.I. Voices
Voice actors are more and more conversational and versatile and it just doesn’t make sense to hire a voice that can only do one thing: be monotone.
If you still fear for your job, the most important element to keep in mind is that these voices can’t be directed. What you hear is what you get. You can change punctuation, add pauses, breaths, etc., but you’ll never truly be able to get a completely different read.
If it’s too good to be true, it usually is. I’m not convinced their initial demo was fully generated by the technology I tested today.
So, as voice actors, I don’t think we’re about to lose our jobs yet. For those who are nervous, keep on top of your acting training as these A.I. voices definitely aren’t going to replace great actors.