I don’t like the ending tone in some of the words as I don’t use that tone; it seems a bit annoying; also I noticed (may or may not be apparent to you from this quick sample), but the AI model seems to have a more British tonality than American for certain words and expressions. At any rate, by using just 25 little clips of me saying completely different phrases I was able to train the model within 5 minutes and have a custom model ready to take text inputs.
Here’s an example of a conversation with another bot deepfaking my voice:
Then I figured instead of me trying to type in the transcript, why not have my other app (using Deepgram’s Whisper API) generate transcript? So, below is the output from that app also using AI (my article and source code to that app: http://flyingsalmon.net/quick-transcription-using-ai-whisper/ )
Generating transcript from local audio file…
Seconds to execute: 1.532
Hi, there, it’s me. Well, not really, but kind of. I’m not really speaking here. Rather, I train the bot with my voice and he’s speaking my text input stimulating my voice, accent, and tone from 25 audio clips. Are you serious? Yep, I used this tool called Resembl. It’s a generative voice AI toolkit. Your site is ressembl dot ai.,
** Overall accuracy ** : 99.400%
*** Diarized transcript ***:
[Speaker:0] Hi, there, it’s me. Well, not really, but kind of. I’m not really speaking here.
[Speaker:0] Rather, I train the bot with my voice and he’s speaking my text input stimulating my voice, accent, and tone from 25 audio clips. Are you serious?
[Speaker:0] Yep, I used this tool called Resembl.
[Speaker:0] It’s a generative voice AI toolkit. Their site is ressemble dot ai.
Count of words detected: 61 | Unique speakers detected: 1
(yes, it didn’t get the 2nd speaker detected correctly although it got her text right)
If you’re interested in related articles that got me to look into voice simulation, check out the articles below: