If you’ve been active on social media lately, you might have noticed the viral trend of artists singing in the styles of different musicians. It’s an exciting and captivating phenomenon that’s been taking the internet by storm. However, there’s another intriguing development that’s been making waves in the music industry, and it’s got everyone on their toes – AI-generated vocals. These vocals are so authentic and convincing, that distinguishing between a real singer and their AI-generated counterpart can be challenging. It’s like there’s an impostor among us, and they’re stealing the spotlight!
Case in point: ‘Heart on my Sleeve,’ a track that dropped a few months ago, was credited to a mysterious ‘ghostwriter.’ The vocals on this song sounded eerily similar to the likes of Drake and The Weeknd. The catch? They were entirely AI-generated. The model behind this technological marvel was so proficient, that many listeners believed it was the real deal, forcing Universal Music Group to intervene and remove these songs from all platforms.
But how does this wizardry work, you ask? Well, let’s dive into the rabbit hole and explore the secrets of AI-generated vocals for research purposes only.
First, a word of caution: Attempting to use AI-generated vocals for commercial purposes without proper permission can lead to more trouble than triumph. It’s a legal minefield. (On a fun note, the artist Grimez has generously allowed the public to experiment with her voice, so go wild!)
Now, onto the nitty-gritty. AI voice cloning is the sorcery behind this phenomenon. It’s a process where AI models generate human-like speech from given text or replicate someone’s voice using existing speech recordings. This involves training deep learning models that pick up the intricacies of human speech, including pitch, intonation, and pronunciation.
In the case of voice cloning, the AI model is trained on a vast dataset of a specific individual’s speech recordings. It meticulously analyzes and learns from these recordings to recreate that person’s voice, enabling it to speak any text that closely mimics the original speaker.
Enter VITS, the Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech. This AI model can be used to create uncannily realistic voice clones. VITS consists of two main components: the speaker encoder and the vocoder. The former extracts speaker-specific features from audio recordings, while the latter generates new audio based on these features. The result is high-quality audio recordings that can be nearly indistinguishable from the original.
Now, let’s talk about Singing Voice Conversion (SVC). This fascinating field of research focuses on transforming one person’s singing voice into that of another while preserving the unique musical characteristics. It’s like taking Beyoncé and making her sing like Freddie Mercury while keeping the vibe intact.
Hold onto your hats because we’ve got something even cooler: So-VITS SVC, which stands for Soft Speech Units for Improved Voice Conversion (SVC) Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech (VITS) Singing Voice Conversion. In So-VITS SVC, we combine the powers of SoftVC, VITS, and SVC to brilliantly imitate a person’s singing voice and yield shockingly realistic results.
Here’s how it all comes together:
- The source audio is recorded, capturing pitch and intonations.
- Speech features are extracted using the SoftVC content encoder.
- These features are fed into the VITS vocoder to recreate the audio recording.
- Voila! You have a voice clone that’s practically identical to the original singer.
In conclusion, this journey into the world of AI-generated vocals sheds light on the groundbreaking and slightly bizarre future we’re entering. The fact that audio recordings can be effortlessly manipulated is a concept that may be hard to swallow. While many people are cautious about AI voice cloning, few see its innovation potential. So, what’s the hold-up? It’s time for another exciting project, and the possibilities are limitless!