Audio Deepfake

Audio Deepfake
Photo by Will Francis / Unsplash

Audio deepfake, also called voice cloning is a type of AI technology which is used to create audio files that sound like a particular person. The term ‘deepfake’ comes from the underlying technology ‘deep learning’ which is a form of AI. 

Understanding the Technology Behind Deepfake Voices
Image source : Google

In recent times, audio deepfake also known as audio manipulation has become widely accessible and thus poses a cybersecurity threat as it can lead to misinformation in various social media. Deepfake technology has the potential to be used for malicious purposes. People can use deepfake as a voice spoofing technique which can manipulate public opinion for propaganda, defamation, or terrorism.

Applications of Deepfake-

Deepfake technology has a number of positive use cases for both individuals and businesses.

Audio Deepfake technology can help improve the lives of people who suffer from diseases that lead to speech disorders, such as Parkinson's disease, cancer, and multiple sclerosis which take away the ability to talk and communicate. Deepfake can transform a patient’s sound to sound more natural and accurate. Veritone Voice is a leading deepfake voice generator that provides tailored solutions for individuals based on their specific needs.

From a business perspective, it has created a number of opportunities. 

  • Films and TV Industry: Deepfake audio can be used to dub the voice of an actor’s voice even if he/she is unavailable or has passed away. 
  • Animation Industry: Deepfake technology allows animators to give unique and distinct tones to their characters, regardless of their vocal range or languages. 
  • Game Development Industry: Deep fakes can create characters with realistic and dynamic dialogues that match their personalities.
  • Cross-Language Localization: Deepfake voice technology helps people to speak various languages in their own voice, expanding new possibilities for this application in various industries. Dubbing process in multiple languages have also been simpler with the use of deepfake.

How to create deepfake?

The recent mechanisms used to create a deep fake by deploying deep learning models are autoencoders and generative adversarial networks (GANs) .

Autoencoders consist of two parts: an encoder and a decoder. The encoder takes an input data, and transforms it into a lower-dimensional representation, called a latent code or a bottleneck. The decoder then tries to reconstruct the original input data using the latent code as closely as possible. The autoencoder tries to minimize the reconstruction error, which is the difference between the input and the output.

Architecture of autoencoders

Generative Adversarial Networks (GANs) are a class of neural networks that are used for unsupervised learning. It is an approach to generative modeling that generates a new set of data based on training data. It can automatically discover and learn the regularities or patterns in input data in such a way that the model can be used to generate new output that could have been drawn from the original dataset. 

GANs have two neural networks which compete with each other and are able to capture, copy, and analyze the variations in a dataset. The two models are usually called Generator and Discriminator.

The generator network takes random input and generates samples, such as images, text, or audio, that resemble the training data it was trained on, to try and fool the discriminator. The discriminator network tries to distinguish between real and generated samples. It is trained with real samples from the training data and generated samples from the generator.  

Generative Adversarial Network Architecture and its Components

The potential misuse of deepfakes for spreading disinformation or manipulating public opinion has raised discussions on the ethical use of this technology. 

  • Misinformation and Deception: Deepfakes can be used to spread misinformation or impersonate someone else, leading to damage to the reputation and public trust in individuals or organizations.
  • Fraud: Criminals can use deepfake audio to impersonate bank officials or government officials to extract sensitive information and commit fraud.
  • Privacy and Consent: Deepfakes can be created from a small audio recording, which can be obtained without consent or knowledge, violating privacy rights.
  • Manipulation: Deepfakes can be used to create fake recordings of socially marginalized communities, promoting harmful stereotypes and discrimination. It also enables the creation of fake news or propaganda to influence public opinion.
  • Lack of Regulation: Laws need to be implemented to regulate the use of deep fakes , as currently there are no laws regarding this concern.

References-

https://en.wikipedia.org/wiki/Audio_deepfake

https://www.geeksforgeeks.org/generative-adversarial-network-gan/

https://www.analyticsvidhya.com/blog/2021/10/an-end-to-end-introduction-to-generative-adversarial-networksgans/

https://murf.ai/resources/deepfake-voices/

https://www.linkedin.com/advice/1/how-can-autoencoders-help-anomaly-detection#:~:text=Autoencoders%20are%20 unsupervised%20models%2C%20 which,are%20trained%20on%20labeled%20data.

By Hamsini Ramesh