What is deepfake?
Deepfake is a composite of “deep learning” and “fake”. Deep learning is a subset of artificial intelligence (AI) and machine learning. It uses neural networks to mimic the brain for quickly analysing data and training computer models. Say, it allows the user to swap two people’s faces in a video or manipulate their voice, like Dev Anand looking and sounding like Raj Kapoor.
Simply put, a deepfake video can be made in three steps—extraction, training and creation. We will try to explain through face-swap ping tool FakeApp. But a disclaimer first: this is not a step-by-step tutorial, but an overview of how photorealistic fake videos are made.
It is a highly contentious subject, illegal in many countries, and we are not encouraging people to source software to create hoaxes, such as a video of a politician saying crazy things, or insert people into pornographic films to harass/harm them. But then, such software are becoming abundant, and creators such as FakeApp and DeepFaceLab—or the Chinese ZAO app, recently—give out extensive instructions required to create sophisticated deepfake.
Step 1. Extraction
You want to transpose Dev Anand’s face onto another character, say Raj Kapoor. You need a large dataset of images, hundreds of them, or use FakeApp’s GET DATASET feature that allows you to extract all frames from a video. Clicking on EXTRACT will start the process. Ideally, you need a video of person A and a video of person B. If your original video is called Shree420.mp4, the frames will be extracted in a folder called dataset-video. Inside, there will be another folder called extracted, containing images ready to be used.
Step 2. Training
Use the TRAIN tab. Here you come across Data A (the folder extracted from the background video Shree420) and Data B (faces of the person, Dev Anand, you want to insert into the Data A video). The training process will convert the face of person A (Kapoor) into person B (Anand).
Step 3. Creation
The process of creating a video is similar to the one in GET DATASET. Pressing CREATE will automatically extract all frames from the source video, crop all faces and align them, process each face, merge the faces back into the original frame and store them in a folder, join all frames to create the final video.
How about cloning your voice to sound like Shahrukh Khan in a clip or manipulate the superstar’s speech to make him utter words that we would not expect him to say (not in public at least)? In 2018, Jordan Peele and BuzzFeed teamed up to make a video of Barack Obama in which the American actor ventriloquizes the former President to call Donald Trump “a total and complete dipshit”. It’s a sampler of how dangerously believable deepfake looks and sounds.
Montreal start-up Lyrebird’s software requires just a minute’s worth of audio to generate a completely altered digital voice. In other words, if you feed the software a snippet of audio spoken by someone, it can then say whatever you want—in the same voice.
There is text-to-speech service as well, like Amazon’s Polly (launched in 2016), which can replicate pauses for breathing. Microsoft, Google Duplex and Baidu China have demonstrated human-like speech capability of software they have developed.
Deepfake can write content that mimics the voice and style of a specific person. American news website Axios has demonstrated the capabilities of a text-generating AI, feeding two factual, human-written sentences. The text-generating AI used these sentences to create a compelling, yet false, news article about current world affairs.
Highlight of the text: “China uses new and innovative methods to enable its advanced military technology to proliferate around the world, particularly to countries with which we have strategic partnerships (said the Pentagon)”.