You’d believe something you see and hear, right?
I mean your eyes and ears won’t betray you now, would they?
We rely on our senses to guide us through life, but with the advent of technology what we see and hear may be deceptive.
More and more software and applications allow you to alter someone’s voice or video.
Currently, people are using it for laughs and gags such as below where someone used the “deepfake” technology to place President Trump’s face onto comedian Jimmy Fallon in a clip where he was impersonating the U.S. president.
But seeing how it’s almost impossible to tell that the right side is fake, it could have serious implications in the future.
Jordan Peele in a video gave a PSA about how AI could alter videos and make someone believe something false.
The video starts with former U.S. President Obama saying some shocking things. We then see Jordan appearing in the video to tell us how videos and images can be manipulated to make you believe a certain thing and how we shouldn’t believe everything we see on the internet.
This could create controversies in the future if videos of political leaders saying controversial things were released or that or a public figure doing objectionable things.
People could be framed for crimes they didn’t commit while the perpetrator would roam free.
To save us from a similar situation like this in a call, Samsung Electronics Co., Ltd. proposed a solution for this in their recent patent 20200228648A1
So, suppose you wish to run an analysis on a call, then, their system will pull a file of the person on call. This file will include data for reference such as their voice module and video. In case the system does not have a file on that person, the system will try to pull it from the cloud.
Then the pre-trained multi-stage neural network detection model will analyze the call. It will compare the data and voice/video will detect if there’s any forgery occurring.
It takes the help of multiple face classification detection models, voiceprint classification detection models, limb movement classification detection models, and even a lip language classification detection model.
All this acquired data is sent to its corresponding feature data input and is detected using a fully connected convolutional network, and a training parameter of the two-stage neural network detection model of the call to check for abnormalities.
For the most accurate results, they take a few extra things into consideration, such as matching the coughing and sneezing sounds/actions AND making the terminal device collect image data without any beauty filters.
In case some discrepancy is detected by the system, it will trigger an abnormality alarm. This alarm will display information indicating that the identification of the call object is abnormal.
It will then notify a “real person” on the call and notify that someone is impersonating the user on the other end of the call.
The system will also notify the cloud server to mark the number being used in the call as the source of the forgery.
Hopefully in the future with methods like this, it will be easier to detect fake calls.