Deepfakes, synthetic media created by advanced artificial intelligence technologies, have escalated into a major societal issue. These digital creations are so convincing that they blur the boundaries between reality and fabrication, making it challenging to discern whether a politician truly made a controversial statement or was misrepresented through technological manipulation.
Professor Visar Berisha from Arizona State University, an expert in electrical engineering with a dual role at the university’s College of Health Solutions, emphasizes that the reliability of recorded voices, once taken for granted, is now under scrutiny. “With the proliferation of voice cloning technologies, we are transitioning from a default of trust to one of skepticism,” Berisha explains.
In response to the potential damages caused by AI-generated deepfakes, such as tarnishing reputations and undermining confidence in public institutions, the U.S. Federal Trade Commission (FTC) organized the FTC Voice Cloning Challenge, offering $35,000 in prizes for innovative solutions to counteract this threat.
One of the challenge’s victors, a project named OriginStory, introduces a novel approach with a specialized microphone designed to confirm the human origin of spoken words before recording them. This device not only records speech but also embeds a watermark to certify its authenticity. This watermark serves as a verification token from the point of recording to its eventual playback, ensuring the integrity of the communication chain.
The project has strong ties to Arizona State University, utilizing the university’s resources and securing a patent through Skysong Innovations, ASU’s exclusive intellectual property management entity. The development team is led by Berisha, alongside Daniel Bliss and Julie Liss from ASU’s School of Electrical, Computer and Energy Engineering and College of Health Solutions, respectively.
OriginStory leverages existing sensor technology within various electronics to detect biological signals associated with human speech, such as vocal cord vibrations and movements of speech articulators (lips, tongue, nasal cavity). By recording these biosignals concurrently with speech, OriginStory can authenticate the human origin of the voice, preserving privacy as these biosignals are distinct between humans and AI, but not between individuals.
This integration of biosignal verification embeds a unique watermark in the audio file, which guarantees its human origin upon any future retrieval, thus maintaining public trust.
The conception of OriginStory was sparked by a disturbing incident involving a scam where a Phoenix-area mother received a fake ransom call claiming her daughter was kidnapped, using an AI clone of the daughter’s voice. “It was a terrifying moment that really hit close to home,” Berisha reflects.
Julie Liss, a specialist in speech physiology and acoustics, stresses the critical need for protective measures against AI-generated voices, citing global security concerns. The project represents over a decade of collaborative efforts between Liss and Berisha, blending engineering with health sciences to tackle pressing challenges.
With the FTC recognition, the OriginStory team, including collaboration with Drena Kusari, vice president of product at Microsoft, is motivated to further refine and eventually commercialize the technology. This advancement underscores the necessity for innovations that can reliably certify the authenticity of human voices in our increasingly digital world.
Berisha views the team’s success in the FTC challenge as a strong endorsement of their work. “Being recognized by the FTC not only validates our approach but also emphasizes the societal need for technologies that can ensure a voice is genuinely human from its recording to its listening,” he states.
For further insights into how AI technologies are reshaping communication and security, consider exploring this topic on platforms like ASU’s news page, where academic and technological innovations meet real-world applications.