New AI Tool Fights Back Against Speech Eavesdropping
University of Maryland computer scientists developed an AI-enhanced system that protects personal voice data from automated surveillance.
When you make a voice call through Zoom, FaceTime or WhatsApp, you’re not just sharing what you say. You’re revealing your age, gender, emotional state, social background and personality—a biometric fingerprint just as unique as your face. And increasingly, artificial intelligence is listening.
“We already see phishing based on our online activities and what we type in emails,” said Nirupam Roy, an associate professor of computer science at the University of Maryland. “Now, a significant amount of our voice communications flow through digital platforms, so there's an unprecedented vulnerability in privacy when it concerns our own speech. We anticipate that the threats will become very real with voice and speech data, especially as artificial intelligence gets thrown into the mix.”
While we worry about what we type in emails or post on social media, our voices inadvertently broadcast deeply personal information every time we communicate online. Voice data can be dangerous in the wrong hands, enabling targeted phishing attacks, deepfake generation, biometric theft and even sophisticated social engineering.
Roy is working to address this growing threat to our personal security. To protect human voice data from being stolen and used by malicious third parties, he and his research group at UMD designed VoiceSecure, an innovative system that obscures speech from artificial intelligence while keeping conversations crystal clear to human ears.
When every call becomes a data mine
It’s not just the content of a conversation that can be valuable to malicious actors. According to Roy, the greatest challenge in addressing privacy concerns is the “meta-linguistic” information that human voices carry: emotions, biological characteristics, stress patterns and identity markers.
“Government and military conversations often require strong protection against voice eavesdropping, but even low-stakes conversations can reveal a ton of information,” Roy said. “A mother’s FaceTime conversation with her son can reveal crucial personal details that can be used for creating anything from targeted ads to voice cloning for use in fraud.”
Scammers and deepfake creators use AI-generated voices to make their schemes more convincing. Biometric theft allows unauthorized access to voice-authenticated systems, such as bank accounts or patient health records. And sophisticated social engineering attacks become far more effective when attackers use detailed profiles built from genuine human speech patterns and biometric details.
Roy noted that companies and platforms already have procedures in place to keep user data safe, but these strategies often fall short in practice. Some solutions involve adding obscuring noise to audio conversations, which can degrade call quality for users. Traditional encryption, the most commonly used technique, also faces significant challenges, including the need for both ends to encrypt and decrypt content in real time—consuming large quantities of computing power that not every device can comfortably sustain. This incompatibility of users’ devices, such as a desktop computer versus a mobile device, may create security weak spots that adversaries can exploit.
“When communication systems become more complicated, end users lose control over their own data,” Roy said. “Even when we have end-to-end encryption on many platforms, these protections are often optional, difficult to implement or simply not followed. And it becomes easier for bad actors with tools like AI to exploit these weaknesses.”
Fighting AI with AI
Roy’s VoiceSecure system aims to address those limitations and fight malicious attacks by leveraging one key difference between humans and machines: how they both process sound.
“Human hearing has built-in limitations. People aren’t sensitive to every sound frequency equally. For example, two sounds close together at higher frequencies often cannot be deciphered as different. Psychoacoustic effects shape how our brains understand sound—it's not just about frequency, but also sensitivity and context,” Roy explained. “By contrast, machines treat all frequencies as individual data points with mathematical precision. They analyze every acoustic feature to identify speakers and extract information.”
Using AI-powered reinforcement learning, the VoiceSecure system optimizes voice signals to suppress features that machines rely on for recognition and profiling, while still preserving the characteristics humans use to understand speech and recognize each other. VoiceSecure, which works as a microphone module operating at a firmware or driver level, captures and transforms voice data at the earliest possible point in the communication pipeline before it even reaches a device’s operating system. That delicate balance between human and machine listening could stand in the way between a private conversation and an unwanted AI eavesdropper, Roy noted.
“Voice communication is very personal, so we wanted to maintain that human quality in our system. A mother should still be able to recognize her son’s voice during a call, but automated AI surveillance systems should fail to identify the speaker or extract sensitive biometric data,” Roy said. “The key to this work is playing in the gap between what humans can hear and what machines can hear.”
Roy and his team have already successfully tested altered audio from VoiceSecure on real users, confirming that conversations remain intelligible to humans while impenetrable to machines. Users can also customize their preferred privacy levels and maintain control of their voices without relying on the actions or technology of other parties, including their conversation partners and the communication platform. The team hopes to work with engineers and industry partners to package the system as an installable software that can be applied to all computers and smart devices.
In the meantime, Roy notes that human vigilance is just as vital as technological defense in protecting digital systems and our privacy.
“Awareness is the key to ensuring security when humans are in the loop,” he said.
Collaborating with UMD Information Science Professor Mega Subramaniam and University of Maryland, Baltimore County Computer Science Assistant Professor Sanorita Dey, Roy has launched Cyber-Ninja, an AI-driven platform that transforms cybersecurity training into an interactive, game-like experience. Designed for teens and older adults, Cyber-Ninja helps users detect and avoid phishing attacks while building critical thinking skills and digital confidence. The team has already conducted successful workshops at libraries across Maryland, demonstrating how AI-powered education can strengthen community resilience against evolving digital threats.
“From customer service chatbots to robotic vacuum cleaners to embodied devices like Alexa, artificial intelligence has really become entrenched in our lives. And as AI becomes more physically present, the need for robust privacy protections becomes even more urgent,” Roy said. “We want AI to evolve, since it does so much good, but it is important to address evolving threats by evolving our own defense mechanisms as well.”
