The predictive attacks also are hard for an ASR system to outsmart, says Chiquier. When the attack plays along with the words predicted by the algorithm, the combined sound waves turn into an acoustic mishmash that confuses any ASR system within earshot. And it keeps changing with each sound someone speaks. That attack amounts to the sound that the system plays alongside the speaker’s words. These data help the algorithm learn and calculate what the team calls a predictive attack. That prediction is based on the characteristics of your voice and your language patterns. “Based on that speech, it anticipates the sounds you might make in the future.” And not just sometime in the future, but half a second later. “Our system listens to the last two seconds of your speech,” explains Chiquier. The new algorithm works in much the same way. It also gets used to what types of messages you send and the words you use. If you text a lot, your smartphone will start to anticipate what the next few letters or word in a message will be. Step one in creating great voice camo: Get to know the speaker. Then it quietly broadcasts sounds chosen to confuse the smart speaker’s interpretation of those words.Ĭhiquier described it on April 25 at the virtual International Conference for Learning Representations. To work, the system predicts the sounds that someone will say a short time in the future. The trick to making them effective, she says, is having these so-called “attack” sound waves fit in with what someone says. Chiquier likens them to the sound of a small air conditioner in the background. The volume of the masking sounds is not what’s key. She and her colleagues describe their new system as “voice camouflage.” It “completely confuses this transcribing system,” Chiquier says. Those added waves jumble a sound signal to make it hard for the ASR to pick out the sounds of your speech. The new program fools the ASR by playing sound waves that vary with your speech. She studies computer science at Columbia University in New York City. The smart devices use automated speech-recognition - or ASR - to translate sound waves into text, explains Mia Chiquier. She works in a computer science research lab run by Carl Vondrick. Mia Chiquier is a graduate student at Columbia University.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |