Can You Hear Me Now?
Ed Felten reports on a new technique to turn go from a recording of typing to the sequence of keystrokes:
Li Zhuang, Feng Zhou, and Doug Tygar have an interesting new paper showing that if you have an audio recording of somebody typing on an ordinary computer keyboard for fifteen minutes or so, you can figure out everything they typed. The idea is that different keys tend to make slightly different sounds, and although you don’t know in advance which keys make which sounds, you can use machine learning to figure that out, assuming that the person is mostly typing English text. (Presumably it would work for other languages too.)
Ed’s writeup is really good, so I don’t have a lot to add. The paper is “Keyboard Acoustic Emanations Revisited,” L. Zhuang, F. Zhou, and J. D. Tygar. The authors have a website [http://keyboard-emanations.org/], which “will shortly be supplemented with raw versions of our experimental data and setup.”
The paper starts to answer the noise question:
We have shown that the recognition rate is lower in noisy environments. Attacks will be less successful when, say, the user is playing music while typing. However, there is research in the signal processing area that separates voice from other sound in the same channel. For example, sophisticated Karaoke systems can separate voice and music. These techniques may also apply here.
I’m speechless…pun intended;-)
Isn’t this wholly dependent on the concept of predictive key usage? If the user were typing english prose, then the sounds and key frequency would be udnerstood. But if the person is coding, or typing passwords, or numbers, or whatever, then audio recordings wouldn’t help, since there’s no cross-reference. Doesn’t matter if they hit ‘5’ more often than the ‘3’, there’s no reason the ‘sniffer’ would know that.
Spooks’ corner: listening to typing, Spycatcher, and talking to Tolkachev
A team of UCB researchers have coupled the sound of typing to various artificial intelligence learning techniques and recovered the text that was being typed. This recalls to mind Peter Wright’s work. Poking around the net, I found that Shamir…
I believe the idea is that most of what the user will type will be prose, and the algorithm trains on that. Then when he types a password once in a while, you know what keys make what sounds. [Note that the algorithm doesn’t need to know in advance what sections are prose and what are passwords.]