The GAZEploit attack consists of two parts, says Zhan, one of the lead researchers. First, the researchers created a way to identify when someone wearing the Vision Pro is typing by analyzing the 3D avatar they are sharing. For this, they trained a recurrent neural network, a type of deep learning model, with recordings of 30 people’s avatars while they completed a variety of typing tasks.
When someone is typing using the Vision Pro, their gaze fixates on the key they are likely to press, the researchers say, before quickly moving to the next key. “When we are typing our gaze will show some regular patterns,” Zhan says.
Wang says these patterns are more common during typing than if someone is browsing a website or watching a video while wearing the headset. “During tasks like gaze typing, the frequency of your eye blinking decreases because you are more focused,” Wang says. In short: Looking at a QWERTY keyboard and moving between the letters is a pretty distinct behavior.
The second part of the research, Zhan explains, uses geometric calculations to work out where someone has positioned the keyboard and the size they’ve made it. “The only requirement is that as long as we get enough gaze information that can accurately recover the keyboard, then all following keystrokes can be detected.”
Combining these two elements, they were able to predict the keys someone was likely to be typing. In a series of lab tests, they didn’t have any knowledge of the victim’s typing habits, speed, or know where the keyboard was placed. However, the researchers could predict the correct letters typed, in a maximum of five guesses, with 92.1 percent accuracy in messages, 77 percent of the time for passwords, 73 percent of the time for PINs, and 86.1 percent of occasions for emails, URLs, and webpages. (On the first guess, the letters would be right between 35 and 59 percent of the time, depending on what kind of information they were trying to work out.) Duplicate letters and typos add extra challenges.
“It’s very powerful to know where someone is looking,” says Alexandra Papoutsaki, an associate professor of computer science at Pomona College who has studied eye tracking for years and reviewed the GAZEploit research for WIRED.
Papoutsaki says the work stands out as it only relies on the video feed of someone’s Persona, making it a more “realistic” space for an attack to happen when compared to a hacker getting hands-on with someone’s headset and trying to access eye tracking data. “The fact that now someone, just by streaming their Persona, could expose potentially what they’re doing is where the vulnerability becomes a lot more critical,” Papoutsaki says.
While the attack was created in lab settings and hasn’t been used against anyone using Personas in the real world, the researchers say there are ways hackers could have abused the data leakage. They say, theoretically at least, a criminal could share a file with a victim during a Zoom call, resulting in them logging into, say, a Google or Microsoft account. The attacker could then record the Persona while their target logs in and use the attack method to recover their password and access their account.
Quick Fixes
The GAZEploit researchers reported their findings to Apple in April and subsequently sent the company their proof-of-concept code so the attack could be replicated. Apple fixed the flaw in a Vision Pro software update at the end of July, which stops the sharing of a Persona if someone is using the virtual keyboard.
An Apple spokesperson confirmed the company fixed the vulnerability, saying it was addressed in VisionOS 1.3. The company’s software update notes do not mention the fix, but it is detailed in the company’s security-specific note. The researchers say Apple assigned CVE-2024-40865 for the vulnerability and recommend people download the latest software updates.