Lessons from Threat Modeling Intensive With AI

At OWASP, we delivered the first version of our new “Threat Modeling Intensive with AI” course. This is part of how we’re exploring LLMs and their impact on our work, and so I want to start with lessons from that delivery:
- First and foremost, we learned that LLMs are unreliable when you vibe threat model. The tasks are too broad for asking ChatGPT for a threat model. One of our students fed the same exact prompt to ChatGPT three times, and got 20 threats on one run, and 44 on another.
- We saw that no one wants to review the output. In some cases, people were getting upwards of 50 pages out of the chatbots. Some folks learned to tune the output with prompt elements like “be concise,” “provide a bullet list” or even “few-shot prompting” with examples of the sort of output they wanted. (They also got a lot fewer imaginary risk tables!) Mike Ensign even built and shared rulesets as part of the course (and follow-up afterwards!)
- When people did review the output, one of the most fascinating things we heard was “Before this course, I would have accepted this as a good threat model!”
- We saw that people enjoyed the experience of using LLMs to threat model. That’s important, and worth teasing apart because people like doing fun things. (That’s both obvious, and obviously not obvious to the makers of most enterprise software.) That’s probably a future post.
Our experimentation isn’t limited to that course, we’re constantly experimenting to see if we can improve customer experiences by using LLMs. Our experiments lead us to be solidly in the “measure outcomes, not use” camp. What we find, consistently, includes:
- Rewriting tricky emails? Awesome.
- Models vary dramatically in their ability to accomplish simple tasks like “list action items from a meeting.” Yesterday, one of them built four separate bullets for a single “make an introduction” task.
- Accidental focus lock continues to be a problem. Look closely at what happens after “Jira” in meeting minutes.
- The bigger the task, the less reliable the results.
This lines up pretty closely with what we see in the training results.
”Before this course, I would have accepted this as a good threat model!”
In that spirit of experimentation, we’ve built and delivered Threat Modeling Intensive with AI, a three-day course focused on how to threat model and how to leverage LLMs as you get more proficient and as you scale. We’ve built Threat Modeling AI Systems and are delivering it next month. That’s a two-day course for people who know how to threat model and want to dive deep into the specifics of how to apply that knowledge to AI systems. And we’re working on a “complete AI” edition for Blackhat, with baseline threat modeling skills from our Intensive, buttressed with using LLMs to threat model, and threat modeling LLM systems.
All of these new courses are part of how we’re learning and sharing our journey in these disruptive times.