Shostack + Friends Blog

 

LLM Threat Modeling Is Fun

Exploring the fun in LLM threat modeling, and how it’s both an interface choice and a possibly ‘dark pattern’ a photograph of a set of boston dynamics robot cheerleaders, encouraging people trying to do really hard work in an office

In my post on our Threat Modeling Intensive with AI, I wrote:

“We saw that people enjoyed the experience of using LLMs to threat model. That’s important, and worth teasing apart because people like doing fun things. (That’s both obvious, and obviously not obvious to the makers of most enterprise software.)”

Other people talk about the fun, but I want to talk about why using LLMs feels fun, how that fun is a result of either good interface design or dark patterns (or both), and how that fun relates to critical thinking and assessing our work.

Whatever else I’m going to say, fun is good interface design. Fun is a great tool. I spend energy on games in security because fun helps us get through to people. The chatbot is told to act as a cheerleader and push you to keep going. It offers ideas for next steps. (Many security people could learn from this framing.)

At work, we use LLMs to do things we have to do. Ideally, using LLMs — to threat model or anything else — lets us uplevel our thinking. Finding the right level of abstraction for a task can enable people to perform well, by balancing between challenge and ability. This is a basic precept of Csíkszentmihályi’s flow theory. So we can ask “does the fun help?”

Writing your own threat model, code, marketing copy, etc, can feel like toil, or work in the weeds. For example, I hate spending time figuring out API conventions...oh, this expects me to single quote!)

Part of a chatbot’s appeal is the dopamine rush: we feel like we’re being productive... in ways that don’t always relate to productivity. But sometimes they do. Sometimes what we’re doing is upleveling the work, and directing rather than doing.

  One of the elements we saw in delivering a course on using LLMs to threat model was ... no one wanted to stop producing and start evaluating.

One consequence of that feeling is that people jump into using LLMs for some task, and spend way more time tuning the prompt than they would to do the work themselves. That happens for both one-off prompts (a friend used it to draft a law school application) and those that you hope to reuse. For example, we use LLMs to analyze meeting transcripts in our Accelerator service, and we hope to re-use those prompts to be more efficient.

That chews up tokens pretty quickly. Of course, the slot machine has a conveniently located credit card reader. Less cynically, is that manipulative or making it easy to experiment? It’s easy to imagine something which shows tokens used and available in the chatbot UI. Would that serve the customer, or distract them? (In fact, the LLM editor tool I’m using Wordsmith.page, just told me I have 3 of 5 edit passes left.)

It can be hard to step back from the feeling of productivity, or the sunk time and switch to an evaluation mode. One of the elements we saw in delivering a course on using LLMs to threat model was ... no one wanted to stop producing and start evaluating. In using LLMs to summarize, it’s easier to ask another question than to step back and realize things for yourself.

All of this ... the excitement, some of the manipulation, and the challenges to critical thinking are parts of a disruption. It’s easy to be a cheerleader, it’s easy to be a cynic, and it’s important to find the ways in which we can make use of the new while managing downside.


Image by Midjourney, “: a photograph of a set of boston dynamics robot cheerleaders, encouraging people trying to do really hard work in an office --ar 8:3”