Shostack + Friends Blog

 

Twenty Years of Scaling Threat Modeling

Reflecting on 20 years of work to scale threat modeling A stack of books

In 1999, I wrote my first paper on threat modeling, Breaking Up Is Hard To Do: Modeling Security Threats for Smartcards. Bruce and I talked on the phone a lot, and our analysis methodology was to think carefully about the problems. Threat modeling was something done by smart professionals with lots of experience and time available. At the same time, Loren Kohnfelder and Praerit Garg wrote their paper on the S.T.R.I.D.E. Model of Threats, but it wasn’t mentioned in public until later.

I mention both of these because they’re useful anchor points for understanding change.

Seven years later, on June 5, 2006 I started working for Steve Lipner and Eric Bidstrup at Microsoft. On my first day, Eric said something like “This threat modeling thing isn’t working. Fix it.” I asked something like “what does success look like?” and he said “that’s a good question. Figure it out and convince people you have it right.”

There are days when I feel like I’m still doing that, but more seriously, I want to look back on some of the new things that have emerged in that time. Before I do, I want to set the stage by talking about how Microsoft had transformed software security so radically that no one understood that they’d brought the concept of shift-left to security (but not named it).

Past

In the 1980s and 90s, the focus of software security, through tools like the Rainbow books was evaluation. The key, orange, book was formally titled “DoD Trusted Computer System Evaluation Criteria.” The core idea was “we can’t trust anyone to build it right, and so we’re going to evaluate. The government continued to invest in that evaluation-heavy approach through the 90s. The last book in the series was February 1994’s publication of “Procurement of Trusted Systems: Computer Security Contract Data Requirements List and Data Item Description.” (To be fair, the system also described a methodology to create a highly-trusted “A1” system, by writing formal mathematical proofs as part of development... but real systems never got there.) It’s worth noting that much of the authoritative history of those rules is in The Birth and Death of the Orange Book, by the very same Steve Lipner who was then my boss.

Microsoft had made the decision to push security to developers, with assurance of observable properties.

Microsoft ships a lot of software. The Windows division, at the time, was 6,000 engineers. That means that anything that could be automated should be automated, and anything that can’t be automated had a very high bar to clear, and even things that could be automated had a high bar. For example, we had both PreFix and Prefast static analysis tools. As I recall, Prefix was slow, worked across code, and had a high false positive rate. Prefast was scoped much more tightly, and we had a specific mandatory ruleset. We had other tools to check, for example, compile flags for checked-in code. All of which, by 2006, had shifted left.

In contrast to everything else in the SDL, threat modeling wasn’t a scalable, automation-friendly process. Reviewing threat models was completely manual. The threat modeling process was roughly 17 steps (depending on the version you looked at). The first was “list all your assumptions,” which is impossible, and that list was never revisited, which is annoying. No one knew what quality looked like. When we had experts like Mike Howard, Dave LeBlanc or Window Snyder in the room, threat modeling worked great. When they left, it sometimes worked and sometimes went sideways.

Those problems led to my threat modeling work at Microsoft: the Four Question Framework, the SDL Threat Modeling Tool v3, and a focus on how engineers threat model. The Elevation of Privilege card deck was a response to the observation that the software, designed to enable validation and scaling, had pulled the fun out of the analysis. The book came from wanting to share what I’d learned, and what I’ve learned continues to inform how we train and help firms accelerate today.

Present

Since then, we’ve seen tremendous growth in threat modeling. The practice has shifted from smart people talking through a problem to one driven by a wide variety of tools and methodologies. We have structured approaches that we can teach and that produce consistent results.

We’ve seen growth in software for threat modeling, including the emergence of enterprise-grade tools like ThreatModeler, programming language support like PyTM, services like Tutamantic, a separate category of design review tools like Seezo, a stack of emergent AI threat modeling tools like OWASP Precogly.

More broadly:

I can confidently say threat modeling is growing because for years I’ve been watching that the growth. And in Threat Modeling for the Defense Industry: Past, Present, and Future, Hyunsuk Cho and Seungjoo Kim included their analysis of academic papers on threat modeling. They found there were more papers published in 2023 alone than all the work published from 2003 to 2016 (273 and 261, respectively.)

a graph showing
steady linear growth

Future

The rise of LLMs has driven tremendous interest in both understanding the threat model we should apply to LLMs, and asking if we can use LLMs to help us threat model.

For me, that's not just curiosity: It’s driving experimentation and exploration, it’s driving new tools such as PHANTOM-B.

For the company, it’s new elements in our Accelerator toolkit, new trainers creating and delivering new trainings on threat modeling AI systems, on using LLMs to threat model, and more.

For threat modeling specialists, it's a time of great demand, as the skills and methodologies we’ve honed become more relevant in all sorts of conversations. We’ll need to continue developing new methodologies like ATLAS, the OWASP AI Exchange and LLM Top 10, the Berryville analyses, and a whole lot more. We’ll need to ask where human judgment adds specific technical value or adds assurance.

Engineering, as a profession, carries a responsibility to the public, and threat modeling is a key technique we use to deliver on that responsibility.

It’s easy to veer here into AI, rather than threat modeling, but threat modeling: asking what can go wrong and what we’ll do about those things will continue to be a place where humans will be held accountable. Engineering, as a profession, carries a responsibility to the public, and threat modeling is a key technique we use to deliver on that responsibility.

All of these reflect a shift from “can we scale threat modeling” to “threat modeling is a crucial element of how we conceptualize and deliver more secure systems.” I’m deeply grateful to Eric and Steve for giving me a chance to address that and to share the lessons we were learning, to the great many people at Microsoft and then elsewhere who told me what wasn’t working, and then later those who shared what was working, built communities and more. It’s been a heck of a ride, and ... you ain’t seen nothin’ yet.

Image: A stack of books showing my work on some of the foundations.