Five Threat Model Diagrams for Machine LearningSome diagrams to help clarify machine learning threats
When we talk about “threat modeling machine learning,” some people seem to be talking about threats to machine learning systems, some about threats from those systems, and sometimes even about threat to our jobs. (That last comes up more when talking about “AI and cybersecurity.”)
For threat model Thursday, I wanted to share a few sketches I’m working on to help clarify what we’re talking about. Diagrams that we can point to are an incredibly powerful tool that are so simple that they can be hard to talk about.
So this first set of diagrams shows threats to and threats from the ML system.
It also lets me talk about an element of sketching, which is what data flow arrows do we need? In the first diagram, I show only the threats, expecting that the response is implicit. In the second, I show both the “write email” and the “phishing response,” and I didn’t bother with sequence numbers. In a more complex diagram, I might have. I could also have used a message sequence diagram to show the second one, and then time would have been more visible, but I’d have lost the similarity between the diagrams which helps us see the “to/from” nature that I wanted to emphasize.
This second set is about training data, and where it comes from. These are intentionally similar on the left, to draw attention to differences on the right. (In fact, they’re copy/pasted.) This allows us to think about the training data, and how system design decisions about how the system is trained impact it.
I’m pretty sure the model of how “racist garbage” from Twitter got to Tay is not accurate. Microsoft didn’t retrain a full model each time it got an @. What’s important is that there’s no ‘data quality process.’ I hope the relevant differences jump out.
So, this threat model Thursday is entirely focused on these diagrams to help people think about what can go wrong in different scenarios. (And hopefully obviously, the threats shown are intended to be illustrative, not complete.)