Threat Modeling the Genomic Data Sequencing Workflow (Threat Model Thursday)
An exciting new sample TM from MITRE![A subsection of a dataflow diagram](/images/blog/img/2025/threat-modeling-the-genetic-sequencing-workflow-1000w.png)
For Threat Model Thursday, I want to provide some comments on NIST CSWP 35 ipd, Cybersecurity Threat Modeling the Genomic Data Sequencing Workflow (Initial Public Draft). As always, my goal is to offer helpful feedback.
This is a big, complex document. It’s 50 pages of real content with 13 listed authors, and is a subset of a larger project. The official goal is to “demonstrate how to conduct cybersecurity threat modeling...(L148; In this post, I’ll use L to refer to lines, and § to refer to sections.) The draft officially follows the Four Question Framework, and is.. big and maybe intimidating. One question I had is “is NIST setting the bar for a published threat model too high?” In other words, could a simpler threat model serve some of the same purposes? The apparent complexity is exacerbated by the intermingling of ‘how to conduct’ with ‘sample output’ and perhaps the document might be improved by breaking it into two: a ‘how to’ guide and a ‘sample output’ document or documents. Overall, this is one of the more interesting public threat modeling documents.
I’m concerned that this threat modeling is implicitly operationally focused, essentially taking as given many development and operational choices that may have been made at this point. I don’t see, for example, an evaluation of the security of two different sequencing machines and a choice being made, or a consideration of alternatives to ‘Globus’ for file transfer. This may be realistic, but is a choice that should be discussed.
Context
I believe that the “we” in this document is generally a “Genomic Sequencing Lab” and the “research partners” are the lab’s untrusted and untrustworthy customers, but I’m not sure.
The document has an interesting mix of a lot of detail, and a lot of references which imply “we could have done more.” What makes this level of detail right for this document? I’d like to see an explicit discussion, and statements that much lighter threat modeling could be appropriate.
- Why do “organizations” need to ‘select appropriate cyber capabilities’ (L207)? Why aren’t those built-in, and why are the sequencers not secure by design?
- How should organizations go about considering its goals and priorities (L209)? There’s a set of objectives in Table 1, why can’t an organization just use those? Please give specific advice (there is some, perhaps call forward to it.)
- How should the organization periodically assess its cyber posture (L211)? How does that activity differ from what’s in this guide?
- The discussion of threats, risks and how those apply to specific organizations (§ 1.3) is excellent. The example of a dos threat being high impact for a disease surveillance lab, and low impact to an agricultural researcher is great.
- Building on that, the need to publish threat models so organizations can manage risks (L247-249) reminds me of Loren Kohnfelder’s recent essay, Flaunt Your Threat Models, and I think what the authors here are saying is that full threat models need to be either shared with prospects or published.
- I think reference 7 is pointing to the wrong thing; the best cite is [11]. (L253)
- The relationship between threats, risks, and possible mitigations as described starting at L272 is really good, it could be even better if the guide (or a related document) assessed how it does in relation to the needs of various stakeholders.
What are we working on
Generally, the set of diagrams doesn’t match the FDA pre-market’s Guidance, which requires a multi-patient harm view and security use case view(s). I think there’s a case that there’s no need for a multi-patient harm view, but the absence should be discussed. I’m unsure if any of the diagrams are intended to act as security use case views.
- When discussing how “answering question 1 helps teams
identify activities and language (L297),” I have several
comments:
- There’s an interplay of journey and reward here, perhaps separate them?
- Perhaps start from the concrete. “When we answer question 1, we create models, in the forms of diagrams and explanatory or contextualizing text..”
- The language in L298-300 assumes that threat modeling is done on a completed system, not as part of creating it.
- That para also seems to imply that threat modeling is done by outsiders to the system.
- The idea of High Value Dataflows (L305) is fascinating. At first introduction, I noted “bad — assumes answers.” This is partially addressed in §2.1.3, with a specific list of reasons things are highlighted. There’s an implicit leap to ‘these things can go wrong,’ which is not bad, sometimes we know, but does marking these take attention away from other subsystems?
- Table 2 should show the HVD dataflow element, and the doc should show an example of a HVD right there. The next diagram (Figure 3) doesn’t use HVDs, which led me to wonder if there lines were sufficiently differentiated.
- Please also add stick figure external entities to Table 2 since they’re used in Fig 4 and beyond.
- I find Figure 3 hard to read. The elements are all differently sized, there’s no apparent rationale to placement, and I don’t know what the SaaS/PaaS components mean. I’d like to see the lab components grouped on the left in a labeled “lab” boundary. (We read both text and diagrams left to right and top to bottom.)
- In general, the diagrams are not easy to read. Elements move around arbitrarily between diagrams (for example, in Fig 3, Manufacturers are at 11 o’Clock relative to the wet lab, but when that diagram is expanded (without any ‘See Figure 4’ in Fig 3), it’s now at 2 o’clock.
- The discussion of trust in external entities (L412-421) is interesting, but what should the reader do with that information?
- The process description at L495 could be substantially more secure. Consider changing it to an outbound request to the manufacturer, and having the binary file be signed and the signature validated.
- L582, if Globus treats encryption as optional in the year 2025, NIST should select a more secure example to reference, such as scp.
What can go wrong
- The discussion of STRIDE at L613 is good, but somewhat contradicted by the decision (L637) to categorize each threat, a step which I don’t see as worth the effort. A very small nit, I might call it “A STRIDE methodology,” since there are several, such as STRIDE per element or the EoP deck.
- L624 discusses “improving brainstorming.” I consider brainstorming to be unstructured, and so thinking that STRIDE improves things confuses things by breaking an inherent property of brainstorming.
- I like the table in Figure 16, and often use similar ones, with the addition of a “misc” column.
- I consider prioritization to be part of answering “What are we going to do about it,” and so § 2.2.2, which starts with ‘attacks that target the most valuable assets.’ This is somewhat at odds with the framing in § 1.3, which I complemented above for pointing out that different organizations will have different prioritizations. Here I’ll add that not only do they have different impacts, but the cost of managing issues can be quite different, and so again, “who are ‘we’?”
- MITRE’s addition of ATT&CK mappings “for completeness,” (L602) is additional work, and I’m not convinced that it, or the generation of attack trees, is obviously worthwhile.
- Relatedly, there are claims (L670) that trees “effectively tell the story” and “helping those less skilled understand the risks.” Both of those claims could be subjected to usability testing.
- I’m not going to further evaluate the trees in the interests of time.
What are we going to do?
- § 2.3 starts with risk management and suggests a broad approach. I tend to start from mitigation, and tell people to consider risk management approaches only when it doesn’t work. For example, moving your files from /tmp to a private local scratch directly is nearly free and defends your files from tampering by other local users.
- I would like to have seen a set of secure by design, secure by default and secure in development (SSDF) set of mitigations included. For example, are the tools written in memory safe-languages?
Did we do a good job
Yes.
This is one of the most comprehensive published threat models I’ve seen and I think it’s a helpful benchmark. On reflection, I’d like to see it split in two: a guide to threat modeling and an implementation sample. I am concerned that its length and depth may be intimidating, and in places, for example, L1093, I am concerned that “comprehensiveness” is the bar they’ve set, rather than usefulness or accessibility.
When I say splitting, I would prefer to see multiple documents, with one on process and the other on sample output, because there is no better way than being short to signal that the output is digestible.
Note: I have no opinion on Globus, but note that section 6 of their userguide does seem to support the idea that encryption is an option.