12 Tips for Designing an InfoSec Risk Scorecard (its harder than it looks)
A few months ago on the Securitymetrics.org mailing list, someone bravely posted their draft of an Information Security (InfoSec) Risk scorecard, asking for feedback. I sent feedback via private email, and then forwarded it to specific people who asked for a copy. Several of those folks, including the original poster, said I should generalize the feedback and post it some place to help anyone who is trying to design an InfoSec risk scorecard. Here it is in the form of “12 tips”.
Why is it important to get the design right? A risk scorecard is often the first step an organization takes toward the risk management approach to InfoSec. If it’s done poorly, it might be their last step, too.
(For the tips, read on…)
As a preface, I should note that spreadsheet scorecards are both very pervasive in business and also very seductive in appearance. All sorts of experts tote out their scorecard process for evaluating software packages, evaluating vendors, evaluating acquisition targets, and even for employee and team performance evaluations. They exhude an odor of rigor and discipline, and even a whiff of quantitative precision and power. Thus, it’s no surprise that many managers and executives would want a scorecard approach to information security risk (InfoSec risk). Your bosses might hand this project to you, saying “How hard can it be? You can do it in your spare time! And… by the way… I need it before the next executive staff meeting.”
Before you charge into the project, read through these 12 tips and you will gain new appreciation for the complexities and pitfalls, and also ways to deal with them.
Now for a couple of definitions:
- “Scorecard” is a method for scoring, rating, or ranking individual components or categories, then aggregating those component scores into an overall score.
- “InfoSec risk” is the probabilistic, forward-looking estimate of costs related to information security for a given time period, including loss events, mitigation, remediation, recovery, etc. In this discussion, I assume we are talking about aggregated InfoSec risk for an organization. The component categories can include threats, vulnerabilities, controls, mitigations, assets, and so on.
To be clear, the difference between a “risk scorecard” and any other InfoSec scorecard is that a risk scorecard attempts to include all the factors that drive risk – threats, vulnerabilities, controls, mitigations, assets, etc. Other scorecards tend to focus on a subset (mostly controls and vulnerabilities), are aimed at answering a different set of questions (e.g. “Are we compliant?”) and thus don’t face all of the difficulties listed below.
1. Scorecards are a crude approximation of risk. Be very aware of this limitation. Risk is a probabilistic notion, and risk management depends on understanding causation, at least to some degree. But for simplicity, InfoSec risk scorecards don’t include any probabilistic models, causal models, or the like. It can only roughly approximate it under simplifying assumptions. By analogy, this like using arithmetic and simple geometry to predict the dynamics of a spring-weight-damper system, instead of calculus. This might not too hard if the mathematical functions between input and output were smooth, as in the case of the spring-weight-damper. But the mathematical landscape of InfoSec risk is not smooth but rugged. The rest of these tips are aimed to help you avoid the worst problems with using a crude approximation in a rugged landscape.
2. Pick the right right components and the right scales. Though it might be tempting to define components using the “usual suspects” — threats, vulnerabilities, controls, mitigations, assets, and so on — you should really spend some time thinking about what goes into each category, how they relate to each other, and how granular they need to be. You need to realize that you are mixing apples with oranges with kangaroos — meaning that these components have very different characteristics, with very different underlying data. You need to acheive some sort of balance, even if that means shedding light on some aspects of InfoSec risk where you have little or no data.
Then, for each scorecard component and also the aggregate score, you need to decide on whether the scales will be ordinal, interval and ratio. The ubiquitous “high – medium – low” scale is a crude ordinal scale. Ordinal scales can be very fine-grained (i.e. ranking all possibilities), but they don’t tell you anything about the magnitude of difference between #2 and #22, for instance. An interval scale is more informative, because you can talk about the relative differences or gaps between scores. You can add and subtract scores safely, but a score of “0” has no meaning since it’s not on an absolute reference point of magnitude. If you need to analyze your scores in terms of ratios, % rates of increase/decrease, and so on, then you need a ratio scale. (For example, efficiency, productivity, and return on investment are all ratio calculations.) While it may be appealing to jump to a ratio scale, it’s the hardest to support and justify on the evidence. Basically, you need a very sophisticated measurement system to collect data, and then a very rigorous metric analysis system to roll-up the data, plus a meaningful unit of measure for aggregate “Vulnerabilities” and so on. (Someone once called this mythical unit a “securiton”.) I don’t know of any organization that has this capability, and it’s probably beyond the state of the art even in theory. Therefore, your best choice is either an ordinal or interval scale.
Lastly, if you have an arithmetic weighting system for aggregating “ground truth data” or evaluations into each component score, you need to test the validity of the weights (e.g. 10% for “process controls”, 30% for “IT controls”, 50% for “HR controls… adding up to 100% for “Controls” component score.) You might also need some exception handling system for corner cases where the weighting system needs to be overruled.
3. Scales must be normalized and have appropriate granularity. You need to normalize each score so that its contribution to the final result is proportional, relative to the other variables. For example, does a score of “9” on “Data” have the same contribution to total risk as if “Mitigation” score is a “9”? You’ll need to go through every combination of variables asking this question to appropriately scale each variable. Also think through your granularity. Is three levels enough? Is ten too many? Don’t just look at each component in isolation, but in combination with every other component.
4. Special care needs to be taken if you are mixing scales. Mixing ordinal, interval, and ratio scales is generally not safe if you use only arithmetic functions to calculate the aggregate score. Then there is too much chance of erroneous results or nonsense corner cases. It can take a lot of time to explore validate each and every combination of component scores. However, mixing scales and also qualitative metrics is technically feasible if you have an inference system to combine all these pieces of “evidence” to produce the score. This is essentially an A.I. approach. Unfortunately, this may be look like “black magic” to everyone besides the scorecard designer.
5. Pay close attention to the math. Is “sum” or “average” really the right/best function for combining the component scores? While that seems to be the obvious and simplest choice, it you could be very wrong. What you are seeking is some way to roughly approximate the causal relationships among the components. This might lead you to a mixture of functions, including “min”, “max”, and even logical functions or inference rules. You will need to use trial and error to find the right combination of functions. The resulting spreadsheet may be harder to explain, maintain, and update, however. You might also be vulnerable to accusations of “gaming” the scorecard.
6. Explicitly define rules for interpreting the aggregate score. To effectively guide decisions, you need to define the rules for interpretation, consistent with the scales, corner cases, etc. How sensitive is the aggregate score? How noisy is it? What degree of change in the score signals something “bad” or signals something “good”? Are trends meaningful, or just a product of drift, or the scoring system itself? Is a score of 2,000 twice as bad as a score of 1,000? Does that mean you should spend twice as much on it? Is a score of 3,000 three times as bad as a score of 1,000? Or do you just use it for ranking and prioritizing? Are there thresholds, where any score over X gets priority resources or any score below Y is essentially equal to zero? How would you use this aggregate InfoSec risk score? Would it be used in performance goals for the security team or IT team or CIO? How does this roll up to enterprise risk metrics or goals? How does it mesh with compliance goals or metrics? How do InfoSec risk scores relate to other business metrics (i.e. in a balanced scorecard framework)?
After poking your design with all these questions, you will probably end up revising your scoring system, which is good. You will find out the limitations of your scoring system, as well.
7. Watch out for interdependencies and nasty corner cases. As mentioned above, the landscape of InfoSec risk is rugged, in the mathematical sense. Much of this comes from interdependencies, corner cases, etc. This is especially a problem in scorecards that attempt to enumerate individual vulnerabilities, individual threats, etc., and then aggregate them using a simple sum or rank. The problem is that you are ignoring interdependencies between threats, threatening agents, vulnerabilities, assets, and so on. You can see examples in the Verizon Data Breach Report, where the really big security impacts were “caused” by a whole cluster of attacks (probes), vulnerabilities, and mistakes/errors. Patterns matter. History (a.k.a. path dependence) matters.
Do five or ten “low” vulnerabilities, added together, equal one “medium” or even one “large” vulnerability? The only way to determine this is to consider interdependencies.
It’s helpful to view these interdependencies from the viewpoint of attackers (i.e. through attack graphs or similar). In many cases, a class of vulnerabilities might be a case of “weakest link”, where the existence of any one is sufficient to allow attackers to succeed in some intermediate goal.
I don’t know how you would incorporate all this analysis into a scorecard method other than by adding additional qualifiers and factors, plus rules of thumb and heuristics. Maybe you create an overlay scorecard (meta-level) that creates scores for low-probability high-impact events of various types, based on the existence and likelihood of various interdependencies. How ever you handle it, decision-makers really need visibility on the likelihood of “Pearl Harbor” events, not just the run-of-the-mill incidents that arise weekly, monthly, and quarterly.
8. Have a method to deal with uncertainty, vagueness, missing information, imprecision, and contradictory information for each of the component scores. Executives need to know what level of confidence to place on the aggregate and component scores and how the “messiness” of the inputs might affect decision-making. If the end result is a single number with no qualifiers, then the end result then looks precise and accurate, but may be no better than a guess or hunch. Management should know this and it should be visible.
For example, say you have three sources of risk, “A”, “B”, and “C”. Let’s say that “A” has a risk score in the range of 1,000 to 1,500, “B” has a risk score somewhere between 500 and 2,000, and “C” has a risk score of 1,250 exactly. Do you treat them the same in decision-making?
There are several ways to deal with “messiness” in the input data and scoring process, including:
- Adding one or more non-score values, such as “Don’t know”, “Not Applicable”, etc.
- Adding a confidence qualifier, either as a score (0% to 100%) or as descriptive labels.
- Allowing ranges (min and max) or minimum (“at least”) or maximum (“not greater than”)
- Adding inference flags to indicate whether missing information is or is not critical in calculating the end metric
- Adding default values based on business rules. This makes the template more fool-proof for many different users.
- Adding assumptions (“assumes new data warehouse implemented on schedule”) or linkages between line items.
Of course, adding these factors will make the calculation/inference process more complicated. For example, fuzzy logic, partial logic, and multi-valued logic can be used to reason about these factors. It’s more work but it’s worth the effort to better support decision-making.
9. The scorecard process should support organization learning. I can’t emphasize this point enough. Your organization doesn’t just need the aggregate or component scores. What ever the score is, it’s tentative at best, based on partial data. Your organization needs continuous learning to constantly tune and improve it’s understanding of InfoSec risk. It’s very important to understand “how much do we know now?” and “what do we need to learn more about?”. Your scorecard should highlight gaps in knowledge and encourage learning. Ideally, it will place some value on new or better information regarding key inputs or relationships.
This suggestion may cause you to scratch your head, since it is absent from almost all scoring methods. But this is what separates information security risk management from, say, managing flood risk. The risk landscape is changing rapidly and we will never have sufficient or complete information on the current state, let alone future states.
To make this a tool for organization learning, you’d start by incorporating uncertainty, missing information, contradictions, etc. as mentioned above. You’d also want to create a wiki or other knowledge repository to collect supporting information behind each variable, and also to support discussion and debate. Contrary opinions, heuristics, and clarifications are extremely important in the organization learning process. This can be your bridge to processes like log analysis, intrusion detection (true vs. false positives/negatives), forensic investigations, business cases, and so on.
10. Make the “ground truth data” and the business rules behind the scorecard visible and open to peer review/audit. If you’ve followed the advice of the previous nine tips, you’ll be accululating lots of business rules about what “ground truth data” to use, how to evaluate it, how to normalize it, how to score it, how to combine and aggregate scores, and how to deal with “messiness” of all sorts. You should put all these business rules in one place, along with the ground truth data you are drawing from, so that anyone or everyone can understand it, challenge it, and even improve on it. Don’t hide it in spreadsheet formulas or label it “top secret” or other curtains.
11. Stick with standard terminology and definitions – This may seem like a small issue compared to the previous points, but it still merits mentioning. Regardless of scoring system you use, you should draw on one or more of the risk assessment methodologies for clear, consistent terminology. While there is no single authoritative source of terminology, there are many good attempts from NIST, ISO, OCTAVE, FAIR, and others.
12. If your bosses really need a good InfoSec Risk Scorecard, then they should be prepared to pay for it. Maybe your boss handed you this assignment as one more thing to do “in your spare time”. How hard can it be to create a spreadsheet with a scoring system?? But if you’ve read through all these tips and find yourself thinking “Holy crap! I have no idea how to do all this, let alone in my spare time”, then you are in a no-win situation. Go back to your bosses and show them this list. Ask them, “How much is a good InfoSec Risk Scorecard worth to you — in time, money, or scarce resources?” and “What is currently on our priority list that you would bump off in favor of this project?”. If you don’t get good answers to these questions, then your bosses aren’t serious about the project. If they are serious, they should be ready to put real money into it, either to pay for your time, other people in your organization, or outside consultants. If you think doing it right is costly, you should consider the long-term cost of doing it wrong.