Metricon 1 — The Inaugural Event

September 20, 2006

Metricon 1.0 was held 1 August 2006 in Vancouver, British Columbia, Canada, coincident and co-located with the 15th USENIX Security Symposium. This page has the final agenda, copies of all presentation materials, and a digest summary of the meeting itself. (As is both typical and appropriate, let me hasten to say as the scribe for the affair that all errors are mine.)

The Metricon 1.0 Agenda follows below with presentation materials from each author. A prose Digest of the meeting’s conversation is here.

Agenda #

Keynotes
- Andrew Jaquith, Yankee Group – Resolved: Metrics are Nifty
- Steve Bellovin, Columbia University – Resolved: Metrics are too hard – speech and presentation
Software Security Metrics. Chair: Gunnar Peterson, Artec Group
- Brian Chess & Katrina Tsipenyuk, Fortify Software – A Metric for Evaluating Static Analysis Tools
- Pratyusa Manadhata & Jeannette Wing, Carnegie-Mellon – An Attack Surface Metric
- Jeremy Epstein, WebMethods – “Good enough” Metrics
- Thomas Heyman & Christophe Huygens, U of Leuven – Software Security Patterns and Risk
- Pravir Chandra, Secure Software – Code Metrics
Enterprise & Case Studies A. Chair: Adam Shostack
- Chris Walsh – Data Breaches: Measurement Efforts and Issues. Includes a model.
- Dennis Opacki, Covestic – The Human Side of Security Metrics
- John Quarterman & Gretchen Phillips, InternetPerils – No Substitute for Ongoing Data, Quantification, Visualization, and Story-Telling
- Shawn Butler, MSB Associates – What are the Business Security Metrics?
Enterprise & Case Studies B. Chair: Betsy Nichols; slides
- John Nye, Symantec – Leading Indicators in Information Security
- Vik Solem – Top Network Vulnerabilities Over Time
- Andrew Sudbury, ClearPoint Metrics – IAM Metrics Case Study
- Jonas Hallberg & Amund Hunstad, Swedish Defence Research Agency – Assessment of IT Security in Networked Information Systems
Governance. Chair: Dan Geer
- Dan Geer, Verdasys – The only metrics that matter are for decision support
- Bryan Ware, Digital Sandbox – Model Concepts for Consideration and Discussion
- Kawika Daguio, Northeastern University – Mission and Metrics from Different Views: Firm/Agency, Industry, and Profession
- Bob Blakley, Burton Group – Measuring Information Security Risk
- Wayne Jansen, NIST – Information Assurance Metrics Taxonomy
Dinner/Rump Session
- Eric Byres, Wurldtech Analytics & David Leversage, British Columbia Institute of Technology – The Industrial Security Incident Database
- Andy Ozment and Stuart Schechter, MIT Lincoln Laboratory – Milk or Wine: Does Software Security Improve with Age?
- Pete Lindstrom, Spire Security – Security Metrics

Program Committee #

Chair: Andrew Jaquith, Yankee Group

Members:

Adam Shostack, emergentchaos.org
Gunnar Peterson, Artec Group
Elizabeth Nichols, ClearPoint Metrics
Pete Lindstrom, Spire Security
Dan Geer, Verdasys

Original Call for Participation #

Overview

Announcement and Call for Participation, First Workshop on Security Metrics (Metricon 1.0) August 1, 2006 Vancouver, B.C., Canada

Ever feel like Chicken Little? Wonder if letter grades, color codes, and/or duct tape are even a tiny bit useful? Cringe at the subjectivity applied to security in every manner? If so, Metricon 1.0 may be your antidote to change security from an artistic “matter of opinion” into an objective, quantifiable science. The time for adjectives and adverbs has gone; the time for numbers has come.

Metricon 1.0 is intended as a forum for lively, practical discussion in the area of security metrics. It is a forum for quantifiable approaches and results to problems afflicting information security today, with a bias towards practical, specific implementations. Topics and presentations will be selected for their potential to stimulate discussion in the Workshop.

Workshop format

Metricon 1.0 will be a one-day event, Tuesday, August 1, 2006, co-located with the 15th USENIX Security Symposium in Vancouver, B.C., Canada. Beginning first thing in the morning, with meals taken in the meeting room, and extending into the evening.

Attendance will be by invitation and limited to 50 participants. All participants will be expected to “come with opinions” and be willing to address the group in some fashion, formally or not. Preference given to the authors of position papers/presentations who have actual work in progress.

Each presenter will have 10-15 minutes to present his or her idea, followed by 15-20 minutes of discussion with the workshop participants. Panels may be convened to present different approaches to related topics, and will be steered by what sorts of proposals come in in response to this Call.

Goals & topics

The goal of the workshop is to stimulate discussion of and thinking about security metrics and to do so in ways that lead to realistic, early results of lasting value. Potential attendees are invited to submit position papers to be shared with all. Such position papers are expected to address security metrics in one of the following categories:

Benchmarking
Empirical Studies
Metrics Definitions
Financial Planning
Security/Risk Modeling
Visualization

Practical implementations, real world case studies, and detailed models will be preferred over broader models or general ideas.

How to participate

Submit a short position paper or description of work done/ongoing. Your submission must be no longer than five (5) paragraphs or presentation slides. Author names and affiliations should appear first in/on the submission. Submissions may be in PDF, PowerPoint, HTML, or plaintext email and must be submitted to Metricon@securitymetrics.org.

Presenters will be notified of acceptance by June 15, 2006 and expected to provide materials for distribution by July 15, 2006. All slides and position papers will be made available to participants at the workshop. No formal proceedings are intended.

Simultaneous submission of the same work to multiple venues, submission of previously published work, and plagiarism constitute dishonesty. The organizers of this Workshop as well as USENIX prohibit these practices and will take appropriate action if dishonesty of this sort is found.

Location

Metricon 1.0 will be co-located with the 15th USENIX Security Symposium (Security ’06).

Cost

$200 all-inclusive of meeting space, materials preparation, and meals for the day.

Important dates

Requests to participate: by May 15, 2006
Notification of acceptance: by June 15, 2006
Materials for distribution: by July 15, 2006

Keynotes #

Nominal text of keynote speeches at Metricon 1 in Vancouver, BC on August 1, 2006.

Andrew Jaquith, Yankee Group: Metrics Are Nifty #

Good morning everyone, and welcome to the first conference on security metrics, dubbed Metricon 1.0.

My confederates urged that I do the keynote for this. I consented under heavy duress and after a few tactically-executed bribes.

I had hoped to make a speech that combined the rhetorical force of Cicero, the abstract symmetry of Euclid, the solidity and timelessness of Hadrian’s Wall, and the sparse lyricism of Robert Frost.

In truth, though, I’ve probably only succeeded in evoking the bombast of Jacque Chirac and the byzantine inpenetrability of his compatriot, Jacques Derrida. I’ll let you be the judge of that.

Keen observers will note from the agenda announcement that I’ve billed the initial slot as a parliamentary style debate between Steve Bellovin and I. I proposed to take the position that “Metrics are Nifty” — and proceed to hold forth on that topic — and he would take the contrary position.

I am sorry to disappoint you, but the prospects of me debating Steve in public in a serious way about anything at all are pretty close to zero. I’m a relative newcomer to the field of information security, with no formal training and nary an academic paper to my name. Up against the pedigree of Mr. Bellovin, my answers to anything are likely to be not just wrong, but not even wrong. So I’ve shied away from that, primarily because I’m chicken, and partly because I didn’t think ahead and rent the powdered barrister’s wigs that we’d need to do the debate properly.

That said, I would like to take a few minutes and speak on the goals for this conference, and in particular of what metrics ought to be.

The need for a conference like this seems plain to me. Security is one of the few areas of management that does not possess a well-understood canon of techniques for measurement. In logistics, for example, metrics like “freight cost per mile” and “inventory warehouse turns” help operators understand how efficiently trucking fleets and warehouses run. In finance, “value at risk” techniques calculate the amount of money a firm could lose on a given day based on historical pricing volatilities. By contrast, in security… there is exactly nothing. No consensus on key indicators for security exists.

The lack of consensus on security metrics is, in part, due to the fact that the culture surrounding security is largely one of shame. Firms that get hacked tend not to talk about security incidents in public. Likewise, firms that are doing the right things tend not to talk either, lest giant red bulls’-eyes appear on their firewall’s flanks. When they do talk, it is typically under NDA, or at small gatherings of like-minded people. Therefore, today we will, as a secondary objective, discuss effective practices of firms that have taken the responsibility of measuring their security activities seriously.

I’ve been curious to see if I could find a consensus definition of what a “metric” is. According to Oxford’s American Dictionary, a metric is “a system or standard of measurement.” In mathematics and physics, it is “a binary function of a topological space that gives, for any two points of the space, a value equal to the distance between them, or to a value treated as analogous to distance for the purpose of analysis.”

Specific to IT metrics, Maizlitsh and Handler further discriminate between metrics used for quantifying value versus those used to measure performance:

There are two fundamental types of metrics that must be considered before commencing with IT portfolio management: value delivery and process improvement. Value delivery consists of cost reduction, increase in revenue, increase in productivity, reduction of cycle time, and reduction in downside risk. Process improvement refers to improvements in the IT portfolio management process.

These two definitions certainly help, but like most definitions it grants us a rather wide scope for discussion. Just about anything that quantifies a problem space and results in a value could be considered a metric. Perhaps we ought to re-focus the discussion on the goals of what a metric should help an organization do. The primary goal of metrics is quantify data to facilitate insight. Metrics do this by:

Helping an analyst diagnose a particular subject area, or understand its performance
Quantifying particular characteristics of the chosen subject area
Facilitating “before-and-after,” “what-if” and “why/why not” inquiries
Focusing discussion about the metrics themselves on causes, means and outcomes rather than on methodologies used to derive them

As an analyst, I’m keenly interested in making sure that persons examining a “metric” for the first time should see it for what it is — a standard of measurement — rather than as something confusing that prompts a dissection of the measurer’s methods.

Metrics suffer when readers perceive them to be vague. To keep organizations from trapping themselves in tar-pits of hand-wavery and vagueness, metrics should be clear and unambiguous. Specifically, good metrics should be consistently measured, cheap to gather, expressed as a number or percentage, and expressed using at least one unit of measure. A “good metric” should also ideally be contextually specific.

Consistently measured. Metrics confer credibility when they can be measured in a consistent way. Different people should be able to apply the method to the same data set and come up with equivalent answers. “Metrics” that depend on the subjective judgments of those ever-so-reliable creatures, humans, aren’t metrics at all. They’re ratings. The litmus test is simple: if you asked two different persons the same measurement question, would they produce the same answer?
Cheap to gather. Every metric takes time to compute. All metrics start their lives as raw source data, then — through the magic of computation — become something more insightful. That means that somebody or something needs to obtain the data from a source, massage and transform the data as needed, and compute and format the results. For some metrics, these steps collapse into a single, fast process; a simple SQL statement or API method call delivers the goods. I firmly believe that metrics ought to be computed often.
Expressed as a number. Good metrics should be expressed as a number or percentage. By “expressed as a number,” I mean a cardinal number — something that counts how many of something there are — rather than an ordinal number that denotes which position that something is in. For example, “number of application security defects” evaluates to a cardinal number that can be counted. By contrast, high-medium-low ratings that evaluate to 1, 2 and 3 are ordinal numbers that grade relative performance scores but don’t count anything. Metrics that aren’t expressed as numbers don’t qualify as good metrics. “Traffic lights” are not metrics at all. They contain neither a unit of measure nor a numerical scale.
Expressed using at least one unit of measure. Good metrics should evaluate to a number. They should also contain at least one associated unit of measure that characterizes the things being counted. For example, the metric “number of application security defects” expresses one unit of measure, namely defects. By using a unit of measure, the analyst knows how to consistently express results of a measurement process that looks for defects.
Contextually specific. Good metrics mean something to the persons looking at them: they shed light on an underperforming part of the infrastructure under their control, chronicle continuous improvement or demonstrate the value their people and processes bring to the organization. Although specificity is not required for all good metrics, it helps to keep each of them scoped in such a way that a reader could receive enough insight to make decisions based on the results.

“Contextually specific” is a shorthand way of saying that a good metric ought to pass the “smell test.” You don’t want managers wrinkling their noses and asking belligerent questions like “and this helps me exactly… how”? For example, defining an “average number of attacks” metric for an entire organization doesn’t help anybody do their jobs better — unless the indirect goal is an increased security budget. But scoping the same metric down to the level of a particular business unit’s e-commerce servers can help much more, because they can make specific decisions about security provisioning and staffing for these servers based on the data.

These, of course, are my opinions. I could be wrong.

Not everything you see or hear today will necessarily conform to my personal definition of what metrics ought to be — nor should they. Because there’s a healthy vibrancy in the way this group approaches metrics.

Most metrics discussions break into discussions of “models” and of “measures”. Modeling relates quite naturally to measuring. In the information security world, most observers who speak about “security metrics” generally think about them from the point of view of modeling threats, risk (or “risk”) and losses. A vocal minority cares less about the modeling aspects per se, and would rather just measure things. At the risk of being simplistic, modelers think about risk equations, loss expectancy, economic incentives and why things happen. Measurers think more about empirical data, correlation, data sharing and causality. These two viewpoints harken back to the classic division between scientific theorists and experimentalists.

On the securitymetrics.org metrics discussion list that I run, what’s most interesting about is that the preponderance of comments concern risk modeling rather than risk measurement. Our group has spun many long threads about vulnerability disclosure models, but very few about developing a “top 10” list of favorite metrics.

I’m in the latter camp. I’ve not got the math background to come up with a model to explain how, why or how much risk a company has or should bear. I figure that if I can collect enough data on enough things, perhaps I can put everything in a blender and tease out the relationships via correlation.

That said, I appreciate and understand the importance of modeling in driving forward security measurement. In fact, it’s hard to do a good job with measuring without knowledge of modeling. My colleague John Leach reminds us that correlations on their own are sterile.

Today, we’re going to explore both camps: modeling and measuring, with a slight bias towards measurement.

(review of agenda)

I thank you for coming.

Steven Bellovin, Columbia University: On the Brittleness of Software and the Infeasibility of Security Metrics #

How secure is a computer system? Bridges have a load limit; this isn’t determined (as “Calvin and Hobbes” would have it) by building an identical bridge and running trucks over it until it collapses. In a more relevant vein, safes are rated for how long they’ll resist attack under given circumstances. Can we do the same for software?

I’ve sometimes quoted Lord Kelvin:

If you can not measure it, you can not improve it.
When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be.

But I’ve reluctantly concluded that current architectures are not amenable to metrics of the sort I want. Here’s why.

It’s well known that any single defense can fail. More precisely, it’s well known that any piece of software can be buggy, including security software — that list is alarmingly long. This means that whatever the defense, a single well-placed blow can shatter it. We can layer defenses, but once a layer is broken the next layer is exposed; it, of course, has the same problem.

Karger and Schell noted more than 30 years ago the difficulty of defending against an attacker who had purchased a copy of your system and could probe each layer offline. In most situations, we need to defend against threats of precisely this nature. And this is why metrics are so hard: the attacker’s effort is linear in the number of layers, rather than exponential, and each effort is low-cost.

What we need is defense systems with this exponential property. That is, we need a system where getting through two layers is proportional to the product of the difficulty of each layer, rather than the sum. If our defense systems do have this property, we have some hope of measuring their strength. The constant for any one layer may remain small — this is, after all, software that we’re dealing with — but we’ve somehow found a compositional principle that negates the linear effect.

Unfortunately, we do not know how to do this. One possibility might be to use randomization techniques to increase the attack constant for a particular layer. Unfortunately, we’re still dealing with software, and the attacker may go around our randomization. Thus, we can use random stack frame layouts, similar to OpenBSD’s, to defend against buffer overflows; the attacker, though, could launch an SQL injection attack. Perhaps we could use intrusion detection and repair at each layer. If we can do that, the holes won’t stay open at first; the attacker will have continually relaunch the attack. That presupposes that we can build such self-repairing software — research on it is still at a very early stage — and that it won’t be subject to the same brittleness. In any event, the repair would have to succeed before the next layer was penetrated, or some autonomous attack code could continue the attack on inner layers even though the outer layers had been repaired.

The problem, of course, is that such systems do not exist today. The strength of each layer approximates zero; adding these together doesn’t help. We need layers of assured strength; we don’t have them. I thus very reluctantly conclude that security metrics are chimeras for the foreseeable future. We can develop probabilities of vulnerability, based on things like Microsoft’s Relative Attack Surface Quotient, the effort expended in code audits, and the like, but we cannot measure strength until we overcome brittleness. And until we can measure security, we can’t improve it.