For computer software, our security models, policies, mechanisms, and means of assurance were primarily conceived and developed before the end of the 1970’s. However, since that time, software has changed radically: it is thousands of times larger, comprises countless libraries, layers, and services, and is used for more purposes, in far more complex ways. It is worthwhile to revisit our core computer security concepts. For example, it is unclear whether the Principle of Least Privilege can help dictate security policy, when software is too complex for either its developers or its users to explain its intended behavior.
One possibility is to take an empirical, data-driven approach to modern software, and determine its exact, concrete behavior via comprehensive, online monitoring. Such an approach can be a practical, effective basis for security—as demonstrated by its success in spam and abuse fighting—but its use to constrain software behavior raises many questions. In particular, three questions seem critical. First, can we efficiently monitor the details of how software is behaving, in the large? Second, is it possible learn those details without intruding on users’ privacy? Third, are those details a good foundation for security policies that constrain how software should behave?
This paper outlines what a data-driven model for software security could look like, and describes how the above three questions can be answered affirmatively. Specifically, this paper briefly describes methods for efficient, detailed software monitoring, as well as methods for learning detailed software statistics while providing differential privacy for its users, and, finally, how machine learning methods can help discover users’ expectations for intended software behavior, and thereby help set security policy. Those methods can be adopted in practice, even at very large scales, and demonstrate that data-driven software security models can provide real-world benefits.