The Business Value of Security

What is the business value of security? If a product owner is making decisions about how to prioritize user stories, features, or work items, how does he or she decide how to prioritize work items related to improving security? Perhaps this is the wrong question to be asking.

We must start with the question of how to compare the business value of a functional requirement to the business value a security requirement. Conversations about this topic in the agile world often make two critical assumptions: (1) that there is a decision-maker, say a product owner, uneducated in information security, who can weigh the two types of requirements and make prioritization decisions between them, and (2) that there is some measure of business value that allows him or her to make such decisions rationally – a “common currency” of business value, so to speak. Return on Investment (ROI) is often mentioned in this connection. So the traditional agile approach, as I interpret it, has the technical team explaining to the product owner what the ROI is of the security feature, and then the product owner making a good trade-off by comparing ROIs.

It doesn’t work well, and when it doesn’t, the technical team is blamed for not making clear what the ROI is (“they don’t think in business terms!”). But as I show in The Art of Business Value, there is no such common currency of business value, particularly not ROI. The product owner comparing user features to security tasks is comparing apples and oranges, and has a bias for apples. Business cases for security features – for all features, come to think of it – are hopelessly sensitive to assumptions about probability (exactly how probable is it that a bad guy will try a SQL injection on this particular web page?). For many business features, we can reduce uncertainty through experimentation. But for security features? Do you want to leave firewall ports open to find out how big a threat is out there?

No, the problem here is that security is not a feature. True, the company needs to make decisions about how much to invest in it and what exactly to invest in. But the decision needs to be made at a different level in the organization, I think, and in a different way. The company as a whole needs to have a risk management strategy, and to invest in it. And those risk decisions around information security will probably be made by a CISO or CIO.

Information security decisions are largely about the quality of the information product. A security vulnerability is a kind of defect. Security is about how we create our features, not about what features we create. The good news is that to the extent that “quality is free,” security is free as well. This, I think, is the crucial idea behind Rugged DevOps. We should commit to building and deploying rugged software, rugged networks, rugged infrastructure. It is how we do our jobs – simply a matter of professionalism. We can set up a testing regimen that finds security defects the way we find any other defects – automated tests, static code analysis, dynamic testing, penetration testing (the security equivalent of exploratory testing). More importantly, we can “shift left” and make sure developers know how to avoid vulnerabilities and do so. It is part of their everyday job to avoid SQL injections and buffer overflows. It is part of being a professional software engineer.

The important thing is that security decisions cannot be made simply by having a product owner compare ROIs or any other business value metrics. Security itself is a value, and a way of judging the value of product. This value must be articulated and incentivized by the highest layers of management, in the context of a vision of overall risk management. The Art of Business Value, so to speak, is for leaders to establish and articulate values and direct the enterprise in achieving them.

Sponsored Post Learn from the experts: Create a successful blog with our brand new courseThe Blog is excited to announce our newest offering: a course just for beginning bloggers where you’ll learn everything you need to know about blogging from the most trusted experts in the industry. We have helped millions of blogs get up and running, we know what works, and we want you to to know everything we know. This course provides all the fundamental skills and inspiration you need to get your blog started, an interactive community forum, and content updated annually.

BI-Driven Development: Testing Outcomes Rather Than Outputs

In my book The Art of Business Value, I – almost jokingly – tossed out the term BI-driven development (BiDD). This was a play on the idea of Test Driven Development (TDD), where developers first write automated tests for their code, and only then write the code itself. We use BI, or Business Intelligence systems, to report on business metrics. My proposal was that we first decide how to measure the results we want our new features or systems to achieve, build a way of measuring those results – probably in the company’s BI system, and then start developing those new features. Since the book came out, a number of people have expressed interest in the idea, so I wanted to explain a little further what I have in mind and why it is important to our practice.

Historically, IT projects have been based on a set of requirements – essentially instructions on what should be built. With the introduction of Agile development approaches, we began to view these requirements as things that could be juggled, added to, or subtracted from. But this misses the real point of the Agile approach. The Agile idea is to deliver business value – outcomes – rather than particular features. In a sense, the requirements are not just flexible, but almost irrelevant. Any set of requirements that generates the desired business outcomes (at a high level of quality and a low cost) is just as good as any other set of requirements. Autonomous teams are truly autonomous when we charge them with delivering outcomes, rather than charging them with delivering a particular set of user stories or requirements.

In many formulations of Agile, teams are charged with delivering product that meets the needs established by users and a product owner – that is, they are charged with delivering outputs. But with DevOps – where the team is responsible for a much broader subset of the value stream – and takes feedback from their code’s performance in production to improve the product – we can give them responsibility instead for outcomes. Test-driven development measures outputs – does the code I wrote do what I was told it needs to do? BiDD is intended to measure outcomes. The “requirement” that flows to the team is to improve a business metric; the test of whether the team has done so is created through a BI dashboard that measures that metric; and the team’s progress and success can be measured by watching that dashboard over time. Arguably, the so-called “requirements” don’t matter – the team determines its own requirements with the goal of achieving the business’s desired outcome. Now that is autonomy!

Who’s overseeing the overseers?

The government provides oversight over projects and programs. Interestingly, this oversight often happens outside of the normal reporting structure of the government agencies. It is considered important for these overseers to be independent – not part of the organization that is sponsoring or administering the project. While this allows for some objectivity, it also means that the overseers have little “skin in the game” – they do not have to live with the consequences of their decisions. The team running the program does.

Now suppose – this is theoretical, of course, and would never happen in any situation I am familiar with, ahem – suppose that the oversight body imposed substantial burdens on the programs it oversaw. Suppose that it demanded extensive documentation that no one ever read, nit-picked on the format of the documentation, imposed supposed “best practices” that were not actually best practices, and frequently asked for data or status updates that distracted those managing the program. Suppose further that the overseers themselves were not always efficient; held up the programs while they tried to schedule review meetings, gave the programs contradictory direction, and argued amongst themselves or prepared inadequately for review meetings. The problem could be exacerbated if the overseers did not themselves have the experience of running programs, and therefore their understanding of best practices was at best theoretical, at worst superstitious.

If that – ahem – ever happened, then given the power oversight bodies have, they would essentially be ordering the programs to waste money. They might also add risk to programs. Since the job of the oversight body is ostensibly the opposite – to prevent waste and mismanagement – this could be a critical issue. What controls are in place to prevent this? Is the oversight body perhaps incentivized to make “corrections” to the programs to demonstrate its own usefulness?

Because the oversight bodies are not “inline” with the management structure over the program, they have no obligation to cultivate the program team as employees. They do not need to encourage program staff, deal with any issues of demoralization, provide positive feedback, make a comfortable work environment that will attract more great performers into program management. Oversight in such an environment runs the danger of focusing on negativity and control, rather than on successful execution.

How can we improve this? Oversight bodies must be measured by the success of the programs they oversee, not by their willingness to cancel failing programs. They must be composed of people who are experts – not in overseeing programs, but in executing them. They must work to optimize their processes and to minimize the waste they add to programs, and must solicit feedback from programs to understand what waste they are causing. What I am saying is that oversight must create value for programs, and only the programs can judge whether they do.

Cancel Successful Projects, Not Failing Ones

Government oversight bodies take great pride in canceling failing projects. They consider it a measure of success. What are oversight bodies for? Eliminating wasteful investments, of course. At first glance this might seem to be consistent with an agile mindset. Among the advantages of an agile approach are the transparency given to stakeholders and the ability to manage risk by working in increments. Only the current increment is at risk, since the project can be stopped before the next increment begins. Problems are exposed through transparency and oversight can take advantage of the incremental approach to stop a failing project.

This way of thinking is a mistake. The project was started because of a mission need. Canceling the project leaves that mission need unmet. Oversight has failed in two senses: (1) it failed to make the project successful, and (2) it did not allow a mission need to be met, one that was important enough to have invested in. If, in fact, the mission need is important, then a new project will have to be started to address that same need. The new project will have the overhead of starting up a new program, thereby wasting more money. Instead of canceling the program, the oversight body should shift the course of the current program to make it more successful.

But isn’t that throwing more money into a wasteful program? No – what has been spent so far is a sunk cost, and there may be some salvageable assets. Doesn’t the program’s failure to date mean that it is poorly managed? Not necessarily, but if it is, the oversight body should simply force the program management to change, not cancel the program. In many cases the problem is not management, but circumstances outside their control. The oversight body should help the program overcome those outside forces. Terminating the program is just a way to punish the program’s management, and adds waste for the government as a whole.

Why cancel a successful program? If a program has delivered substantial value to date, then the oversight body should consider whether the remaining work of the program is necessary. If the program’s work was appropriately prioritized, then there should be diminishing returns to continuing the program. Oversight should constantly reassess the value of the remaining work, and see if the agency’s needs have changed in a way that the remaining work is no longer worth the investment. If the oversight body decides that this is the case, it should cancel the remainder of the program and rejoice – for this it can legitimately claim an oversight success!

Paper Parity and Digital Services

The government needs to give up on the idea of parity between paper forms and electronic forms. This one concept alone is holding back Digital Services and public-centric offerings more than any other.

As I mentioned in my last post, The Government Paperwork Elimination Act (GPEA) of 1998 tried to establish electronic interactions as equivalent to paper. In 1998, this might have been forward-leaning, but in 2015 it just doesn’t go far enough. Because much of government policy has been based around paper forms, it requires a creative leap of the imagination to treat electronic forms and online interactions as something different and better. But the two channels have a major difference: electronic forms can be interactive, while paper forms can never be. Interactivity requires different ways of thinking. Instead, much of our government process, including the implementation of the Paperwork Reduction Act (PRA), effectively forces us to dispense with any interactivity, for the specious reason that it would make electronic interaction different from paper.

Let me give a few illustrations of how channel parity holds us back from what is considered normal customer service. Online forms generally validate user input as it is entered – they check for mistakes that would result in denied applications and database integrity issues. But validation is something that paper cannot do – the public can submit paper applications that have silly errors in them. So “parity thinking” requires that we allow them to make the same mistakes electronically. In fact, we could often go further than simple validation – if the user is applying for something that they are patently unqualified for, our online form could in some cases let them know immediately and save them the trouble and cost of applying. But paper applications do not do this, so it is disallowed or frowned upon.

There are cases where we could help the applicant fill out an application by providing data that we already know or can look up for them. For example, if they are applying to replace a green card, we could look up the information from their previous card, ask what has changed, ask for proof of those changes, and produce the card. Instead, we have them enter from scratch much of the information on the card, as they would on paper, then check to see if it matches what we already know.

I understand that there are legal and policy issues involved here. I’m just suggesting that those reflect an old way of thinking that no longer aligns with a world that has changed. We should be looking for ways to change those laws and policies. Perhaps a “Government Interactivity Preference Act”?

The problematic Paperwork Reduction Act

The Paperwork Reduction Act (1980,1995) and the Government Paperwork Elimination Act (1998) together suggest that the government wants to move away from burdensome, paper-only interactions with the public toward a 21st century approach that takes advantage of the online world. The Paperwork Elimination Act (GPEA) mandates that government agencies treat electronically submitted information the same as a paper version – even to the extent of recognizing electronic signatures – so that individuals can transact with the government electronically. The Paperwork Reduction Act (PRA) is intended to reduce the burden on the public resulting from information collections. Simply put, agencies should not require unnecessary information from the public and should make the best use of the information it has collected.

These goals are the right ones. As someone who has applied for visas for foreign countries and had to provide odd pieces of information that were clearly irrelevant, I am happy that the US has a mechanism to avoid such a thing. Unfortunately, the details of the legislation and its implementation are interfering with the goal, despite what are clearly the best intentions of all concerned.

One problem is process-related. The PRA sets up a process for both new forms and changes to existing forms that requires a 60-day public comment period followed by a second 30-day public comment period once feedback from the initial comment period has been incorporated. The form must then be approved by the chronically understaffed Office of Information and Regulatory Affairs (OIRA) at OMB. With the time required for preparation of the documents OIRA requires, the process can take 1 – 2 years for a change to an existing form.

The result is that agencies are discouraged from making improvements to their forms. Planning within agencies centers around how to avoid making changes that will trigger a PRA review. In an era when tech-savvy companies make continuous improvements to their user interactions, often testing two versions of the user interface at the same time (called A-B testing), this process interferes with the government’s ability to reduce burden and improve the public’s experience when transacting with the government.

A second issue is the existence of loopholes in the legislation. Government agencies are instructed to accept electronic signatures “where practicable.” In many cases the Department of Justice believes that such signatures are not “practicable” and agencies must require “wet” signatures even if a form is submitted electronically.

Perhaps the biggest issue, though, is the equating of paper and electronic versions of forms. OIRA requires parity between forms that are available both electronically and in print. This means that many of the features of electronic customer interaction are not allowed, since they would create a disparity between the channels. For example, online forms typically “validate” information as it is entered, flagging errors in the user’s input. Since paper allows the user to write anything they want, agencies are not allowed to stop an applicant from electronically submitting information that is clearly wrong. This denies agencies and the public one of the greatest benefits of electronic interactions.

There is a more subtle and insidious problem with this requirement. Electronic applications are generally – outside of the government – interactive; that is, as the user enters information the computer responds by providing related information. For example, once the applicant has been identified, the system can look up information it already has on the applicant and provide it as a “default” to reduce the burden on the applicant. But this would diverge from what is available on a paper application.

As a result the government’s electronic applications are static; viewed as just an equivalent of the paper application. As with paper, the applicant is expected to fill out information on a static page and submit it before the government can provide any help. The paperwork burden on the public is not reduced and the agency receives bad data, which makes its processing less efficient.

The PRA requires that an agency “to the maximum extent practicable, uses information technology to reduce burden and improve data quality, agency efficiency and responsiveness to the public.” The Open Government Directive further requires that OIRA review the PRA for impediments to the use of new technologies. In my view, that means that we cannot treat electronic forms as if they were paper forms, but rather must take advantage of all the advantages electronic interaction allows. Doing so would realize the spirit of the PRA and GPEA better than today’s process.

The legacy system modernization trap

Government agencies have often failed at efforts to modernize their legacy systems. Even when the effort is not labeled a failure, it is expensive, time-consuming, and risky. There is a good reason for this: we should not be doing legacy system modernization projects at all.

Why not? To begin with, “modernization” is not a business goal. It does not in itself represent a business outcome that adds value to the enterprise. As a result, its scope of activities can never be clear, the business cannot feel a sense of urgency (and as a result wants to hold off releasing until the system is perfect), and good prioritization and trade-off decisions cannot be made. It is a project that is all risk and no return: there is already a functioning system, so the agency is only risking failure. A modernization effort can never end, since being up-to-date is a goal that slips away as we approach it.

Legacy modernization also does not lend itself to an agile approach – that immediately should be a give-away that something is wrong. We cannot release the modernized product incrementally, beginning with a minimally viable product, because the business will not want to use something that does not at least match the functionality of the legacy system. This means that the scope of the first release is fixed and cannot be adjusted to fit within a time box. The legacy system must be replaced in a single release.

Yet there is a way to modernize a system. The first goal should be to bring contemporary technical practices to bear on the legacy system as is. And the first step in doing that is to bring the legacy system under automated test. A wonderful reference on doing so is Working Effectively with Legacy Code by Michael Feathers. Once the system is under automated test, regressions can be avoided and the code can be restructured and refactored at will. The second step is to re-architect (as little as possible) to make re-engineering pieces of the system efficient. In many cases this will involve creating elements of a service-oriented architecture – or at least finding ways to loosely couple different pieces of the system.

Now to the real point of the exercise. We need to work from real business needs – capabilities that the system does not have. For each capability we change the system to deliver it. We do so in an agile way, and we use our automated tests to feel comfortable that we are not breaking anything. Over time, the system becomes easier to change as components are replaced by more “modern” components. But we never do anything without a good business reason to do so.

In other words, we’ve modernized our system without doing any such thing.

Spend more on delivery, less on risk mitigation

Let’s do a simple Lean analysis of government IT system delivery projects. How much of our spend is on activities that directly create value, and how much is additional overhead? What percentage of our spend is value-creating?

The value-creating part of a software development project is primarily the actual development and testing of the software. Add to that the cost of the infrastructure on which it is run, the cost of designing and building that infrastructure, and perhaps the cost of any software components from which it is built. I mean to include in these costs the salaries of everyone who is a hands-on contributor to those activities.

The non-direct-value-creating part is primarily management overhead and risk mitigation activities. Add to these the costs of the contracting process, documentation, and a bunch of other activities. Let’s call these overhead. A great deal of this overhead is for risk mitigation – oversight to make sure the project is under control; management to ensure that developers are doing a good job; contract terms to protect the government against non-performance.

No one would claim that these overhead categories are bad things to spend money on. The real question is what a reasonable ratio would be between the two. Let’s try a few scenarios here. An overhead:value ratio of 1:1 would mean that for every $10 we spend creating our product, we are spending an additional $10 to make sure the original $10 was well-spent. Sounds wrong. How about 3:1? For every $10 we spend, we spend $30 to make sure it is well spent? Unfortunately – admittedly without much concrete evidence to base it on – I think 3:1 is actually pretty close to the truth.

Why would the ratio be so lopsided? One reason is that we tend to outsource most of the value-add work. The government’s role is management overhead and the transactional costs of contracting. Management overhead is duplicative: the contractor manages the project and charges the government for it, and the government also provides program management. Another reason is the many layers of oversight and the diverse stakeholders involved. Oversight has a cost, as does all the documentation and risk mitigation activity that is tied to it. When something goes wrong, our tendency is to add more overhead to future projects.

A thought exercise. Let’s start with the amount we are currently spending on value-creating activity, and $0 for overhead. Now let’s add incremental dollars. For each marginal dollar, let’s decide whether it should be spent on overhead or on additional value creation (that is, programmers and testers). Clearly we will get benefit from directing some of those marginal dollars to overhead. But very soon we will start facing a difficult choice: investing in more programmers will allow us to produce more. Isn’t that better than adding more management or oversight?

To produce better results, we need to maintain a strong focus on the value creating activities – delivery, delivery, delivery.

Who needs requirements?

On my other blog, I posted an entry on how agile approaches in a way dispense with the idea of requirements; instead a business need is translated directly into code (skipping the requirements step), with tests providing an objective way to see whether the result is acceptable.

This idea disturbs many government IT and procurement professionals. It shouldn’t.

Perhaps it will ease people’s minds to think of an agile process as something like a procurement done with a Statement of Objectives. In place of system requirements the government, throughout the course of development, presents the contractor with business needs, and the contractor is free to provide a solution without constraints. For the same reason that this is often good practice in contracting, it is also good practice in software development. I am not saying that agile procurements should be done through a Statement of Objectives (a good idea in some cases), just pointing out the underlying similarity in concept.

One objection I hear is that without requirements, we cannot contract for services. Even if we could, how could we have a fair competition, since contractors bid on how they would address requirements? The trick here, I believe, is to distinguish between contractual requirements and system requirements. There is no rule that says that the contract or the RFP must include system requirements. Of course it must include some sort of requirements. The requirements depend on the basis for the competition – for example, if a procurement is for development services, we can state requirements for the services – required skills and experience, management approach, etc. Or we can state requirements for the business needs to be fulfilled. Perhaps the following comparison is in order: if I wanted security guard services I could specify that the security guards need to prevent people we don’t trust from entering the building. The solicitation does not need to list the names of the particular people we don’t trust.

A second objection is that we need the requirements to know whether the contractor or the project team has performed well. That seems to miss the point. If the requirements are satisfied but the product doesn’t meet the business need, then no one has been successful. We should gauge success by business value produced, business needs met, quality of work, customer service, and so on. Or we can judge the contractor’s success at meeting the business needs developed in the “conversations” with users. We don’t need system requirements in the solicitation to do this.

The main point to keep in mind is that better results are obtained by working from business needs directly to system development. Best results are what we want. We might have to change how we set up our contracts to get there. There is no conflict, from what I can see, with the Federal Acquisition Rules.

DevOps and FISMA, part 2

In my last post I discussed how rapid feedback cycles from production can support FISMA goals of continuous monitoring and ongoing authorization. Today I’d like to discuss FISMA compliance and DevOps from another perspective.

In order to support frequent, rapid, small deployments to production, we must ensure – no surprise – that our system is always deployable, or “potentially shippable.” That means that our system must always be secure, not just in production, but also in the development pipeline. With a bit of effort, the DevOps pipeline can be set up so as to achieve this.

I find it helpful to think of security vulnerabilities or flaws as simply a particular kind of defect. I would treat privacy flaws, accessibility flaws (“section 508 compliance”), and other non-functional flaws the same way. I believe this is consistent with the ideas behind the Rugged DevOps movement. We want to move to a zero-defect mentality, and that includes all of these non-functional types of defects.

Clearly, then, we need to start development with a hardened system, and keep it hardened – that way it is always deployable and FISMA compliant. This, in turn, requires an automated suite of security tests (and privacy, accessibility, etc.). We can start by using a combination of automated functional tests and static code analysis that can check for typical programming errors. We can then use threat modeling and “abuser stories” to generate additional tests, perhaps adding infrastructure and network tests as well. This suite of security tests can be run as part of the build pipeline to prevent regressions and ensure deployability.

How can we start with a hardened system, when we almost always need to develop security controls, and that takes time and effort? I don’t have a perfect answer, but our general strategy should be to use inherited controls – by definition, controls that are already in place when we start development. These controls may be inherited from a secure cloud environment, an ICAM system (Identity, Credential, and Access Management) that is already in place, libraries for error logging and pre-existing log analysis tools, and so on. These “plug and play” controls can be made to cover entire families of the controls described in the NIST standard 800-53.

Start hardened. Stay hardened, Build rugged.