I consult, write, and speak on running better technology businesses (tech firms and IT captives) and the things that make it possible: good governance behaviors (activist investing in IT), what matters most (results, not effort), how we organize (restructure from the technologically abstract to the business concrete), how we execute and manage (replacing industrial with professional), how we plan (debunking the myth of control), and how we pay the bills (capital-intensive financing and budgeting in an agile world). I am increasingly interested in robustness over optimization.

I work for ThoughtWorks, the global leader in software delivery and consulting.

Tuesday, August 31, 2010

One-Way Risk and Robustness of IT Projects

Writing in the FT's Long View column, James Mackintosh makes the point that hedge fund managers “appeared smarter than they really were, because they were taking a risk they did not recognize.” That’s an apt description for a lot of what goes on in IT, too.

Despite all of the risks that commonly befall an IT project, we still deal with IT planning as an exercise in deterministic forecasting: if these people do these things in this sequence we will produce this software by this date. The plan is treated as a certainty. It then becomes something to be optimized through execution. As a result, management concerns itself with cost minimization and efficiency of expenditure.

Trouble is, an operations plan isn't a certainty. It's a guess. As Nassim Taleb observed in Errors, Robustness and the Fourth Quadrant:

Forecasting is a serious professional and scientific endeavor with a certain purpose, namely to provide predictions to be used in formulating decisions, and taking actions. The forecast translates into a decision, and, accordingly, the uncertainty attached to the forecast, i.e., the error, needs to be endogenous to the decision itself. This holds particularly true of risk decisions. In other words, the use of the forecast needs to be determined – or modified – based on the estimated accuracy of the forecast. This, in turn creates an interdependency about what we should or should not forecast – as some forecasts can be harmful to decision makers.

In an IT project context, the key phrase is: “This holds particularly true of risk decisions.” We take thousands of decisions over the course of an IT project. Each is a risk decision. Yet more often than not, we fail to recognize the uncertainty present in each decision we make.

This comes back to the notion that operations plans are deterministic. One of the more trite management phrases is “plan your work and work your plan.” No matter how diligently we plan our work in IT, we are constantly under siege while “working our plan”. Developers come and go. Business people come and go. Business needs change. The technology doesn’t work out as planned. The people responsible for the interfaces don’t understand them nearly as well as they believe they do. Other business priorities take people away from the project. Yet we still bake in assumptions about these and many other factors into point projections – as opposed to probabilistic projections – of what we will do, when we will be done and how much it will cost.

Our risk management practices should shed light on this. But risk management in IT is typically limited to maintaining a “risks and issues” log, so it’s never more than an adjunct to our plan.

That most IT projects have only rudimentary risk management is quite surprising given the one-way nature of risks in IT. One-way risks are situations where we have massive exposure in one direction, but only limited exposure in another. Taleb gives the example of trans-Atlantic flight times. It’s possible for an 8 hour flight to arrive 1 or possibly 2 hours early. It can’t arrive 6 hours early. However, it can arrive 6 hours, or 8 hours, a day or even several days late. Clearly, the risks to flight duration are substantially in one direction. IT risks are much the same: we may aggressively manage scope or find some efficiency, but by and large these and many other factors will conspire to delay our projects.

The fact that risk in IT is substantially one-way brings a lot of our management and governance into serious doubt. Having a project plan distinct from the risk log makes the hubristic assumption that we will deliver at time and cost, so we must pay attention to the things that threaten the effort. Given that our risk is substantially one-way, we should make a more humble assumption: odds are that delivery will occur above our forecast time and cost, so what do we need to make sure goes right so that we don't? While such a pessimistic perspective may be in direct contrast to the cheerleading and bravado that all too often pass for "management", it makes risk the core activity of management decision making, not a peripheral activity dealt with as an exception.

In Convexity, Robustness and Model Error in the Fourth Quadrant, Taleb makes the point that one-way risk is best dealt with by robustness – for example, that we build redundancies into how we work. Efficiency, by comparison, makes us more vulnerable to one-way risk by introducing greater fragility into our processes. By way of example, think of the "factory floor" approach to IT, where armies of people are staffed in specialist roles. What happens to the IT "assembly line" when one or more role specialists exit, depriving the line of their situational knowledge? Without redundancy in capability, the entire line is put at risk.

Common sense and statistical analysis both conclude that an optimized system is sensitive to the tiniest of variations. This means that when risks are predominantly one-way – such as in IT projects – it behooves us to err on the side of robustness.

That risk in IT is substantially one-way brings a lot of our management and governance into serious doubt. Having a project plan distinct from the risk log makes the hubristic assumption that we will deliver at time and cost, so we must pay attention to this list of things that could go wrong. Given the one-way risk - and the uncertainty of what those risks are - we should make a more humble assumption: delivery will occur well above our forecast time and cost, so what do we need to make sure goes right? While such a pessimistic outlook may be in direct contrast to the cheerleading and bravado that pass for "management", it makes risk the core activity of management decision making, not a peripheral activity dealt with as an exception.

Robustness is the antithesis of efficiency. Maximum efficiency of execution against a plan calls for the fewest people delivering the most output to a predetermined set of architectural decisions. Building in robustness – for example, redundancy of people so that skills and knowledge aren’t resident in a single person, pursuing multiple technical solutions as a means of mitigating non-functional requirements, etc. – will not come naturally to managers with a singular focus on minimizing cost, especially if, like hedge fund managers James Mackintosh was referring to, they’re blissfully unaware of the risks.

So, what can we do?

First, we have to stop trafficking in the false precision of IT project management. This is no easy task, particularly in a business culture rooted in fixed-budgets and rigid planning cycles, buyers of industrial IT expecting that technology labor is interchangeable, and so forth. We won’t change the landscape all at once, but we can have tremendous influence with current business examples that will be relevant to sponsors and investors of IT projects. If we change the expectations of the people paying for IT projects, we can create the expectation that IT should provide probabilistic projections and take more robust – and therefore one-way risk tolerant – solution paths.

Second, we can introduce risk management that is more sophisticated than what we typically do, yet still easy to understand. If you haven’t read the book, or haven’t read it for a while, pick up Waltzing with Bears by DeMarco and Lister. Their statistical model for risk profiling is a good place to start, quick to work with and easy to understand. Nothing stops us from using it today. Now, the act of using the tool won’t make risk management the central activity of project managers or steering committees, but adding a compelling analysis to the weekly digest of project data will shift the balance in that direction. That, in turn, makes it easier to introduce robustness into IT delivery.

On that subject of robustness, Taleb observed:

Close to 1000 financial institutions have shut down in 2007 and 2008 from the underestimation of outsized market moves, with losses up to 3.6 trillion. Had their managers been aware of the unreliability of the forecasting methods (which were already apparent in the data), they would have requested a different risk profile, with more robustness in risk management …. and smaller dependence on complex derivatives.

Given the success rate of IT projects – still, according to the research organizations, less than 40% - IT project managers should similarly conclude that more robustness in risk management would be appropriate.