I consult, write, and speak on running better technology businesses (tech firms and IT captives) and the things that make it possible: good governance behaviors (activist investing in IT), what matters most (results, not effort), how we organize (restructure from the technologically abstract to the business concrete), how we execute and manage (replacing industrial with professional), how we plan (debunking the myth of control), and how we pay the bills (capital-intensive financing and budgeting in an agile world). I am increasingly interested in robustness over optimization.

I work for ThoughtWorks, the global leader in software delivery and consulting.

Sunday, July 23, 2006

Nothing Happens in Zero Time

One of the benefits of Agile / best practices is integrity of completion: everything from requirements gathering to development to build to acceptance testing that needs to be done to drive a particular unit of business functionality to completion is encapsulated by it's definintion as a story. That is, we drive more finely grained units of business functionality to a state where that functionality can be utilized; that provides a boolean state of completion: 'tis done or not. This is diametrically opposed to the way in which functionality has been delivered since Eisenhower was president and Churchill was prime minister for the second time round: we have restructured business needs into technical silos of analysis, development, build, curse, test, negotiate and live-with-it. By doing so, we mortgage our future by borrowing against time-boxes in the hope that there will be sufficient bandwidth to complete inaccurately encapsulated and measured activity.

The benefits of the Agile approach are documented elsewhere, obviously the return on technology investment is much greater, as well as the transparency in knowing exactly where you are at any given time. Transparency is worth underscoring specifically because of the greater degree of fact (as opposed to opinion) there is in status reporting: requirements are in fact done or they're not done: "done" means they're ready to go live; "not done" means they're not ready to go live. There's no middle ground, no "I'm xx% complete," no mystery 6-week-window-at-the-end-of-the-project-in-which-we'll-marshal-test-and-deploy-and-hope. Above all, we rely far less hope. In and of itself this is a rich area of discussion which we'll continue to dissect, but it's important to acknowledge a subtle admission in all of this that must not go un-noticed: nothing happens in zero time.

When making work estimates we tend to assume that 100% of the effort required to get something done rests with developing (coding) software. In the process we overlook - and devalue - requirements gathering, unit testing, marshaling/building, integration, QA, UAT and releasing to production. To some extent, in the silo/waterfall world we collect requirements and assume the time / effort /cost to build code in support of two sets of functionality is marginally longer than the time / effort / cost to build just one. This isn't true as each requirements is more an "exception" than a "rule." Each introduces complexity to the business solution, each is code composed by different people which by it's nature presents different challenges. The point is, non-development-technical activity doesn't happen of it's own volition, and in many cases increases exponentially with an increase in the number of requirements piled-on, again as each requirement is really an aberration (introducing quirks, problems and issues) more than a uniform evolution of the platform.

In the same way, our "windows" for silo'd activity - separate time-boxes specifically for requirements gathering, build, a QA / UAT, etc. tend to be characterized by activity pile-on without refactoring the underlying time-box:

  • In the abstract, ever notice how so many projects have a 6 week window at the end of the project for UAT? It's never 5 weeks or 7 weeks, it's 6. Teams tend to borrow against that 6 weeks all through requirements and development, over-mortgaging it to a point where when it's called - when the project is due - there's insufficient capital (time) to cover the positions. That's because we borrow on hope that we can cover our positions, not on fact.

  • By way of specific example, in many projects, sometime during development somebody is going to migrate data; this data needs to be maintained. It often ends up a "ghost" task, sucking time away from people that they otherwise have allocated to complete (value-added) development. The same can often be said for deployment and / or environments: somebody has to own environments and own the process for putting software into production.

The point is that critical albeit non-development-specific activity ends up "piling on" during our different windows: the lead developer is now not just realizing the software but carrying a team, managing an environment and a deploy process, and keeping a data set clean. Subsequently, delivery defaults to a hero-based model during the days/hours leading-up to QA, UAT or production events. This, in turn, introduces a lot of delivery risk around that person's bandwidth: it completely collapses in the event of resignation or burn-out or if that person becomes a victim of the regional transportation authority (i.e., kisses a bus). Clearly it's horrific for the individual; what makes it a tragedy is that it's completely unnecessary business risk.

Consider the criticality of the underlying application and the cost of the risk being introduced by the hero-based model. Suppose failure of a software delivery means a commercial product doesn't get launched and revenue won't come in. Suppose further the burn rate on the development team is $200k / month and the monthly revenue is on the order of $10mm / month. The probability of delivery failure multiplied by the business impact of that failure (total costs incurred, lost revenue, etc.) becomes the maximum amount of insurance the business is providing to itself that it will complete development. From here, it isn't hard to construct a risk-reduction formula to identify the maximum the business is willing to invest to reduce the probability of failure. This is a time-sensitive calculation, so the sooner we're working this math the more likely a smaller increment in cost (people, resources, scope) will create a greater reduction in risk.

Put this in a spreadsheet for any project you have in-flight. Looking at the total commercial picture - business impact including IT commitment - consider two stakeholder perspectives:

  • As the business decision maker, what would you want to know to take an informed decision at different points in the delivery lifecycle? There's less time to be responsive to changes in the business environment as the target delivery day approaches, and there's nothing worse than a negative change in the business environment that is self-inflicted.
  • Then take the IT perspective and ask yourself (a) if it's better to personally guarantee success of the team - especially if you don't have the ability to execute that delivery - when you suspect it to be false or hollow, or (b) if it's better for you to be transparent on behalf of the entire team consistently through the delivery lifecycle.

If you want IT and business to be in partnership, you'd better be prepared to have this conversation, specifically in commercial terms. Those who will be politically embarassed by a failure to deliver - the product manager, the CEO, somebody who is measured on these results - will have the authority to take necessary action to rectify the problem provided they have sufficient time to do something about it. This means they've got to know - in terms they understand (e.g., business terms) - there's a problem in the first place. Fact-based management enables this.

Of course, there might be fear that being the bearer of bad news - e.g., that success isn't guaranteed, that there are risks - might reflect poorly on a manager. In fact, the opposite is true: calling out risk shows you're master of your domain because you're identifying and mitigating delivery risk. Hoping that things magically work themselves out exposes an inability to understand, appreciate and manage success; acknowledging the universe of activity and effort - and subsequently what could go wrong - shows clear understanding of the solution domain. I know the manager to whom I would entrust my investment decisions.

This comes back to the initial intent of this posting. In the above example of piled-on technical tasks of data management and environments issues, there's no specific visibility into what's needs to be done because non-development activity are ghost tasks. In addition to everything else, this creates tremendous tension in the business-IT relationship: "When will you be done" can't be answered authoritatively because there's no authoritative catalogue of what needs to be done and what purpose each serves in delivering business value. Similarly, time spent pursuing things outside of coding drive the business to wonder, "Just what do you spend all your time doing?"

These are diffused with fact-based management; fact-based management inherantly requires acknowledgement that nothing happens in zero time.