I consult, write, and speak on running better technology businesses (tech firms and IT captives) and the things that make it possible: good governance behaviors (activist investing in IT), what matters most (results, not effort), how we organize (restructure from the technologically abstract to the business concrete), how we execute and manage (replacing industrial with professional), how we plan (debunking the myth of control), and how we pay the bills (capital-intensive financing and budgeting in an agile world). I am increasingly interested in robustness over optimization.

Tuesday, March 31, 2020

Autonomy Now

Distributed software development has been practiced for decades. Companies with global footprints were experimenting with this at least as far back as the 1970s. Skilled labor, global communication networks and collaborative tools made "offshore development" possible at scale from the mid-1990s onward. Improved skills, faster networks and more sophisticated collaboration tools have made distributed development practical for very complex software initiatives.

There can be significant differences in the way a team collaborates internally, and the way it collaborates with other teams across a program. Consider a distributed Agile program consisting of multiple teams based in different countries around the world. Under normal circumstances, individual teams of 8 to 12 people work day-in and day-out with colleagues in the same physical location. Intra-team events take advantage of the team's close proximity: the team room, collaborative practices like pair programming and desk checks, team ceremonies like stand-ups, and low-fidelity information radiators such as card walls are all high-bandwidth collaboration techniques. In-person communication is robust, spontaneous and fluid, so it makes sense to take full advantage of it. Conversely, inter-team events such as a Scrum-of-Scrums involve only key team members such as the project manager and lead developer, and are scheduled to take advantage (or at least minimize the inconvenience) of time zone overlap. In practice, any single team in a large program - even an Agile team - can function internally in a tightly coupled manner even though it is loosely coupled to other teams in the same program of work.

The COVID-19 pandemic has a lot of the global work force working in physical isolation from one another; this pushes distributed work models to their extreme. Yes, of course, it has long been possible for teams of individuals to work completely remotely from one another: e.g., tenured experts in the relevant technology who are also fluent in the business context and familiar with one another. But most teams don't consist of technology experts who know the domain and one another. In the commoditized IT industry, people are are staffed as "resources" who are qualified based on their experience with relevant technologies. Domain expertise is a bonus, and interpersonal skills (much less familiarity with team-mates) never enter the equation. A good line manager and competent tech lead know how to compensate for this through spontaneous, high-bandwidth interaction: if somebody's work is going adrift, pull them aside, ask the business analyst or product owner to join you, whiteboard and code together for a bit, and get it fixed. A good line manager and tech lead compensate for a lot of the messiness intrinsic to a team of commodity-sourced people. The physical isolation much of the world is experiencing makes this compensation more difficult.

There are lots of companies and individuals self-publishing practical advice for remote working. Many are worth reading. Although the recommendations look hygienic, good remote collaboration hygiene reduces individual frustration and maximizes the potential communication bandwidth. An "everyone is 100% remote" from one another model has scale limitations, and poor hygiene will quickly erode whatever scale there is to be had.

My colleague Martin Fowler posted a two-part series on how to deal with the new normal. The posts have a lot of practical advice. But the concluding paragraphs of his second post address something more important: it is imperative to change management models.

Being independent while working remotely is not working remotely in an independent manner. The more tightly coupled the team, the more handoffs among team members; the more handoffs, the more people will have to engage in intra-team communication; the lower the fidelity of that communication, the higher the propensity for mistakes. More mistakes means lower velocity, lower quality, and false positive status reports. In practice, the lower the fidelity of intra-team collaboration of a tightly coupled team, the lower the fidelity of inter-team collaboration regardless they are tightly or loosely coupled.

This is where a distributed program of truly Agile teams has a resiliency that Agile-in-name-only teams, command-and-control SAFe teams, and waterfall cannot intrinsically possess by their very nature. A requirement written as a Story that fulfills the INVEST principle is an autonomous unit of production. A development pair that can deliver a Story with minimal consultation with others in the team and minimal dependencies on anybody else in the team is an autonomous delivery team. A Quality Assurance Analyst working from clear acceptance criteria for a Story can provide feedback to the development pair responsible for the development of the Story. Stories that adhere to the INVEST principle can be prioritized by a product owner and executed in a Kanban-like manner by the next available development pair.

A tightly coupled team operating in a command-and-control style of management doesn't scale down to a more atomic level of the individual or pair. The program manager creates a schedule of work, down to the individual tasks that will fulfill that work and the specialist roles that will fulfill those tasks. Project managers coordinate task execution among individual specialists in their respective teams. One project manager is told by three people working on tasks for a requirement that their respective tasks are complete, yet the whole of their work is less than the sum of the parts. Now the manager must chase after them to crack their skulls together to get them to realize they are not done, and needs to loop in the tech lead to figure out where the alignment problems(s) are. This is difficult enough to do when people are in distributed teams in a handful of office buildings; it's that much more difficult when they are working in isolation of one another. Product quality, delivery velocity, and costs all suffer.

Command-and-control management creates the illusion of risk-managed delivery at large scale with low overheads. Forget about scaling up with efficiency; to be robust, a management paradigm needs to be able efficiently to scale down to deliver meaningful business results at the atomic level of the individual or pair. Micromanagement does not efficiently scale down because of the inherently high overheads. Self-directed autonomous teams do efficiently scale down because of the inherently low overheads.

In 2013, I spilled a few photons on the management revolution that never happened: for a variety of reasons in the 1980s, we believed we were on the cusp of a devolution of authority; instead, we got much denser concentration of authority. In 2018, I spilled a lot of photons on autonomous teams at enterprise scale being an undiscovered country worth the risk of exploring.

The COVID-19 pandemic is creating intense managerial challenges right now. It is important to note that there are likely to be long-term structural effects on businesses as well. Perhaps companies will encourage employees to work from home more regularly so the company can permanently reduce office square footage and therefore lease expense. Perhaps a new generation of secure mobile technologies will make it seem idiotic that large swaths of workers are office rather than home based. Perhaps companies will revise their operating models and position specs, requiring greater individual role autonomy to maintain high degrees of productivity in regular and irregular operating conditions. Perhaps metrics for contract labor - metrics that are not attendance based - will emerge to satisfy expectations of value delivery.

Perhaps, with the potential for long-term effects looming, it is time to go explore that undiscovered country of autonomy.