I consult, write, and speak on running better technology businesses (tech firms and IT captives) and the things that make it possible: good governance behaviors (activist investing in IT), what matters most (results, not effort), how we organize (restructure from the technologically abstract to the business concrete), how we execute and manage (replacing industrial with professional), how we plan (debunking the myth of control), and how we pay the bills (capital-intensive financing and budgeting in an agile world). I am increasingly interested in robustness over optimization.

I work for ThoughtWorks, the global leader in software delivery and consulting.

Saturday, October 31, 2020

Playing the Cards You're Dealt

Some years ago, I was working with a company automating its customer contract renewal process. It had licensed a workflow technology and contracted a large number of people to code and configure a custom solution around it. This was no small task given the mismatch between a fine granularity of rules on the one hand and a coarse granularity of test cases on the other. The rules were implemented as IFTTT statements in a low-code language that did not allow them to be tested in isolation. The test cases consisted of clients renewing anywhere from one to four different types of contracts, each of which had highly variable terms and interdependencies on one another.

At the nexus of this mismatch was the QA team, which consisted almost entirely of staff from an outsourcing firm. An vendor had sold the company on QA capacity at a volume of 7 test scripts executed per person per day. They had staffed 50 total people to the program team, while the company had staffed four QA leads (one for each contract team). The outsourcing vendor was reporting no less than 350 test scripts executed by their staff every day, yet the QA managers were reporting very low test case acceptance and the development team was reporting the test case failures could not be replicated.

A little bit of investigation into one of the four teams exposed the mismatch. The outsourcing staff of this one team consisted of 10 people, contractually obligated to execute 70 test scripts. The day I joined, the team reported 70 test scripts executed, of which 5 passed and 6 failed.

Eleven being a little short of seventy, I wanted to understand the discrepancy. The answer from the contracted testers was, "we have questions about the remaining 59." The lead QA analyst - an employee, not a contractor - spent the entire day plus overtime investigating and responding to the questions pertaining to the 59. And then the cycle would start all over again. The next day it was 70 executed with 3 passed and 4 failed. The day after it was 70 executed with 1 passed and 9 failed. And the lead QA would spend the day - always an overtime day - responding to the questions from the outsourced team.

Evidently, this cycle had been going on for some time before I arrived.

We investigated the test cases that had been declared passed and failed. Turns out, those tests that were reported as having passed hadn't really passed: the tester had misinterpreted the results and reported a false positive. And those reported as failed hadn't actually failed for the reason stated: the tester had misinterpreted those results as well. On some occasions, it was the wrong data to test the scenario; in others, it had failed, but it was because a different rule should have executed. In just about every circumstance, it was false results. The outsourced testers were expending effort but yielding no results whatsoever. A brief discussion with the QA lead in each of the other three teams confirmed that they were experiencing exactly the same phenomenon.

After observing this for a week and concluding that no amount of interaction between the QA lead and the outsourced staff was going to improve either the volume of completions or fidelity of the results, I asked the one lead QA to stop working with the outsourced team, and instead to see how many test cases she could disposition herself. The first day, she conclusively dispositioned 40 test scripts (that is, they had a conclusive pass or fail, and if they failed it was for reasons of code and not of data or environment). The second day, she was up to 50. The third, she was just over 50. She was able to produce higher fidelity and higher throughput at lower labor intensity and for lower cost. And she wasn't working overtime to do so.

The outsourced testing capacity was net negative to team productivity. That model employed eleven people to do less than the work of one person.

This wasn't the answer that either the outsourcing vendor or the program office wanted. The vendor was selling snake oil - the appearance of testing capacity that simply did not exist in practice - and was about to lose a revenue stream. The program office was embarrassed for managing the maximization of staff utilization rather than outcomes (that is, relying on effort as a proxy for results).

The reaction of both vendor and program office weren't much of a surprise. What was a surprise was the fact that nobody had called bullshit up to that point. Experimenting with change wasn't a big gamble. The program had nothing to lose except another day of frustration rewarded by completely useless outputs from the testing team. So why hadn't anybody audited the verifiable results? Or made a baseline of testing labor productivity without the participation of the outsourcing team?

This wasn't a case of learned helplessness. The QA leads knew they were on the hook for meaningful testing throughput. The program office believed they had a lot of testing capacity that was executing. The vendor believed the capacity they had sold was not properly engaged. Nobody was going the motions, and everybody believed it would work. The trouble was, they were playing the cards they'd been dealt.

Some years later, I was working with a corporate IT department trying to contain increasing annual spend on ERP support. Although they had implemented SAP at a corporate level and within a number of their subordinate operating companies, they still had some operating companies using a legacy homespun ERP and all business units still relied on decades of downstream data warehouses and reporting systems. Needless to say, there were transaction reconciliation and data synchronization problems. The corporate IT function had entered into a contract with a vendor to resolve these problems. In the years following the SAP implementation, vendor support costs had not gone down but had gone up, proportional to the increase in transaction volume. The question the company was asking was why the support labor couldn't respond to more discrepancies given they had so many years experience with resolving them?

It didn't take a stroke of genius to realize that the vendor stood to gain from their customer's pain: the greater the volume of discrepancies, the more billing opportunities there were for resolution. Worse still, the vendor benefited from the same type of failure recurring again and again and again. The buyer had unwillingly locked themselves into a one-way contract: their choices were to live with discrepancies or pay more money to the vendor for more labor capacity to correct them. The obvious fix was to change the terms of the contract, rewarding the vendor for resolving the discrepancies at their root cause rather than rewarding the vendor for solving the same problem over and over and over. This they did, and the net result was a massive reduction of recurring errors, and a concomitant reduction in the contract labor necessary to resolve errors.

This was, once again, a problem of playing the cards that had been dealt. For years, management defined the problem of containing spend on defect / discrepancy resolution. They hadn't seen it as a problem of continuous improvement in which their vendor was a key partner in that improvement rather than a cost center to be contained.

There are tools that can help liberate us from constraints, such as asking the Five Why's. But such tools are only as effective as the intellectual freedom as we're allowed to pursue them in the first place. If the root question is "why is test throughput so low given the high volume of test capacity and the high rate of test execution", or "how can the support staff resolve defects more quickly to create more capacity", the exercise begins with confirmation bias, in this case that the operating model (the test team composition, the defect containment team mission) is correct. The Five Why's are less likely to lead to an answer that existentially challenges the paradigm in place if the primary question is too narrowly phrased. When that happens, the exercise tends to yield no better than "less bad."

It's all well and good for me to write about how I saw through a QA problem or a support problem, but the fact of the matter is we all fall victim to playing the cards that we're dealt at one time or another. A vendor paradigm, a corporate directive, a program constraint, a funding model, an operating condition limits our understanding of the actual problem to be solved.

But reflecting on it is a reminder that we must always be looking for different cards to play. Perhaps now more than ever, as low contact and automated interactions permanently replace high contact and manual ones in all forms of business, we need to be less intellectually constrained before we can be more imaginative.