I consult, write, and speak on running better technology businesses (tech firms and IT captives) and the things that make it possible: good governance behaviors (activist investing in IT), what matters most (results, not effort), how we organize (restructure from the technologically abstract to the business concrete), how we execute and manage (replacing industrial with professional), how we plan (debunking the myth of control), and how we pay the bills (capital-intensive financing and budgeting in an agile world). I am increasingly interested in robustness over optimization.

I work for ThoughtWorks, the global leader in software delivery and consulting.

Friday, May 07, 2010

Digital Squalor

In the not too distant past, storage was limited and expensive. As recently as 1980, 1 megabyte of disk storage cost $200. But this is no longer the case. Today, you can buy 8,000 megabytes (a.k.a. 8 gigabytes) for $1. Storage capacity is now so abundant and compact that you can record every voice conversation you’ll ever have in a device that can fit into the palm of your hand.

What this means is that storage is no longer a physical (capacity) challenge, but a logical (organization) one. We’re maximizing the prior, storing everything we can digitize. Unfortunately, we’re not really making a lot of progress on the latter, as “intelligence” eludes us in an ever-expanding swamp of “data.”

Let’s think about the characteristics of data, just on a personal level.

  • We have data everywhere. E-mails contain data. So do documents and spreadsheets. So do various applications, such as a local contact manager. So do subscription services, such as Salesforce.com. So do financial management tools (be it Quickbooks or Oracle Financials.) So does Twitter. So digital photos. So do news feed subscriptions. So do voicemails. So do Podcasts and webinars for that matter.
  • We have a lot of redundant data. How many different booking systems have your frequent flier numbers, know that you prefer an aisle to a window, and know that you prefer a vegetarian meal on long-haul flights? And how much of that has changed since you last edited your profile in each of those systems? Or, think about contact information. How many places do you have your co-worker's (multiple) contact details spread out: in your mobile phone? Corporate directory? Google contacts? Personal e-mail box?
  • There is data in the inter-relationships among data. This document references this spreadsheet, and both were discussed in this meeting on this date with these people. Copies of drafts under discussion at the time may be attached or referenced to the meeting invitation.
  • Our data is inconsistent. We have full contact information for some people who attended a meeting because they’re in the company directory, but perhaps we have only personal data for some because we’re connected to them via LinkedIn, and still for others all we have is an e-mail address.
  • Data has different meaning depending the context. A contract from 2005 between one firm and another is a binding legal document in the context of that relationship. But that document is also a source of language that might be useful when we are drawing up a contract with the same people in that firm, with different people in that firm, or with a different firm all together. Or a specific presentation from 5 years ago may have referenceable content, but at the moment we're only interested in the fact that it encapsulates a template that has elements you want to re-use.
  • We lug this data around with us. Some of it we carry around with us in the file system paradigm, moving it from laptop to laptop. Some we have in our smart phones and media players. Some is stored in a managed service like LinkedIn. Some is managed for us in a service like iTunes. There have been attempts to corral and manage slices of this data: for example, consolidating contact details, e-mail history, proposals in a single CRM system. None have been runaway successes. They’re either incomplete, inadequate, or simply too much work to sustain.

And that’s just a recon of our personal data. The scope of this is amplified several orders of magnitude on a corporate and societal level. To wit: marketing departments seem perpetually engaged in contact list consolidation and clean-up. Then there are all those automatic feeds setup to get everything from bond prices to today’s weather to city council meeting notes.

The fact is, we already live in digital squalor. In a relatively short period of time, we’ve gone from having very little digitally stored, to having a lot digitally stored. Only, along the way we didn’t give much thought to maintaining good hygiene of it all. We have data everywhere. Some structured, some not. Some readily accessible, some long forgotten, and some we’re not entirely certain have integrity any longer. And the bad news is, we’re accumulating data at an exponentially increasing rate.

We tame the data monster through our mental memories and our synaptic processes. A memory or an idea triggers a recollection, so we know to go look for something and roughly where we might find it. Sometimes we're able pull together distinct pieces of data - possibly squirrled away over a period of several years - to derive some useful information. But not all data is created equally, so when we go mining through data, we have to judge whether it has sufficient integrity for our purpose. Is it current enough? Is it from a credible source? Is it a final version or a draft? The bottom line is, it’s human intervention that allows us to bring order out of ever-increasing data chaos.

We're going to be living in digital squalor for quite some time. There are some interesting conclusions we can draw from that.

Our principal tool for managing the data bloat is search. Search is a blunt instrument. Search is really a simple attribute-based pattern matching tool that abdicates results processing to the individual. Meta-tagging is limited and narrow, so we don’t really have much in the way of digital synaptic processes. As the data behemoth grows, search will be decreasingly effective.

But as our digital squalor expands, it presents opportunity for those who can produce a clear, distinct signal from so much noise, e.g., by bringing data and analyses to bear on problems in ways never previously done. One example is FlightCaster, which applies complex analytics on publicly available data (such as the weather, and current flight status and historical flight data) to advise whether you should switch flights or not. It's a decision support tool providing an up-to-date analysis at the moment of decision where none existed previously.

This marks a significant change in IT. We've spent most of the past 60 years in technology creating tools to automate and digitize tasks and transactions. We now have lots of tools. Because of the tools, we also have lots of data. For the first time in history, we can get powerful infrastructure with global reach for rediculously little capital outlay:

  • the internet allows us to access vast amounts of specialized data;
  • cloud computing gives us virtually unlimited, pay-as-you-go computing power to analyze it;
  • smartphones on mobile internet give us an ubiquitous means to deliver our analyses.

Historically, Information Technology has focused on the "technology". Now, it's focused on the "information".

Digital squalor gives us the first broad-based tech-entrepreneurial opportunity of the 21st century. We're now able to pursue information businesses that wouldn't have been viable just a few years ago. We’re limited only by our imagination: what would I really like to know at a specific decision-making moment?

Answer that, and you've found your next start-up.