Rubicon Insight Social Consulting, LLC

Across the Rubicon

The only sustainable strategy is adaptation

Is it Data Yet?

Why we really do need to think about the science of data.


Data is all the rage. There’s a proliferation of data analysts, data scientists, data architects, data engineers, data story-tellers. Everywhere you look businesses, organizations, policies, decisions, and processes are all touted as data-driven. One starts to get the sense of the old advertising tropes – “New & Improved! Now with XYZ…!” as though the rush to be perceived as data-driven entails an improvement. Were you not basing anything on data before and just making it all up as you went?


Well, obviously not. Data is not new, but our ability to handle it and derive insights from it has arguably been vastly improved. We now have incredibly sophisticated methods and far more computing power to wield them. There is, however, a subtle and often missed question lurking in the background. What, exactly, is data?


It may seem a specious question at first, but how we define what is and (especially) is not data plays a large part in how successful are attempts to derive insights from it. This is so important that I’m actually going to skip right to the punchline:

It isn’t data until you have a question.

Almost anything can be data, but what actually is data ultimately depends on what question you are trying to answer. Consequently, for it to be good data there also has to be a good question behind it. We all know that arriving at the right answers requires asking the right questions, but identifying what is the right data for that question can be surprisingly difficult.


A data science parable from the early 1900s

To give an illustration of why data needs a question – a why to go with the what – I’ll use what may seem to be an odd example. Franz Boas is generally regarded as the father of modern anthropology in the United States. A fascinating and somewhat controversial historical figure, Boas nonetheless had an enormous influence on the study of human history and behavior.


Boas’ early career was in natural history as a museum curator, which may have contributed to his later methods for a scientific anthropology. In short, he collected everything related to a group or culture – artifacts, stories, languages, art, folklore… all of it. There are notebooks, tomes, and encyclopedic volumes just for the catalogs of all the things they collected. Boas and his contemporaries filled museums, and entire careers were made just from cataloguing it all – over decades!


The basic intuition was that a science of human behavior had to be empirical. By gathering all the empirical data possible then, the universal patterns they were looking for should become obvious once enough data was available. It certainly seemed a reasonable approach. That’s not quite what happened, though.


Instead, most of that material ended up gathering dust and anthropologists looked for a different way to go about being a science. Why? In no small part because there was too much of it to even begin looking for patterns. Since no real question was ever formulated beyond a broad “why are people different” there was no way to know what was a pattern. They had a lot of potential data, but with no question they had no way to parse it at all as data.


Data is as data does

In an age where everyone is trying to hoover up as much data as possible in hopes that it will somehow drive value, understanding this difference becomes exceedingly important on multiple levels. It’s critical to know what questions are being asked, and whether these volumes of data are really worth the costs. Not just economically, but also ethically – should we be collecting data without a clear idea of why? …are there collective benefits to go with those costs?


For potential data to be useful data, there needs to be a question. Data scientists may sometimes refer to this as feature selection or feature engineering, but what it really means is that for a measurement or observation to be data it has to be directly related to an outcome or question.


There has to be a context to make observations into data. All to often in the rush to chase value that context is an afterthought. In reality, to be data-driven means being question-driven from the onset. It’s the question that determines whether data can become information, which ultimately is the goal.

Archaeological excavations, Caerleon [6] by Robin Drayton, [CC BY-SA 2.0]
By RISC, LLC 24 Feb, 2022
The whole discipline of archaeology, and the reason for all of our methods, is geared towards finding ways to collect as many forms of data possible to understand and fully describe events in the past. All of human behavior leaves something behind, and the questions are how can we find it and how much of it can we collect? Curiously, the reason why we do these admittedly strange things is exactly why the corporate world really needs more archaeologists…
black hole
By RISC, LLC 17 Feb, 2022
Some might say that one is the loneliest number, but more often than not it’s zero that really gets the short end of the stick. This post is dedicated to a topic that frequently gets overlooked and is often under-considered – the absence of something. Very often, it’s just as important for us to know where something isn’t happening as it is to know where something does occur. So important, in fact, that it’s crucial to understand that zero isn’t really defined as the absence of something – it is actually the presence of nothing.
sustainable world
By RISC, LLC 09 Feb, 2022
One wouldn’t think that it would be overly difficult to agree on what a word means, but a remarkably large number of meanings seem to get attached to sustainable. There are nearly as many definitions of sustainable and sustainability as there are suggestions on how to achieve them. Not surprisingly, there is also a rather strong correlation between how the terms are defined and the problems or objectives they are used to discuss. Therein lies a bit of a problem – it’s difficult to arrive at any consensus on a subject if people are using the same word to mean different things. Since policy debates require consensus, this is not a trivial problem.
More Posts
Share by: