Back in the day, astronomers studied galaxies one at a time.
Data about each metropolis of stars had to be pieced together slowly. These individual studies were then combined so that a broader understanding of galaxies and their histories as a whole could slowly emerge.
Then, along came the Sloan Digital Sky Survey and everything changed. Using a special purpose telescope and computer-driven data collection, the Sloan Survey fire-hosed millions of objects onto the laps of astronomers. That was the beginning of "Big Data" in astronomy, and it changed the way we understood our place in the universe.
Now, a visionary group of scientists believes it can do for society what Sloan did for galaxies. Using the ever-increasing capacities of Big Data, the goal is to change the way we understand the human universe.
The Kavli HUMAN Project is a collaboration between the Kavli Foundation, the Institute for the Interdisciplinary Study of Decision Making at NYU, and the Center for Urban Science and Progress at NYU. It's based on one simple goal — and an array of stunningly complex technologies required to get there. Here "HUMAN" stands for "Human Understanding through Measurement and Analysis." The project's vision is to "generat[e] a truly comprehensive longitudinal dataset that capture[s] nearly all aspects of a representative human population's biology, behavior, and environment."
But why is such a study needed, and what does Big Data have to do with it?
For questions associated with human health and well-being, there's always been a gap between scientific requirements for truly all-encompassing studies and the data that could be gathered for those studies. Consider a question like the relationship between getting older and the decline of various body functions. Studies have shown people experience radically different pathways in their aging. For example, one recent study showed that the "biological age" of a group of 38-year-olds (determined through different bio-markers) varied from 28 to 61. That was a huge insight — but it then raised the question why do some people "age" more quickly than others? Getting an answer to that mystery has huge ramifications for society via the innumerable costs of health care.
From the perspective of the HUMAN project, our inability to answer questions like these is first and fundamentally a "data problem." The human condition is an insanely complex mix of biology, behavior and environment. It's so complex that science simply hasn't been able to even see the complexity with enough resolution to begin getting answers. As a recent paper outlining the HUMAN project puts it:
"... Cutting-edge questions are unanswered because we lack the data related to the genetic regulators of aging processes; the impact of intrauterine growth restrictions and child maltreatment; the interaction of aging with cognitive stimulation in early, mid, and later life; the interaction of stress and physical activity; and the interaction of all of these with economic status."
The essence of the HUMAN project is an attempt to create, for the first time, a truly comprehensive view of the human condition through which comprehensive questions can be pursued. What does this look like in practice? It looks like a big honkin' dataset that will be large and amazing enough to put the viability of Big Data to the test.
The HUMAN project intends to follow the lives of 10,000 human beings in exquisite detail over 20 years. The data gathered would include: "regular full genome sequencing (3 billion base pairs)," in-person assessments of well-being/cognitive status and smartphone apps for gathering geo-location data at regular intervals. In addition (and with the participants' consent), data on bills, purchases and expenses would be continually gathered.
By centering the study in New York City — which has been an innovator in the development of public databases — the study would also "see" the human and physical environments of its participants. High-resolution data of the city's evolution would come in the form of census data, education and crime statistics and pollution records. Also, the Center for Urban Science and Progress (CUSP) is developing its own methods for "observing" the city in new and powerful ways. I spent some time there a bit more than a year ago and marveled at the "remote sensing" strategies CUSP and its leadership (like physicist and former DOE Undersecretary of Energy Steve Koonin) were developing.
There has been a lot of hype about Big Data over the past few years with headlines ranging from "world savior" to the "Big Brother." I fall on the cautiously optimistic side of the fence. The capacity to "see" big complex systems (like a city and its inhabitants) through multivaried perspectives and at high resolution (in space and time) holds enormous promise. But with great promise comes great responsibility. The application of Big Data to the public good on a large scale must still be proved.
But that's what makes the Kavli HUMAN Project so exciting. It's exactly the kind of approach that can show us whether Big Data really works. And if it does work, then, perhaps, we will truly have entered a new era of science and begun gaining a new vision of the human condition.
Adam Frank is a co-founder of the 13.7 blog, an astrophysics professor at the University of Rochester, a book author and a self-described "evangelist of science." You can keep up with more of what Adam is thinking on Facebook and Twitter: @adamfrank4.