"Of the dozen cases of possible research misconduct I've looked into in the last ten years, I was able to retrieve the original data in exactly two." This from a colleague (I won't say with which university) bemoaning the state of current data management practices. When I quote this to some of the Data Wranglers here they're not at all surprised.
On the other hand, when I mentioned it to another colleague, a historian, she was rather shocked. In her field, keeping meticulous records and clear documentation of every statement of fact that goes into an article or book is standard practice. That there's a research world that has such an apparently casual attitude towards the data is foreign to her.
But in the biomedical research world, as the hapless teddy bear researcher in the brilliant NYU Library video says, all the data you need is in the article. You do your experiment, you extract the data you need for your article, you move on, leaving your data behind. ("So many boxes!") Changing that mindset is just one of the fundamental hurdles.
Each investigator looks at the world through the lens of their own practices, as if all of science and scholarship behaves the same way. I move through it like an anthropologist, trying not to let my own biases about the world color my perceptions of what the natives are doing and why.
When I embarked on this full-time gig fourteen months ago as the mysteriously titled Director of Digital Data Curation Strategies I believed I had a very good high-level understanding of the issues involved. I'd been dabbling in this space for many years, through the Open Access wars, my involvement with the Scholarly Publishing Roundtable and an increasing understanding that open access to data held far more potential for revolutionizing science than open access to journal articles. I knew that addressing the challenges at the institutional level would require bringing people together from all across the institution, that it wasn't a library problem amenable to a library solution. Indeed, it wasn't a problem localized in any unit of the university. Given the way our research institutions are organized, there isn't a unit within the typical university that obviously has primary responsibility for figuring this out. Most often, it's librarians who have taken the lead, but they can only touch a portion of the problem.
I still believe that I was correct. I did have a very good high-level understanding. But I did not imagine how delightfully complex it would be once I started to dig in.
I'm starting to get to know some of the #datalibs and a fascinating, brilliant and passionate tribe they certainly are. I'm learning a lot and enjoying that tremendously.
My perspective is a little different, though. I remain the quasi-outsider, observing through my anthropological lens. Since I'm no longer in the library, I'm not preoccupied with building a library service and marketing it to my research community. In Charleston, one of the panelists in the "Making Institutional Repositories Work" session was explicit that once you have developed a solid institutional repository service, the next step is to engage with the faculty to see what problems the IR can solve.
At the monthly Data Wranglers sessions, and in the numerous conversations I have with individuals throughout the campus, I'm mostly trying to listen. I want to understand what the problems are first. What do investigators need in terms of services & infrastructure to comply with the data management requirements of funders and publishers? How do we develop institutional policies that assist researchers rather than creating more administrative headaches? How do the needs of the social scientists and historians differ from the epidemiologists and brain mappers?
If we can map that out, then we can start to identify roles. What can the Office of Sponsored Programs take on? How do the libraries contribute? What do we need in terms of IT infrastructure? How do we incorporate effective data management practices into the various graduate and post-doc training programs? We'll probably identify the need for an institutional data repository of some sort at some point. But we haven't gotten there yet. I have much more field work to do.