I'm On An Anthropological Expedition

"Of the dozen cases of possible research misconduct I've looked into in the last ten years, I was able to retrieve the original data in exactly two." This from a colleague (I won't say with which university) bemoaning the state of current data management practices. When I quote this to some of the Data Wranglers here they're not at all surprised.

On the other hand, when I mentioned it to another colleague, a historian, she was rather shocked. In her field, keeping meticulous records and clear documentation of every statement of fact that goes into an article or book is standard practice.  That there's a research world that has such an apparently casual attitude towards the data is foreign to her.

But in the biomedical research world, as the hapless teddy bear researcher in the brilliant NYU Library video says, all the data you need is in the article.  You do your experiment, you extract the data you need for your article, you move on, leaving your data behind.  ("So many boxes!")  Changing that mindset is just one of the fundamental hurdles.

Each investigator looks at the world through the lens of their own practices, as if all of science and scholarship behaves the same way. I move through it like an anthropologist, trying not to let my own biases about the world color my perceptions of what the natives are doing and why.

When I embarked on this full-time gig fourteen months ago as the mysteriously titled Director of Digital Data Curation Strategies I believed I had a very good high-level understanding of the issues involved. I'd been dabbling in this space for many years, through the Open Access wars, my involvement with the Scholarly Publishing Roundtable and an increasing understanding that open access to data held far more potential for revolutionizing science than open access to journal articles. I knew that addressing the challenges at the institutional level would require bringing people together from all across the institution, that it wasn't a library problem amenable to a library solution. Indeed, it wasn't a problem localized in any unit of the university.  Given the way our research institutions are organized, there isn't a unit within the typical university that obviously has primary responsibility for figuring this out.  Most often, it's librarians who have taken the lead, but they can only touch a portion of the problem.

I still believe that I was correct. I did have a very good high-level understanding. But I did not imagine how delightfully complex it would be once I started to dig in.

I'm starting to get to know some of the #datalibs and a fascinating, brilliant and passionate tribe they certainly are.  I'm learning a lot and enjoying that tremendously.  

My perspective is a little different, though.  I remain the quasi-outsider, observing through my anthropological lens.  Since I'm no longer in the library, I'm not preoccupied with building a library service and marketing it to my research community.  In Charleston, one of the panelists in the "Making Institutional Repositories Work" session was explicit that once you have developed a solid institutional repository service, the next step is to engage with the faculty to see what problems the IR can solve.

At the monthly Data Wranglers sessions, and in the numerous conversations I have with individuals throughout the campus, I'm mostly trying to listen.  I want to understand what the problems are first.  What do investigators need in terms of services & infrastructure to comply with the data management requirements of funders and publishers?  How do we develop institutional policies that assist researchers rather than creating more administrative headaches?  How do the needs of the social scientists and historians differ from the epidemiologists and brain mappers?

If we can map that out, then we can start to identify roles.  What can the Office of Sponsored Programs take on?  How do the libraries contribute?  What do we need in terms of IT infrastructure?  How do we incorporate effective data management practices into the various graduate and post-doc training programs?  We'll probably identify the need for an institutional data repository of some sort at some point.  But we haven't gotten there yet.  I have much more field work to do.



Data Wranglers in the Edge of Chaos

I love these lines from Rex Sanders:

If the data you need still exists;

If you found the data you need;

If you understand the data you found;

If you trust the data you understand;

If you can use the data you trust;

Someone did a good job of data management.

It encapsulates the goal as well as anything I've seen.

I used it to lead into the first of what I intend to be more or less monthly informal discussion sessions with the folks I'm somewhat tongue-in-cheek referring to as Data Wranglers.  We gathered in the CafĂ© at the Edge of Chaos (conveniently just a few steps from my office).   I scheduled it for 4:00 with beer in the fridge and wine on the counter, gave a five minute intro to some of the issues (essentially, what does the institution need to do to facilitate good data management) and opened it up for discussion.  These folks are not shy.

Included among the dozen who came were an Institute of Medicine member who is a staunch OA advocate and leads several biostatistics groups, the PI of a very large multi-institutional longitudinal study of stroke risk factors, a computer scientist who runs a multidisciplinary team engaged in brain mapping, the director of the clinical data warehouse, an expert in decision support systems, and a woman working with NASA to link satellite, EPA and public health data.  The others were equally diverse and distinguished.  A fascinating group, all of whom have a keen interest in how we manage research data.

We touched on a number of key themes:

  • Concerns about data sharing contrasted with the value of data sharing
  • The limitations of metadata in supplying sufficient context for data re-use
  • The dangers of one-size-fits all policies
  • The need to provide good information support to investigators in response to imminent federal funder requirements for open data
  • Information sharing vs data sharing
  • Role of commercial interests

I have an ever expanding list of (currently about 40) people from across the campus that I'm inviting to these sessions.  My overarching goal is to build a community of interest, make connections among people who have similar concerns but may not know each other, and use these discussions to drive priorities and strategy.  It's a Wicked Problem, which is what the Edge of Chaos is all about.

After 15 years of working on these issues around the demands of my day job as LHL director, for the past nine months or so I've been able to dig in full time.  It's become clearer than ever that it requires strong collaborative efforts that cross institutional boundaries.  That is very tough to do, given the way that research institutions are organized and the siloed culture of those institutions.

In most places, it's the librarians that have taken the lead, usually in developing services around DMP requirements and, increasingly, tracking the new federal funder requirements for public access to publications and data.  But this is much more than a library problem.

I have been quite struck by how much my perspective has been shifted by the fact that I am doing this out of the Provost's Office rather than out of the library.  My focus is on engaging intensively with researchers across disciplines, the folks in IT and OSP and compliance, and using a very organic approach to surface issues and needs.  Out of that, we'll try to identify the things that the various components of the university can do to help us all do a better job managing research data.  My monthly Data Wranglers discussions are a key component of that approach.

I've come to appreciate that the challenges in achieving Rex Sanders' vision across the entire institution are practically insurmountable.  I've always had a deep empathy for Don Quixote's battle with the windmills.  That must be why I'm having such a good time.