Research Data: Management

Resources for researchers eager to use and manage their data ethically and reproducibly.

Planning for Data

Planning for Data

I will store all data on at least one, and possibly up to 50, hard drives in my lab. The directory structure will be custom, not self-explanatory, and in no way documented or described. Students working with the data will be encouraged to make their own copies and modify them as they please, in order to ensure that no one can ever figure out what the actual real raw data is.

Backups will rarely, if ever, be done.

Brown 2010

Projects are processes. Any research project requires a certain amount of planning at the outset, and a project involving research data is no different. Over the past decade, federal grant-funding agencies have begun to require the inclusion of Data Management Plans as part of grant proposals. But planning out one's data management is valuable even when it's not just for a grant application.

Why Manage Data?

Why Manage Data?

Data that is well managed is data that lasts. With increased focus on reproducibility and replicability in research, research data benefits from organization and management. Furthermore, well managed data is often a priori FAIR (findable, accessible, interoperable, and reusable) data. Even if sharing is not an explicit goal regarding a research project, data management benefits even sharing data with your future self. Whether a week, year, or decade removed from capturing the data, you will benefit from a good description and organization of how that data was captured.

Data Management Plans

Over the past decade, federal grant-funding agencies and other funders have begun requiring data management plans (DMPs) as part of the grant application process. Often, they can be built from the templates provided by the DMPTool. They even provide some sample DMPs. However, just because a tool exists to template a DMP does not mean that DMPs should be boilerplate documents that seem more like a nuisance than anything else. Instead, DMPs are often the best way for the researcher and their team to imagine the longevity of their project, especially with regard to reproducibility and FAIR principles.

What Questions Does a DMP Answer?

Whether filing a DMP with a funding agency or merely planning a research project, consider answering the following questions:

  1. What types of data will we be collecting or creating? This can include everything from instrument readings of physical samples, tabular numeric data, and geospatial data to text corpora, survey responses, and photographs. This encompasses everything that, as the federal government notes, is "the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: Preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This ‘recorded’ material excludes physical objects (e.g., laboratory samples)” (U.S. Office of Management and Budget 1999).
  2. What formats will we use to save our data? This can be lines in an Excel spreadsheet, index cards, a database in the cloud, or anything in between.
  3. How will we describe the data? Your research datasets need accompanying metadata, which describes the datasets. This could include a codebook that describes each column in a table, including what type (numeric, textual) of data will be found in each column. It can also include a narrative of building up the datasets.
  4. Who has access to the data and how do they access it? Research is often done in teams, which can make up a faculty member and a a research assistant, a campus lab, or even a multi-institutional venture. Data needs to be shared. Furthermore, who has access to what data at any given time will change as the research project progresses. It is common for researchers to share data internally with the team and only open their datasets to the public once the datasets accompany publications. Finally, research data can be accessed on a shared drive, or it can be deposited in an institutional repository.
  5. How FAIR is the data? Once a dataset is ready for dissemination, we ask how "FAIR" it is. Is it findable? How? Is it accessible (understood broadly)? 
  6. Who manages its integrity? 

Planning for Data Links