2017 – Wednesday

Good data examples

Message of the day

Good data are FAIR – Findable, Accessible, Interoperable, Re-usable

Things to consider

What makes data good?

  1. It has to be readable and well enough documented for others (and a future you) to understand.
  1. Data has to be findable to keep it from being lost. Information scientists have started to call such data FAIR — Findable, Accessible, Interoperable, Re-usable. One of the most important things you can do to keep your data FAIR is to deposit it in a trusted digital repository. Do not use your personal website as your data archive.
  2. Tidy data are good data. Messy data are hard to work with.
  3. Data quality is a process, starting with planning through to curation of the data for deposit.

Source: http://www.phdcomics.com/comics/archive.php?comicid=1612
Remember! “Documentation is a love letter to your data”


Example: This dataset is still around and usable more than 50 years after the data were collection and more than 40 years after it was last used in a publication.

Counterexample: This article: http://www.sciencedirect.com/science/article/pii/S1751157709000881 promises:

“Statistical scripts and the raw dataset are included as supplemental data and are also available at http://www.researchremix.org.”

Alas: researchremix_sitecontent
(Used by recommendation of the author who has long since become enlightened. The data have made it into a trusted repository too.)

Hadley Wickham tells you how to tidy your data: http://vita.had.co.nz/papers/tidy-data.pdf


Project TIER teaches undergraduate students how to structure data for reproducible research: http://www.projecttier.org/tier-protocol/specifications/

UK Data has great instructions for how to document your data: http://www.data-archive.ac.uk/create-manage/document

If you want to go all in, look at the instructions for documenting data in ICPRS’s Guide to Social Science Data Preparation and Archiving http://www.icpsr.umich.edu/files/deposit/dataprep.pdf

Example: Data can take many forms. This compilation of “Morale and Intelligence Reports” collected by the UK Government during and after the war is a great example of qualitative historical data: https://discover.ukdataservice.ac.uk/catalogue/?sn=7465


What is your favorite data set? How/why is it good for your project? Try out the FAIR Principles to describe and share examples of good data for your discipline. Tell us on Twitter (#loveyourdata) or in the comments below!