Good data examples
Message of the day
Good data are FAIR – Findable, Accessible, Interoperable, Re-usable
Things to consider
What makes data good?
- It has to be readable and well enough documented for others (and a future you) to understand.
- Data has to be findable to keep it from being lost. Information scientists have started to call such data FAIR — Findable, Accessible, Interoperable, Re-usable. One of the most important things you can do to keep your data FAIR is to deposit it in a trusted digital repository. Do not use your personal website as your data archive.
- Tidy data are good data. Messy data are hard to work with.
- Data quality is a process, starting with planning through to curation of the data for deposit.
Remember! “Documentation is a love letter to your data”
Example: This dataset is still around and usable more than 50 years after the data were collection and more than 40 years after it was last used in a publication.
Counterexample: This article: http://www.sciencedirect.com/science/article/pii/S1751157709000881 promises:
“Statistical scripts and the raw dataset are included as supplemental data and are also available at http://www.researchremix.org.”
(Used by recommendation of the author who has long since become enlightened. The data have made it into a trusted repository too.)
Hadley Wickham tells you how to tidy your data: http://vita.had.co.nz/papers/tidy-data.pdf
Project TIER teaches undergraduate students how to structure data for reproducible research: http://www.projecttier.org/tier-protocol/specifications/
UK Data has great instructions for how to document your data: http://www.data-archive.ac.uk/create-manage/document
If you want to go all in, look at the instructions for documenting data in ICPRS’s Guide to Social Science Data Preparation and Archiving http://www.icpsr.umich.edu/files/deposit/dataprep.pdf
Example: Data can take many forms. This compilation of “Morale and Intelligence Reports” collected by the UK Government during and after the war is a great example of qualitative historical data: https://discover.ukdataservice.ac.uk/catalogue/?sn=7465
What is your favorite data set? How/why is it good for your project? Try out the FAIR Principles to describe and share examples of good data for your discipline. Tell us on Twitter (#loveyourdata) or in the comments below!