5  Good Data Practices

Learning Objectives

  • Assess and document the validity, accuracy, and origin of data sources.
  • Ensure the reproducibility of results using detailed data documentation.
  • Understand the importance of data provenance and its impact on data quality and reliability.

5.1 Data Provenance

It matters where you get you data! When starting your search for data to address a research question, think about:

Validity

  • Is this data trustworthy? Is it authentic?
  • Where did the data come from?
  • How has the data been changed / managed over time?
  • Is the data complete?

Comprehension

  • Is this data accurate?
  • Can you explain your results?
  • Is this the right data to answer your question?

Reproducibility

People should be able to fully replicate your results from your raw data and code.

5.2 Documenting Data Sources

Document your data source like a museum curator

Whenever you download data, you should at a minimum record the following:

  • The name of the file you are describing.
  • The date you downloaded it.
  • The original name of the downloaded file (in case you renamed it).
  • The url to the site you downloaded it from.
  • The source of the original data (sometimes different from the site you downloaded it from).
  • A short description of the data, maybe how they were collected (if available).
  • A dictionary for the data (e.g. a simple markdown table describing each variable).