Tidy data for librarians

Key Points

Using spreadsheet programs for data wrangling
  • We will discuss good practices for data entry, structuring and import/export

  • We will not discuss analysis or visualisation

Formatting data tables in Spreadsheets
  • Use one column for one variable

  • Use one row for one observation

  • Use one cell for one value

Formatting problems
  • Don’t use multiple tables in one sheet

  • Fill in zero when you mean zero

  • Use a blank cell to indicate a null value

  • Don’t use formatting to convey information or make the data look pretty

  • Don’t put units or comments in cells

  • Don’t combine several values in one cell

  • Take care over column names

  • Avoid including special characters

  • Put metadata (units, legends etc.) in a separate sheet

Dates as data
  • Excel understands many ways of entering dates, but if you format date columns beforehand you can detect any confusions.

  • Dates can alternatively be stored as YEAR, MONTH, DAY or YEAR, DAY-OF-YEAR in separate columns.

Basic quality assurance and control, and data manipulation in spreadsheets
  • Use data validation tools to minimise the possibility of input errors.

  • Use sorting and conditional formatting to identify possibly errors.

  • Use filtering as an alternative to tidying of data.

Importing and exporting data
  • Use the CSV file format for data storage and exchange

Caveats of popular data and file formats
  • Be careful when using commas in values