Using spreadsheet programs for data wrangling

Overview

Teaching: 10 min
Exercises: 20 min
Questions
  • What are your good and bad experiences in working with spreadsheets?

Objectives
  • Get an overview of good spreadsheet usage and potential issues

Spreadsheet usage is often useful and neccessary for librarians. We can use them for:


What these lessons will teach you

What these lessons will not teach you

These topics are outside of the scope of these lessons, if you’re looking to do this, a good reference is Head First Excel by O’Reilly Media.


Spreadsheet programs

Commands may differ a bit between programs, but the general idea is the same.

In this lesson, we will assume that you are most likely using Excel as your primary spreadsheet program - there are others (gnumeric, Calc from OpenOffice), and their functionality is similar, but Excel is the package you’re most likely to have available on your work computer. A few of the menu commands we use are excel specific, so bear that in mind if you aim to use another spreadsheet program. We will indicate these in the lesson material as far as we are aware of them.


Common usages and problems with spreadsheets

Spreadsheets are very useful for data entry, and viewing data.
Importing and exporting data with spreadsheet programs is common routine in many libraries, and often the data needs to be tidied up before export or after import.

Problems often arise when spreadsheets are used for generating tables for reports that are made for readability. Colors are used, border are added, cells are merged and tables are spilt up. Changes like these are not only cosmetic, they can make it difficult or downright impossible to do future work on the data, such as sorting and summarizing.

A good routine is to keep the original data separate from the prettified data, for example having two sheets, one with the table of original (raw) data, and one sheet with the table of prettified data:

alt text alt text

Optionally you can use word processors like Word to make pretty tables and use Excel for organizing the data.


Questions:

Discuss with your neighbour:

  • What kind of work do you do in spreadsheets? Try to discuss some practical examples where you structure data (sorting, filtering etc) in your work.
  • Have there been any smart discoveries you have made to save work time with spreadsheets?
  • What do you find frustating and what do want to get done better or faster?
  • Do you find it easy or hard to reverse something you did wrong?
  • Is it easy for you to find good information about your spreadsheet program where you work?
  • Where do you go to find help? Internet, colleague, IT-department, niece?

In this lesson, we’re going to talk about:

  1. Formatting data tables in spreadsheets
  2. Formatting problems
  3. Dates as data
  4. Basic quality control and data manipulation in spreadsheets
  5. Exporting data from spreadsheets
  6. Data export formats caveats

Key Points

  • We will discuss good practices for data entry, structuring and import/export

  • We will not discuss analysis or visualisation