Using spreadsheet programs for data wrangling
Overview
Teaching: 10 min
Exercises: 20 minQuestions
What are your good and bad experiences in working with spreadsheets?
Objectives
Get an overview of good spreadsheet usage and potential issues
Spreadsheet usage is often useful and neccessary for librarians. We can use them for:
- Viewing data
- Storing data
- Organizing data (categorizing and sorting)
- Visualizing data (charts and graphs)
- Importing and exporting data
What these lessons will teach you
- How to enter data in a structured way
- How to properly format data
- How to work with dates
- How to export and import data
-
How to sort and filter data
What these lessons will not teach you
- How to do plotting i.e. charts and graphs in a spreadsheet
- How to write code i.e. formulas and macros in spreadsheet programs
- How to do statistics in a spreadsheet
These topics are outside of the scope of these lessons, if you’re looking to do this, a good reference is Head First Excel by O’Reilly Media.
Spreadsheet programs
Commands may differ a bit between programs, but the general idea is the same.
- LibreOffice
- Microsoft Excel
- Gnumeric
- OpenOffice.org
In this lesson, we will assume that you are most likely using Excel as your primary spreadsheet program - there are others (gnumeric, Calc from OpenOffice), and their functionality is similar, but Excel is the package you’re most likely to have available on your work computer. A few of the menu commands we use are excel specific, so bear that in mind if you aim to use another spreadsheet program. We will indicate these in the lesson material as far as we are aware of them.
Common usages and problems with spreadsheets
Spreadsheets are very useful for data entry, and viewing data.
Importing and exporting data with spreadsheet programs is common routine
in many libraries, and often the data needs to be tidied up before export or after import.
Problems often arise when spreadsheets are used for generating tables for reports that are made for readability. Colors are used, border are added, cells are merged and tables are spilt up. Changes like these are not only cosmetic, they can make it difficult or downright impossible to do future work on the data, such as sorting and summarizing.
A good routine is to keep the original data separate from the prettified data, for example having two sheets, one with the table of original (raw) data, and one sheet with the table of prettified data:
Optionally you can use word processors like Word to make pretty tables and use Excel for organizing the data.
Questions:
Discuss with your neighbour:
- What kind of work do you do in spreadsheets? Try to discuss some practical examples where you structure data (sorting, filtering etc) in your work.
- Have there been any smart discoveries you have made to save work time with spreadsheets?
- What do you find frustating and what do want to get done better or faster?
- Do you find it easy or hard to reverse something you did wrong?
- Is it easy for you to find good information about your spreadsheet program where you work?
- Where do you go to find help? Internet, colleague, IT-department, niece?
In this lesson, we’re going to talk about:
- Formatting data tables in spreadsheets
- Formatting problems
- Dates as data
- Basic quality control and data manipulation in spreadsheets
- Exporting data from spreadsheets
- Data export formats caveats
Key Points
We will discuss good practices for data entry, structuring and import/export
We will not discuss analysis or visualisation