Introduction

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • What are the basic principles for using spreadsheets for good data organisation?

Objectives
  • Describe best practices for organising data so that computers can make the best use of it for data analysis.

Spreadsheet programs

Good data organisation is the foundation of any research project that relies on data analysis - it is of paramount importance to get this first step right in order to avoid time-consuming “data wrangling” and “data cleaning” further down the line. Most researchers have data in spreadsheets or use spreadsheets for data entry. Spreadsheet programs are good for data entry and provide very useful graphical interfaces for handling basic data quality control functions. Spreadsheets can provide a lot of functionality that researchers need:

Many spreadsheet programs are available. Most researchers utilise Excel as their primary spreadsheet program (this lesson will make use of Excel examples), but free spreadsheet programs exist, including LibreOffice, Gnumeric, OpenOffice.org or Google Spreadsheets. Commands may differ a bit between programs, but the general idea is the same.

What’s wrong with spreadsheets?

The intricacies of spreadsheets make it hard to reproduce analysis and very difficult to spot and correct errors. Sometimes this is due to human error (you will learn how to avoid some of them); at other times it is due to the spreadsheet program itself: how many times have you accidentally done something in a spreadsheet that caused a problem (or just made you frustrated)? You are not alone! For example, 20% of genetics papers contain errors due to Excel converting gene names to calendar dates.

Using spreadsheets for data organisation, entry and cleaning

During this lesson, you will:

Key Points

  • Good data organisation is the foundation of any research project.