Best Practices in Data Organisation Using Spreadsheets

Credit: Data Carpentry

This lesson is based on Data Carpentry’s “Data Organisation In Spreadsheets” lessons from the ecology and social sciences curricula.

Good data organisation is the foundation of any research project. Most researchers have data in spreadsheets, so it is the place that many research projects start.

We often organise data in spreadsheets in the ways that we as humans want to work with data. However, in order to use tools that make computation and data analysis more efficient, reusable and reproducible, such as programming languages like R or Python, we need to structure our data in a particular way so that computers can “understand” and “make use of” the data. Since this is where most research projects start, this is where we want to start too!

In this lesson, you will learn:

All these will help you to get your current data into a good shape and plan new data collections so less “data cleaning” and “data wrangling” is needed in the future.

In this lesson, however, you will not learn:

There are many good tutorials available online on the topic of data analysis in spreadsheets. If you are looking to learn any of the above, a good reference is Head First Excel, published by O’Reilly.

Screenshots

Throughout this lesson we will show content and screenshots from Excel. As an actively developed software, Excel is adding new features or changing visual elements. Screenshots in this lesson may then become slightly out-of-sync and may not look identical as on your computer.

Getting Started

This lesson assumes no prior knowledge of the computational skills or tools apart from the basic usage of spreadsheet software.

To get started, follow the directions in the “Setup” section to download the lesson data to your computer.

Schedule

Setup Download files required for the lesson
00:00 1. Introduction What are the basic principles for using spreadsheets for good data organisation?
00:10 2. Organising data in spreadsheets How do we organise and format data in spreadsheets for effective data use?
00:35 3. Common spreadsheet errors What are some common challenges with formatting data in spreadsheets and how can we avoid them?
01:00 4. Dates as data What are good approaches for handling dates in spreadsheets?
01:25 5. Quality assurance and control How can we carry out basic quality control and quality assurance in spreadsheets?
01:50 6. Exporting data How can we export data from spreadsheets in a way that is useful for downstream applications?
02:00 Finish