University of Southampton

Aug 01-Aug 02, 2017

9:00 am - 4:30 pm

Instructors: Simon Hettrick, Alistair Bailey, Rob Blair

Helpers: Arshad Emmambux, Olivier Phillipe, Robin Wilson

General Information

Data Carpentry develops and teaches workshops on the fundamental data skills needed to conduct research. Our mission is to provide researchers high-quality, training covering the full lifecycle of data-driven research. Data Carpentry is a sibling organization of Software Carpentry. Where Software Carpentry teaches best practices in software development, our focus is on the introductory computational skills needed for data management and analysis in all domains of research. Our initial target audience is learners who have little to no prior computational experience. We create a friendly environment for learning to empower researchers and enable data driven discovery. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

For more information on what we teach and why, please see our paper "Best Practices for Scientific Computing".

Who: The course is aimed at graduate students and other researchers. You don't need to have any previous knowledge of the tools that will be presented at the workshop.

Where: Room 2207, Building 85 (Biological Sciences Building). Get directions with OpenStreetMap or Google Maps.

When: Aug 01-Aug 02, 2017. Add to your Google Calendar.

Requirements: Participants must bring a laptop with a Mac, Linux, or Windows operating system (not a tablet, Chromebook, etc.) that they have administrative privileges on. They should have a few specific software packages installed (listed below). They are also required to abide by Software Carpentry's Code of Conduct.

Accessibility: We are committed to making this workshop accessible to everybody. The workshop organisers have checked that:

Materials will be provided in advance of the workshop and large-print handouts are available if needed by notifying the organizers in advance. If we can help making learning easier for you (e.g. sign-language interpreters, lactation facilities) please get in touch and we will attempt to provide them.

Contact: Please email rsg-info@soton.ac.uk for more information.


Schedule

Surveys

Please be sure to complete these surveys before and after the workshop.

Pre-workshop Survey

Post-workshop Survey

Day 1

09:00 Arrival and Setup
09:30 Welcome and Introduction to Data Carpentry
10:00 Data Organisation in Spreadsheets
11:00 Break
11:30 Data Cleaning with OpenRefine
12:30 Lunch
13:30 Introduction to R and RStudio
14:30 Break
15:00 Introduction to R and RStudio contd.
16:00 Wrap-up

Day 2

10:00 Data Analysis and Visualization in R
11:00 Break
11:30 Data Analysis and Visualization in R contd.
12:30 Lunch
13:30 Data Management with SQL
14:30 Break
15:00 Data Management with SQL contd.
16:00 Wrap-up

Syllabus

Data Organisation in Spreadsheets

  • Formatting data tables in Spreadsheets
  • Formatting Problems
  • Dates as data
  • Quality Control
  • Exporting data
  • Reference...

Data Cleaning with OpenRefine

  • Working with OpenRefine
  • Filerting and Sorting with OpenRefine
  • Examining Numbers in OpenRefine
  • Exporting and Saving Data from OpenRefine
  • Reference...

Introduction to R and RStudio

  • Before we start
  • Introduction to R
  • Starting with data
  • Reference...

Data Analysis and Visualization in R

  • Aggregating and analyzing data with dplyr
  • Data visualization with ggplot2
  • Reference...

Data Management with SQL

  • Databases using SQL
  • Basic Queries
  • SQL Aggregation
  • Joins and aliases
  • R and Databases
  • Reference...

Setup

To participate in a Data Carpentry workshop, you will need to bring a laptop with the software described below.

Spreadsheet

To work with with spreadsheets, we can use Microsoft Excel, OpenOffice.org, or other programs. Commands may differ a bit between programs, but general ideas for thinking about spreadsheets are the same. For this lesson, if you don’t have a spreadsheet program already, you can use LibreOffice. It’s a free, open source spreadsheet program.

Windows

Only if you don't have MS Excel installed. Install LibreOffice by going to the download page. Your download should begin automatically. You will go to a page that asks about a donation, but you don’t need to make one.

Mac OS X

Only if you don't have MS Excel installed. Install LibreOffice by going to the download page. Your download should begin automatically. You will go to a page that asks about a donation, but you don’t need to make one.

Linux

Install LibreOffice by going to the download page. The version for Linux should automatically be selected. Click Download Version 5.3.X. You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically.

OpenRefine

For this lesson you will need OpenRefine (formerly Google Refine) and a web browser.

Note: this is a program that runs on your machine (not in the cloud). It is accessed via your browser, but no web connection is needed.

Windows

  • OpenRefine uses the Java Runtime Enviroment. If you don't already have it installed, install it from here.
  • Download the OpenRefine 2.7 Windows Kit from http://openrefine.org
  • Unzip the downloaded file into a directory by right-clicking and selecting “Extract…”. Name that directory something like 'OpenRefine'. Remember where you extracted it
  • Go to your newly created OpenRefine directory using File Explorer.
  • Double click "openrefine" (the icon is a blue diamond). A black console window will apppear, and your default browser shortly afterwards.
  • If OpenRefine does not automatically open for you, point your web browser at http://127.0.0.1:3333/ or http://localhost:3333.

Mac OS X

  • Open Refine uses the Java Run Environment. To check you have Java installed, open System Preferences and look for a Java icon. If you don't have it, download and install it.
  • Download the OpenRefine 2.7 Mac Kit from http://openrefine.org
  • Open the downloaded file and drag the OpenRefine icon to Applications as instructed.
  • Launch OpenRefine from Applications.
  • If you receive a warning about installing untrusted applications: Applications -> Utilities -> Terminal and type the following: spctl --add /Applications/OpenRefine.app and try again.
  • If OpenRefine does not automatically open for you, point your web browser at http://127.0.0.1:3333/ or http://localhost:3333.

Linux

  • OpenRefine uses the Java Runtime Enviroment. To check if you have Java installed, open a terminal and type java -version. If you don't have it, the run sudo apt-get install default-jre (Ubuntu) or sudo dnf install java-1.8.0-openjdk (Fedora)
  • Download the OpenRefine 2.7 Linux kit from http://openrefine.org
  • Unzip the downloaded file into a directory. Name that directory something like "OpenRefine".
  • Go to your newly created OpenRefine directory.
  • Type ./refine into the terminal within the OpenRefine directory
  • If OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to launch the program.

R

R is a programming language that is especially powerful for data exploration, visualization, and statistical analysis. To interact with R, we use RStudio.

Windows

Video Tutorial
  • Install R by downloading and running this .exe file from CRAN.
  • Next, install the RStudio IDE. Note that if you have separate user and admin accounts, you should run the installers as administrator (right-click on .exe file and select "Run as administrator" instead of double-clicking). Otherwise problems may occur later, for example when installing R packages.

Mac OS X

Video Tutorial
  • Install R by downloading and running this .pkg file from CRAN.
  • Also, download and install the RStudio IDE. Open the downloaded file and drag the RStudio icon to Applications.

Linux

  • Use your package manager (e.g. for Debian/Ubuntu run sudo apt-get install r-base and for Fedora run sudo yum install R).
  • Also, please install the RStudio IDE.

SQLite

SQL is a specialized programming language used with databases. We use a very lightweight database system called SQLite in our lessons. On its own, it's so light, it doesn't even include a user interface! So, we use DB Browser for SQLite.

Windows

Download and install DB Browser for SQLite (Windows)

Mac OS X

Download and install DB Browser for SQLite (Mac)

Linux

Download and install DB Browser for SQLite (Linux)