Overview
Teaching: 10 min
Exercises: 10 minQuestions
Filtering and sorting data
Objectives
Filter to a subset of rows by text filter or include/exclude.
Sort table by a column.
Sort by multiple columns.
There are many entries in the rodent table. We can filter it to work on a subset of the data in the list for the next set of operations. Please ensure you perform this step to save time during the class.
Challenge
What scientific names (genus and species) are selected by this procedure? How would you restrict this to one of the species selected?
While we could type more letters of text, or click case sensitive, another way to filter is to include and/or exclude entries in a facet. If you still have your facet for scientificName, you can use it, or use drop-down menue > Facet > Text facet to create a new one. Only the names that agree with your Text filter will remain.
Challenge
Use include / exclude to exclude one of the scientific names. Below are some suggested steps.
You can sort the data by a column by using the drop-down menu in that column. There you can sort by text, numbers, dates or booleans (logical expressions). You can specify what order to put Blanks and Errors.
If this is your first time sorting this table, then the drop-down menu for the selected column shows : > Sort…. Select the way to sort (such as numbers)
Challenge
Sort by month. How can you ensure that months are in order?
If you try to resort a column that you have already used, the drop-down menu changes slightly, to > Sort without the …, to remind you that you have already used this column. It will give you additional options:
You can sort by multiple columns by performing sort on additional columns. The sort will depend on the order in which you select columns to sort. To restart the sorting process with a particular column, check the sort by this column alone box in the sort pop-up menu.
Challenge
Try sorting by a year after you have sorted by month. What happens to ordering?
Try sorting first by year and then by month. Be sure to check the sort by this column alone box when sorting by year to remove earlier sorts.
If you go back to one of the already sorted colunms and select > Sort > Remove sort, that column is removed from your multiple sort. If it is the only column sorted, then data reverts to its original order.
Challenge
Sort by year, month and day in some order. Be creative: try sorting as numbers or text, and in reverse order (largest to smallest or z to a).
Use > Sort > Remove sort to remove the sort on the second of three columns. Notice how that changes the order.
Key Points
OpenRefine provides a way to sort and filter data without affecting the raw data.