Introduction
|
|
Formatting data tables in Spreadsheets
|
Never modify your raw data. Always make a copy before making any changes.
Keep track of all of the steps you take to clean your data.
Organize your data according to tidy data principles.
|
Formatting problems
|
Avoid using multiple tables within one spreadsheet.
Avoid spreading data across multiple tabs (but do use a new tab to record data cleaning or manipulations).
Record zeros as zeros.
Use an appropriate null value to record missing data.
Don’t use formatting to convey information or to make your spreadsheet look pretty.
Place comments in a separate column.
Record units in column headers.
Include only one piece of information in a cell.
Avoid spaces, numbers and special characters in column headers.
Avoid special characters in your data.
Record metadata in a separate plain text file.
|
Dates as data
|
|
Quality control
|
Always copy your original spreadsheet file and work with a copy so you don’t affect the raw data.
Use data validation to prevent accidentally entering invalid data.
Use sorting to check for invalid data.
Use conditional formatting (cautiously) to check for invalid data.
|
Exporting data
|
Data stored in common spreadsheet formats will often not be read correctly into data analysis software, introducing errors into your data.
Exporting data from spreadsheets to formats like .csv or .tsv puts it in a format that can be used consistently by most programs.
|