The Bash Shell
Exercise Answers
Files and Directories
Relative path resolution
- Incorrect, since
backup
exists in theUsers
parent directory, where the preceding..
is referencing. - Incorrect, since
..
is used beforebackup
to reference theUsers
parent directory first, sols
will run on/Users/backup
. - Incorrect as per 2, but also to have the directories have a following
/
we’d also need to use the-F
flag. - Correct.
ls reading comprehension
- This will attempt to list the contents of a file or directory called
pwd
which does not exist. - Partially correct, but 3 is also correct.
- Partially correct, but 2 is also correct.
- Correct.
Creating Things
Renaming files
- Incorrect, since this would copy the mistakenly named file to another file with the desired name, but the original mistakenly named file will still exist.
- Correct.
- Incorrect, since this would not rename the file.
- Incorrect, since
mv
is used to rename a file andcp
is used to copy a file.
Moving and Copying
- Incorrect, since
proteins-saved.dat
was created in the directory above, since..
was used before its filename when copyingrecombine/proteins.dat
- Correct.
- Incorrect, since
proteins.dat
was moved into therecombine
directory. - Incorrect, since the
recombine
directory was created in this current directory.
Organizing Directories and Files
mv fructose.dat sucrose.dat analyzed
will copy the files ending.dat
into theanalyzed
directory.
Copy with Multiple Filenames
- With several filenames and a directory,
cp
will copy the given files into the given directory. - When given three or more filenames,
cp
will return an error since with more than two argumentscp
assumes the last argument is a directory.
Pipes and Filters
What does sort -n do?
sort -n
will perform a sort interpreting numerical digits as proper numbers - a numerical sort.sort
on its own will perform a sort assuming any numerical digits are just sequences of characters, e.g.10
will come before2
since1
comes before2
.
What does >> mean?
>
will redirect the outputhello
to the filetestfile01.txt
, replacing the contents of that file if it already exists.<<
will redirect the outputhello
to the filetestfile01.txt
, appending any existing content if that file already exists.
Piping commands together
- Incorrect, since we cannot redirect to a command, we should use a
|
(pipe) instead. - Incorrect, since
1-3
when passed tohead
is interpreted as a file, not a range of line numbers. - Incorrect, since we need to sort the line counts before extracting the top three (otherwise we get them in whatever order
wc
gives them tohead
) - Correct.
Why does uniq only remove adjacent duplicates?
- For efficiency. If it were to work across non-adjacent lines it would need to keep the whole file in memory in some way to know whether it had already encountered a line. This would need considerable memory with very large files, and searching for duplicate lines would take much longer to run.
- You could use
sort
first in a pipe to sort the file contents to ensure duplicate lines are adjacent, e.g.sort salmon.txt | uniq
Pipe reading comprehension
cat animals.txt
will output the contents ofanimals.txt
.head -5
accepts the output fromcat
and output the first 5 lines of that.tail -3
accepts the 5 lines fromhead
and output the last 3 lines of that.sort -r
accepts the 3 lines fromtail
and output those lines in reverse sort order.> final.txt
will take the output fromsort
and redirect it into a file calledfinal.txt
.
Shell Scripts
Variables in shell scripts
- Incorrect, since
-1
is passed tohead
in the script it will output the first line of each.pdb
file, whilst the-1
passed totail
will output the last line of each.pdb
file. - Correct.
- Incorrect, since
*.pdb
is passed into the script and used byhead
andtail
, so only.pdb
files will be used. - Incorrect, since the quotes only mean that
*.pdb
will be passed into the script without expansion.
Script reading comprehension
- Script 1 will output a list of files that match the
*.*
pattern, i.e.fructose.dat
,glucose.dat
, andsucrose.dat
. - Script 2 will take in three arguments on the command line, and for each of them, print out their contents.
- Script 3 will print out all arguments as passed to the script on a single line and append
.dat
to that output.
Loops
Variables in Loops
- The first loop will present “fructose.dat glucose.dat sucrose.dat” three times, since we are running
ls *.dat
three separate times - we’re not making use of the loop variable$datafile
. The second loop will produce “fructose.dat”, “glucose.dat”, and “sucrose.dat” (each on a separate line) since we’re passing$datafile
tols
.
Saving to a File in a Loop - Part One
- Correct.
- Incorrect, since we’re using the
>
redirect operator, which will overwrite any previous contents ofxylose.dat
. - Incorrect, since the file
xylose.dat
would not have existed when*.dat
would have been expanded. - Incorrect.
Saving to a File in a Loop - Part Two
- Correct.
- Incorrect, since we’re looping through each of the other
.dat
files (fructose.dat
andglucose.dat
) whose contents would also be included. - Incorrect, since
maltose.txt
has a.txt
extension and not a.dat
extension, so won’t match on*.dat
and won’t be included in the loop. - Incorrect, since the
>>
operator redirects all output to thesugar.dat
file, so we won’t see any screen output.
Doing a dry run
- Version 2 is the one that successfully acts as a dry run. In version 1, since the
>
file redirect is not within quotes, the script will create three filesanalyzed-basilisk.dat
,analyzed-minotaur.dat
, andanalyzed-unicorn.dat
which is not what we want.
Finding Things
Using grep
- Incorrect, since it will find lines that contain
of
including those that are not a complete word, including “Software is like that.” - Incorrect,
-E
(which enables extended regular expressions ingrep
), won’t change the behaviour since the given pattern is not a regular expression. So the results will be the same as 1. - Correct, since we have supplied
-w
to indicate that we are looking for a complete word, hence only “and the presence of absence:” is found. - Incorrect.
-i
indicates we wish to do a case insensitive search which isn’t required. The results are the same as 1.
find pipeline reading comprehension
- Find all files (in this directory and all subdirectories) that have a filename that ends in
.dat
, count the number of files found, and sort the result. Note that thesort
here is unnecessary, since it is only sorting one number.
Matching ose.dat but not temp {}:
- Incorrect, since the first
grep
will find all filenames that containose
wherever it may occur, and also because the use ofgrep
as a following pipe command will only match on filenames output fromfind
and not their contents. - Incorrect, since it will only find those files than match
ose.dat
exactly, and also because the use ofgrep
as a following pipe command will only match on filenames output fromfind
and not their contents. - Correct answer. It first executes the
find
command to find those files matching the ’*ose.dat’ pattern, which will match on exactly those that end inose.dat
, and thengrep
will search those files for “temp” and only report those that don’t contain it, since it’s using the-v
flag to invert the results. - Incorrect.
Additional Exercises
Copying files with new filenames
- Assuming the output directory is named
copied
:
today_date=$(date +"%d-%m-%y")
for file in data/*.csv
do
base_file=$(basename $file)
cp $file copied/$today_date-$base_file
done
Filtering our output
- The
Max_temp_jul_F
column is the fourth column in each data file - Assuming the input directory is named
copied
and the output directory is namedfiltered
:
for file in copied/*.csv
do
base_file=$(basename $file)
cat $file | cut -d"," -f 4 > filtered/$base_file
done