Most of the tools you’ll be using in the terminal, including those presented in the previous chapter, were designed by the people who created UNIX back in the late sixties and early seventies. They did an amazing job. People like Ken Thompson and Dennis Ritchie are heroes in the computing world: the system they created is at the heart of most servers running the internet, and at the core of all macOS and Linux computers. Dennis also invented the C programming language, with which most of the rest has been written.
The tools you’ll be running are not the ones they wrote; there have been plenty of rewrites over the years. Yours most likely come from the GNU project, the brainchild of Richard Stallman —another legendary programmer— or from the BSD UNIX from Berkeley. When looking at the man page of any program type G
to go to the end, and you’ll usually see something about the history of the program and its authors there.
Let’s get started. Open the terminal and go to the changek
directory that you built in the previous chapter. And check out what’s in there, just in case.
cd ~/changek ; ls
index.html
(Note how we can have more than one command per line if we use a ;
to separate them.)
Just as we left it in the previous chapter. We can also see what’s in index.html,
cat index.html
Hi there
How's that going?
Downloading files from the web
The two main command-line programs to download from the web are curl
and wget
. Most likely you’ll have curl
installed if you are on macOS, and wget
if you are on Linux. Both work similarly.
Let’s practice by downloading a sample data file. We don’t want to litter our working directory with downloads, so let’s first build a temporary directory:
mkdir ~/tmp
and move to it:
cd ~/tmp
Now let’s download a sample CSV file:
curl -LOs https://raw.githubusercontent.com/datasets/population/master/data/population.csv
The options mean:
-L
follows redirects-O
saves with the original filename-s
silent mode
And make sure it came:
ls -lh population.csv
-rw-r--r-- 1 user staff 1.2M Jan 15 10:00 population.csv
You can peek at the contents:
head population.csv
Country Name,Country Code,Year,Value
Arab World,ARB,1960,92490932
Caribbean small states,CSS,1960,4190810
Central Europe and the Baltics,CEB,1960,91401874
East Asia & Pacific (all income levels),EAS,1960,1042550110
East Asia & Pacific (developing only),EAP,1960,896708266
Euro area,EMU,1960,260385009
Europe & Central Asia (all income levels),ECS,1960,667039992
Europe & Central Asia (developing only),ECA,1960,168260282
European Union,EUU,1960,406749670
Bundling files with tar
The tar
program bundles many files into one, usually named with the prefix tar
, and extracts files from a tar
bundle.
Files can be compressed in various formats. The .tar.gz
or .tgz
extension indicates a tar archive that has been compressed with gzip. The .zip
format is another common compression format. Let’s create a tar archive example:
echo "example content" > example.txt
tar czf example.tar.gz example.txt
You should always check the contents of a tar
file before unpacking it. You do it with the tvf
options, as in
tar tzf example.tar.gz
-rw-r--r-- 0 user staff 16 Jan 15 10:00 example.txt
(Note how I piped the output of tar
to head
, a program that shows the first lines of the input and ignores the rest, so I didn’t have to clutter the page too much.)
The content of the tar
file looks good, so let’s unpack it. Replacing the t
in the options with an x
,
tar xzf example.tar.gz
(which makes me think of the classic xkcd joke:)
For zip files, you would use the unzip
command:
unzip example.zip
Creating tar
files
You’ll certainly want to build tar
files. You do it by replacing the x
in the options by a c
, and by specifying a file name for the bundle. We could, for example, pack the content of our working directory with:
cd ~ ; tar zcvf changek.tgz changek
a changek
a changek/index.html
Summary of tar
The tar
program has many more options and interesting use cases, but basic usage is not so bad. You can certainly remember the three main incantations:
- Create a file bundle with
zcvf
,
- Check the contents of a bundle with
ztvf
- And unpack a bundle with
zxvf
,
Move things in place
After unpacking an archive, you typically want to move the extracted files to their final location:
mv ~/tmp/extracted-files ~/changek/data ; cd ~/changek ; ls
data index.html
Finding files with find
This is another tool that you’ll probably find yourself using all the time. The basic invocation is:
find . -name "*.txt"
The first argument is the directory where you want to search. The -name
is the search condition. You can use wildcards in your searches. For example, to find all Python files:
find . -name "*.py"
./scripts/setup.py
./scripts/test.py
When using wildcards with find, it’s safer to quote them:
find . -name "*.py" # Good - quotes protect the wildcard
find . -name *.py # Bad - shell might expand this
Without quotes, the shell might expand *.py
to actual filenames in the current directory before find sees it
which is not what we wanted.
We can call find
with all sorts of interesting arguments. For example, if we want to limit the search to files we can say
find . -name "*.txt" -type f
Or we can find the files that have been modified in the last minute,
find . -name "*.txt" -type f -mtime -1m
We get nothing, because none of the files has been modified in the last minute. Let’s force it by using touch
on one of the files. With touch
you set the file’s access time to now (and you create the file if it didn’t exist):
touch ./example.txt
And now search again,
find . -name "*.txt" -type f -mtime -1m
./example.txt
Looking for differences between files
The diff
program returns the difference between two files, using a clever but easy to understand syntax. Let’s take two identical files: the index.html
file, and an exact copy:
cp index.html another.html ; ls
another.html data index.html
Let’s run diff
on them:
diff index.html another.html
Nothing. Good. When two files are identical there is no difference. Remember what was on index.html
,
cat index.html
Hi there
How's that going?
Let’s append another line in another.html
,
echo "Yet another line" >> another.html
and another one, just for fun,
echo "This is the last line" >> another.html
Now check the contents,
cat another.html
Hi there
How's that going?
Yet another line
This is the last line
Nice. Let’s check again the output of diff,
diff index.html another.html
2a3,4
> Yet another line
> This is the last line
Here it is. It tells you that, after line 2, lines 3 to 4 have been added, and it lists the new lines. This is something that you’ll use all the time to answer questions like did I change this file? Is it the same as that other file?
Find text in files
The grep
program can find text in files. For example, to extract from index.html
the line that contains the word that
you can do
grep that index.html
How's that going?
You can call it with several files, and it will tell you to which file the line or lines it found belong:
grep there *.html
another.html:Hi there
index.html:Hi there
If you want to match words ignoring differences between capital and non-capital letters you can use the -i
option,
grep -i yet *.html
another.html:Yet another line
Finding words in files of a particular type
This is another problem that pops out very often. Say you want to find which among your Python files (ending in .py
) include a particular word, and that your files are spread in several subdirectories. (We’ll learn more about Python in a later chapter.) Or, as we are going to do, which among your .html
files contains the word there
. Let’s first move one of the files to a directory,
mv another.html data ; ls
data index.html
The first thing we need to do is to find all the .html
files, and we know how to do that:
find . -name \*.html
./data/another.html
./index.html
Now we would like to pipe this results to grep
, but we have a problem: the output of find
is just text; it happens to represent file names, but if we send it go grep
as is grep
will never know. It will think it is plain old text, and it will search for whatever we want to find within it. For example,
find . -name \*.html | grep another
./data/another.html
We’ve found the line that contains another, but we’ve done nothing to the contents of the files. This is useful when you want to find a file whose name contains a word, but now we want something else: we want to peek inside the files.
In order to do that we need another program: xargs
, which is kind of tricky: it takes standard input and a program, and arranges things so that the standard input is sent as the files of that program. For example, lets send the name of a file to standard output, to be piped:
ls *.html
index.html
Now we pipe it to xargs
, so that it goes to its standard input:
ls *.html | xargs grep -i hi
Hi there
Whatever xargs
received in standard input (in this case, the output of ls
) it sent as a parameter to the program grep -i hi
.
Knowing this, we can refine our incantation so that it does search inside files, as
find . -name \*.html | xargs grep -i hi
./data/another.html:Hi there
./data/another.html:This is the last line
./index.html:Hi there
Do you see why it found two lines in ./data/another.html
? Remember that -i
stands for ignore case.
It turns out there is another way of running a program on all the files found by find
. I think it is messier, so I only use it in the one ocasion in which the above command is messed up: when your file names include spaces. You do it with the -exec
argument to find
, followed by the command, ended in \;
. In the place where you want the file names you put {}
:
find . -name \*.html -exec grep -i hi {} \;
Hi there
This is the last line
Hi there
This sort of works, but it does not print the file name where the line was found. This is because grep
has been called once per file, every time a file was found, instead of one time with all the files as before. And when you call grep
with only one file it assumes you know what file you sent, and it does not write it back. In this case we don’t know it, because it was find
doing the calling, so we ask grep
to output the file name as well with the -H
option:
find . -name \*.html -exec grep -i -H hi {} \;
./data/another.html:Hi there
./data/another.html:This is the last line
./index.html:Hi there
Much better. Another thing to know is that you can usually group arguments. In this case, the -i -H
can become -iH
, and it should still work:
find . -name \*.html -exec grep -iH hi {} \;
./data/another.html:Hi there
./data/another.html:This is the last line
./index.html:Hi there
In fact, this is what we were doing when calling tar
(remember the zcvf
and zxvf
?). But tar
is special in that it lets you not put the -
in its optional arguments.
Looking for help
This section might be a bit overwhelming. Don’t worry: you don’t have to remember it all. You know how to look for help, and you will develop an intuition that tells you “I am sure there’s a way to tell this program to behave like this”. For example, I didn’t remember about the -H
argument to grep
, but I knew it had to be there. So I checked in the man page, and there it is. The things that you use all the time —and this will include the find
piped to xargs
with grep
— you will remember without problems.