This is the online version of The Hacker Ways.
Get your copy: epub, mobi and PDF, DRM-free.

ยง3 — Basic UNIX tools

Most of the tools you'll be using in the terminal, including those presented in the previous chapter, were designed by the people who created UNIX back in the late sixties and early seventies. They did an amazing job. People like Ken Thompson and Dennis Ritchie are heroes in the computing world: the system they created is at the heart of most servers running the internet, and at the core of all OSX and Linux computers. Dennis also invented the C programming language, with which most of the rest has been written.

The tools you'll be running are not the ones they wrote; there have been plenty of rewrites over the years. Yours most likely come from the GNU project, the brainchild of Richard Stallman —another legendary programmer— or from the BSD UNIX from Berkeley. When looking at the man page of any program type G to go to the end, and you'll usually see something about the history of the program and its authors there.

Let's get started. Open the terminal and go to the changek directory that you built in the previous chapter. And check out what's in there, just in case.

cd ~/changek ; ls
bash: cd: /Users/inesuka/changek: No such file or directory
Makefile                editor.org              index.org               programming.org         speaking-python.org
README.md               git.org                 introduction.org        project.org             terminal.org
computing.org           images/                 manuscript/             res/                    tools.org

(Note how we can have more than one command per line if we use a ; to separate them.)

Just as we left it in the previous chapter. We can also see what's in index.html,

cat index.html
cat: index.html: No such file or directory

Bring in some content

We'll be using Bootstrap to help us style our page. Bootstrap is a library; a bunch of files intended to work together in order to provide some functionality to programmers. In this case, the functionality we'll get from it is an easy way to give a professional look to your pages and to make them adapt to different screen sizes.

You may download the library from your browser, but as we are learning to work with the terminal we might as well download from there. The two main command-line programs to download from the web are curl and wget. Most likely you'll have curl installed if you are on OSX, and wget if you are on Linux.

We don't want to litter our working directory with downloads, so let's first build a temporary directory

mkdir ~/tmp

and move to it,

cd ~/tmp

And we are ready to download. Now let me take a small detour. What you would typically do is go to the Bootstrap page, right-click in the download link, copy the address, and paste it in the command line to download. But it is distributed as a zip file, and I want to show you UNIX way, which is tar. So I have re-packed it, and made it available in my own server. Here's the command line to download it:

curl -Os http://juanreyero.com/bootstrap-3.0.3-dist.tgz

And make sure it came,

ls
Makefile                        editor.org                      introduction.org                res/
README.md                       git.org                         manuscript/                     speaking-python.org
bootstrap-3.0.3-dist.tgz        images/                         programming.org                 terminal.org
computing.org                   index.org                       project.org                     tools.org

Bundling files with tar

The tar program bundles many files into one, usually named with the prefix tar, and extracts files from a tar bundle.

The tgz extension in the example is a very common contraction of the extensions tar and gz. So the file we've just downloaded could have been named

bootstrap-3.0.3-dist.tar.gz

and it would mean the same: that it has been compressed with gzip (hence the .gz) after being packaged with tar. You can uncompress it to get the tar,

gunzip bootstrap-3.0.3-dist.tgz ; ls
Makefile                        editor.org                      introduction.org                res/
README.md                       git.org                         manuscript/                     speaking-python.org
bootstrap-3.0.3-dist.tar        images/                         programming.org                 terminal.org
computing.org                   index.org                       project.org                     tools.org

You should always check the contents of a tar file before unpacking it. You do it with the tvf options, as in

tar tvf bootstrap-3.0.3-dist.tar | head -n 4
drwxr-xr-x  0 juanre staff       0 Dec  5  2013 dist/
drwxr-xr-x  0 juanre staff       0 Dec  5  2013 dist/css/
drwxr-xr-x  0 juanre staff       0 Dec  5  2013 dist/fonts/
drwxr-xr-x  0 juanre staff       0 Dec  5  2013 dist/js/

(Note how I piped the output of tar to head, a program that shows the first lines of the input and ignores the rest, so I didn't have to clutter the page too much.)

The content of the tar file sounds reasonable, so let's unpack for good. Replacing the t in the options with an x,

tar xvf bootstrap-3.0.3-dist.tar
x dist/
x dist/css/
x dist/fonts/
x dist/js/
x dist/js/bootstrap.js
x dist/js/bootstrap.min.js
x dist/fonts/glyphicons-halflings-regular.eot
x dist/fonts/glyphicons-halflings-regular.svg
x dist/fonts/glyphicons-halflings-regular.ttf
x dist/fonts/glyphicons-halflings-regular.woff
x dist/css/bootstrap-theme.css
x dist/css/bootstrap-theme.min.css
x dist/css/bootstrap.css
x dist/css/bootstrap.min.css

The unpacking of the tar file has created a dist directory,

ls
Makefile                        dist/                           index.org                       project.org                     tools.org
README.md                       editor.org                      introduction.org                res/
bootstrap-3.0.3-dist.tar        git.org                         manuscript/                     speaking-python.org
computing.org                   images/                         programming.org                 terminal.org

The tar program is able to deal with compressed files as well when you add the z option, so you didn't have to uncompress before unpacking. Let's try it out. First compress it back,

gzip bootstrap-3.0.3-dist.tar ; ls
Makefile                        dist/                           index.org                       project.org                     tools.org
README.md                       editor.org                      introduction.org                res/
bootstrap-3.0.3-dist.tar.gz     git.org                         manuscript/                     speaking-python.org
computing.org                   images/                         programming.org                 terminal.org

then check the contents again, but now adding the z option

tar ztvf bootstrap-3.0.3-dist.tar.gz | head -n 4
drwxr-xr-x  0 juanre staff       0 Dec  5  2013 dist/
drwxr-xr-x  0 juanre staff       0 Dec  5  2013 dist/css/
drwxr-xr-x  0 juanre staff       0 Dec  5  2013 dist/fonts/
drwxr-xr-x  0 juanre staff       0 Dec  5  2013 dist/js/

Creating tar files

You'll certainly want to build tar files. You do it by replacing the x in the options by a c, and by specifying a file name for the bundle. We could, for example, pack the content of our working directory with:

cd ~ ; tar zcvf changek.tgz changek
bash: cd: /Users/inesuka: No such file or directory
tar: changek: Cannot stat: No such file or directory
tar: Error exit delayed from previous errors.
xkcd tar

Summary of tar

The tar program has many more options and interesting use cases, but basic usage is not so bad. You can certainly remember the three main incantations:

  • Create a file bundle with zcvf,
tar zcvf changek.tgz changek
  • Check the contents of a bundle with ztvf
tar ztvf changek.tgz
  • And unpack a bundle with zxvf,
tar zxvf changek.tgz

Move things in place

After unpacking we have a dist directory with the files that came along when we downloaded bootstrap. Let's move it to its final location,

mv ~/tmp/dist ~/changek/bootstrap ; cd ~/changek ; ls
mv: rename /Users/inesuka/tmp/dist to /Users/inesuka/changek/bootstrap: No such file or directory
bash: cd: /Users/inesuka/changek: No such file or directory
Makefile                        computing.org                   images/                         programming.org                 terminal.org
README.md                       dist/                           index.org                       project.org                     tools.org
bootstrap-3.0.3-dist.tar.gz     editor.org                      introduction.org                res/
changek.tgz                     git.org                         manuscript/                     speaking-python.org

Finding files with find

This is another tool that you'll probably find yourself using all the time. The basic invocation is:

find . -name bootstrap.css
./dist/css/bootstrap.css

The first argument is the directory where you want to search. The -name is the search condition. You can use wildcards in your searches. For example, to find the names that start with bootstrap do:

find . -name bootstrap\*
./bootstrap-3.0.3-dist.tar.gz
./dist/css/bootstrap-theme.css
./dist/css/bootstrap-theme.min.css
./dist/css/bootstrap.css
./dist/css/bootstrap.min.css
./dist/js/bootstrap.js
./dist/js/bootstrap.min.js

Note the \ before the *: it is meant to tell the shell to leave the following * go, not to treat it as a wildcard, and pass it untouched to the program being invoked —the find program, in this case—. It is called an escape, and it is a trick used thorough. If we had written the * without the escape this is what would have happened:

find . -name bootstrap*
./bootstrap-3.0.3-dist.tar.gz

The shell has expanded the bootstrap* to the existing bootstrap directory, and thus has called find as

find . -name bootstrap

which is not what we wanted.

We can call find with all sorts of interesting arguments. For example, if we want to limit the search to files we can say

find . -name bootstrap\* -type f
./bootstrap-3.0.3-dist.tar.gz
./dist/css/bootstrap-theme.css
./dist/css/bootstrap-theme.min.css
./dist/css/bootstrap.css
./dist/css/bootstrap.min.css
./dist/js/bootstrap.js
./dist/js/bootstrap.min.js

Or we can find the files that have been modified in the last minute,

find . -name bootstrap\* -type f -mtime -1m
./bootstrap-3.0.3-dist.tar.gz

We get nothing, because none of the files has been modified in the last minute. Let's force it by using touch on one of the files. With touch you set the file's access time to now (and you create the file if it didn't exist):

touch ./bootstrap/js/bootstrap.js

And now search again,

find . -name bootstrap\* -type f -mtime -1m
./bootstrap-3.0.3-dist.tar.gz

Looking for differences between files

The diff program returns the difference between two files, using a clever but easy to understand syntax. Let's take two identical files: the index.html file, and an exact copy:

cp index.html another.html ; ls
cp: index.html: No such file or directory
Makefile                        computing.org                   images/                         programming.org                 terminal.org
README.md                       dist/                           index.org                       project.org                     tools.org
bootstrap-3.0.3-dist.tar.gz     editor.org                      introduction.org                res/
changek.tgz                     git.org                         manuscript/                     speaking-python.org

Let's run diff on them:

diff index.html another.html
diff: index.html: No such file or directory
diff: another.html: No such file or directory

Nothing. Good. When two files are identical there is no difference. Remember what was on index.html,

cat index.html
cat: index.html: No such file or directory

Let's append another line in another.html,

echo "Yet another line" >> another.html

and another one, just for fun,

echo "This is the last line" >> another.html

Now check the contents,

cat another.html
Yet another line
This is the last line

Nice. Let's check again the output of diff,

diff index.html another.html
diff: index.html: No such file or directory

Here it is. It tells you that, after line 2, lines 3 to 4 have been added, and it lists the new lines. This is something that you'll use all the time to answer questions like did I change this file? Is it the same as that other file?

Find text in files

The grep program can find text in files. For example, to extract from index.html the line that contains the word that you can do

grep that index.html
grep: index.html: No such file or directory

You can call it with several files, and it will tell you to which file the line or lines it found belong:

grep there *.html

If you want to match words ignoring differences between capital and non-capital letters you can use the -i option,

grep -i yet *.html
Yet another line

Finding words in files of a particular type

This is another problem that pops out very often. Say you want to find which among your Python files (ending in .py) include a particular word, and that your files are spread in several subdirectories. Or, as we are going to do, which among your .html files contains the word there. Let's first move one of the files to a directory,

mv another.html bootstrap ; ls
Makefile                        changek.tgz                     git.org                         manuscript/                     speaking-python.org
README.md                       computing.org                   images/                         programming.org                 terminal.org
bootstrap                       dist/                           index.org                       project.org                     tools.org
bootstrap-3.0.3-dist.tar.gz     editor.org                      introduction.org                res/

The first thing we need to do is to find all the .html files, and we know how to do that:

find . -name \*.html
./res/subscribe.html

Now we would like to pipe this results to grep, but we have a problem: the output of find is just text; it happens to represent file names, but if we send it go grep as is grep will never know. It will think it is plain old text, and it will search for whatever we want to find within it. For example,

find . -name \*.html | grep another

We've found the line that contains another, but we've done nothing to the contents of the files. This is useful when you want to find a file whose name contains a word, but now we want something else: we want to peek inside the files.

In order to do that we need another program: xargs, which is kind of tricky: it takes standard input and a program, and arranges things so that the standard input is sent as the files of that program. For example, lets send the name of a file to standard output, to be piped:

ls *.html
ls: *.html: No such file or directory

Now we pipe it to xargs, so that it goes to its standard input:

ls *.html | xargs grep -i hi
ls: *.html: No such file or directory

Whatever xargs received in standard input (in this case, the output of ls) it sent as a parameter to the program grep -i hi.

Knowing this, we can refine our incantation so that it does search inside files, as

find . -name \*.html | xargs grep -i hi

/* Add your own MailChimp form style overrides in your site stylesheet or in this style block.
           We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */

Do you see why it found two lines in ./bootstrap/another.html? Remember that -i stands for ignore case.

It turns out there is another way of running a program on all the files found by find. I think it is messier, so I only use it in the one ocasion in which the above command is messed up: when your file names include spaces. You do it with the -exec argument to find, followed by the command, ended in \;. In the place where you want the file names you put {}:

find . -name \*.html -exec grep -i hi {} \;

/* Add your own MailChimp form style overrides in your site stylesheet or in this style block.
           We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */

This sort of works, but it does not print the file name where the line was found. This is because grep has been called once per file, every time a file was found, instead of one time with all the files as before. And when you call grep with only one file it assumes you know what file you sent, and it does not write it back. In this case we don't know it, because it was find doing the calling, so we ask grep to output the file name as well with the -H option:

find . -name \*.html -exec grep -i -H hi {} \;

./res/subscribe.html:   /* Add your own MailChimp form style overrides in your site stylesheet or in this style block.
./res/subscribe.html:      We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */

Much better. Another thing to know is that you can usually group arguments. In this case, the -i -H can become -iH, and it should still work:

find . -name \*.html -exec grep -iH hi {} \;

./res/subscribe.html:   /* Add your own MailChimp form style overrides in your site stylesheet or in this style block.
./res/subscribe.html:      We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */

In fact, this is what we were doing when calling tar (remember the zcvf and zxvf?). But tar is special in that it lets you not put the - in its optional arguments.

Looking for help

This section might be a bit overwhelming. Don't worry: you don't have to remember it all. You know how to look for help, and you will develop an intuition that tells you "I am sure there's a way to tell this program to behave like this". For example, I didn't remember about the -H argument to grep, but I knew it had to be there. So I checked in the man page, and there it is. The things that you use all the time —and this will include the find piped to xargs with grep— you will remember without problems.

This is the online version of The Hacker Ways.
Get your copy: epub, mobi and PDF, DRM-free.
 

blog comments powered by Disqus