This is the online version of The Hacker Ways.
Get your copy: epub, mobi and PDF, DRM-free.

ยง8 — Managing your digital life

When writing or programming you soon feel the need to keep track of your work. Being able to go back in time to your book as it was three weeks ago, for example. Or seeing the differences between yesterday's working version of a program and today's failing one. Programs that help you do these things are called revision control software, and will be a key part of your toolset. Anything that you create and that will change over time —either as part of the creation process, or as part of its working life— should be managed by a revision control system.

The need to keep track of the different versions of files has been felt by programmers since the very beginning of the craft. The first solution cited in the Wikipedia dates from 1972 (I was, for the record, two years old), and was written by Marc J. Rochkind. The first revision control program that I actually used was RCS, written back in 1982 by Walter F. Tichy. It spawned CVS, that could work on full projects instead of only individual files, and in turn inspired subversion in 2000.

Subversion, or svn, is still in used today by some open source projects. But most of the world is moving over to Git, which is what you need to learn.

Git was written in 2005 by Linus Torvalds, the creator of Linux, in order to manage the source code of the Linux kernel. It is a radical departure of everything that had been done before, and a much better solution in many ways.

Git is not a simple program: it can do many things, and it might be overwhelming. However, normal life with Git does boil down to a handful of commands that you need to know.

Figuring Git out

The most important thing you need to know about Git is that

A cursory look at them seems to suggest that several hundred are actually tutorials

cd ~/changek ; ls
index.html

(Note how we can have more than one command per line if we use a ; to separate them.)

Just as we left it in the previous chapter. We can also see what's in index.html,

cat index.html
Hi there
How's that going?

Bring in some content

We'll be using Bootstrap to help us style our page. Bootstrap is a library; a bunch of files intended to work together in order to provide some functionality to programmers. In this case, the functionality we'll get from it is an easy way to give a professional look to your pages and to make them adapt to different screen sizes.

You may download the library from your browser, but as we are learning to work with the terminal we might as well download from there. The two main command-line programs to download from the web are curl and wget. Most likely you'll have curl installed if you are on OSX, and wget if you are on Linux.

We don't want to litter our working directory with downloads, so let's first build a temporary directory

mkdir ~/tmp

and move to it,

cd ~/tmp

And we are ready to download. Now let me take a small detour. What you would typically do is go to the Bootstrap page, right-click in the download link, copy the address, and paste it in the command line to download. But it is distributed as a zip file, and I want to show you UNIX way, which is tar. So I have re-packed it, and made it available in my own server. Here's the command line to download it:

curl -Os http://juanreyero.com/bootstrap-3.0.3-dist.tgz

And make sure it came,

ls
bootstrap-3.0.3-dist.tgz

Bundling files with tar

The tar program bundles many files into one, usually named with the prefix tar, and extracts files from a tar bundle.

The tgz extension in the example is a very common contraction of the extensions tar and gz. So the file we've just downloaded could have been named

bootstrap-3.0.3-dist.tar.gz

and it would mean the same: that it has been compressed with gzip (hence the .gz) after being packaged with tar. You can uncompress it to get the tar,

gunzip bootstrap-3.0.3-dist.tgz ; ls
bootstrap-3.0.3-dist.tar

You should always check the contents of a tar file before unpacking it. You do it with the tvf options, as in

tar tvf bootstrap-3.0.3-dist.tar | head -n 4
drwxr-xr-x  0 juanre staff       0 Dec  5 17:40 dist/
drwxr-xr-x  0 juanre staff       0 Dec  5 17:40 dist/css/
drwxr-xr-x  0 juanre staff       0 Dec  5 17:40 dist/fonts/
drwxr-xr-x  0 juanre staff       0 Dec  5 17:40 dist/js/

(Note how I piped the output of tar to head, a program that shows the first lines of the input and ignores the rest, so I didn't have to clutter the page too much.)

The content of the tar file sounds reasonable, so let's unpack for good. Replacing the t in the options with an x,

tar xvf bootstrap-3.0.3-dist.tar
x dist/
x dist/css/
x dist/fonts/
x dist/js/
x dist/js/bootstrap.js
x dist/js/bootstrap.min.js
x dist/fonts/glyphicons-halflings-regular.eot
x dist/fonts/glyphicons-halflings-regular.svg
x dist/fonts/glyphicons-halflings-regular.ttf
x dist/fonts/glyphicons-halflings-regular.woff
x dist/css/bootstrap-theme.css
x dist/css/bootstrap-theme.min.css
x dist/css/bootstrap.css
x dist/css/bootstrap.min.css

The unpacking of the tar file has created a dist directory,

ls
bootstrap-3.0.3-dist.tar        dist/

The tar program is able to deal with compressed files as well when you add the z option, so you didn't have to uncompress before unpacking. Let's try it out. First compress it back,

gzip bootstrap-3.0.3-dist.tar ; ls
bootstrap-3.0.3-dist.tar.gz     dist/

then check the contents again, but now adding the z option

tar ztvf bootstrap-3.0.3-dist.tar.gz | head -n 4
drwxr-xr-x  0 juanre staff       0 Dec  5 17:40 dist/
drwxr-xr-x  0 juanre staff       0 Dec  5 17:40 dist/css/
drwxr-xr-x  0 juanre staff       0 Dec  5 17:40 dist/fonts/
drwxr-xr-x  0 juanre staff       0 Dec  5 17:40 dist/js/

Creating tar files

You'll certainly want to build tar files. You do it by replacing the x in the options by a c, and by specifying a file name for the bundle. We could, for example, pack the content of our working directory with:

cd ~ ; tar zcvf changek.tgz changek
a changek
a changek/index.html
xkcd tar

Summary of tar

The tar program has many more options and interesting use cases, but basic usage is not so bad. You can certainly remember the three main incantations:

  • Create a file bundle with zcvf,
tar zcvf changek.tgz changek
  • Check the contents of a bundle with ztvf
tar ztvf changek.tgz
  • And unpack a bundle with zxvf,
tar zxvf changek.tgz

Move things in place

After unpacking we have a dist directory with the files that came along when we downloaded bootstrap. Let's move it to its final location,

mv ~/tmp/dist ~/changek/bootstrap ; cd ~/changek ; ls
bootstrap/      index.html

Finding files with find

This is another tool that you'll probably find yourself using all the time. The basic invocation is:

find . -name bootstrap.css
./bootstrap/css/bootstrap.css

The first argument is the directory where you want to search. The -name is the search condition. You can use wildcards in your searches. For example, to find the names that start with bootstrap do:

find . -name bootstrap\*
./bootstrap
./bootstrap/css/bootstrap-theme.css
./bootstrap/css/bootstrap-theme.min.css
./bootstrap/css/bootstrap.css
./bootstrap/css/bootstrap.min.css
./bootstrap/js/bootstrap.js
./bootstrap/js/bootstrap.min.js

Note the \ before the *: it is meant to tell the shell to leave the following * go, not to treat it as a wildcard, and pass it untouched to the program being invoked —the find program, in this case—. It is called an escape, and it is a trick used thorough. If we had written the * without the escape this is what would have happened:

find . -name bootstrap*
./bootstrap

The shell has expanded the bootstrap* to the existing bootstrap directory, and thus has called find as

find . -name bootstrap

which is not what we wanted.

We can call find with all sorts of interesting arguments. For example, if we want to limit the search to files we can say

find . -name bootstrap\* -type f
./bootstrap/css/bootstrap-theme.css
./bootstrap/css/bootstrap-theme.min.css
./bootstrap/css/bootstrap.css
./bootstrap/css/bootstrap.min.css
./bootstrap/js/bootstrap.js
./bootstrap/js/bootstrap.min.js

Or we can find the files that have been modified in the last minute,

find . -name bootstrap\* -type f -mtime -1m

We get nothing, because none of the files has been modified in the last minute. Let's force it by using touch on one of the files. With touch you set the file's access time to now (and you create the file if it didn't exist):

touch ./bootstrap/js/bootstrap.js

And now search again,

find . -name bootstrap\* -type f -mtime -1m
./bootstrap/js/bootstrap.js

Looking for differences between files

The diff program returns the difference between two files, using a clever but easy to understand syntax. Let's take two identical files: the index.html file, and an exact copy:

cp index.html another.html ; ls
another.html    bootstrap/      index.html

Let's run diff on them:

diff index.html another.html

Nothing. Good. When two files are identical there is no difference. Remember what was on index.html,

cat index.html
Hi there
How's that going?

Let's append another line in another.html,

echo "Yet another line" >> another.html

and another one, just for fun,

echo "This is the last line" >> another.html

Now check the contents,

cat another.html
Hi there
How's that going?
Yet another line
This is the last line

Nice. Let's check again the output of diff,

diff index.html another.html
2a3,4
Yet another line
This is the last line

Here it is. It tells you that, after line 2, lines 3 to 4 have been added, and it lists the new lines. This is something that you'll use all the time to answer questions like did I change this file? Is it the same as that other file?

Find text in files

The grep program can find text in files. For example, to extract from index.html the line that contains the word that you can do

grep that index.html
How's that going?

You can call it with several files, and it will tell you to which file the line or lines it found belong:

grep there *.html
another.html:Hi there
index.html:Hi there

If you want to match words ignoring differences between capital and non-capital letters you can use the -i option,

grep -i yet *.html
another.html:Yet another line

Finding words in files of a particular type

This is another problem that pops out very often. Say you want to find which among your Python files (ending in .py) include a particular word, and that your files are spread in several subdirectories. Or, as we are going to do, which among your .html files contains the word there. Let's first move one of the files to a directory,

mv another.html bootstrap ; ls
bootstrap/      index.html

The first thing we need to do is to find all the .html files, and we know how to do that:

find . -name \*.html
./bootstrap/another.html
./index.html

Now we would like to pipe this results to grep, but we have a problem: the output of find is just text; it happens to represent file names, but if we send it go grep as is grep will never know. It will think it is plain old text, and it will search for whatever we want to find within it. For example,

find . -name \*.html | grep another
./bootstrap/another.html

We've found the line that contains another, but we've done nothing to the contents of the files. This is useful when you want to find a file whose name contains a word, but now we want something else: we want to peek inside the files.

In order to do that we need another program: xargs, which is kind of tricky: it takes standard input and a program, and arranges things so that the standard input is sent as the files of that program. For example, lets send the name of a file to standard output, to be piped:

ls *.html
index.html

Now we pipe it to xargs, so that it goes to its standard input:

ls *.html | xargs grep -i hi
Hi there

Whatever xargs received in standard input (in this case, the output of ls) it sent as a parameter to the program grep -i hi.

Knowing this, we can refine our incantation so that it does search inside files, as

find . -name \*.html | xargs grep -i hi
./bootstrap/another.html:Hi there
./bootstrap/another.html:This is the last line
./index.html:Hi there

Do you see why it found two lines in ./bootstrap/another.html? Remember that -i stands for ignore case.

It turns out there is another way of running a program on all the files found by find. I think it is messier, so I only use it in the one ocasion in which the above command is messed up: when your file names include spaces. You do it with the -exec argument to find, followed by the command, ended in \;. In the place where you want the file names you put {}:

find . -name \*.html -exec grep -i hi {} \;
Hi there
This is the last line
Hi there

This sort of works, but it does not print the file name where the line was found. This is because grep has been called once per file, every time a file was found, instead of one time with all the files as before. And when you call grep with only one file it assumes you know what file you sent, and it does not write it back. In this case we don't know it, because it was find doing the calling, so we ask grep to output the file name as well with the -H option:

find . -name \*.html -exec grep -i -H hi {} \;
./bootstrap/another.html:Hi there
./bootstrap/another.html:This is the last line
./index.html:Hi there

Much better. Another thing to know is that you can usually group arguments. In this case, the -i -H can become -iH, and it should still work:

find . -name \*.html -exec grep -iH hi {} \;
./bootstrap/another.html:Hi there
./bootstrap/another.html:This is the last line
./index.html:Hi there

In fact, this is what we were doing when calling tar (remember the zcvf and zxvf?). But tar is special in that it lets you not put the - in its optional arguments.

Looking for help

This section might be a bit overwhelming. Don't worry: you don't have to remember it all. You know how to look for help, and you will develop an intuition that tells you "I am sure there's a way to tell this program to behave like this". For example, I didn't remember about the -H argument to grep, but I knew it had to be there. So I checked in the man page, and there it is. The things that you use all the time —and this will include the find piped to xargs with grep— you will remember without problems.

This is the online version of The Hacker Ways.
Get your copy: epub, mobi and PDF, DRM-free.
 

blog comments powered by Disqus