Do You Speak Tar?
For a lot of people, the GNU tar
command’s options seem obscure and hard to
use. The most common ones exist only in a short form and always appear grouped
in the same order and often without a leading hyphen, e.g. tar xzvf
archive.tgz
and not tar -v -z -x -f archive.tgz
. Additionally, tar
doesn’t
work without any “option”.
These options, or rather these commands, can be seen as a (small) language that
you can learn to speak, write or read. Each command has its own
meaning that sometimes depend on which other commands are used with it.
The Grammar
tar
’s sentences start with a verb. There’s no subject, because you’re giving
an order to tar
. This verb is followed by zero or more modifiers that give
more context to the action. The last part is the object(s) on which the action
is made. Spaces are not needed between tar
’s words because they all consist
of one letter.
Actions
The two most common actions are “create” (c
) and “extract”
(x
). The first one is used to create an archive from some files; and the
second is used to extract that archive in order to get back those files.
All tar
implementations support one more action: “list” (t
) to
list an archive’s content without extracting it. Some implementations support
two variants of “create” that are “append” (r
) and “update”
(u
). The former appends files to an existing archive; the latter updates
files in the archive for which there exist a more recent version.
Unfortunately we now know all tar
actions but can’t do much without knowing
how to apply them to an object. Let’s dive into objects and we’ll see the
modifiers later.
Objects
tar
has a very limited set of objects: archives. Each tar
command operates
on one archive, that is given by f
(for “file”) followed by its path.
Files added or extracted from archives are simply given as extra arguments to these commands without needing any special word.
We’re now ready to write our first meaningful sentences.
“Hey tar
, please create an archive file foo.tar
with file1
and file2
” is written as tar cf foo.tar file1 file2
.
“Extract archive file foo.tar
” is written as tar xf foo.tar
.
“List archive file foo.tar
” is written as tar tf foo.tar
. You get
the idea.
Note that actions like “extract” or “list” accept additional
arguments for the file patterns you want to extract/list. Say you have a big
archive from which you only want to extract one important.txt
file. Just give
this information to tar
and it’ll kindly extract it for you:
tar xf big-archive.tar important.txt
You might wonder what is this “file” word for if we always need it.
Well, we can remove it. But if we do so, our tar
command doesn’t have any
object left, so it’ll look at something else: STDIN
or STDOUT
.
Actions that read archives operate on STDIN
if you don’t give them a
file object:
You can also be explicit by giving -
to f
:
The “create” action will output the archive on STDOUT
if you don’t
give it a name (or use f -
). You still need to give it the name of the files
to put in that archive:
Note that you can’t extract an archive to STDOUT
without a modifier.
tar
operates on files, not on data streams. By default it doesn’t compress
its content so creating a tar
archive for one file doesn’t make much
sense.
Now that we know how to write basic sentences, let’s add some modifiers to them.
Modifiers
In my experience the most used modifiers are v
and z
. The first one is the
“verbose” flag and makes tar
more chatty. When creating or
extracting an archive it’ll print each file’s name as it’s (un)archiving
it. When listing an archive it’ll print more info about each file.
Compare both outputs below:
The v
modifier can be combined with any other one mentioned below.
z
will tell tar
to use Gzip to (de)compress the archive. Nowadays
tar
-ing with no compression is rarely used, and Gzip is ubiquitous. Just add
z
to your modifiers and tar
will create compressed archive and
extract them. The convention is to use .tar.gz
or .tgz
for such
archives:
Other common modifiers include j
that works exactly like z
but (de)compress
using Bzip2 instead of Gzip. Such archives usually end with .tar.bz2
or
.tbz2
. It’s not named b
or B
because those were already taken when
this modifier was introduced.
Similarly to j
one can use its capital friend, J
. This one compresses using
xz
instead of the last Gzip or Bzip2. These archives use the extensions
.tar.xz
or .txz
.
Note you can also (de)compress archives by yourself if you don’t remember these modifiers:
There is a dozen of other modifiers you can find in the manpage, but let’s
mention two more: O
and k
. You may remember from the first section that I
wrote you can’t extract an archive to STDOUT
without a modifier. Well,
that modifier is called O
:
Using O
when extracting will print the content of the archive on
STDOUT
. This is the same output as you would get by calling cat
on all the
files in it.
The last modifier I wanted to mention is k
, which tells tar
not to override
existing files when extracting an archive. That is, if you already have
a file called important.txt
in your directory and you un-tar
an archive
using the k
modifier, you can be sure it won’t override your existing
important.txt
file.
I hope this post helped you have a better understanding of tar
commands, and
how they’re not that complicated. I put a few (valid) commands below, just so
you can see if you understand what’s they’re doing: