MAKEPP_TUTORIAL_COMPILATION(1) Makepp MAKEPP_TUTORIAL_COMPILATION(1)NAMEmakepp_tutorial_compilation-- Unix compilation commands
DESCRIPTION
Skip this this manual page if you have a good grasp on what the
compilation commands do.
I find that distressingly few people seem to be taught in their
programming classes is how to go about compiling programs once they've
written them. Novices rely either on a single memorized command, or
else on the builtin rules in make. I have been surprised by extremely
computer literate people who learned to compile without optimization
because they simply never were told how important it is. Rudimentary
knowledge of how compilation commands work may make your programs run
twice as fast or more, so it's worth at least five minutes. This page
describes just about everything you'll need to know to compile C or C++
programs on just about any variant of Unix.
The examples will be mostly for C, since C++ compilation is identical
except that the name of the compiler is different. Suppose you're
compiling source code in a file called "xyz.c" and you want to build a
program called "xyz". What must happen?
You may know that you can build your program in one step, using a
command like this:
cc -g xyz.c -o xyz
This will work, but it conceals a two-step process that you must
understand if you are writing makefiles. (Actually, there are more
than two steps, but you only have to understand two of them.) For a
program of more than one module, the two steps are usually explicitly
separated.
Compilation
The first step is the translation of your C or C++ source code into a
binary file called an object file. Object files usually have an
extension of ".o". (For some more recent projects, ".lo" is also used
for a slightly different kind of object file.)
The command to produce an object file on Unix looks something like
this:
cc -g -c xyz.c -o xyz.o
"cc" is the C compiler. Sometimes alternate C compilers are used; a
very common one is called "gcc". A common C++ compiler is the GNU
compiler, usually called "g++". Virtually all C and C++ compilers on
Unix have the same syntax for the rest of the command (at least for
basic operations), so the only difference would be the first word.
We'll explain what the "-g" option does later.
The "-c" option tells the C compiler to produce a ".o" file as output.
(If you don't specify "-c", then it performs the second compilation
step automatically.)
The "-o xyz.o" option tells the compiler what the name of the object
file is. You can omit this, as long as the name of the object file is
the same as the name of the source file except for the ".o" extension.
For the most part, the order of the options and the file names does not
matter. One important exception is that the output file must
immediately follow "-o".
Linking
The second step of building a program is called linking. An object
file cannot be run directly; it's an intermediate form that must be
linked to other components in order to produce a program. Other
components might include:
· Libraries. A library, roughly speaking, is a collection of object
modules that are included as necessary. For example, if your
program calls the "printf" function, then the definition of the
"printf" function must be included from the system C library. Some
libraries are automatically linked into your program (e.g., the one
containing "printf") so you never need to worry about them.
· Object files derived from other source files in your program. If
you write your program so that it actually has several source
files, normally you would compile each source file to a separate
object file and then link them all together.
The linker is the program responsible for taking a collection of object
files and libraries and linking them together to produce an executable
file. The executable file is the program you actually run.
The command to link the program looks something like this:
cc -g xyz.o -o xyz
It may seem odd, but we usually run the same program ("cc") to perform
the linking. What happens under the surface is that the "cc" program
immediately passes off control to a different program (the linker,
sometimes called the loader, or "ld") after adding a number of complex
pieces of information to the command line. For example, "cc" tells
"ld" where the system library is that includes the definition of
functions like "printf". Until you start writing shared libraries, you
usually do not need to deal directly with "ld".
If you do not specify "-o xyz", then the output file will be called
"a.out", which seems to me to be a completely useless and confusing
convention. So always specify "-o" on the linking step.
If your program has more than one object file, you should specify all
the object files on the link command.
Why you need to separate the steps
Why not just use the simple, one-step command, like this:
cc -g xyz.c -o xyz
instead of the more complicated two-stage compilation
cc -g -c xyz.c -o xyz.o
cc -g xyz.o -o xyz
if internally the first is converted into the second? The difference
is important only if there is more than one module in your program.
Suppose we have an additional module, "abc.c". Now our compilation
looks like this:
# One-stage command.
cc -g xyz.c abc.c -o xyz
or
# Two-stage command.
cc -g -c xyz.c -o xyz.o
cc -g -c abc.c -o abc.o
cc -g xyz.o abc.o -o xyz
The first method, of course, is converted internally into the second
method. This means that both "xyz.c" and "abc.c" are recompiled each
time the command is run. But if you only changed "xyz.c", there's no
need to recompile "abc.c", so the second line of the two-stage commands
does not need to be done. This can make a huge difference in
compilation time, especially if you have many modules. For this
reason, virtually all makefiles keep the two compilation steps
separate.
That's pretty much the basics, but there are a few more little details
you really should know about.
Debugging vs. optimization
Usually programmers compile a program either either for debug or for
speed. Compilation for speed is called optimization; compiling with
optimization can make your code run up to 5 times faster or more,
depending on your code, your processor, and your compiler.
With such dramatic gains possible, why would you ever not want to
optimize? The most important answer is that optimization makes use of
a debugger much more difficult (sometimes impossible). (If you don't
know anything about a debugger, it's time to learn. The half hour or
hour you'll spend learning the basics will be repaid many many times
over in the time you'll save later when debugging. I'd recommend
starting with a GUI debugger like "kdbg", "ddd", or "gdb" run from
within emacs (see the info pages on gdb for instructions on how to do
this).) Optimization reorders and combines statements, removes
unnecessary temporary variables, and generally rearranges your code so
that it's very tough to follow inside a debugger. The usual procedure
is to write your code, compile it without optimization, debug it, and
then turn on optimization.
In order for the debugger to work, the compiler has to cooperate not
only by not optimizing, but also by putting information about the names
of the symbols into the object file so the debugger knows what things
are called. This is what the "-g" compilation option does.
If you're done debugging, and you want to optimize your code, simply
replace "-g" with "-O". For many compilers, you can specify increasing
levels of optimization by appending a number after "-O". You may also
be able to specify other options that increase the speed under some
circumstances (possibly trading off with increased memory usage). See
your compiler's man page for details. For example, here is an
optimizing compile command that I use frequently with the "gcc"
compiler:
gcc -O6 -malign-double -c xyz.c -o xyz.o
You may have to experiment with different optimization options for the
absolute best performance. You may need different options for
different pieces of code. Generally speaking, a simple optimization
flag like "-O6" works with many compilers and usually produces pretty
good results.
Warning: on rare occasions, your program doesn't actually do exactly
the same thing when it is compiled with optimization. This may be due
to (1) an invalid assumption you made in your code that was harmless
without optimization, but causes problems because the compiler takes
the liberty of rearranging things when you optimize; or (2) sadly,
compilers have bugs too, including bugs in their optimizers. For a
stable compiler like "gcc" on a common platform like an Pentium,
optimization bugs are seldom a problem (as of the year 2000--there were
problems a few years ago).
If you don't specify either "-g" or "-O" in your compilation command,
the resulting object file is suitable neither for debugging nor for
running fast. For some reason, this is the default. So always specify
either "-g" or "-O".
On some systems, you must supply "-g" on both the compilation and
linking steps; on others (e.g. Linux), it needs to be supplied only on
the compilation step. On some systems, "-O" actually does something
different in the linking phase, while on others, it has no effect. In
any case, it's always harmless to supply "-g" or "-O" for both
commands.
Warnings
Most compilers are capable of catching a number of common programming
errors (e.g., forgetting to return a value from a function that's
supposed to return a value). Usually, you'll want to turn on warnings.
How you do this depends on your compiler (see the man page), but with
the "gcc" compiler, I usually use something like this:
gcc -g -Wall -c xyz.c -o xyz.o
(Sometimes I also add "-Wno-uninitialized" after "-Wall" because of a
warning that is usually wrong that crops up when optimizing.)
These warnings have saved me many many hours of debugging.
Other useful compilation options
Often, necessary include files are stored in some directory other than
the current directory or the system include directory (/usr/include).
This frequently happens when you are using a library that comes with
include files to define the functions or classes.
Suppose, for example, you are writing an application that uses the Qt
libraries. You've installed a local copy of the Qt library in
/home/users/joe/qt, which means that the include files are stored in
the directory /home/users/joe/qt/include. In your code, you want to be
able to do things like this:
#include <qwidget.h>
instead of
#include "/home/users/joe/qt/include/qwidget.h"
You can tell the compiler to look for include files in a different
directory by using the "-I" compilation option:
g++ -I/home/users/joe/qt/include -g -c mywidget.cpp -o mywidget.o
There is usually no space between the "-I" and the directory name.
When the C++ compiler is looking for the file qwidget.h, it will look
in /home/users/joe/qt/include before looking in the system include
directory. You can specify as many "-I" options as you want.
Using libraries
You will often have to tell the linker to link with specific external
libraries, if you are calling any functions that aren't part of the
standard C library. The "-l" (lowercase L) option says to link with a
specific library:
cc -g xyz.o -o xyz -lm
"-lm" says to link with the system math library, which you will need if
you are using functions like "sqrt".
Beware: if you specify more than one "-l" option, the order can make a
difference on some systems. If you are getting undefined variables
when you know you have included the library that defines them, you
might try moving that library to the end of the command line, or even
including it a second time at the end of the command line.
Sometimes the libraries you will need are not stored in the default
place for system libraries. "-labc" searches for a file called
libabc.a or libabc.so or libabc.sa in the system library directories
(/usr/lib and usually a few other places too, depending on what kind of
Unix you're running). The "-L" option specifies an additional
directory to search for libraries. To take the above example again,
suppose you've installed the Qt libraries in /home/users/joe/qt, which
means that the library files are in /home/users/joe/qt/lib. Your link
step for your program might look something like this:
g++ -g test_mywidget.o mywidget.o -o test_mywidget -L/home/users/joe/qt/lib -lqt
(On some systems, if you link in Qt you will need to add other
libraries as well (e.g., "-L/usr/X11R6/lib -lX11 -lXext"). What you
need to do will depend on your system.)
Note that there is no space between "-L" and the directory name. The
"-L" option usually goes before any "-l" options it's supposed to
affect.
How do you know which libraries you need? In general, this is a hard
question, and varies depending on what kind of Unix you are running.
The documentation for the functions or classes you are using should say
what libraries you need to link with. If you are using functions or
classes from an external package, there is usually a library you need
to link with; the library will usually be a file called "libabc.a" or
"libabc.so" or "libabc.sa" if you need to add a "-labc" option.
Some other confusing things
You may have noticed that it is possible to specify options which
normally apply to compilation on the linking step, and options which
normally apply to linking on the compilation step. For example, the
following commands are valid:
cc -g -L/usr/X11R6/lib -c xyz.c -o xyz.o
cc -g -I/somewhere/include xyz.o -o xyz
The irrelevant options are ignored; the above commands are exactly
equivalent to this:
cc -g -c xyz.c -o xyz.o
cc -g xyz.o -o xyz
perl v5.20.3 2012-02-07 MAKEPP_TUTORIAL_COMPILATION(1)