Assembly language work is a departure from C work, and gcc is first and foremost a C compiler. Therefore, we need to look first at the process of building C code. On the surface, building a C program for Linux using the GNU tools is pretty simple. Behind the scenes, however, it's a seriously hairy business. While it looks like gcc does all the work, what gcc really does is act as master controller for several GNU tools, supervising a code assembly line that you don't need to see unless you specifically want to.
Theoretically, this is all you need to do to generate an executable binary file from C source code:
gcc eatc.c -o eatc
Here, gcc takes the file eatc.c (which is a C source code file) and crunches it to produce the executable file eatc. (The -o option tells gcc what to name the executable output file.) However, there's more going on here than meets the eye. Take a look at Figure 12-1 as we go through it. In the figure, shaded arrows indicate movement of information. Blank arrows indicate program control.
The programmer invokes gcc from the shell command line, typically in a terminal window. Gcc takes control of the system and immediately invokes a utility called the C preprocessor, cpp. The preprocessor takes the original C source code file and handles certain items like #inciudes and #defines. It can be thought of as a sort of macro expansion pass on the C source code file.
When cpp is finished with its work, gcc takes over in earnest. From the preprocessed C source code file, gcc generates an assembly language source code file with an .s file extension. This is literally the assembly code equivalent of the C statements in the original .c file, in human-readable form. If you develop any skill in reading AT&T assembly syntax and mnemonics (more on which a little later) you can learn a lot from inspecting the .s files produced by gcc.
When gcc has completed generating the assembly language equivalent of the C source code file, it invokes the GNU assembler, gas, to assemble the .s file into object code. This object code is written out in a file with an .o extension.
The final step involves the linker, ld. The .o file contains binary code, but it's only the binary code generated from statements in the original .c file. The .o file does not contain the code from the standard C libraries that are so important in C programming. Those libraries have already been compiled and simply need to be linked into your application. The linker, ld, does this work at gcc's direction. The good part is that gcc knows precisely which of the standard C libraries need to be linked to your application to make it work, and it always includes the right libraries in their right versions. So, although gcc doesn't actually do the linking, it knows what needs to be linked—and that is valuable knowledge indeed, as your programs grow more and more complex.
At the end of the line, ld spits out the fully linked and executable program file. At that point, the build is done, and gcc returns control to the Linux shell. Note that all of this is typically done with one simple command to gcc!
Was this article helpful?