Assemblers read your source code files and generate an object code file containing the machine instructions that the CPU understands, plus any data you've defined in the source code.
There's no reason at all why an assembler could not read a source code file and write out a finished, executable program as its object code file, but this is almost never done. The assembler I'm teaching in this book, NASM, can do that for DOS programs, and can write out .COM executable files for the real mode flat model. More modern operating systems such as Linux and Windows are too complex for that, and in truth, there's no real payoff in such one-step assembly except when you're first learning to write assembly language.
So the object code files produced by modern assemblers are a sort of intermediate step between source code and executable program. This intermediate step is a type of binary file called an object module, or simply an object code file.
Object code files cannot themselves be run as programs. An additional step, called linking, is necessary to turn object code files into executable program files.
The reason for object code files as intermediate steps is that a single large source code file may be divided up into numerous smaller source code files to keep the files manageable in size and complexity. The assembler assembles the various fragments separately, and the several resulting object code files are then woven together into a single executable program file. This process is shown in Figure 5-8.
Source code text files
Object code binary files
Object code binary files
Figure 5-8: The assembler and linker
When you're first learning assembly programming, it's unlikely that you'll be writing programs spread out across several source code files. This may make the linker seem extraneous, as there's only one piece to your program and nothing to link together. Not so: the linker does more than just stitch lumps of object code together into a single piece. It ensures that function calls out of one object module arrive at the target object module, and that all the many memory references actually reference what they're supposed to reference. The assembler's job is obvious; the linker's job is subtle. Both are necessary to produce a finished, working executable file.
Besides, you'll very quickly get to the point where you begin extracting frequently used portions of your programs into your own personal code libraries. There are two reasons for doing this:
1. You can move tested, proven routines into separate libraries and link them into any program you write that might need them. This way, you can reuse code and not build the same old wheels every time you begin a new programming project in assembly language.
2. Once portions of a program are tested and found to be correct, there's no need to waste time assembling them over and over again along with newer, untested portions of a program. Once a major program grows to tens of thousands of lines of code (and you'll get there sooner than you might think!) you can save a significant amount of time by assembling only the portion of a program that you are currently working on, linking the finished portions into the final program without re-assembling every single part of the whole thing every time you assemble any part of it.
The linker's job is complex and not easily described. Each object module may contain the following:
■ Program code, including named procedures
■ References to named procedures lying outside the module
■ Named data objects such as numbers and strings with predefined values
■ Named data objects that are just empty space ''set aside'' for the program's use later
References to data objects lying outside the module
■ Debugging information
Other, less common odds and ends that help the linker create the executable file
To process several object modules into a single executable module, the linker must first build an index called a symbol table, with an entry for every named item in every object module it links, with information about what name (called a symbol) refers to what location within the module. Once the symbol table is complete, the linker builds an image of how the executable program will be arranged in memory when the operating system loads it. This image is then written to disk as the executable file.
The most important thing about the image that the linker builds relates to addresses. Object modules are allowed to refer to symbols in other object modules. During assembly, these external references are left as holes to be filled later—naturally enough, because the module in which these external symbols exist may not have been assembled or even written yet. As the linker builds an image of the eventual executable program file, it learns where all of the symbols are located within the image, and thus can drop real addresses into all of the external reference holes.
Debugging information is, in a sense, a step backward: portions of the source code, which was all stripped out early in the assembly process, are put back in the object module by the assembler. These portions of the source code are mostly the names of data items and procedures, and they're embedded in the object file to make it easier for the programmer (you!) to see the names of data items when you debug the program. (I'll go into this more deeply later.) Debugging information is optional; that is, the linker does not need it to build a proper executable file. You choose to embed debugging information in the object file while you're still working on the program. Once the program is finished and debugged to the best of your ability, you run the assembler and linker one more time, without requesting debugging information. This is important, because debugging information can make your executable files hugely larger than they otherwise would be, and generally a little slower.
Was this article helpful?