The Mechanics of Globals and Externals

The hexdump2 program in Listing 10-1 contains several procedures. Let's pull those procedures out of the main program module and create a separately assembled library module from them, so that we can see how it all works.

UTILS.ASM

\

PROG.ASM

\

EXTERN MyVar

GLOBAL MyVar

GLOBAL MyProc

EXTERN MyProc

Assembler i

Assembler i

UTILS.O

MyVar

MyProc

PROG.O

Linker

Figure 10-3: Connecting globals and externals

I've described the source code requirements of assembly language programs in detail in the last few chapters. Separately assembled library modules are similar to programs and may have all three of the sections (.text, .data, and .bss)

that program modules may have. There are two major differences, however, related to things that library modules lack:

■ External modules do not contain a main program and hence have no start address. That is, no label _start: exists in a library to indicate to the linker that this is the point at which code execution is to begin. Library modules are not intended to be run by themselves, so a _start: label in a library module is both unnecessary and grounds for a fatal linker error if _start: already exists in the main program module.

External modules do not return to Linux. If only the main program module contains a _start: label, then only the main program module should contain the code to make the required sys_exit int 80h call shutting down the program and giving control back to Linux. As a general rule of thumb, never make a call to sys_exit from within a procedure, whether it's a procedure located in the same module as the main program, or a procedure located in an external library module.

First, take a look at Listing 10-2, which contains a program called hexdump3. It does precisely the same things as hexdump2. It's a lot shorter than hexdump2 from a source code standpoint, because most of its machinery has been outsourced. Outsourced where? You don't know yet—and you don't have to. NASM will put off resolving the addresses of the missing procedures as long as you list all the missing procedures using the EXTERN directive.

Listing 10-2: hexdump3.asm

Executable name Version Created date Last update Author Description hexdump3 1.0

4/15/2009 4/20/2009 Jeff Duntemann

A simple hex dump utility demonstrating the use of separately assembled code libraries via EXTERN

Build using these commands:

nasm -f elf -g -F stabs hexdump3.asm ld -o hexdump3 hexdump3.o <path>/textlib.o

SECTION .bss ; Section containing uninitialized data

BUFFLEN EQU 10 Buff resb BUFFLEN

SECTION .data ; Section containing initialised data

SECTION .text ; Section containing code

EXTERN ClearLine, DumpChar, PrintLine

Listing 10-2: hexdump3.asm (continued)

GLOBAL start

_start:

nop nop xor esi,esi

; This no-op keeps gdb happy... ; Clear total chars counter to 0

; Read a buffer full of text from stdin: Read:

mov eax,3 mov ebx,0 mov ecx,Buff mov edx,BUFFLEN int 80h mov ebp,eax cmp eax,0 je Done

Specify sys_read call

Specify File Descriptor 0: Standard Input Pass offset of the buffer to read to Pass number of bytes to read at one pass Call sys_read to fill the buffer Save # of bytes read from file for later If eax=0, sys_read reached EOF on stdin Jump If Equal (to 0, from compare)

Set up the registers for the process buffer step:

xor ecx,ecx ; Clear buffer pointer to 0

; Go through the buffer and convert binary values to hex digits: Scan:

xor eax,eax mov al,byte[Buff+ecx]

mov edx,esi and edx,0000000Fh call DumpChar

Clear EAX to 0

Get a char from the buffer into AL Copy total counter into EDX Mask out lowest 4 bits of char counter Call the char poke procedure

Bump the buffer pointer to the next character and see if buffer's done: inc ecx ; Increment buffer pointer inc esi ; Increment total chars processed counter cmp ecx,ebp ; Compare with # of chars in buffer jae Read ; If we've done the buffer, go get more

See if we're at the end of a block of 16 and need to display a line:

test esi,0000000Fh jnz Scan call PrintLine call ClearLine jmp Scan

Test 4 lowest bits in counter for 0 If counter is *not* modulo 16, loop back ...otherwise print the line Clear hex dump line to 0's Continue scanning the buffer

call PrintLine mov eax,1 mov ebx,0 int 80H

Print the "leftovers" line Code for Exit Syscall Return a code of zero Make kernel call

External declarations of multiple items may be put on a single line, separated by commas, as in hexdump3:

EXTERN ClearLine, DumpChar, PrintLine

There does not have to be a single EXTERN directive. Several may exist in a module; each external identifier, in fact, may have its own extern directive. It's up to you. When you have a longish list of external identifiers, however, don't make this mistake, which is an error:

EXTERN InitBlock, ReadBlock, ValidateBlock, WriteBlock, CleanUp, ShowStats, PrintSummary ; ERROR!

extern declarations do not span line boundaries. (In fact, almost nothing in assembly language spans line boundaries, especially with NASM. Pascal and C programmers run up against this peculiarity fairly often.) If you have too many external declarations to fit on a single line with a single extern, place additional extern directives on subsequent lines. There is no limit to the number of extern directives in a single module.

To make hexdump3 link into a functioning executable program, we have to create an external library module for all of its procedures. All that's needed are the procedures and their data in the proper sections, and the necessary GLOBAL declarations, as shown in Listing 10-3.

Listing 10-3: textlib.asm

Library name Version Created date Last update Author Description textlib 1.0

4/10/2009 4/20/2009 Jeff Duntemann

A linkable library of text-oriented procedures and tables

; Build using these commands: ; nasm -f elf -g -F stabs textlib.asm

SECTION .bss ; Section containing uninitialized data

BUFFLEN EQU 10 Buff resb BUFFLEN

SECTION .data ; Section containing initialised data

; Here we have two parts of a single useful data structure, implementing the ; text line of a hex dump utility. The first part displays 16 bytes in hex ; separated by spaces. Immediately following is a 16-character line delimited

Listing 10-3: textlib.asm (continued)

by vertical bar characters. Because they are adjacent, they can be referenced separately or as a single contiguous unit. Remember that if DumpLin is to be used separately, you must append an EOL before sending it to the Linux console. DumpLin: db " 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 "

DUMPLEN EQU $-DumpLin

ASCLEN EQU $-ASCLin

FULLLEN EQU $-DumpLin

; The HexDigits table is used to convert numeric values to their hex ; equivalents. Index by nybble without a scale: [HexDigits+eax] HexDigits: db "0123456789ABCDEF"

This table allows us to generate text equivalents for binary numbers. Index into the table by the nybble using a scale of 4: [BinDigits + ecx*4] BinDigits: db "0000","0001","0010","0011" db "0100","0101","0110","0111" db "1000","1001","1010","1011" db "1100","1101","1110","1111"

This table is used for ASCII character translation, into the ASCII portion of the hex dump line, via XLAT or ordinary memory lookup. All printable characters "play through" as themselves. The high 128 characters are translated to ASCII period (2Eh). The non-printable characters in the low 128 are also translated to ASCII period, as is char 127.

DotXlat:

db 2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh db 2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh, 2Eh db 20h,21h,22h,23h,24h,25h,26h,27h,28h,29h,2Ah,2Bh,2Ch,2Dh,2Eh,2Fh db 30h,31h,32h,33h,34h,35h,36h,37h,38h,39h,3Ah,3Bh,3Ch,3Dh,3Eh,3Fh db 40h,41h,42h,43h,44h,45h,46h,47h,48h,49h,4Ah,4Bh,4Ch,4Dh,4Eh, 4Fh db 50h,51h,52h,53h,54h,55h,56h,57h,58h,59h,5Ah,5Bh,5Ch,5Dh,5Eh, 5Fh db 60h,61h,62h,63h,64h,65h,66h,67h,68h,69h, 6Ah,6Bh,6Ch,6Dh,6Eh, 6Fh db 70h,71h,72h,73h,74h,75h,76h,77h,78h,79h,7Ah,7Bh,7Ch,7Dh,7Eh,2Eh db 2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh, 2Eh db 2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh, 2Eh db 2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh db 2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh db 2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh db 2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh db 2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh db 2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh,2Eh, 2Eh

SECTION .text

Section containing code

Listing 10-3: textlib.asm (continued)

GLOBAL ClearLine, DumpChar, Newlines, PrintLine ; Procedures

GLOBAL DumpLin, HexDigits, BinDigits ; Data items

ClearLine: Clear a hex dump line string to 16 0 values

UPDATED: 4/13/2009

IN: Nothing

RETURNS: Nothing

MODIFIES: Nothing

CALLS: DumpChar

DESCRIPTION: The hex dump line string is cleared to binary 0.

ClearLine:

push edx mov edx,15 .Poke: mov eax,0

call DumpChar sub edx,1 jae .Poke pop edx ret

Save caller's EDX

We're going to go 16 pokes, counting from 0

Tell DumpChar to poke a '0'

Insert the 10' into the hex dump string

DEC doesn't affect CF!

Restore caller's EDX

Go home

DumpChar: "Poke" a value into the hex dump line string. UPDATED: 4/13/2009

IN: Pass the 8-bit value to be poked in EAX.

Pass the value's position in the line (0-15) in EDX RETURNS: Nothing

MODIFIES: EAX CALLS: Nothing

DESCRIPTION: The value passed in EAX will be placed in both the hex dump portion and in the ASCII portion, at the position passed in ECX, represented by a space where it is not a printable character.

DumpChar:

push ebx ; Save EBX on the stack so we don't trash push edi

; First we insert the input char into the ASCII portion of the dump line mov bl,byte [DotXlat+eax] ; Translate nonprintables to '.' mov byte [ASCLin+edx+1],bl ; Write to ASCII portion ; Next we insert the hex equivalent of the input char in the hex portion ; of the hex dump line:

mov ebx,eax ; Save a second copy of the input char lea edi,[edx*2+edx] ; Calc offset into line string (ECX X 3)

; Look up low nybble character and insert it into the string:

and eax,0000000Fh ; Mask out all but the low nybble mov al,byte [HexDigits+eax] ; Look up the char equivalent of nybble

Listing 10-3: textlib.asm (continued)

mov byte [DumpLin+edi+2],al ; Write char equivalent to line string ; Look up high nybble character and insert it into the string:

and ebx,000000F0h ; Mask out all the but second-lowest nybble shr ebx,4 ; Shift high 4 bits of char into low 4 bits mov bl,byte [HexDigits+ebx] ; Look up char equivalent of nybble mov byte [DumpLin+edi+1],bl ; Write the char equiv. to line string ;Done! Let's go home:

pop ebx ; Restore caller's EBX register value pop edi ; Restore caller's EDI register value ret ; Return to caller

Newlines: Sends between 1 and 15 newlines to the Linux console UPDATED: 4/13/2009

RETURNS: Nothing

MODIFIES: Nothing

CALLS: Kernel sys_write

DESCRIPTION: The number of newline chareacters (0Ah) specified in EDX is sent to stdout using using INT 80h sys_write. This procedure demonstrates placing constant data in the procedure definition itself, rather than in the .data or .bss sections.

Newlines:

pushad

.exit cmp edx,15 ja .exit mov ecx,EOLs mov eax,4 mov ebx,1 int 80h popad ret

Save all caller's registers

Make sure caller didn't ask for more than 15

If so, exit without doing anything

Put address of EOLs table into ECX

Specify sys_write

Specify stdout

Make the kernel call

Restore all caller's registers

Go home!

EOLs: db 10,10,10,10,10,10,10,10,10,10,10,10,10,10,10

PrintLine: Displays the hex dump line string via INT 80h sys_write

UPDATED: 4/13/2009

IN: Nothing

RETURNS: Nothing

MODIFIES: Nothing

CALLS: Kernel sys_write

DESCRIPTION: The hex dump line string is displayed to stdout using INT 80h sys_write.

PrintLine:

pushad ; Push all GP registers

Listing 10-3: textlib.asm (continued)

mov eax,4

mov ebx,1

mov ecx,DumpLin mov edx,FULLLEN

int 80h popad ret

Specify sys_write call

Specify File Descriptor 1: Standard output

Pass offset of line string

Pass size of the line string

Make kernel call to display line string

Pop all GP registers

Go home!

There are two lines of global identifier declarations, each line with its own global directive. As a convention in my own work, I separate declarations of procedures and named data items, and give each their own line:

GLOBAL ClearLine, DumpChar, Newlines, PrintLine ; Procedures

Any procedure or data item that is to be exported (that is, made available outside the module) must be declared on a line after a global directive. You don't have to declare everything in a module global. In fact, one way to manage complexity and prevent certain kinds of bugs is to think hard about and strictly limit what other modules can ''see'' in their fellow modules. A module can have ''private'' procedures and named data items that can only be referenced from inside the module. Making these items private is in fact the default: Just don't declare them global.

Note well that all items declared global must be declared global before they are defined in the source code. In practice, this means that you need to declare global procedures at the top of the .text section, before any of the procedures are actually defined. Similarly, all global named data items must be declared in the .data section before the data items are defined.

Equates can be exported from modules, though this is an innovation of the NASM assembler and not necessarily true of all assemblers. Just place the label associated with an equate in a list of extern definitions, and other modules will be able to see and use the equate, though equates are necessarily considered read-only values, like Pascal constants.

Was this article helpful?

0 0

Post a comment