An Interrupt That Doesnt Interrupt Anything

As one new to the x86 family of processors back in 1981, the notion of a software interrupt drove me nuts. I kept looking and looking for the interrupter and interruptee. Nothing was being interrupted.

The name is unfortunate, though I admit that there is some reason for calling software interrupts as such. They are in fact courteous interrupts—if you can still call an interrupt an interrupt when it is so courteous that it does no interrupting at all.

The nature of software interrupts and Linux services is best explained by a real example illustrated twice in eatsyscall.asm. As I hinted previously, Linux keeps library routines—sequences of machine instructions focused on a single task—tucked away within itself. Each sequence does something useful—read something from a file, send something to a file, fetch the current time, access the network port, and so on. Linux uses these to do its own work, and it also makes them available (with its troll hat on) to you, the programmer, to access from your own programs.

Well, here is the critical question: how do you find something tucked away inside of Linux? All sequences of machine instructions, of course, have addresses, so why not just publish a list of the addresses of all these useful routines?

There are two problems here: first, allowing user space programs intimate access to operating system internals is dangerous. Malware authors could modify key components of the OS to spy on user activities, capture keystrokes and forward them elsewhere, and so on. Second, the address of any given sequence of instructions changes from one installation to another—nay, from one day to another, as software is installed and configured and removed from the PC. Linux is evolving and being improved and repaired on an ongoing basis. Ubuntu Linux releases two major updates every year in the spring and in the fall, and minor automatic updates are brought down to your PC regularly through the Update Manager. Repairing and improving code involves adding, changing, and removing machine instructions, which changes the size of those hidden code sequences—and, as a consequence, their location.

The solution is ingenious. There is a way to call service routines inside Linux that doesn't depend on knowing the addresses of anything. Most people refer to it as the kernel services call gate, and it represents a heavily guarded gateway between user space, where your programs run, and kernel space, where god/troll Linux does its work. The call gate is implemented via an x86 software interrupt.

At the very start of x86 memory, down at segment 0, offset 0, is a special lookup table with 256 entries. Each entry is a complete memory address including segment and offset portions, for a total of 4 bytes per entry. The first 1,024 bytes of memory in any x86 machine are reserved for this table, and no other code or data may be placed there.

Each of the addresses in the table is called an interrupt vector. The table as a whole is called the interrupt vector table. Each vector has a number, from 0 to 255. The vector occupying bytes 0 through 3 in the table is vector 0. The vector occupying bytes 4 through 7 is vector 1, and so on, as shown in Figure 8-4.

None of the addresses is burned into permanent memory the way the PC BIOS routines are. When your machine starts up, Linux and BIOS fill many of the slots in the interrupt vector table with addresses of certain service routines within themselves. Each version of Linux knows the location of its innermost parts, and when you upgrade to a new version of Linux, that new version will fill the appropriate slots in the interrupt vector table with upgraded and accurate addresses.

Vector 3

Vector 2

Vector 1

Vector 0

00000010h

0000000Ch

00000008h

00000004h

00000000h

The lowest location in x86 memory

Figure 8-4: The interrupt vector table

What doesn't change from Linux version to Linux version is the number of the interrupt that holds a particular address. In other words, since the very first Linux release, interrupt number 80h has pointed the way into darkest Linux to the services dispatcher, a sort of multiple-railway switch with spurs heading out to the many (almost 200) individual Linux kernel service routines. The address of the dispatcher is different with most Linux distributions and versions, but regardless of which Linux distro or which version of a distro that you have, programs can access the dispatcher by way of slot 80h in the interrupt vector table.

Furthermore, programs don't have to go snooping the table for the address themselves. In fact, that's forbidden under the restrictions of protected mode. The table belongs to the operating system, and you can't even go down there and look at it. However, you don't have to access addresses in the table directly. The x86 CPUs include a machine instruction that has special powers to make use of the interrupt vector table. The int (INTerrupt) instruction is used by eatsyscall.asm to request the services of Linux in displaying its ad slogan string on the screen. At two places, eatsyscall.asm has an int 80h instruction. When an int 8 0h instruction is executed, the CPU goes down to the interrupt vector table, fetches the address from slot 80h, and then jumps execution to that address. The transition from user space to kernel space is clean and completely controlled. On the other side of the address stored in table slot 80h, the dispatcher picks up execution and performs the service that your program requests.

The process is shown in Figure 8-5. When Linux loads at boot time, one of the many things it does to prepare the machine for use is put correct addresses in several of the vectors in the interrupt vector table. One of these addresses is the address of the kernel services dispatcher, which goes into slot 80h.

Later, when you type the name of your program eatsyscall on the Linux console command line, Linux loads the eatsyscall executable into user space memory and allows it to execute. To gain access to kernel services, eatsyscall executes int 8 0h instructions as needed. Nothing in your program needs to know anything more about the Linux kernel services dispatcher than its number in the interrupt vector table. Given that single number, eatsyscall is content to remain ignorant and simply let the int 80h instruction and interrupt vector 80h take it where it needs to go.

On the northwest side of Chicago, where I grew up, there was a bus that ran along Milwaukee Avenue. All Chicago bus routes have numbers, and the Milwaukee Avenue route is number 56. It started somewhere in the tangled streets just north of downtown, and ended up in a forest preserve just inside the city limits. The Forest Preserve District ran a swimming pool called Whelan Pool in that forest preserve. Kids all along Milwaukee Avenue could not necessarily have told you the address of Whelan Pool, but they could tell you in a second how to get there: Just hop on bus number 56 and take it to the end of the line. It's like that with software interrupts. Find the number of the vector that reliably points to your destination and ride that vector to the end of the line, without worrying about the winding route or the precise address of your destination.

Behind the scenes, the int 8 0h instruction does something else: it pushes the address of the next instruction (that is, the instruction immediately following the int 80h instruction) onto the stack, before it follows vector 80h into the Linux kernel. Like Hansel and Gretel, the int 8 0h instruction was pushing some breadcrumbs to the stack as a way of helping the CPU find its way back to the eatsyscall program after the excursion down into Linux—but more on that later.

Now, the Linux kernel services dispatcher controls access to 200 individual service routines. How does it know which one to execute? You have to tell the dispatcher which service you need, which you do by placing the service's number in register EAX. The dispatcher may require other information as well, and will expect you to provide that information in the correct place—almost always in various registers—before it begins its job.

The INT80h instruction first pushes the address of the instruction after it onto the stack...

User Space Kernel Space

User Space Kernel Space

The address at vector 80h takes execution into the Linux system call dispatcher

...and then jumps to whatever address is stored in vector 80h.

Figure 8-5: Riding an interrupt vector into Linux

Look at the following lines of code from eatsyscall.asm:

mov eax,4 mov ebx,1

; Specify sys_write syscall

; Specify File Descriptor 1: Standard Output mov ecx,EatMsg mov edx,EatLen int 80H

Pass offset of the message

Pass the length of the message

Make syscall to output the text to stdout

This sequence of instructions requests that Linux display a text string on the console. The first line sets up a vital piece of information: the number of the service that we're requesting. In this case, it's to sys_write, service number 4, which writes data to a Linux file. Remember that in Linux, just about everything is a file, and that includes the console. The second line tells Linux which file to write to: standard output. Every file must have a numeric file descriptor, and the first three (0,1, and 2) are standard and never change. The file descriptor for standard output is 1.

The third line places the address of the string to be displayed in ECX. That's how Linux knows what it is that you want to display. The dispatcher expects the address to be in ECX, but the address is simply where the string begins. Linux also needs to know the string's length, and we place that value in register EDX.

With the kernel service number, the address of the string, and the string's length tucked into their appropriate registers, we take a trip to the dispatcher by executing int 8 0h. The int instruction is all it takes. Boom!—execution crosses the bridge into kernel space, where Linux the troll reads the string at ECX and sends it to the console through mechanisms it keeps more or less to itself. Most of the time, that's a good thing: there can be too much information in descriptions of programming machinery, just as in descriptions of your personal life.

So much for getting into Linux. How does execution get back home again? The address in vector 80h took execution into the kernel services dispatcher, but how does Linux know where to go to pass execution back into eatsyscall? Half of the cleverness of software interrupts is knowing how to get there, and the other half—just as clever—is knowing how to get back.

To continue execution where it left off prior to the int 80h instruction, Linux has to look in a completely reliable place for the return address, and that completely reliable place is none other than the top of the stack.

I mentioned earlier (without much emphasis) that the int 8 0h instruction pushes an address to the top of the stack before it launches off into the unknown. This address is the address of the next instruction in line for execution: the instruction immediately following the int 8 0h instruction. This location is completely reliable because, just as there is only one interrupt vector table in the machine, there is only one stack in operation at any one time. This means that there is only one top of the stack—that is, at the address pointed

Was this article helpful?

0 0

Post a comment