T

...the blinders force the CPU to read and write memory in chunks no more than 65,536 bytes in size.

00000H

Figure 4-3: Seeing a megabyte through 64K blinders

The CPU's view of memory in real mode segmented model is peculiar. It is constrained to look at memory in chunks, where no chunk is larger than 65,536 bytes in length—again, what we call ''64K.'' Making use of those chunks—that is, knowing which one is currently in use and how to move from one to another—is the real challenge of real mode segmented model programming. It's time to take a closer look at what segments are and how they work.

The Nature of Segments

We've spoken informally of segments so far as chunks of memory within the larger memory space that the CPU can see and use. In the context of real mode segmented model, a segment is a region of memory that begins on a paragraph boundary and extends for some number of bytes. In real mode segmented model, this number is less than or equal to 64K (65,536). You've seen the number 64K before, but paragraphs?

Time out for a lesson in old-time 86-family trivia. A paragraph is a measure of memory equal to 16 bytes. It is one of numerous technical terms used to describe various quantities of memory. We've looked at some of them before, and all of them are even multiples of 1 byte. Bytes are data atoms, remember; loose memory bits are more like subatomic particles, and they never exist in the absence of a byte (or more) of memory to contain them. Some of these terms are used more than others, but you should be aware of all of them, which are provided in Table 4-1.

Table 4-1: Collective Terms for Memory

NAME

VALUE IN DECIMAL

VALUE IN HEX

Byte

1

01H

Word

2

02H

Double word

4

04 H

Quad word

8

08H

Ten byte

10

OAH

Paragraph

16

10H

Page

256

100H

Segment

65,536

10000H

Some of these terms, such as ten byte, occur very rarely, and others, such as page, occur almost never. The term paragraph was never common to begin with, and for the most part was used only in connection with the places in memory where segments may begin.

Any memory address evenly divisible by 16 is called a paragraph boundary. The first paragraph boundary is address 0. The second is address 10H; the third address 20H, and so on. (Remember that 10H is equal to decimal 16.) Any paragraph boundary may be considered the start of a segment.

This doesn't mean that a segment actually starts every 16 bytes up and down throughout that megabyte of memory. A segment is like a shelf in one of those modern adjustable bookcases. On the back face of the bookcase are a great many little slots spaced one-half inch apart. A shelf bracket can be inserted into any of the little slots. However, there aren't hundreds of shelves, but only four or five. Nearly all of the slots are empty and unused. They exist so that a much smaller number of shelves may be adjusted up and down the height of the bookcase as needed.

In a very similar manner, paragraph boundaries are little slots at which a segment may be begun. In real mode segmented model, a program may make use of only four or five segments, but each of those segments may begin at any of the 65,536 paragraph boundaries existing in the megabyte of memory available in the real mode segmented model.

There's that number again: 65,536—our beloved 64K. There are 64K different paragraph boundaries where a segment may begin. Each paragraph boundary has a number. As always, the numbers begin from 0, and go to 64K minus one; in decimal 65,535, or in hex 0FFFFH. Because a segment may begin at any paragraph boundary, the number of the paragraph boundary at which a segment begins is called the segment address of that particular segment.

We rarely, in fact, speak of paragraphs or paragraph boundaries at all. When you see the term segment address in connection with real mode segmented model, keep in mind that each segment address is 16 bytes (one paragraph) farther along in memory than the segment address before it. In Figure 4-4, each shaded bar is a segment address, and segments begin every sixteen bytes. The highest segment address is 0FFFFH, which is 16 bytes from the very top of real mode's 1 megabyte of memory.

In summary: segments may begin at any segment address. There are 65,536 segment addresses evenly distributed across real mode's full megabyte of memory, sixteen bytes apart. A segment address is more a permission than a compulsion; for all the 64K possible segment addresses, only five or six are ever actually used to begin segments at any one time. Think of segment addresses as slots where segments may be placed.

So much for segment addresses; now, what of segments themselves? The most important thing to understand about a segment is that it may be up to 64K bytes in size, but it doesn't have to be. A segment may be only one byte long, or 256 bytes long, or 21,378 bytes long, or any length at all short of 64K bytes.

0FFFFFH

0FFFF8H

0FFFFH

Segment addresses in the range 0000H -0FFFFH

0FFFF0H

Memory addresses in the range 00000H -0FFFFFH

00028H

0002H

00020H

00018H

0001H

00010H

00008H

0000H

00002H 00001H 00000H

Figure 4-4: Memory addresses versus segment addresses

A Horizon, Not a Place

You define a segment primarily by stating where it begins. What, then, defines how long a segment is? Nothing, really—and we get into some really tricky semantics here. A segment is more a horizon than a place. Once you define where a segment begins, that segment can encompass any location in memory between that starting place and the horizon—which is 65,536 bytes down the line.

Nothing dictates, of course, that a segment must use all of that memory. In most cases, when a segment is defined at some segment address, a program considers only the next few hundred or perhaps few thousand bytes as part of that segment, unless it's a really world-class program. Most beginners reading about segments think of them as some kind of memory allocation, a protected region of memory with walls on both sides, reserved for some specific use.

This is about as far from true as you can get. In real mode nothing is protected within a segment, and segments are not reserved for any specific register or access method. Segments can overlap. (People often don't think about or realize this.) In a very real sense, segments don't really exist, except as horizons beyond which a certain type of memory reference cannot go. It comes back to that set of 64K blinders that the CPU wears, as I drew in Figure 4-3. I think of it this way: A segment is the location in memory at which the CPU's 64K blinders are positioned. In looking at memory through the blinders, you can see bytes starting at the segment address and going on until the blinders cut you off, 64K bytes down the way.

The key to understanding this admittedly metaphysical definition of a segment is knowing how segments are used—and understanding that finally requires a detailed discussion of registers.

Making 20-Bit Addresses out of 16-Bit Registers

A register, as I've mentioned informally in earlier chapters, is a memory location inside the CPU chip, rather than outside the CPU in a memory bank somewhere. The 8088, 8086, and 80286 are often called 16-bit CPUs because their internal registers are almost all 16 bits in size. The 80386 and its twenty years' worth of successors are called 32-bit CPUs because most of their internal registers are 32 bits in size. Since the mid-2000s, many of the new x86 CPUs are 64 bits in design, with registers that are 64 bits wide. (More about this at the end of the chapter.) The x86 CPUs have a fair number of registers, and they are an interesting crew indeed.

Registers do many jobs, but perhaps their most important single job is holding addresses of important locations in memory. If you recall, the 8086 and 8088 have 20 address pins, and their megabyte of memory (which is the real mode segmented memory we're talking about) requires addresses 20 bits in size.

How do you put a 20-bit memory address in a 16-bit register? You don't.

You put a 20-bit address in two 16-bit registers.

What happens is this: all memory locations in real mode's megabyte of memory have not one address but two. Every byte in memory is assumed to reside in a segment. A byte's complete address, then, consists of the address of its segment, along with the distance of the byte from the start of that segment. Recall that the address of the segment is the byte's segment address. The byte's distance from the start of the segment is the byte's offset address. Both addresses must be specified to completely describe any single byte's location within the full megabyte of real mode memory. When written out, the segment address comes first, followed by the offset address. The two are separated with a colon. Segment:offset addresses are always written in hexadecimal.

I've drawn Figure 4-5 to help make this a little clearer. A byte of data we'll call ''MyByte'' exists in memory at the location marked. Its address is given as 0001:0019. This means that MyByte falls within segment 0001H and is located 0019H bytes from the start of that segment. It's a convention in x86 programming that when two numbers are used to specify an address with a colon between them, you do not end each of the two numbers with an H for hexadecimal. Addresses written in segment:offset form are assumed to be in hexadecimal.

The universe is perverse, however, and clever eyes will perceive that MyByte can have two other perfectly legal addresses: 0:0029 and 0002:0009. How so? Keep in mind that a segment may start every 16 bytes throughout the full megabyte of real memory. A segment, once begun, embraces all bytes from its origin to 65,535 bytes further up in memory. There's nothing wrong with segments overlapping, and in Figure 4-5 we have three overlapping segments. MyByte is 2DH bytes into the first segment, which begins at segment address 0000H. MyByte is 1DH bytes into the second segment, which begins at segment address 0001H. It's not that MyByte is in two or three places at once. It's in only one place, but that one place may be described in any of three ways.

It's a little like Chicago's street-numbering system. Howard Street is 76 blocks north of Chicago's ''origin,'' Madison Street. Howard Street is also four blocks north of Touhy Avenue. You can describe Howard Street's location relative to either Madison Street or Touhy Avenue, depending on what you want to do.

An arbitrary byte somewhere in the middle of real mode's megabyte of memory may fall within literally thousands of different segments. Which segment the byte is actually in is strictly a matter of convention.

In summary: to express a 20-bit address in two 16-bit registers is to put the segment address into one 16-bit register, and the offset address into another 16-bit register. The two registers taken together identify one byte among all 1,048,576 bytes in real mode's megabyte of memory.

MyByte could have any of three possible addresses:

0000 0001 0002

0029 0019 0009

9H Bytes

0002H

19H Bytes

29H Bytes

0001H

0000H

MyByte

Figure 4-5: Segments and offsets

Is this awkward? You bet, but it was the best we could do for a good many years.

Was this article helpful?

0 0

Post a comment