Endianness

In the lower-left corner of the bottom pane of the Bless editor is a check box marked ''Show little endian decoding.'' By default the box is not checked, but in almost all cases it should be. The box tells Bless whether to interpret sequences of bytes as numeric values in ''big endian'' order or in ''little endian'' order. If you click and unclick the check box, the values displayed in the lower pane will change radically, even if you don't move the cursor. When you change the state of that check box, you are changing the way that the Bless editor interprets a sequence of bytes in a file as some sort of number.

If you recall from Chapter 4, a single byte can represent numbers from 0 to 255. If you want to represent a number larger than 255, you must use more than one byte to do it. A sequence of two bytes in a row can represent any number from 0 to 65,535. However, once you have more than one byte representing a numeric value, the order of the bytes becomes crucial.

Let's go back to the first two bytes in either of the two files we loaded earlier into Bless. They're nominally the letters ''S'' and ''a,'' but that is simply another interpretation. The hexadecimal sequence 53 61H may also be interpreted as a number. The 53H appears first in the file. The 61H appears after it (see Figures 5-1 and 5-2). So, taken together as a single 16-bit value, the two bytes become the hex number 53 61H.

Or do they? Perhaps a little weirdly, it's not that simple. See Figure 5-4. The left part of the figure is a little excerpt of the information shown in the Bless hex display pane for our example text file. It shows only the first two bytes and their offsets from the beginning of the file. The right portion of the figure is the very same information, but reversed left-for-right, as though seen in a mirror. It's the same bytes in the same order, but we see them differently. What we assumed at first was the 16-bit hex number 53 61H now appears to be 61 53H.

Did the number change? Not from the computer's perspective. All that changed was the way we printed it on the page of this book. By custom, people reading English start at the left and read toward the right. The layout of the Bless hex editor display reflects that. But many other languages in the world, including Hebrew and Arabic, start at the right margin and read toward the left. An Arabic programmer's first impulse might be to see the two bytes as 61 53H, especially if he or she is using software designed for the Arabic language conventions, displaying file contents from right to left.

It's actually more confusing than that. Western languages (including English) are a little schizoid, in that they read text from left to right, but

Reading from left to right (English & most European languages)

Reading from right to left (Hebrew & Arabic)

Offset Increases

Offset Increases

Offset Increases

Offset Increases

00 01

01 00

53 61

61 53

So is it "53 61H" or "61 53H" ?

Figure 5-4: Differences in display order vs. differences in evaluation order evaluate numeric columns from right to left. The number 426 consists of four hundreds, two tens, and six ones, not four ones, two tens, and six hundreds. By convention here in the West, the least significant column is at the right, and the values of the columns increase from right to left. The most significant column is the leftmost.

Confusion is a bad idea in computing. So whether or not a sequence of bytes is displayed from left to right or from right to left, we all have to agree on which of those bytes represents the least significant figure in a multibyte number, and which the most significant figure. In a computer, we have two options:

We can agree that the least significant byte of a multibyte value is at the lowest offset, and the most significant byte is at the highest offset.

We can agree that the most significant byte of a multibyte is at the lowest offset, and the least significant byte is at the highest offset.

These two choices are mutually exclusive. A computer must operate using one choice or the other; they cannot both be used at the same time at the whim of a program. Furthermore, this choice is not limited to the operating system, or to a particular program. The choice is baked right into the silicon of the CPU and its instruction set. A computer architecture that stores the least significant byte of a multibyte value at the lowest offset is called little endian. A computer architecture that stores the most significant byte of a multibyte value at the lowest offset is called big endian.

Figure 5-5 should make this clearer. In big endian systems, a multibyte value begins with its most significant byte. In little endian systems, a multibyte value begins with its least significant byte. Think: big endian, big end first; little endian, little end first.

Big Endian

Little Endian

Offset Increases

21345

Unsigned decimal equivalent

Offset Increases

Big Endian

Offset Increases

00

01

00

01

53

61 1

Bytes in storage

53

61 /

53

61

16-bit hexadecimal

61

53

Most Least Significant Significant Byte Byte

Most Least Significant Significant Byte Byte

24915

Figure 5-5: Big endian vs. little endian for a 16-bit value

There are big differences at stake here! The two bytes that begin our example text file represent the decimal number 21,345 in a big endian system, but 24,915 in a little endian system.

It's possible to do quite a bit of programming without being aware of a system's ''endianness.'' If you program in higher-level languages like Visual Basic, Delphi, or C, most of the consequences of endianness are hidden by the language and the language compiler—at least until something goes wrong at a low level. Once you start reading files at a byte level, you have to know how to read them; and if you're programming in assembly language, you had better be comfortable with endianness going in.

Reading hex displays of numeric data in big endian systems is easy, because the digits appear in the order that Western people expect, with the most significant digits on the left. In little endian systems, everything is reversed; and the more bytes used to represent a number, the more confusing it can become. Figure 5-6 shows the endian differences between evaluations of a 32-bit value. Little endian programmers have to read hex displays of multibyte values as though they were reading Hebrew or Arabic, from right to left.

Remember that endianness differences apply not only to bytes stored in files but also to bytes stored in memory. When (as I'll explain later) you inspect numeric values stored in memory with a debugger, all the same rules apply.

Big Endian

Little Endian

Big Endian

Little Endian

Offset Increases

>

Offset Increases

00

01 02

03

00

01 02

03

53

61 6D

0A

Bytes in storage

53

61 6D

1111

53 61 6D 0A

16-bit hexadecimal

0A 6D 61 53

Most Significant Byte

Least Significant Byte

Most Significant Byte

1398893834

Figure 5-6: Big endian vs. little endian for a 32-bit value

Unsigned decimal equivalent

Least Significant Byte

174940499

So, which ''endianness'' do Linux systems use? Both! (Though not at the same time...) Again, it's not about operating systems. The entire x86 hardware architecture, from the lowly 8086 up to the latest Core 2 Quad, is little endian. Other hardware architectures, such as Motorola's 68000 and the original PowerPC, and most IBM mainframe architectures like System/370, are big endian. More recent hardware architectures have been designed as bi-endian, meaning they can be configured (with some difficulty) to interpret numeric values one way or the other at the hardware level. Alpha, MIPS, and Intel's Itanium architecture are bi-endian.

If (as mostly likely) you're running Linux on an ordinary x86 CPU, you'll be little endian, and you should check the box on the Bless editor labeled ''Show little endian decoding.'' Other programming tools may offer you the option of selecting big endian display or little endian display. Make sure that whatever tools you use, you have the correct option selected.

Linux, of course, can be made to run on any hardware architecture, so using Linux doesn't guarantee that you will be facing a big endian or little endian system, and that's one reason I've gone on at some length about endianness here. You have to know from studying the system what endianness is currently in force, though you can learn it by inspection: store a 32-bit integer to memory and then look at it with a debugger or a hex editor like Bless. If you know your hex (and you had better!) the system's endianness will jump right out at you.

Was this article helpful?

0 0

Post a comment