Manipulating floating point unit registers

Later models of the i386 architecture have more FPU registers than do earlier ones. The streaming SIMD extensions were introduced with the Pentium III. They are of use in areas such as image processing and word recognition. SSE uses 128-bit registers, called XMM registers. There is also an MXCSR register, containing control and status bits for operating the XMM registers.

H.10.2.1 Initialising floating point unit registers

When the current process uses the FPU for the first time, the function shown in Figure 11.27, from arch/i386/kernel/i38 7 .c, is called. It initialises the registers and, if SSE is supported, sets the MXCSR to its default value. It also notes that the current process now has valid values in the FPU.

33 voidinit_fpu(void)

36 if(cpu_has_xmm)

37 load_mxcsr(0x1f80);

39 current->used_math =1;

Figure 11.27 Initialising the hardware floating point unit registers

35 this executes the machine instruction to initialise the hardware FPU registers to default values, without checking for error conditions.

36-37 39

the macro to determine if the CPU has SSE capability is from Section 11.10.1. If so, this code sets the MXCSR register to its default value, 0001 1111 1000 0000. The 6 bits set in this mean that all six error types are enabled. This has been considered in more detail when discussing the handling of SIMD error conditions in the FPU (see Section 11.9.2). The load_mxcsr() macro is from Section

this initialisation routine should be run only once for each process, the first time it attempts to use the FPU. The used_math field in the task_struct notes that this initialisation has been done.

Checking if a process has used the floating point unit

Whenever a process uses the FPU, the PF_USEDFPU flag is set in the task_struct. The macro shown in Figure 11.28, from <asm-i386/i38 7.h>, checks this flag and, if it is set, saves the FPU context to the thread structure. For this it uses the save_init_fpu() function, from Section

Figure 11.28 Checking if a process has used the floating point unit

Loading the MXCSR register

The macro shown in Figure 11.29, from <asm-1386/i38 7.h>, loads the supplied value into the MXCSR register.

57 unsigned long_mxcsr = ((unsigned long)(val) & 0xffbf); \

58 asmvolatile( "ldmxcsr%0" : : "m" (_mxcsr) ); \

Figure 11.29 Loading the MXCSR register

0x ffbf is 1111 1111 1011 1111, so this line ensures that bit 6 of the supplied value is clear. This bit was introduced in the Pentium 4, and attempts to set it on any other type of CPU will cause a general protection exception.

this loads the value from_mxc sr into the MXCSR register.

Saving state of floating point unit registers

The state of FPU registers is saved to the thread structure by the functions shown in Figure 11.30, from arch/i386/kernel/i387.c.

46 static inline void_save_init_fpu( struct task_struct *tsk )

asmvolatile( "fxsave%0 ; fnclex"

asmvolatile( "fnsave%0 ; fwait"

void save_init_fpu( struct task_struct *tsk ) {


60 61

58-62 61

Figure 11.30 Saving floating point unit register values this function saves values from the FPU registers to the thread field of the specified task_struct.

this checks whether the CPU supports the FXSAVE instruction or not (see Section 11.10.1).

the FXSAVE instruction saves the x87 state, along with the state of the MMX, XMM, and MXCSR registers. The information is copied to parameter 0, which is the architecture-specific thread structure of the process, where it can be used by system software. The actual values in the registers are not altered. The FNCLEX instruction clears the floating point exception flags in the FPU status word, without checking for error conditions.

for older CPUs, FNSAVE is used. This instruction stores x87 FPU state information, including data registers, in thread, where it can be used by system software. It also initialises the FPU to the same default values as FNINIT. Then the FWAIT instruction checks for and handles pending unmasked x87 FPU exceptions. It does not wait for the next instruction, as usual.

in all cases, we clear the PF_USEDFPU bit in the flags field of the task_struct. The registers have been saved and, as long as this bit remains clear, they will not be saved again at a context switch.

this is just a wrapper function which calls_save_init_fpu() from line 46 and sets the TS

bit in CR0.

this macro, described in Section, sets the TS bit in the CR0 register, whether set beforehand or not. This bit allows context switches to be speeded up by swapping the FPU registers only when necessary. The CPU sets it to 1 every time a context switch occurs. An FPU instruction will raise an exception if it is set to 1. Setting it at this point means that a subsequent FPU instruction will raise a 'device not available' exception. Ideally, by the time this happens, the signal handler would have fixed up the saved values in the thread structure, so, when the handler for the 'device not available' exception reloads the hardware registers from there, execution should proceed correctly. Restoring floating point unit register values

Figure 11.31, from arch/i386/kernel/i38 7.c, shows the function that restores values to the FPU registers from the thread structure of the current process. It is not called on every context switch but only when required.

75 voidrestore_fpu( struct task_struct *tsk )

78 asm volatile( "fxrstor %0"

79 : : "m" (tsk->thread.i38 7.fxsave) );

81 asmvolatile( "frstor%0"

77 depending on whether the CPU has this extra functionality or not, this code executes the appropriate machine instruction to restore the register values from the thread structure of the current process.

78 this restores FPU state, as well as the XMM registers, and the MXCSR register, from parameter 0. This is the appropriate element of the i38 7_union in thread.

81 this restores the simpler FPU state from parameter 0. This is the appropriate element of the i38 7_union in thread. Manipulating the CR0 register

Several macros are defined to operate on the CR0 register. These are shown in Figure 11.32, from <asm-i386/system.h>.

Figure 11.31 Restoring the hardware floating point unit registers

110 111 112 113

#define clts()_asm__volatile_("clts")

unsigned int __dummy

"movl %%cr0,%0\n\t" :"=r" (_dummy));


#define write_cr0(x)

asm_("movl%0,%%cr0": :"r" (x));

104 the CLTS machine instruction clears the TS bit in CR0. Here it is converted to a C macro of the same name. This bit is set by the CPU on every context switch.

105-111 the macro evaluates to the value in the CR0 register.

108 the value in the CR0 register is moved to parameter 0, the local variable_dummy.

110 the macro evaluates to the value of_dummy.

112-113 this macro copies the value of parameter 0 (x) to the CR0 register.

124 this macro reads the value in CR0, using the macro from line 104. It then sets bit 3 (TS), whether it was set beforehand or not, and writes the result back to CR0, using the macro from line 111.

Was this article helpful?

0 0
Photoshop CS Mastery

Photoshop CS Mastery

Artists, photographers, graphic artists and designers. In fact anyone needing a top-notch solution for picture management and editing. Set Your Photographic Creativity Free. Master Adobe Photoshop Once and For All - Create Flawless, Dramatic Images Using The Tools The Professionals Choose. Get My Video Tutorials and Retain More Information About Adobe Photoshop.

Get My Free Videos

Post a comment