Creating the Layout

Last Updated on Wed, 06 Jan 2021 | Linux Kernel Architecture

The address space of a task is laid out when an ELF binary is loaded with load_elf_binary — recall that the function is used by the exec system call. Loading an ELF file is cluttered with numerous technical details that are not interesting for our purposes, so the code flow diagram in Figure 4-3 concentrates on the steps required to set up the virtual memory region.

load_elf_binary\|

		Set pf_randomize if required

		arch_pick_mmap_layoutl

		setup_arg_page^

Figure 4-3: Code flow diagram for load_elf_binary.

Address space randomization is enabled if the global variable randomize_va_space is set to 1. This is usually the case, but is disabled for Transmeta CPUs because it has a negative speed impact on such machines. Besides, the user can use /proc/sys/kernel/randomize_va_space to disable the feature.

The address space layout is selected in arch_pick_mmap_layout. If the architecture does not provide a specific function, the kernel's default routine sets up the address space as shown in Figure 4-1. It is, however, more interesting to observe how IA-32 chooses between the classical and the new alternative:

arch/x86/mm/mmap_32.c void arch_pick_mmap_layout(struct mm_struct *mm) {

* Fall back to the standard layout if the personality

* bit is set, or if the expected stack growth is unlimited:

if (sysctl_legacy_va_layout |

(current->personality & ADDR_COMPAT_LAYOUT) | current->signal->rlim[RLIMIT_STACK].rlim_cur == RLIM_INFINITY)

mm->mmap_base = TASK_UNMAPPED_BASE; mm->get_unmapped_area = arch_get_unmapped_area; mm->unmap_area = arch_unmap_area;

mm->get_unmapped_area = arch_get_unmapped_area_topdown; mm->unmap_area = arch_unmap_area_topdown;

The old layout is chosen if the user has explicitly instructed to do so via /proc/sys/kernel/legacy_ va_layout, if a binary that was compiled for a different Unix flavor that requires the old layout is executed, or — most importantly — the stack may grow infinitely. This makes it difficult to find a bound for the stack below which the mmap region can start.

In the classical case, the start of the mmap area is at task_unmapped_base, which resolves to 0x4000000, and the standard function arch_get_unmapped_area (despite its name, the function is not necessarily architecture-specific, but there's also a standard implementation available in the kernel) is used to grow new mappings from bottom to top.

When the new layout is used, memory mappings grow from top to bottom. The standard function arch_get_unmapped_area_topdown (which I will not consider in detail) is responsible for this. More interesting is how the base address for memory mappings is chosen:

arch/x86/mm/mmap_32.c

#define MIN_GAP (128*1024*1024) #define MAX_GAP (TASK_SIZE/6*5)

static inline unsigned long mmap_base(struct mm_struct *mm) {

unsigned long gap = current->signal->rlim[RLIMIT_STACK].rlim_cur; unsigned long random_factor = 0;

if (current->flags & PF_RANDOMIZE)

gap = MIN_GAP; else if (gap > MAX_GAP) gap = MAX_GAP;

return PAGE_ALIGN(TASK_SIZE - gap - random_factor);

The lowest possible stack location that can be computed from the maximal stack size can be used as the start of the mmap area. However, the kernel ensures that the stack spans at least 128 MiB. Additionally, it is ensured that at least a small portion of the address space is not taken up by the stack if a gigantic stack limit is specified.

If address space randomization is requested, the position is modified by a random offset of maximally 1 MiB. Additionally, the kernel ensures that the region is aligned along the page frame size because this is required by the architecture.

At a first glance, one could assume that life is easier for 64-bit architectures because they should not have to choose between different address layouts — the virtual address space is so large that collisions of heap and mmap region are nearly impossible.

However, the definition of arch_pick_mmap_layout for the AMD64 architecture shows that another complication arises:

arch/x86_64/mmap.c void arch_pick_mmap_layout(struct mm_struct *mm) {

#ifdef CONFIG_IA32_EMULATION

if (current_thread_info()->flags & _TIF_IA32) return ia32_pick_mmap_layout(mm);

#endif mm->mmap_base = TASK_UNMAPPED_BASE; if (current->flags & PF_RANDOMIZE) {

/* Add 28bit randomness which is about 40bits of address space because mmap base has to be page aligned, or ~1/128 of the total user VM (total user address space is 47bits) */ unsigned rnd = get_random_int() & Oxfffffff; mm->mmap_base += ((unsigned long)rnd) << PAGE_SHIFT;

mm->get_unmapped_area = arch_get_unmapped_area; mm->unmap_area = arch_unmap_area;

If binary emulation for 32-bit applications is enabled, any process that runs in compatibility mode should see the same address space as it would encounter on a native machine. Therefore, ia32_pick_ mmap_layout is used to lay out the address space for 32-bit applications. The function is an identical copy of arch_pick_mmap_layout for IA-32 systems, as discussed above.

The classic layout for virtual address space is always used on AMD64 systems so that there is no need to distinguish between the various options. Address space randomization is performed by shifting the otherwise fixed mmap_base if the PF_RANDOMIZE flag is set.

Let us go back to load_elf_binary. Finally, the function needs to create the stack at the appropriate location:

static int load_elf_binary(struct linux_binprm *bprm, struct pt_regs *regs) {

retval = setup_arg_pages(bprm, randomize_stack_top(STACK_TOP), executable_stack);

The standard function setup_arg_pages is used for this purpose. I will not discuss it in detail because it is only technical. The function requires the top of the stack as a parameter. This is given by the architecture-specific constant stack_top, but randomize_stack_top ensures that the address is changed by a random amount if address space randomization is required.

Continue reading here: Principle of Memory Mappings

Was this article helpful?

Creating the Layout

Related Posts