In this post we will cover a few essential topics of Heap Memory. We will learn about the heap creation, the structure of heap memory and different exploitation techniques. This post is mainly based on glibc's heap implementation. We recommend reading “Doug Lee malloc implementation’” to have a better understanding of 'dlmalloc()'. To avoid confusion, we have articulated the post in layman's terms instead of professional jargon.
What is a Heap?
Heap is a region of memory allotted to every program at the runtime to allocate variables whose size are not deterministic at compile time. This memory is global which means it can be accessed and modified from anywhere within a program using 'pointers'.
Wait…. why do we even need a Heap when we have ‘Stack’ segment?
Whenever we run a program, each thread will have a stack region where all the local variables are stored. But what if the variables are too large to be stored on stack? This may leave your program unstable with a possibility of crashing so we need another section of writable memory to store large variables. To manipulate Heap, there are two main functions: 'malloc()' and 'free()'. Pointers are used to allocate the memory using these functions.
Observing the above source code, we see that the function allocates 25 bytes of memory dynamically using the function 'malloc()' and then frees the memory at the end of the routine using the function 'free()'. As was previously mentioned, we need a pointer to allocate memory on heap (which is '*buffer' in this case).
Malloc: void* malloc(size_t size) Allocates a reasonable amount of memory, creates a heap segment and returns the caller a pointer to a memory region.As the 'size_t' is an unsigned type, the value of 'size' cannot be a negative value. However, when we pass a negative value to the function, it requests a huge amount of memory which is not possible in most cases. This leads to an undefined behaviorFree: void free(void *ptr) Releases the chunk of memory pointed to by ptr, that had been previously allocated using malloc or a related routine such as realloc(). The most common issues are Double Free, Use-After-Free due to the incorrect usage of this function in the program.
How the Heap works
Before diving into heap internals, we should know the definitions of some significant terms that will be used throughout the post:
- start_brk: pointer which denotes the beginning of the heap segment.
- brk: pointer that denotes the end of the heap segment.
- end_data: pointer which sets the end of the data segment.
Reference:https://elixir.bootlin.com/linux/v3.8/source/include/linux/mm_types.h#L364
The system calls brk(), sbrk() and mmap() are used by malloc to request memory from the OS.
<pbrk(): int brk(void *addr) Initializes 'end_data' to the value specified by addr. 'start_brk' to the next available segment after the data segment. And 'brk' = start_brk, initially start (start_brk) and end of heap segment (brk) would point to same location.
In order to expand the dimensions of heap segment, an additional glibc function 'sbrk()' is used.
sbrk(): void *sbrk(intptr_t increment)This is a wrapper function around brk() and increments the program break by increment bytes. Also, this function can be used to find the current location of the program break by passing argument '0' to the function.
mmap(): void * mmap (void *address, size_t length, int protect, int flags, int filedes, off_t offset) mmap is used by malloc() to create a private anonymous mapping segment. The primary purpose of private anonymous mapping is to allocate new memory (zero filled). The mmap uses the flags 'MAP_ANON' & 'MAP_PRIVATE'.
Now, let's try to understand how glibc manages memory internally. Knowing a few key concepts/data structures will be helpful to perform Heap Exploitation which will be covered in the next article.
Arena: Arena is a contiguous piece of memory. There are two types of arenas and the major difference between the main arena and a dynamically created arena is how they acquire memory from kernel. Only one main arena exists for a process and it calls sbrk() which usually starts immediately after the executable’s data section. On the other hand, a dynamic arena (Non-main arena) uses mmap() to manage the memory fetched from kernel.
Chunk: The Chunk is the bottom entity and it is the smallest unit of memory management in dynamic allocation. A chunk divides a large region of memory ('heap') into chunks of various sizes. When a 'free()' function is called, the chunk is marked as free and its reference will be placed in one of the data structures 'fastbinY' and 'bin'. As per the glibc wiki, "freeing" memory does not actually return it to the operating system for other applications to use instead it recycles these chunks to serve future malloc calls.
fastbinsY: This is an array of LIFO single linked lists holding recently freed chunks denominated fast chunks (chunks of size 16 - 80 bytes known as fast chunk)
bins: This array holds unsorted, small and large bins. The bins holding fast chunks are known as fast bins. Among all the bins, fast bins are faster in memory allocation and deal location Bins are the data structures which are used to hold free chunks of specific sizes depending on index. Based on chunk sizes, different bins are available
- Fast bin
- Unsorted bin
- Small bin
- Large bin
Fast bin: There are 10 fast bins. Each of these bins maintains a single linked list. Insertion and deletion happen from the top of this list.
Unsorted bin: The primary purpose of this bin is improve the efficiency of allocating and deallocating requests. There is only 1 unsorted bin. On freeing, the chunks end up in this bin.
Small bin: Small bins are faster than large bins but slower than fast bins. Each bin maintains a doubly-linked list and there are 62 small bins.
Large bin: A particular large bin has chunks of different sizes, sorted in decreasing order There are 63 large bins. like small bin, it maintains a doubly-linked list.
The internals of memory allocation and deallocation varies from one OS to the other. There are few good books to refer for the better understanding of Heap and its implementations. In the next few posts we will learn about Heap Overflows and few exploitation techniques.
Reference: https://sourceware.org/glibc/wiki/MallocInternals
Credit: ACE Team - Loginsoft