how core file size limit has non-deterministic effects on processes

linux core dump analysis (1)

I'm running a custom 2.6.27 kernel and I just noticed the core files produced during a segfault are larger than the hard core file size limit set for processes.

And what makes it weirder is that the core file is only sometimes truncated (but not to the limit set by ulimit).

For example, this is the program I will try and crash below:

int main(int argc, char **argv)
    // Get the hard and soft limit from command line
    struct rlimit new = {atoi(argv[1]), atoi(argv[1])};

    // Create some memory so as to beef up the core file size
    void *p = malloc(10 * 1024 * 1024);

    if (!p)
        return 1;

    if (setrlimit(RLIMIT_CORE, &new)) // Set the hard and soft limit
        return 2;                     // for core files produced by this
                                      // process

    while (1);

    return 0;

And here's the execution:

Linux# ./a.out 1446462 &    ## Set hard and soft limit to ~1.4 MB
[1] 14802
Linux# ./a.out 1446462 &
[2] 14803
Linux# ./a.out 1446462 &
[3] 14804
Linux# ./a.out 1446462 &
[4] 14807

Linux# cat /proc/14802/limits | grep core
Max core file size        1446462              1446462              bytes

Linux# killall -QUIT a.out

Linux# ls -l
total 15708
-rwxr-xr-x 1 root root     4624 Aug  1 18:28 a.out
-rw------- 1 root root 12013568 Aug  1 18:39 core.14802         <=== truncated core
-rw------- 1 root root 12017664 Aug  1 18:39 core.14803
-rw------- 1 root root 12013568 Aug  1 18:39 core.14804         <=== truncated core
-rw------- 1 root root 12017664 Aug  1 18:39 core.14807
[1]   Quit                    (core dumped) ./a.out 1446462
[2]   Quit                    (core dumped) ./a.out 1446462
[3]   Quit                    (core dumped) ./a.out 1446462
[4]   Quit                    (core dumped) ./a.out 1446462

So multiple things happened here. I set the hard limit for each process to be about 1.4 MB.

  1. The core files produced well exceed this set limit. Why?
  2. And 2 of the 4 core file produced are truncated, but by exactly 4096 bytes. What's going on here?

I know the core file contains, among other things, the full stack and heap memory allocated. Shouldn't that be pretty much constant for such a simple program (give or take a few bytes at the most), hence producing a consistent core between multiple instances?


1 The requested output of du

Linux# du core.*
1428    core.14802
1428    core.14803
1428    core.14804
1428    core.14807

Linux# du -b core.*
12013568    core.14802
12017664    core.14803
12013568    core.14804
12017664    core.14807

2 Adding memset() after malloc() definitely reigned things in, in that the core file are now all truncated to 1449984 (still 3522 bytes over the limit).

So why were the cores so big before, what did they contain? Whatever it was, it wasn't subjected to the process' limits.

3 The new program shows some interesting behaviour as well:

Linux# ./a.out 12017664 &
[1] 26586
Linux# ./a.out 12017664 &
[2] 26589
Linux# ./a.out 12017664 &
[3] 26590
Linux# ./a.out 12017663 &        ## 1 byte smaller
[4] 26653
Linux# ./a.out 12017663 &        ## 1 byte smaller
[5] 26666
Linux# ./a.out 12017663 &        ## 1 byte smaller
[6] 26667

Linux# killall -QUIT a.out

Linux# ls -l
total ..
-rwxr-xr-x 1 root root     4742 Aug  1 19:47 a.out
-rw------- 1 root root 12017664 Aug  1 19:47 core.26586
-rw------- 1 root root 12017664 Aug  1 19:47 core.26589
-rw------- 1 root root 12017664 Aug  1 19:47 core.26590
-rw------- 1 root root  1994752 Aug  1 19:47 core.26653           <== ???
-rw------- 1 root root  9875456 Aug  1 19:47 core.26666           <== ???
-rw------- 1 root root  9707520 Aug  1 19:47 core.26667           <== ???
[1]   Quit                    (core dumped) ./a.out 12017664
[2]   Quit                    (core dumped) ./a.out 12017664
[3]   Quit                    (core dumped) ./a.out 12017664
[4]   Quit                    (core dumped) ./a.out 12017663
[5]   Quit                    (core dumped) ./a.out 12017663
[6]   Quit                    (core dumped) ./a.out 12017663

Answer #1

The implementation of core dumping can be found in fs/binfmt_elf.c. I'll follow the code in 3.12 and above (it changed with commit 9b56d5438) but the logic is very similar.

The code initially decides how much to dump of a VMA (virtual memory area) in vma_dump_size. For an anonymous VMA such as the brk heap, it returns the full size of the VMA. During this step, the core limit is not involved.

The first phase of writing the core dump then writes a PT_LOAD header for each VMA. This is basically a pointer that says where to find the data in the remainder of the ELF file. The actual data is written by a for loop, and is actually a second phase.

During the second phase, elf_core_dump repeatedly calls get_dump_page to get a struct page pointer for each page of the program address space that has to be dumped. get_dump_page is a common utility function found in mm/gup.c. The comment to get_dump_page is helpful:

 * Returns NULL on any kind of failure - a hole must then be inserted into
 * the corefile, to preserve alignment with its headers; and also returns
 * NULL wherever the ZERO_PAGE, or an anonymous pte_none, has been found -
 * allowing a hole to be left in the corefile to save diskspace.

and in fact elf_core_dump calls a function in fs/coredump.c ( dump_seek in your kernel, dump_skip in 3.12+) if get_dump_page returns NULL. This function calls lseek to leave a hole in the dump (actually since this is the kernel it calls file->f_op->llseek directly on a struct file pointer). The main difference is that dump_seek was indeed not obeying the ulimit, while the newer dump_skip does.

As to why the second program has the weird behavior, it's probably because of ASLR (address space randomization). Which VMA is truncated depends on the relative order of the VMAs, which is random. You could try disabling it with

echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

and see if your results are more homogeneous. To reenable ASLR, use

echo 2 | sudo tee /proc/sys/kernel/randomize_va_space