'Why data segment starts at a non-page boundary?

I am trying to understand the relation between a compiled program in Linux and how it is loaded in the main memory.

I understand that when a program is loaded in memory, all its virtual pages go in some 'page frames' of the main memory.

Below is the snippet of readelf output for my program.

readelf --segments a.out

Elf file type is DYN (Shared object file)
Entry point 0x1060
There are 13 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
......

  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000600 0x0000000000000600  R      0x1000
  LOAD           0x0000000000001000 0x0000000000001000 0x0000000000001000
                 0x0000000000000fc5 0x0000000000000fc5  R E    0x1000
  LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
                 0x0000000000000190 0x0000000000000190  R      0x1000
  LOAD           0x0000000000002db8 0x0000000000003db8 0x0000000000003db8
                 0x00000000000002cc 0x00000000000002d0  RW     0x1000

.......

Here,

(1) First segment (R) starts at the virtual address '0x0000000000000000' and has size '0x600'. Consumes 1 Page.

(2) Second segment (R,E, text segment) starts at the virtual address '0x0000000000001000' and has size '0xfc5'. Consumes 1 Page.

(3) Third segment (R) starts at the virtual address '0x0000000000002000' and has size '0x190'. Consumes 1 Page.

(4) Fourth segment (RW, data segment) stars at address '0x0000000000003db8', why?

The fourth segment should have started at '0x0000000000003000' as the third segment has size of only '0x190' bytes and also the Align for this segment is 0x1000(4096), which is a page boundary.

LOAD           0x0000000000002db8 0x0000000000003db8 0x0000000000003db8
                 0x00000000000002cc 0x00000000000002d0  RW     0x1000

After loading this program, the physical memory mappings are,

cat /proc/4125/maps
56028da61000-56028da62000 r--p 00000000 08:05 1442315 a.out
56028da62000-56028da63000 r-xp 00001000 08:05 1442315 a.out
56028da63000-56028da64000 r--p 00002000 08:05 1442315 a.out
56028da64000-56028da65000 r--p 00002000 08:05 1442315 a.out
56028da65000-56028da66000 rw-p 00003000 08:05 1442315 a.out


Solution 1:[1]

The linker has put segment 4 starting in a page shared with segment 3, to save 0x0248 bytes of file space. That page is to be mapped into virtual memory in two different places, once at relative address 0x2000 (read-only) and again at relative address 0x3000 (read-write). Since the zeroth page of virtual memory stays unmapped to catch null pointer dereferences, the base address for the program is 0x1000 and the program's virtual memory will contain:

0x3000 - 0x3190: read-only version of segment 3
0x3db8 - 0x4000: read-only version of part of segment 4 (will not be used)
0x4000 - 0x4190: read-write version of segment 3 (will not be used)
0x4db8 - 0x5084: read-write version of segment 4

Now, even though I said the page at 0x4000 would be read-write, the output from /proc/NNN/maps shows it as read-only. It looks like something in the C startup code actually calls mprotect at runtime to change the permissions on that page; you can see it happening with strace.

If you dump the section headers (not segment), you should see that the address range 0x4db8-0x5000 corresponds to the sections .init_array, .fini_array, .dynamic and .got, with the normal writable .data section starting at 0x5000. I haven't looked into it, but I presume that for some reason .init_array and such need to be writable for initialization, but can be read-only during the rest of the program's execution.


The alignment field doesn't refer to the alignment of the first byte of segment 4. Rather, the starting address of segment 4 is rounded down to a multiple of the alignment (to 0x3000), and that address from the file is placed in virtual memory aligned to 0x1000 bytes. As explained at What is p_align in elf header?, the rule is that Offset and VirtAddr must be congruent mod Align. Which they are: when either one is divided by 0x1000, the remainder is 0x02db.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1