'C fibers crashing on printf

I am in the process of creating a fiber threading system in C, following https://graphitemaster.github.io/fibers/ . I have a function to set and restore context, and what i am trying to accomplish is launching a function as a fiber with its own stack. Linux, x86_64 SysV ABI.

extern void restore_context(struct fiber_context*);
extern void create_context(struct fiber_context*);

void foo_fiber()
{
    printf("Called as a fiber");
    exit(0);
}

int main()
{
    const uint32_t stack_size = 4096 * 16;
    const uint32_t red_zone_abi = 128;

    char* stack = aligned_alloc(16, stack_size);
    char* sp = stack + stack_size - red_zone_abi;

    struct fiber_context c = {0};
    c.rip = (void*)foo_fiber;
    c.rsp = (void*)sp;

    restore_context(&c);
}

where restore_context code is as follows:

.type restore_context, @function
.global restore_context
restore_context:
  movq 8*0(%rdi), %r8

  # Load new stack pointer.
  movq 8*1(%rdi), %rsp

  # Load preserved registers.
  movq 8*2(%rdi), %rbx
  movq 8*3(%rdi), %rbp
  movq 8*4(%rdi), %r12
  movq 8*5(%rdi), %r13
  movq 8*6(%rdi), %r14
  movq 8*7(%rdi), %r15

  # Push RIP to stack for RET.
  pushq %r8

  xorl %eax, %eax
  ret

So basically i am creating a new stack on the heap, and since the stack growns downwards, i take the end address - 128 bytes of red zone (which is necessary in the ABI). What restore_context does is simply swap %rsp to my new stack, and push address of foo_fiber onto it and then ret's to jump into foo_fiber. (it also loads some registers from fiber_context structure, but it should not matter now).

From what im seeing in GDB, the program manages to properly jump to foo_fiber and into printf, and then it crashes in _vprintf_internal on movaps %xmm1, 0x10(%rsp).

|  0x7ffff7e2f389 <__vfprintf_internal+153>        movdqu (%rax),%xmm1                                                                                                                                                    │
│  0x7ffff7e2f38d <__vfprintf_internal+157>        movups %xmm1,0x128(%rsp)                                                                                                                                               │
│  0x7ffff7e2f395 <__vfprintf_internal+165>        mov    0x10(%rax),%rax                                                                                                                                                 │
│  >0x7ffff7e2f399 <__vfprintf_internal+169>       movaps %xmm1,0x10(%rsp)  

I find that extremely odd since it managed movups %xmm1, 0x128(%rsp) so a much higher offset from stack pointer. What is going on there?

If i change the code of foo_fiber to do something else, for example allocate and randomly fill char[100], it works.

I am kind of at loss about what is going on. At first i thought i might have alignment issues, since the vector xmm functions are crashing, so I changed malloc to aligned_alloc. The crash i am getting is a SIGSEGV, but 0x10



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source