'ARM Cortex M4 silently resets on recursion though fault handlers ARE OK (sometimes)

I recently went through an exercise to implement and test fault handlers on a bare metal ARM Cortex M4 platform. Having implemented handlers, I also wrote a pretty brutal app to trigger various faults and see how the handler responded.

By the way, the chip is a Nordic nRF52832.

One test I wrote was the following function (the call to it is simplified, you actually need to send a command over the main UART to make it kill itself):

static volatile uint32_t dummy = 1;

uint32_t evil_recursive(uint32_t i)
{
  printf("N:%d MSP: 0x%08x", i, (uint32_t *)__get_MSP());
  return evil_recursive( i + dummy);
}

main()
{
  uint32_t i = evil_recursive(1);
}

This was surprisingly difficult to get to misbehave, we needed to use the compiler flag

CFLAGS += -fno-optimize-sibling-calls

to get it to define new stack frames. The compiler seems to unroll the function into loop of 9 calls (you can see MSP decrease by 32 every 9 print statements). But, misbehave it did, after two minutes, with a bus fault. All OK so far.

As printf() here is going to a slow, bitbash, blocking UART, to speed things up, I removed it. (It was just for debug). To my surprise, calling the evil function now causes an immediate reset of the processor, and no fault handler is triggered.

Has anyone any idea why this might be? I would expect this to eat memory and cause some kind of handler to kick in.

MORE INFO : I wondered whether the simple presence of the print statement was changing the way things compile, or the behaviour. So I tried this, which runs much faster but contains basically the same call:

uint32_t evil_recursive(uint32_t i)

    {
      static char c = 'A';
      debug_uart_putc(c);
      c++;
      if (c > 'Z') c = 'A';
      return evil_recursive( i + dummy);
    }

This also resets without triggering any handler.

LATER : this is how the two fn's have been compiled.

First the one that DOES trigger the handler:

00026678 <evil_recursive>:
   26678:   b5f0        push    {r4, r5, r6, r7, lr}
   2667a:   4e4c        ldr r6, [pc, #304]  ; (267ac <evil_recursive+0x134>)
   2667c:   b083        sub sp, #12
   2667e:   4604        mov r4, r0
   26680:   212f        movs    r1, #47 ; 0x2f
   26682:   4630        mov r0, r6
   26684:   f004 fb8f   bl  2ada6 <strrchr>
   26688:   f3ef 8308   mrs r3, MSP
   2668c:   4f48        ldr r7, [pc, #288]  ; (267b0 <evil_recursive+0x138>)
   2668e:   4d49        ldr r5, [pc, #292]  ; (267b4 <evil_recursive+0x13c>)
   26690:   9301        str r3, [sp, #4]
   26692:   1c41        adds    r1, r0, #1
   26694:   463b        mov r3, r7
   26696:   9400        str r4, [sp, #0]
   26698:   223c        movs    r2, #60 ; 0x3c
   2669a:   2000        movs    r0, #0
   2669c:   f004 fa14   bl  2aac8 <project_log>
   266a0:   682b        ldr r3, [r5, #0]
   266a2:   212f        movs    r1, #47 ; 0x2f
   266a4:   4630        mov r0, r6
   266a6:   441c        add r4, r3
   266a8:   f004 fb7d   bl  2ada6 <strrchr>
   266ac:   f3ef 8308   mrs r3, MSP
   266b0:   223c        movs    r2, #60 ; 0x3c
   266b2:   9301        str r3, [sp, #4]
   266b4:   1c41        adds    r1, r0, #1
   266b6:   9400        str r4, [sp, #0]
   266b8:   463b        mov r3, r7
   266ba:   2000        movs    r0, #0
   266bc:   f004 fa04   bl  2aac8 <project_log>
   266c0:   682b        ldr r3, [r5, #0]
   266c2:   212f        movs    r1, #47 ; 0x2f
   266c4:   4630        mov r0, r6
   266c6:   441c        add r4, r3
   266c8:   f004 fb6d   bl  2ada6 <strrchr>
   266cc:   f3ef 8308   mrs r3, MSP
   266d0:   223c        movs    r2, #60 ; 0x3c
   266d2:   9301        str r3, [sp, #4]
   266d4:   1c41        adds    r1, r0, #1
   266d6:   9400        str r4, [sp, #0]
   266d8:   463b        mov r3, r7
   266da:   2000        movs    r0, #0
   266dc:   f004 f9f4   bl  2aac8 <project_log>
   266e0:   682b        ldr r3, [r5, #0]
   266e2:   212f        movs    r1, #47 ; 0x2f
   266e4:   4630        mov r0, r6
   266e6:   441c        add r4, r3
   266e8:   f004 fb5d   bl  2ada6 <strrchr>
   266ec:   f3ef 8308   mrs r3, MSP
   266f0:   223c        movs    r2, #60 ; 0x3c
   266f2:   9301        str r3, [sp, #4]
   266f4:   1c41        adds    r1, r0, #1
   266f6:   9400        str r4, [sp, #0]
   266f8:   463b        mov r3, r7
   266fa:   2000        movs    r0, #0
   266fc:   f004 f9e4   bl  2aac8 <project_log>
   26700:   682b        ldr r3, [r5, #0]
   26702:   212f        movs    r1, #47 ; 0x2f
   26704:   4630        mov r0, r6
   26706:   441c        add r4, r3
   26708:   f004 fb4d   bl  2ada6 <strrchr>
   2670c:   f3ef 8308   mrs r3, MSP
   26710:   223c        movs    r2, #60 ; 0x3c
   26712:   9301        str r3, [sp, #4]
   26714:   1c41        adds    r1, r0, #1
   26716:   9400        str r4, [sp, #0]
   26718:   463b        mov r3, r7
   2671a:   2000        movs    r0, #0
   2671c:   f004 f9d4   bl  2aac8 <project_log>
   26720:   682b        ldr r3, [r5, #0]
   26722:   212f        movs    r1, #47 ; 0x2f
   26724:   4630        mov r0, r6
   26726:   441c        add r4, r3
   26728:   f004 fb3d   bl  2ada6 <strrchr>
   2672c:   f3ef 8308   mrs r3, MSP
   26730:   223c        movs    r2, #60 ; 0x3c
   26732:   9301        str r3, [sp, #4]
   26734:   1c41        adds    r1, r0, #1
   26736:   9400        str r4, [sp, #0]
   26738:   463b        mov r3, r7
   2673a:   2000        movs    r0, #0
   2673c:   f004 f9c4   bl  2aac8 <project_log>
   26740:   682b        ldr r3, [r5, #0]
   26742:   212f        movs    r1, #47 ; 0x2f
   26744:   4630        mov r0, r6
   26746:   441c        add r4, r3
   26748:   f004 fb2d   bl  2ada6 <strrchr>
   2674c:   f3ef 8308   mrs r3, MSP
   26750:   223c        movs    r2, #60 ; 0x3c
   26752:   9301        str r3, [sp, #4]
   26754:   1c41        adds    r1, r0, #1
   26756:   9400        str r4, [sp, #0]
   26758:   463b        mov r3, r7
   2675a:   2000        movs    r0, #0
   2675c:   f004 f9b4   bl  2aac8 <project_log>
   26760:   682b        ldr r3, [r5, #0]
   26762:   212f        movs    r1, #47 ; 0x2f
   26764:   4630        mov r0, r6
   26766:   441c        add r4, r3
   26768:   f004 fb1d   bl  2ada6 <strrchr>
   2676c:   f3ef 8308   mrs r3, MSP
   26770:   223c        movs    r2, #60 ; 0x3c
   26772:   9301        str r3, [sp, #4]
   26774:   1c41        adds    r1, r0, #1
   26776:   9400        str r4, [sp, #0]
   26778:   463b        mov r3, r7
   2677a:   2000        movs    r0, #0
   2677c:   f004 f9a4   bl  2aac8 <project_log>
   26780:   682b        ldr r3, [r5, #0]
   26782:   4630        mov r0, r6
   26784:   212f        movs    r1, #47 ; 0x2f
   26786:   441c        add r4, r3
   26788:   f004 fb0d   bl  2ada6 <strrchr>
   2678c:   f3ef 8308   mrs r3, MSP
   26790:   223c        movs    r2, #60 ; 0x3c
   26792:   1c41        adds    r1, r0, #1
   26794:   e9cd 4300   strd    r4, r3, [sp]
   26798:   2000        movs    r0, #0
   2679a:   463b        mov r3, r7
   2679c:   f004 f994   bl  2aac8 <project_log>
   267a0:   6828        ldr r0, [r5, #0]
   267a2:   4420        add r0, r4
   267a4:   f7ff ff68   bl  26678 <evil_recursive>
   267a8:   b003        add sp, #12
   267aa:   bdf0        pop {r4, r5, r6, r7, pc}
   267ac:   0002bd28    .word   0x0002bd28
   267b0:   0002bd3c    .word   0x0002bd3c
   267b4:   20003310    .word   0x20003310

and here is the one that causes the reset (with no printing at all):

00026678 <evil_recursive>:
   26678:   b510        push    {r4, lr}
   2667a:   4a0b        ldr r2, [pc, #44]   ; (266a8 <evil_recursive+0x30>)
   2667c:   6813        ldr r3, [r2, #0]
   2667e:   6811        ldr r1, [r2, #0]
   26680:   440b        add r3, r1
   26682:   6811        ldr r1, [r2, #0]
   26684:   6814        ldr r4, [r2, #0]
   26686:   440b        add r3, r1
   26688:   6811        ldr r1, [r2, #0]
   2668a:   4423        add r3, r4
   2668c:   440b        add r3, r1
   2668e:   6811        ldr r1, [r2, #0]
   26690:   440b        add r3, r1
   26692:   6811        ldr r1, [r2, #0]
   26694:   440b        add r3, r1
   26696:   6811        ldr r1, [r2, #0]
   26698:   6812        ldr r2, [r2, #0]
   2669a:   440b        add r3, r1
   2669c:   4413        add r3, r2
   2669e:   4418        add r0, r3
   266a0:   f7ff ffea   bl  26678 <evil_recursive>
   266a4:   bd10        pop {r4, pc}
   266a6:   bf00        nop
   266a8:   20003310    .word   0x20003310

obviously it is a lot smaller. I don't know much ARM assembler, I guess I am going to learn some ...



Solution 1:[1]

Your second disassembly function is recursing with a stack frame of 8 bytes for every 9 add operations. It should overflow the stack exactly as you expect, there is no compiler funnybusiness apart from unrolling the loop.

I would look at what is next in memory below the stack. Look at the linker map output and the memory map in the datasheet and see if anything is assigned there. Try creating a function that just reads or writes from one or more bytes or words below that known address directly and see what happens.

The first disassembly has a bigger frame and so might be clobbering a different address. A read or write to an unknown address doesn't always guarantee a bus fault.

The other thing I would think about is, is the nordic "softdevice" messing with you? nRF52 isn't really quite bare metal if you have that thing enabled.

Solution 2:[2]

So it seems that a real stack error was causing a lockup condition, probably because function calls within the handler could not work with no stack. If a fault happens within a fault handler, this is what happens. You can test if this is going on like this:

uint32_t reset_reason = nrf_power_resetreas_get();
printf("RESET REASON: 0x%08x", reset_reason);
if (reset_reason & POWER_RESETREAS_RESETPIN_Msk ) printf("  - reset pin");
if (reset_reason & POWER_RESETREAS_DOG_Msk ) printf("  - watchdog");
if (reset_reason & POWER_RESETREAS_SREQ_Msk ) printf("  - soft reset");
if (reset_reason & POWER_RESETREAS_LOCKUP_Msk ) printf("  - lockup");
if (reset_reason & POWER_RESETREAS_OFF_Msk ) printf("  - GPIO wakeup");
if (reset_reason & POWER_RESETREAS_LPCOMP_Msk ) printf("  - LPCOMP wakeup");
if (reset_reason & POWER_RESETREAS_DIF_Msk ) printf("  - DEBUG wakeup");
if (reset_reason & POWER_RESETREAS_NFC_Msk ) printf("  - NFC wakeup");

and it seems like a good idea to implement such a check on boot. Quite why the version with the longer print statement causes a bus error first, I don't quite know, but at least this explains the reset. To have this device handle a stack overflow well, we would probably have to do something with the MMU so that an error is raised when the stack is dangerously low, but before it actually corrupts other areas of memory.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tom V
Solution 2 danmcb