'ARM Cortex M4 silently resets on recursion though fault handlers ARE OK (sometimes)
I recently went through an exercise to implement and test fault handlers on a bare metal ARM Cortex M4 platform. Having implemented handlers, I also wrote a pretty brutal app to trigger various faults and see how the handler responded.
By the way, the chip is a Nordic nRF52832.
One test I wrote was the following function (the call to it is simplified, you actually need to send a command over the main UART to make it kill itself):
static volatile uint32_t dummy = 1;
uint32_t evil_recursive(uint32_t i)
{
printf("N:%d MSP: 0x%08x", i, (uint32_t *)__get_MSP());
return evil_recursive( i + dummy);
}
main()
{
uint32_t i = evil_recursive(1);
}
This was surprisingly difficult to get to misbehave, we needed to use the compiler flag
CFLAGS += -fno-optimize-sibling-calls
to get it to define new stack frames. The compiler seems to unroll the function into loop of 9 calls (you can see MSP decrease by 32 every 9 print statements). But, misbehave it did, after two minutes, with a bus fault. All OK so far.
As printf() here is going to a slow, bitbash, blocking UART, to speed things up, I removed it. (It was just for debug). To my surprise, calling the evil function now causes an immediate reset of the processor, and no fault handler is triggered.
Has anyone any idea why this might be? I would expect this to eat memory and cause some kind of handler to kick in.
MORE INFO : I wondered whether the simple presence of the print statement was changing the way things compile, or the behaviour. So I tried this, which runs much faster but contains basically the same call:
uint32_t evil_recursive(uint32_t i)
{
static char c = 'A';
debug_uart_putc(c);
c++;
if (c > 'Z') c = 'A';
return evil_recursive( i + dummy);
}
This also resets without triggering any handler.
LATER : this is how the two fn's have been compiled.
First the one that DOES trigger the handler:
00026678 <evil_recursive>:
26678: b5f0 push {r4, r5, r6, r7, lr}
2667a: 4e4c ldr r6, [pc, #304] ; (267ac <evil_recursive+0x134>)
2667c: b083 sub sp, #12
2667e: 4604 mov r4, r0
26680: 212f movs r1, #47 ; 0x2f
26682: 4630 mov r0, r6
26684: f004 fb8f bl 2ada6 <strrchr>
26688: f3ef 8308 mrs r3, MSP
2668c: 4f48 ldr r7, [pc, #288] ; (267b0 <evil_recursive+0x138>)
2668e: 4d49 ldr r5, [pc, #292] ; (267b4 <evil_recursive+0x13c>)
26690: 9301 str r3, [sp, #4]
26692: 1c41 adds r1, r0, #1
26694: 463b mov r3, r7
26696: 9400 str r4, [sp, #0]
26698: 223c movs r2, #60 ; 0x3c
2669a: 2000 movs r0, #0
2669c: f004 fa14 bl 2aac8 <project_log>
266a0: 682b ldr r3, [r5, #0]
266a2: 212f movs r1, #47 ; 0x2f
266a4: 4630 mov r0, r6
266a6: 441c add r4, r3
266a8: f004 fb7d bl 2ada6 <strrchr>
266ac: f3ef 8308 mrs r3, MSP
266b0: 223c movs r2, #60 ; 0x3c
266b2: 9301 str r3, [sp, #4]
266b4: 1c41 adds r1, r0, #1
266b6: 9400 str r4, [sp, #0]
266b8: 463b mov r3, r7
266ba: 2000 movs r0, #0
266bc: f004 fa04 bl 2aac8 <project_log>
266c0: 682b ldr r3, [r5, #0]
266c2: 212f movs r1, #47 ; 0x2f
266c4: 4630 mov r0, r6
266c6: 441c add r4, r3
266c8: f004 fb6d bl 2ada6 <strrchr>
266cc: f3ef 8308 mrs r3, MSP
266d0: 223c movs r2, #60 ; 0x3c
266d2: 9301 str r3, [sp, #4]
266d4: 1c41 adds r1, r0, #1
266d6: 9400 str r4, [sp, #0]
266d8: 463b mov r3, r7
266da: 2000 movs r0, #0
266dc: f004 f9f4 bl 2aac8 <project_log>
266e0: 682b ldr r3, [r5, #0]
266e2: 212f movs r1, #47 ; 0x2f
266e4: 4630 mov r0, r6
266e6: 441c add r4, r3
266e8: f004 fb5d bl 2ada6 <strrchr>
266ec: f3ef 8308 mrs r3, MSP
266f0: 223c movs r2, #60 ; 0x3c
266f2: 9301 str r3, [sp, #4]
266f4: 1c41 adds r1, r0, #1
266f6: 9400 str r4, [sp, #0]
266f8: 463b mov r3, r7
266fa: 2000 movs r0, #0
266fc: f004 f9e4 bl 2aac8 <project_log>
26700: 682b ldr r3, [r5, #0]
26702: 212f movs r1, #47 ; 0x2f
26704: 4630 mov r0, r6
26706: 441c add r4, r3
26708: f004 fb4d bl 2ada6 <strrchr>
2670c: f3ef 8308 mrs r3, MSP
26710: 223c movs r2, #60 ; 0x3c
26712: 9301 str r3, [sp, #4]
26714: 1c41 adds r1, r0, #1
26716: 9400 str r4, [sp, #0]
26718: 463b mov r3, r7
2671a: 2000 movs r0, #0
2671c: f004 f9d4 bl 2aac8 <project_log>
26720: 682b ldr r3, [r5, #0]
26722: 212f movs r1, #47 ; 0x2f
26724: 4630 mov r0, r6
26726: 441c add r4, r3
26728: f004 fb3d bl 2ada6 <strrchr>
2672c: f3ef 8308 mrs r3, MSP
26730: 223c movs r2, #60 ; 0x3c
26732: 9301 str r3, [sp, #4]
26734: 1c41 adds r1, r0, #1
26736: 9400 str r4, [sp, #0]
26738: 463b mov r3, r7
2673a: 2000 movs r0, #0
2673c: f004 f9c4 bl 2aac8 <project_log>
26740: 682b ldr r3, [r5, #0]
26742: 212f movs r1, #47 ; 0x2f
26744: 4630 mov r0, r6
26746: 441c add r4, r3
26748: f004 fb2d bl 2ada6 <strrchr>
2674c: f3ef 8308 mrs r3, MSP
26750: 223c movs r2, #60 ; 0x3c
26752: 9301 str r3, [sp, #4]
26754: 1c41 adds r1, r0, #1
26756: 9400 str r4, [sp, #0]
26758: 463b mov r3, r7
2675a: 2000 movs r0, #0
2675c: f004 f9b4 bl 2aac8 <project_log>
26760: 682b ldr r3, [r5, #0]
26762: 212f movs r1, #47 ; 0x2f
26764: 4630 mov r0, r6
26766: 441c add r4, r3
26768: f004 fb1d bl 2ada6 <strrchr>
2676c: f3ef 8308 mrs r3, MSP
26770: 223c movs r2, #60 ; 0x3c
26772: 9301 str r3, [sp, #4]
26774: 1c41 adds r1, r0, #1
26776: 9400 str r4, [sp, #0]
26778: 463b mov r3, r7
2677a: 2000 movs r0, #0
2677c: f004 f9a4 bl 2aac8 <project_log>
26780: 682b ldr r3, [r5, #0]
26782: 4630 mov r0, r6
26784: 212f movs r1, #47 ; 0x2f
26786: 441c add r4, r3
26788: f004 fb0d bl 2ada6 <strrchr>
2678c: f3ef 8308 mrs r3, MSP
26790: 223c movs r2, #60 ; 0x3c
26792: 1c41 adds r1, r0, #1
26794: e9cd 4300 strd r4, r3, [sp]
26798: 2000 movs r0, #0
2679a: 463b mov r3, r7
2679c: f004 f994 bl 2aac8 <project_log>
267a0: 6828 ldr r0, [r5, #0]
267a2: 4420 add r0, r4
267a4: f7ff ff68 bl 26678 <evil_recursive>
267a8: b003 add sp, #12
267aa: bdf0 pop {r4, r5, r6, r7, pc}
267ac: 0002bd28 .word 0x0002bd28
267b0: 0002bd3c .word 0x0002bd3c
267b4: 20003310 .word 0x20003310
and here is the one that causes the reset (with no printing at all):
00026678 <evil_recursive>:
26678: b510 push {r4, lr}
2667a: 4a0b ldr r2, [pc, #44] ; (266a8 <evil_recursive+0x30>)
2667c: 6813 ldr r3, [r2, #0]
2667e: 6811 ldr r1, [r2, #0]
26680: 440b add r3, r1
26682: 6811 ldr r1, [r2, #0]
26684: 6814 ldr r4, [r2, #0]
26686: 440b add r3, r1
26688: 6811 ldr r1, [r2, #0]
2668a: 4423 add r3, r4
2668c: 440b add r3, r1
2668e: 6811 ldr r1, [r2, #0]
26690: 440b add r3, r1
26692: 6811 ldr r1, [r2, #0]
26694: 440b add r3, r1
26696: 6811 ldr r1, [r2, #0]
26698: 6812 ldr r2, [r2, #0]
2669a: 440b add r3, r1
2669c: 4413 add r3, r2
2669e: 4418 add r0, r3
266a0: f7ff ffea bl 26678 <evil_recursive>
266a4: bd10 pop {r4, pc}
266a6: bf00 nop
266a8: 20003310 .word 0x20003310
obviously it is a lot smaller. I don't know much ARM assembler, I guess I am going to learn some ...
Solution 1:[1]
Your second disassembly function is recursing with a stack frame of 8 bytes for every 9 add operations. It should overflow the stack exactly as you expect, there is no compiler funnybusiness apart from unrolling the loop.
I would look at what is next in memory below the stack. Look at the linker map output and the memory map in the datasheet and see if anything is assigned there. Try creating a function that just reads or writes from one or more bytes or words below that known address directly and see what happens.
The first disassembly has a bigger frame and so might be clobbering a different address. A read or write to an unknown address doesn't always guarantee a bus fault.
The other thing I would think about is, is the nordic "softdevice" messing with you? nRF52 isn't really quite bare metal if you have that thing enabled.
Solution 2:[2]
So it seems that a real stack error was causing a lockup condition, probably because function calls within the handler could not work with no stack. If a fault happens within a fault handler, this is what happens. You can test if this is going on like this:
uint32_t reset_reason = nrf_power_resetreas_get();
printf("RESET REASON: 0x%08x", reset_reason);
if (reset_reason & POWER_RESETREAS_RESETPIN_Msk ) printf(" - reset pin");
if (reset_reason & POWER_RESETREAS_DOG_Msk ) printf(" - watchdog");
if (reset_reason & POWER_RESETREAS_SREQ_Msk ) printf(" - soft reset");
if (reset_reason & POWER_RESETREAS_LOCKUP_Msk ) printf(" - lockup");
if (reset_reason & POWER_RESETREAS_OFF_Msk ) printf(" - GPIO wakeup");
if (reset_reason & POWER_RESETREAS_LPCOMP_Msk ) printf(" - LPCOMP wakeup");
if (reset_reason & POWER_RESETREAS_DIF_Msk ) printf(" - DEBUG wakeup");
if (reset_reason & POWER_RESETREAS_NFC_Msk ) printf(" - NFC wakeup");
and it seems like a good idea to implement such a check on boot. Quite why the version with the longer print statement causes a bus error first, I don't quite know, but at least this explains the reset. To have this device handle a stack overflow well, we would probably have to do something with the MMU so that an error is raised when the stack is dangerously low, but before it actually corrupts other areas of memory.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Tom V |
| Solution 2 | danmcb |
