Tuesday, March 5, 2013

Hard Fault Debugging for Cortex-M MCUs

______________________________________________

1. Why HARD_FAULT exception raises?
    1.1. Uncontrolled/unexpected memory accesses.
    1.2. Accesses to unclocked peripherals.
    1.3. Executing Flash commands from code in Flash
2. Finding HARD_FAULT exception.
    2.1 General notes about the code
    2.2 Take special care with....
    2.3 The code
______________________________________________


In this post we are explaining how to debug HARD_FAULT exceptions on Cortex-M processors as well as locating the origin of the exception condition.

There are lots of sites describing this problem, and most of them show exactly the same code.

Of course, the code did not properly compiled for my target processor (Kinetis / Cortex-M4), so I made some small changes in order to get it working.

Finally I have modified the code to compile it using GNU GCC ARM Toolchain for many Cortex-M targets and thus, the code should properly compile using any GCC compatible toolchain (or it is supposed to do it properly).

Before going into the code, let's just say a few words about HARD_FAULT exception causes


1. Why HARD_FAULT exception raises?

Most commom causes raising HARD_FAULT exceptions are the following::

1.1. Uncontrolled/unexpected memory accesses.

This happens due to programming errors that result in uncontrolled accesses to restricted or forbidden memory locations.

On the best situation, HARD_FAULT exception is due to stupid errors which are not noticed by compilation process: like miss-typing the name of variables with simmilar names (i.e. exchanging 'u8aux' and 'u8idx' when both variables are declared in the same context), by array indexes going out of bounds, and things like those....

On the worst case, the exception may be raised due to some issues harder to debug, like wrong return from function or ISR due to stack corruption, etc.

1.2. Accesses to unclocked peripherals.

Some MCUs control clock distribution to each system module or peripheral to reduce power consumption to the bare modules in execution.

In these kind of MCU's you must enable clocking into the peripheral module prior to accessing any peripheral register or be pretty sure that HARD_FAULT exception will raise.

1.3. Executing Flash commands from code in Flash

When the MCU does not support "read while write" flash feature, HARD_FAULT exception may rise when executing flash commands (like sector erasing or writing) from code stored in flash and apparently not affected by the flash operation.

The easiest way to solve this problem is running flash command code from code stored into RAM memory instead of flash. To d this, just set up your linker configuration file to load flash driver code into RAM on startup. (Visit this post to get an example of this issue).


2. Finding HARD_FAULT exception.

The main target of the code shown below is recovering the processor execution context that was present when the exception raised (program counter, stack pointers, register contents, etc.)

In order to understand the code below, we must consider that Cortex-M MCUs process HARD_FAULT exception like a higher priority interrupt, and so, they store the processor execution context into the stack before jumping to HARD_FAUL handler vector. This way, we can extract the execution context that raised the exception from the stack.

The code below declares a HARD_FAULT exception handler "hardfaultHandler" that determines which stack PSP or MSP was active when the exception happened and calls another function "hardfaultGetContext" that extracts the stacked context into local variables.

2.1 General notes about the code

The code shown below has been compiled using GNU GCC ARM Toolchain running from Eclipse IDE.
It successfully compiles without errors targeting the following Cortex-M processors:
  • Cortex-M0
  • Cortex-M0+
  • Cortex-M1
  • Cortex-M3
  • Cortex-M4 (without floating point processor)
  • Cortex-M4 (with software floating point)
  • Cortex-M4 (with hardware floating point processor)

2.2 Take special care with....

There is a special issue included in the code below that I have not found in other blogs nor websites and that took me a couple of painfull days to go through it.

Most hardfault code published in Internet calls hardfaultGetContext function from inline assembler (using unconditional branch to label BL instruction). The following code shows an example of wrong code found on Internet...

// WRONG CODE: The code below generates compiler errors
void hardfaultGetContext(unsigned long* stackedContextPtr)
{
...some code...
}
void __attribute__((naked, interrupt)) hardfaultHandler(void)
{
__asm__ volatile (
...some asm code...
" BL hardfaultGetContext \n"
or
" BL _hardfaultGetContext \n"
...some asm code...
:: );
}


By default, the code above issues an assembler error because the assembler does not know the symbol or label  "hardfaultGetContext" nor "_hardfaultGetContext".

To avoid the error you must assign an assembler label to the entry point of "hardfaultGetContext" function and branch to that label from inline assembly. In the following code snippet you can see how to assign an assembler label to a C function...

/*!
* \note The following declaration is mandatory to avoid compiler errors. \n
* In the declaration below, we are assigning the assembler label __label_hardfaultGetContext__
* to the entry point of __hardfaultGetContext__ function
*/
void hardfaultGetContext(unsigned long* stackedContextPtr) asm("label_hardfaultGetContext");
/* The code below calls "hardfaultGetContext" from inline assembler using the assembler label "label_hardfaultGetContext" */
void __attribute__((naked, interrupt)) hardfaultHandler(void)
{
__asm__ volatile (
...some asm code...
" BL label_hardfaultGetContext \n"
...some asm code...
:: );
}


2.3 The code

And finally here you are the complete code... enjoy it!

/*!
* \file cortex_hardfault_handler.c
* \brief The code below implements a mechanism to discover HARD_FAULT sorces in Cortex-M embedded applications.
* \version mcufreaks.blogspot.com
*/
/*!
* \note The following declaration is mandatory to avoid compiler errors. \n
* In the declaration below, we are assigning the assembler label __label_hardfaultGetContext__
* to the entry point of __hardfaultGetContext__ function
*/
void hardfaultGetContext(unsigned long* stackedContextPtr) asm("label_hardfaultGetContext");
/*!
* \fn void hardfaultGetContext(unsigned long* stackedContextPtr)
* \brief Copies system stacked context into function local variables. \n
* This function is called from asm-coded Interrupt Service Routine associated to HARD_FAULT exception
* \param stackedContextPtr : Address of stack containing stacked processor context.
*/
void hardfaultGetContext(unsigned long* stackedContextPtr)
{
volatile unsigned long stacked_r0;
volatile unsigned long stacked_r1;
volatile unsigned long stacked_r2;
volatile unsigned long stacked_r3;
volatile unsigned long stacked_r12;
volatile unsigned long stacked_lr;
volatile unsigned long stacked_pc;
volatile unsigned long stacked_psr;
volatile unsigned long _CFSR;
volatile unsigned long _HFSR;
volatile unsigned long _DFSR;
volatile unsigned long _AFSR;
volatile unsigned long _BFAR;
volatile unsigned long _MMAR;
stacked_r0 = stackedContextPtr[0];
stacked_r1 = stackedContextPtr[1];
stacked_r2 = stackedContextPtr[2];
stacked_r3 = stackedContextPtr[3];
stacked_r12 = stackedContextPtr[4];
stacked_lr = stackedContextPtr[5];
stacked_pc = stackedContextPtr[6];
stacked_psr = stackedContextPtr[7];
// Configurable Fault Status Register
// Consists of MMSR, BFSR and UFSR
_CFSR = (*((volatile unsigned long *)(0xE000ED28))) ;
// Hard Fault Status Register
_HFSR = (*((volatile unsigned long *)(0xE000ED2C))) ;
// Debug Fault Status Register
_DFSR = (*((volatile unsigned long *)(0xE000ED30))) ;
// Auxiliary Fault Status Register
_AFSR = (*((volatile unsigned long *)(0xE000ED3C))) ;
// Read the Fault Address Registers. These may not contain valid values.
// Check BFARVALID/MMARVALID to see if they are valid values
// MemManage Fault Address Register
_MMAR = (*((volatile unsigned long *)(0xE000ED34))) ;
// Bus Fault Address Register
_BFAR = (*((volatile unsigned long *)(0xE000ED38))) ;
__asm("BKPT #0\n") ; // Break into the debugger
// The following code avoids compiler warning [-Wunused-but-set-variable]
stackedContextPtr[0] = stacked_r0;
stackedContextPtr[1] = stacked_r1;
stackedContextPtr[2] = stacked_r2;
stackedContextPtr[3] = stacked_r3;
stackedContextPtr[4] = stacked_r12;
stackedContextPtr[5] = stacked_lr;
stackedContextPtr[6] = stacked_pc;
stackedContextPtr[7] = stacked_psr;
(*((volatile unsigned long *)(0xE000ED28))) = _CFSR;
(*((volatile unsigned long *)(0xE000ED2C))) = _HFSR;
(*((volatile unsigned long *)(0xE000ED30))) = _DFSR;
(*((volatile unsigned long *)(0xE000ED3C))) = _AFSR;
(*((volatile unsigned long *)(0xE000ED34))) = _MMAR;
(*((volatile unsigned long *)(0xE000ED38))) = _BFAR;
}
/*!
* \fn void hardfaultHandler(void)
* \brief HARD_FAULT interrupt service routine. Selects among PSP or MSP stacks and \n
* calls \ref hardfaultGetContext passing the selected stack pointer address as parameter.
* \note __naked__ attribute avoids generating prologue and epilogue code sequences generated \n
* for C-functions, and only pure asm instructions should be included into the function body.
*/
void __attribute__((naked, interrupt)) hardfaultHandler(void)
{
__asm__ volatile (
" MOVS R0, #4 \n" /* Determine if processor uses PSP or MSP by checking bit.4 at LR register. */
" MOV R1, LR \n"
" TST R0, R1 \n"
" BEQ _IS_MSP \n" /* Jump to '_MSP' if processor uses MSP stack. */
" MRS R0, PSP \n" /* Prepare PSP content as parameter to the calling function below. */
" BL label_hardfaultGetContext \n" /* Call 'hardfaultGetContext' passing PSP content as stackedContextPtr value. */
"_IS_MSP: \n"
" MRS R0, MSP \n" /* Prepare MSP content as parameter to the calling function below. */
" BL label_hardfaultGetContext \n" /* Call 'hardfaultGetContext' passing MSP content as stackedContextPtr value. */
:: );
}
//Uhhg...well... this is a dummy main function to test handler functionality
int main()
{
hardfaultHandler();
return 0;
}



1 comment:

  1. Thank you ! Very valuable - I just had an issue with hard fault exception, and no clue what was happening. It resolved by itself (wtf?!? - via cold reset), but if I have it again - your code will help me.

    ReplyDelete