Wednesday 20 January 2016

How to debug stack corruption on FreeRTOS

When the stack is corrupted, usually the value from the overwritten LR is stored into the PC when function returns. In that case, processor enters the exception handler because it cannot execute code from let's say "0xff80dddd" or other garbage address. When debugging such issues, you usually get following useless backtrace:

(gdb) bt
#0  _hang () at startup/startup.s:136
#1  <signal handler called>
#2  0x0001c918 in prvPortStartFirstTask () at portable/GCC/ARM_CM4F/port.c:303
#3  0x0001ca02 in xPortStartScheduler () at portable/GCC/ARM_CM4F/port.c:395
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

How to approach such issues?

Let's do some assumptions before going further:
  • The processor is 32 bit ARM.
  • You're using GCC toolchain.
  • OS is the FreeRTOS.
  • The stack is configured to grow downwards (standard way).
  • You don't have memory dumping mechanism and/or true post-mortem analysis tools/scripts.
  • You can catch the exception using GDB.

Having in mind above assumptions, this is what I do to find the root cause:

1. Connect through GDB and wait until bug reproduces.

2. When GDB catches the exception, print current task name:

(gdb) p pxCurrentTCB->pcTaskName
$6 = "Bug task\000\000"

3. Find in the source code what stack size was allocated for this task. Example:

#define BUG_TASK_SIZE 256
xTaskCreate(bug_task, "BUG", BUG_TASK_SIZE, NULL, 1, NULL)

To find out how many bytes are reserved for this task's stack in FreeRTOS, the "BUG_TASK_SIZE" value must be multiplied by word size. The 32 bit ARMs have 4 byte word size, so actual stack size is 256*4 = 1kB.

4. Find the lowest possible stack address:

(gdb) p pxCurrentTCB->pxStack
$7 = (StackType_t *) 0x20010400

5. Add the stack size to get stack range:

0x20010400 + 0x400 (1kB) = 0x20010800;

The stack of this task is between 0x20010400 and 0x20010800.

6. Read current top of the stack:

(gdb) p pxCurrentTCB->;pxTopOfStack
$8 = (volatile StackType_t *) 0x2001073c

So far, we get:

0x20010800   <- beginning of stack
|
|
0x2001073c   <- current top of stack
...
0x20010400   <- end of stack

7. Calculate how many bytes of stack was used:

0x20010800 - 0x2001073c = 0xC4 (196 bytes which are 49 words)

8. Dump the stack:

(gdb) x/49wx 0x2001073c
0x2001073c:     0x2000e41c      0x200107a8      0x2000e3f8      0x00000000
0x2001074c:     0x00000000      0x20010758      0x20010758      0x00000000
0x2001075c:     0x00000001      0x200107a8      0x00000000      0x00000000
0x2001076c:     0x00000000      0x20015f90      0x2001626c      0x0102fea9
0x2001077c:     0x20015ed4      0x0000000a      0x20010798      0x20010798
0x2001078c:     0x00021869      0x00020cf8      0x01000000      0x0102fea9
0x2001079c:     0x00000000      0x00000000      0x00000000      0x200107b0
0x200107ac:     0x00020e89      0x200107b8      0x00020f99      0x00020e71
0x200107bc:     0x00020e81      0x200107c8      0x00020bf5      0x00020e71
0x200107cc:     0x00020e81      0x00000000      0x00000000      0x00020f71
0x200107dc:     0x02000000      0x200107e8      0x0002101b      0x00020e71
0x200107ec:     0x00020e81      0x00000000      0x0001c8a5      0x00000000
0x200107fc:     0x00000000

9. Pass it to arm-none-eabi-addr2line:

You can pass each value one by one, create some sort of script or format it as one column and just paste:

arm-none-eabi-addr2line -e <path-to-elf>
<paste stack data>
0x2000e41c
0x200107a8
0x2000e3f8
0x00000000
0x00000000
0x20010758
0x20010758
0x00000000
0x00000001
0x200107a8
0x00000000
0x00000000
0x00000000
0x20015f90
0x2001626c
0x0102fea9
0x20015ed4
0x0000000a
0x20010798
0x20010798
0x00021869
0x00020cf8
0x01000000
0x0102fea9
0x00000000
0x00000000
0x00000000
0x200107b0
0x00020e89
0x200107b8
0x00020f99
0x00020e71
0x00020e81
0x200107c8
0x00020bf5
0x00020e71
0x00020e81
0x00000000
0x00000000
0x00020f71
0x02000000
0x200107e8
0x0002101b
0x00020e71
0x00020e81
0x00000000
0x0001c8a5
0x00000000
0x00000000

addr2line tool will try to parse each value as code address. Some of those addresses are data, so you'll get a garbage that you can ignore. Other lines, that match one of your source files, will be printed with specific line number. All in all, as an output you'll get something like this:

heap_4.c:?
heap_4.c:?
:?
:?
heap_4.c:?
heap_4.c:?
:?
:?
heap_4.c:?
:?
:?
:?
zzzz_sd.c:?
zzzz_sd.c:?
??:0
main.c:?
:?
heap_4.c:?
heap_4.c:?
/home/yyy/devel/xxx/app/zz/src/zzzz_sd.c:331   <<--- CHECK THIS LINE
heap_4.c:?
/home/yyy/devel/xxx/app/zzz/src/zzzz_sd.c:395
/home/yyy/devel/xxx/app/zzz/src/zzzz_sd.c:324
/home/yyy/devel/xxx/app/zzz/src/zzzz_sd.c:329
heap_4.c:?
/home/yyy/devel/xxx/app/zzz/src/zzzz_sd.c:227
/home/yyy/devel/xxx/app/zzz/src/zzzz_sd.c:324
/home/yyy/devel/xxx/app/zzz/src/zzzz_sd.c:329
:?
:?
/home/yyy/devel/xxx/app/zzz/src/zzz_sd.c:387
??:0
heap_4.c:?
/home/yyy/devel/xxx/app/zzz/src/zzz_sd.c:422
/home/yyy/devel/xxx/app/zzz/src/zzz_sd.c:324
/home/kowyyyalmic/devel/xxx/app/zzz/src/zzz_sd.c:329
:?
/home/kowalmic/devel/xxx/portable/GCC/ARM_CM4F/port.c:269

9. Analyze:

Now, at this point there are no strict rules how to proceed. However, check the source lines parsed by addr2line tool (note, the addresses are LR values, not PC) and this should point you as close as possible to the offending function. There are pretty good chances that the top-most parsed source line is just after some kind of wrongly used memcpy/memset.

For example, one of the last LRs put on stack was parsed by addr2line as:

/home/yyy/devel/xxx/app/zz/src/zzzz_sd.c:331

If you check the source line of zzzz_sd.c you can see for instance:

328: void xxx_get_ipv4_addr_raw(char *addr)
329: {
330:     memcpy(addr, m_xxx_iface->ip_addr, IPV6_LENGTH);
331: }

The address points to the next instruction after branch to the memcpy. Now, just look into the memcpy call above and the bug becomes obvious. Got it!



Sunday 3 January 2016

The Pong Year

It just happened that every time I'd like to check out a new technology I'm trying to use pong as an example. Some time ago, when I was learning DirectX and C# I've created pong in 3D. I don't have this project anymore, but the game had an isometric view and the pads and the ball was actually moving in two dimensions.

During this year however, I've created two more pong implementations. First one is based on the ncurses library. Actually, the idea was to create a generic framework for terminal-based games and use it to implement pong game as an example.

Here are the screenshots:



Another goal of this project was to create a network gameplay. I've started it, but for now it's abandoned. Although the multiplayer mode is not ready, the project still has features valuable for me: 
  • Simple image-to-ascii converter
  • Doxygen documentation
  • Check unit tests
  • API for creating simple menus
You can check out the full project here. See readme for build instructions.

Later this year, I've decided to learn Unity 3D. The goal was to create an Android game and push it into Google Play to see how the process looks like. Again, I failed with a network gameplay. I have a working implementation for WiFi based LAN, but I've decided to exclude it from the final release because of numerous bugs related to the re-connection handling. 
Although the multiplayer mode wasn't released, there were some advantages of the project. At the time I was creating it, Unity 3D didn't have support for LAN discovery. I did my own module which provided ability to advertise and search for the host. The module is based on the UDP packets broadcasting. It uses one quite ugly hack for Android: it checks for wlan0 or eth0 interfaces to determine if the connection is available. In most cases, having one of those interfaces up means the WiFi (or Ethernet) is enabled, but in theory it doesn't always have to be true. You can check out the module here. I don't develop LAN discovery module anymore, because latest Unity 3D has a native support for it. On the other hand, the local discovery is planned to be only in the premium version of Unity in the future.

In terms of a single player mode, I'm quite satisfied with the AI algorithm. The speed of the pad is randomized (in range that can be chosen by parameters) and after each strike it moves back to the middle of the room. I've also created (in one or two evenings) short soundtrack music using Reaper, MT Power Drum Kit, and 4Front Bass. All of those tools are really great.

The Unity itself seems to be a good engine. However, for small Android games it has quite significant size overhead. There are two native libraries provided: one for ARM and one for Intel architecture. If both are included in a final build, the size of an empty application will be almost 20MB. If you resign from Intel targets, you'll start with a  ~9MB application. AFAIK it can be tuned in a premium version.

Here is how the end result looks like:



You can try it yourself on Google Play.

The conclusion from those projects for me is: no more pongs! I'm really bored creating pong implementations, I need to come up with a different template theme :)