“Historically, software verification has had little impact on the real world of software development. Despite the plethora of specification and verification technologies, the problem has been in applying these techniques and theories to full-scale, real-world programs. Any fully detailed specification must, by its very nature, be as complex as the actual program. Any simplification or abstraction may hide details that may be critical to the correct operation of the program.”
(Hailpern 2002, pg 9)

What is it?

Embedded debugging is testing and fault-finding on embedded systems, or basically, small microprocessors. Here, our definition might differ on what is small, but any memory constrained device that does not have the storage or display features of a desktop PC. Software is usually hosted on a separate PC or workstation, although some embedded devices do support “self hosted” Linux software development.

Our targets are for industrial applications, increasingly moving towards small automation tasks in the home. These might be as simple as switches for LED lighting that communicate to another device to handle the PWM power stage. Our designs are generally prototypes for customers or small production runs, with development being the largest cost. To help with earlier release dates or demonstration of a working prototype, we use 32-bit devices as they are mostly below $5 in low volumes. We program on “bare metal” and use a small tick-based RTOS. Our debugging is generally to verify that the tasks are executing in the correct order within their deadlines. Once deployed, some communications is useful to monitor the health of the target or change parameters. This is done from a host PC or more powerful embedded target with a display. Our debugging involves JTAG probes, limited instrumentation, event logging and code coverage.

Debugging will always be a “work in progress” so the menu selections to new material are added below to minimise broken links.

In theory

In theory, the debug phase of a computer or software project is one sixth of the total time. Other areas are design, coding, testing, documentation and maintenance.

In practice, the embedded coders are the furthermost from the enlightened designers, who consume almost all the time and resources for something that cannot be properly nailed down without machine readable specifications. See (Hailpern 2002, pg 9) for motivation of why a specification needs to be as complex as the actual program.

Embedded testing takes a little longer as you are closer to the target with development that is not “self hosted” and on severely constrained hardware (compared to a desktop PC).

Debugging, testing and verification activities can easily range from 50 to 75% of total development costs according to (Hailpern 2002, pg 4), making it worthwhile to dedicate a processor to debugging. For cost sensitive targets, suitable connectors and a trace port will allow external hardware for debugging that does not need to ship to end-users once a system is in production.

Academic work used to centre around supercomputers or in territory where it was easier to publish due to novel equipment, (as they are measured by publications). Small embedded devices only became academically interesting once their capabilities improved together with supercomputer budgets drying up. Many architecture innovations first appeared in academia — simultaneous multi-threading, RISC, real-time scheduling with a theoretical or mathematical grounding, etc. Industry are not that interesting in publishing, but rather obtaining patents. Within both the publishing and patent work, there are plenty of ideas for simple debugging aids. We have tried to highlight a few of our favourites in the References section further down. (We will add to the list as and when we get a chance to make the jumble of notes presentable as a web page).

In practice

We have been debugging embedded devices since the early 1980s. Yes, they have improved, however, the tools are either very primitive or very expensive. Much was discovered by “trial and error” and at the time, was the obvious solution. Since then, patents have appeared long after embedded developers routinely used widely known methods.

The one sixth project time only exists in shelfware written by authors who have less than a 40-hour week contact time per semester with students. They might not even have to write more than a 1000 lines of embedded software to show how bad the C language is, or change languages as the fashion and publishing pressure herds them into various camps. There may be six slices of the software life cycle pie, but they are not equal is size, time or effort.

We saw the C++ debate rage and then fade as Java tried gallantly to run at a decent speed as an interpreted language. C#, Objective-C and others have less to do with performance than marketing or a computer landscape/ landgrab.

For small targets, the most popular language is likely to remain C. C++ with dynamic allocation and other features is too tricky for a SoC with limited memory for a stack.

Costs will always determine what to put into a device. We are unlikely to see debug improve beyond what we have. Trace formats have remained secret to a select few, and even the ARM camp does not have cheap trace hardware.

Hardware, software or hybrid debug?

Hardware debugging involves logic analyzers, oscilloscopes, or FPGA based instrumentation for viewing signals. Software debug ranges from the printf() style of marker being emitted when executing a section of code, to simulation with replay (forward or backward). Unless it is trivial, visualisation is essential to grasp the huge quantities of data gathered from running a program. Hybrid debug is a combination of hardware and software instrumentation. There is not much pure hardware based debug with modern devices, as their visibility disappeared from external probes, requiring some software to assist in selecting which signal to monitor from an internal buffer.


The workstations of the 1980s and 1990s where based on devices that are less powerful than current offerings costing a few dollars. Hard drive capacities (now SD cards), memory, clock speeds and other metrics have improved by several orders of magnitude over the past thirty years, yet the early trace research managed nicely on a few MIPS. Even the small Cortex-M0 32-bit core at 48MHz is good for 20 plus MIPS at under $3 in small quantities. Monitoring pushbuttons and industrial interfaces has not changed much since the days of the 8 MHz 68000, so why not use some of the spare capacity and insert software instrumentation into software that ships with the final product? Other than the highest volume items, development and testing costs should justify using a slightly more capable device to counter the additional overheads. As multi-core becomes mainstream in low-cost embedded, here is an ideal opportunity to take the ideas that were popular several years ago — dedicate one of the cores to debug as it is likely to have access to the on-chip peripherals or at least have a channel to an adjacent core, plus the chances of stepping on a patent “landmine” are greatly reduced.

We have opted to use a standard RTOS as the effort to establish or document anything that could compete with the numerous free offerings is unlikely for small teams (particularly unfunded). In a previous academic setting, there would be little to gain from not trying something new. The support was not an issue as there were no products to ship or documentation to generate. Customers are unwilling to fund another RTOS as the lifetime of a device is measured in quarters rather than years. For the developer, just wading through the 2000 to 4000 page PDF manuals is bad enough for most embedded SoC devices, even if based on a popular core. The peripherals take all the effort to program, and there is little outside help if nothing toggles or changes a scope trace.

Evaluation boards are great, but they seldom put in the effort you would expect other than the blinky examples. Even the delay() functions are just swinging around a for loop. ARM tried to created the CMSIS standard that they believe is the answer to the array of peripherals out there and offered a RTOS late in the day after they gained enough confidence to challenge their own ecosystem and “eat their young”.

We have bought several compilers, many evaluation boards, tried plenty of RTOS kernels, and now just accept what a customer uses. Sometimes we might even change between devices within a month. We are not publishing source code of our instrumentation due to the increased chance of being sued compared to not publishing. Where we can purchase instrumentation without having to maintain the host side, we take the easy option.

Look around these pages as they slowly fill up. We try to peg dates to prior art in the hope that we can reuse our code, particularly when projects range from a few days to six months.

From the ground up

The order we choose to develop has changed over the years. In the 1980s, you started off with a schematic, laid out the board, then wrote a simple monitor to exercise the hardware. Fast forward thirty years and the whole board is within a sub-$5 SoC. There are no chip selects to trigger a scope for a NOP loop through the address space. To program any code now requires an interface to the internal Flash, which is usually via JTAG. The easiest debug path starts with a semiconductor vendor's evaluation board that has an on-board probe, a toolchain that supports the probe, and source-level stepping through code until the target can be “free run”. Example code helps to see the blinky application flash a LED. This verifies the target connection, the Flash programming, source-level debug, peripheral initialisation, correct clock gating for ports, and something visual that can be used as you progress. Inputs are next — hopefully there are a few switches for the simple loop of scanning, testing for the switch state, then driving an output. By now you also discover that to progress any further, you need to open up the 2000 to 4000 page PDF datasheet.

The chosen architecture will dictate where the stack resides, how it is initialised and how you proceed out of reset. This is tedious reading, but only needs to be done once, then tested and put in a safe place for reuse. Each vendor offers some header files with copyright notices and threats of punishment if used elsewhere — as if the code will run on anything else. As the herd follow ARM, we are blessed/ cursed with the new CMSIS from ARM. We generally prefer to write to ports in 32-bit quantities, and set clocks in a single blow, but CMSIS writes single bits as part of elaborate functions with many nested include files. In our opinion, not progress, but like the C++ versus C on “bare metal” debate, not many hands stick up.

By this stage, you have agreed to countless license terms on tick boxes, but must now proceed to refactor the code as we used to in the 1980s, when not many header files were supplied. An example of a header file for the Kinetis K60 from Freescale, is copyright from 1997 (not sure there were any Kinetis devices on the planet at that stage), is 812,283 bytes long, which when sent through the pr print program gives 242 pages of A4 or letter output. What if you prefer to split this up, or wanted to supply it with your work to a customer? Are you allowed to split up these file and redistribute them?

Your toolchain will have other restrictions, depending on GNU or other licenses if you use their libraries. printf() on a desktop will pull in so much code that it will blow the 128kByte limit of a small processor's internal Flash before adding your application code, so you might want to get a library or build one from scratch. We did this many years ago and tested the functions on a desktop (32-bit in those days), but either way, you will need to know what is in a library if you are going to use one. Without source code, don't bother. We have spent long international phone calls with per second billing arguing with support because we did not have source code, and vowed to never get caught in that situation again. We hope that is a lesson you don't have to repeat.

Instrumentation will require visualisation on a host or a logic analyzer. Start to think about this early, as your customers might not be as keen as you are to collect numerous packages and still have to learn how they work — just because your target misses a couple of heartbeats. Software instrumentation will always be cheaper, as any additional hardware costs extra. A customer without an Ethernet connection on a laptop is rare if they have some automation. In the 1980s, serial ports were useful, but they are near extinct on the desktop. You will also need to be able to look at some kind of log if your product failed unexpectedly. This implies connecting to a running target without a reset and possibly getting a memory dump or stack trace.


B. Hailpern and P. Santhanam. Software debugging, testing, and verification. IBM Systems Journal, Volume 41, N°1, pp 4–12, 2002


Other work written several years ago on debug and tracing:
Debugging Embedded Systems (some early 32-bit work) (PDF) 7,9Meg and
Trace and Profile Survey (PDF) 684 kBytes