Texas Instruments 6201 DSP Project

Evaluation

The evaluation of the Texas Instruments TMS320C6201 integer VLIW DSP, was for imaging algorithms. The previous Texas Instruments MVP-80 processor included four DSPs and one 32-bit RISC controller on a single die. The future of the MVP-80 was not assured, and Texas Instruments were promoting their VLIW devices.

The evaluation was to compare some common imaging algorithms on one of these newer devices. A PCI board with the 6201 was available. The processor ran at 133 or 200 MHz, with a jumper to select the different crystals. The examples from TI assumed Windows NT for a development platform. We took a week to attempt loading up the memory over PCI on a Linux platform. As this did not relate to the imaging evaluation, we stuck with the Windows NT code. The images were compiled into arrays using some custom code on Linux, then linked into the Code Composer modules and downloaded to the DSP. The DSP had a decent 32-bit timer, which was read before a test and on completion. The images were uploaded after a run and compared to the expected results on a Linux platform.

The evaluation included detailed compiler output examination, modifying the assembly to run more instructions in parallel, as well as debug at the assembler level. The 62xx and 6701 could execute eight instructions simultaneously, and if the data could fit into on-chip memory, then the throughput was indeed impressive.

Support Framework

Initial tests involved images from well known libraries, however, the output had to be converted from a dump to an image and displayed on a separate desktop PC. With the quad DSPs in the MVP-80, the image was split into overlapping segments, but with the 6201, we used a single memory space (and processor) for the image. Comparing intermediate results was difficult using vision (peering at the screen), so some additional software was written under Linux to compare expected to actual images. The C code was later tested under Linux first, then “ported” to the 6201. Some tweaking was used to get better parallelism at the assembler level, but the Texas Instruments VLIW compiler was very impressive.

Examples for the imaging libraries were supplied by TI with cycle times. These were also used to build up an imaging library. A problem with the first C6xxx devices was the small amount of on-chip memory. If the code did not fit inside on-chip memory, then the access was 32-bit sequentially to build up the 256-bit VLIW instruction. This meant that large imaging programs would not take advantage of the high throughput of eight simultaneous 32-bit instructions. (There were six ALUs and two branch units, plus two memory banks). The processor was certainly faster than other general purpose processors at the time for imaging work.

The compiler and tools from TI were excellent. Almost ten years later, we would test the OMAP-L137, and found the standard of tools excellent. If bought separate from an evaluation board, they were a bit expensive, but in 2011, there are some specials for some of the multi-core devices.