Xilinx 405 PPC Project

 

Avnet Virtex-II Pro evaluation board with Xilinx JTAG Parallel-III cable plugged in at top right.

 

2003 — 2005.

Avnet Virtex-II Pro board

PowerPC in a FPGA

When this website was originally populated in 2011, three pages were dedicated to the Virtex-II Pro — one under the PowerPC, and two under Xilinx FPGAs covering one page per board. In 2013 the processor hardware was grouped into 8/ 16/ 32 and 64 bits. The three Virtex-II Pro related pages were merged to remove duplication. Enough time was dedicated to the Virtex-II Pro to take up a page in our cyber home.

None of this was paid work, which might have changed the outlook a little, but by students and for my own research. Finding topics and coercing students to fill in gaps in your research is much like “herding cats”.

Thank you to Warwick and Basil all the effort, plus apologies for putting you through all this for your M.Tech's.

 

All of us stepped into this with little or no VHDL experience, some with Freescale PowerPC experience, and no logic analyzer for debug. When gathering references for updating this page, the gulf between marketing claims and the shipping product was large.

The journey described here is not one to repeat unless well funded with plenty of time, or having no alternative for throughput. Pricing is unlikely to favour FPGAs for even modest volumes. When devices and software are leading edge, they are likely to contain more than a few errata. You also have to move quickly, as FPGAs devices become obsolete quickly. Xilinx user guides (several versions on) no longer recommend the PowerPC devices for new designs.


As Advertised

Hardcore PowerPC processor, multiple 3Gbps serial ports, plenty of internal memory, trace port for the 405 core, ChipScope for built-in logic analyzer, one port for configuring the FPGA bitstream and processor code, some devices with 4 cores, etc.

This was certainly well ahead of other FPGA offerings at the time.

An Xcell Journal article1, made it look easy to dig into any internal signal or bus. This was followed by2. Fast forward fifty or so pages to page 86, where the Virtex-II Pro would be described enough to make a gamble on a blind purchase. (This was not our money, but a research grant).

 

This would be ideal for tracing a real-time multi-processor on a single board without spending a year developing hardware before being able to start research. It might be possible to compress a continuous trace and send it over a serial link. Our tracing literature survey can be found here (PDF 684 kBytes).


Evaluation Board

In 2004, two $499 Avnet/Xilinx Virtex-II Pro evaluation boards were purchased for student projects and tracing the PowerPC 405 core. (Unfortunately, Avnet did not take out the trace port). We spent a short time on the evaluation boards before giving them to students for their projects. The initial tools were Xilinx ISE and EDK 6.1. There were a couple of Parallel-IV cables. The first Xilinx USB JTAG probe caused the laptop’s port to go into current limit. It was kindly swapped out.

Once the I/O tests of the DIP switches, LEDs and 7-segment LEDs were completed, there was little to demonstrate any real form of life. Two $300 Audio/Visual boards later, we discovered they were designed for a Spartan board or something else — although advertised on the same page as the Virtex-II Pro boards with matching connectors.

Avnet Virtex-II Pro PowerPC FPGA evaluation board

Close up of Virtex-II Pro evaluation board


Development Board

We managed to test enough code on the Evaluation Boards to bravely purchase the $1995 Development Board when the research grant arrived. An ideal was to have a logic analyzer core and compression in the FPGA logic inspired by Tarari's design3 for gzip compression (five times the then PC rate).

The development board looked like a good option, as perhaps the evaluation boards could plug into the development board? It would have been cheaper to plug in separate boards for multi-processor tests rather than the single chip multi-core in a FPGA. (The quad core PowerPC in a Virtex-II Pro device was reported to be $30,000 — possibly a yield problem or did these even exist at the time?) The PCI bridge looked like an option to develop large programs or trace on a PC. The brochures and notes showed the SelectMAP coming from the Spartan device, so we guessed that we might be able to program the board from the PCI connector.

The working VHDL for the bridge was supplied, but again, no scaffolding to do any decent debug. Xilinx might not have wanted to loose out on their ACE bootloader, but we thought they would fill in the gaps later; it was after all a new product.

Avnet Virtex-II Pro PowerPC FPGA board

Virtex-II Pro Development Board with the Xilinx Parallel-IV cable.


Challenges

The PowerPC from IBM “mainframe land” was not the same as the Freescale “embedded land” device. The 405 and 603 cores would use different vocabulary, plus to try and get out of the FPGA fabric looked impossible from the first one thousand or so pages we read. IBM reversed the order of the address bus, so A0 was the most significant address bit rather than the least significant. The data sheets were not bedtime reading; terms like Processor Local Bus, Instruction Cache Unit, and plenty of new acronyms. You had to delve into a level of the internal processor architecture beyond what would normally be expected, even if you only wanted to write software.

The academic licenses were great, the device had potential, but the approach on everything was from the FPGA and VHDL side — areas new to us. As there was so little emphasis on the software side, it was assumed to be trivial. After reading many IBM RISCwatch documents, knowing that it was in the Virtex-II Pro device was taken that it would be supported in a device designed specifically for prototyping or hardware/ software co-design. Bad mistake!

Other factors affecting the decision to buy were the 0,1 inch connector and SMD connector that would ensure a vibrant eco-system without having to resort to “first principals” to make useful ADC interfaces. We would find out later that the SMD connectors were unavailable in low volume.

The only standard output on the evaluation board was a serial port which ran some polling demo code “out of the box”. The include files had to be linked to files generated by the Xilinx software for the address maps, device IDs etc.

No sooner had the solder dried, but Avnet would move on to the next product and abandon their “not fully debugged” hardware. Not sure if there ever were any boards that could be mated to the SMD connectors that were so closely spaced. That is the evaluation board industry. It is still much better than starting out from scratch on your own!

The Xilinx Parallel-III JTAG probe was used to download both VHDL and PowerPC code, but the tools assumed that the code would run in BlockRAM. The VHDL side was fine, but the C code examples were trivial and lacked any starting examples for the Avnet audio/video boards, (or even compatible UCF files for seemingly similar connectors). On the evaluation board circuit diagrams, there were connectors for the Parallel-III, Parallel-IV JTAG download cables, as well as a debug port for the PowerPC. The JTAG port could be used for both bitstream download and PowerPC source level debug, but it was not ready for prime-time and regularly crashed during single-step or register dumps.

 

Perhaps we should not have expected the same Parallel-III JTAG cable to be used for code and configuration stream downloads, but the GHS Slingshot was never tested on the Xilinx Virtex-II Pro as we were using them on 5200 Freescale evaluation boards and custom hardware with 5200 processors (Warwick Smith’s MTech project). The Slingshot had to be reflashed to connect to different PowerPC cores which was not as convenient as the GHS brochures suggested.

The demo code for the Development Board relied on the ACE interface, but Xilinx were not giving much away. We examined some detailed debug and loading designs, but in the meantime, see JTAG Debug Probes, which are not linked into the other menus yet.

The debug capability was poor when the board was plugged into a PCI slot, and we don’t recall any code for loading up the memory via the Spartan that controlled the PCI interface. We would like to have used Linux on a host with the PCI card connected via a JTAG umbilical chord to a Windows PC for bitstream downloads. (PCI was not configured before the PC booted and trying to reload the driver under Windows or Linux during development was not a simnple affair). Cross compiling on the PC and then transferring code via dual port memory would have been more civilized than trying to stitch together small chunks of BlockRAM. Even though the datasheets advertise huge amounts of embedded RAM in these devices, it is not contiguous.

You would need to dedicate months to archeology, sacrifice many small animals to the digital gods, and pray for good weather before discovering how the BlockRAM memory could be used as per the processors in the wild.

Well, it would take a big chunk of the year just to get through the documentation and the samples, plus more money than we had to get ChipScope going over a logic analyzer channel or into more than a small block RAM.

Want Ethernet?

There was a PHY on board (Nat Semi I think and also 1Gpbs), but where do you get an Ethernet core just to download code? And you thought it would be cheap, or that you could modify it. The $5000 price tag for testing Ethernet was impractical. We said as much.


Beware the FPGA up first

Almost ten years later, the Xilinx Zynq-7000 ARM-based FPGAs appear to have learned some of the bitter lessons we had discovered many years prior. No doubt folks with more influence than us had bent their ears a little. The PowerPC PLB bus and hardware interconnect was pretty complex, plus without a dedicated Ethernet core, not much use for industrial work. But most importantly with the Zynq, the processor comes up before the FPGA, and can even load up the FPGA (I believe).

Aurora protocol

No SMA cables or loopback connector were provided. Trying to get small quantities through the academic supply chain was almost impossible. The Aurora protocol did not ship when promised, and even later, it could not reach the 3,2 GHz advertised. There was an application note4 that described back drilling vias and other esoteric feats to improve on 2,5GHz out of the early Rocket ports. Without a decent scope on academic budgets, we were not able to test them anyway.


Lessons Learned

After the LEDs had been flashed, some attempts at the serial port (much wading through documentation), and challenges at getting some companion boards from Avnet, we stuck to Xilinx and Freescale boards. The students kept busy, but I will wait for a couple of years before venturing back to FPGA based processor boards that rely on the FPGA fabric coming up first. As a software developer, I expected to be able to load up the large on-board DRAMs with some sort of scaffolding to bootstrap the processor, configure the DRAM interface, then load up the memory via DMA while keeping the processor in reset ready to execute the first instruction. If that was not possible, then have some dual-port static RAM that could be loaded up like an EPROM emulator. Not difficult stuff, really!

Xilinx support on the C embedded software side was poor. WindRiver was their software partner—unfortunately redefining int and others, changing function calls to IMPORT and STATIC, plus nested include files. This was not much of a concern as we had been using C for over twenty years in anger (and VxWorks on occasions). On updating this page almost ten years later, we still have the original demo code, and we stand by our claims that the code quality was poor.

IBM was possibly not the best choice for a simple CPU—a while back Altera announced MIPS and ARM hardcores. They ended up only shipping an ARM hardcore with no reasons for abandoning the MIPS device. IBM manufactured the Virtex-II Pro chips, so there was probably a very big peace pipe smoked at the same table that was used for the Apple/ Motorola/ IBM folk who defined the PowerPC. It was too complicated for a 266 MHz core when embedded devices in the wild were well beyond 1GHz, plus 64-bit in simple packages as seen in our MIPS/IDT574 pages.

During one of many upgrades, the designs could no longer be migrated. The diagnostics and error messages were utterly useless. We were quite far down the road and had to back track until the student projects were completed — not the first time a software upgrade would diverge from a marketing department’s version of reality.

Have a look at a Xilinx Application note5, which lists problems with the Agilent trace analyzer (who worked with Xilinx to develop ChipScope Pro), the problems reading gprof for a PowerPC host and not being able to debug Linux because of RISCWatch (an IBM product who supplied the PowerPC) not able to deal with TLB addresses. Now how do mere mortals outside of Xilinx debug this stuff? Pretty poor effort after all the marketing hype, and for an application note that did not actually help. This confirmed our findings. The above application note was also on a trivial toy example even by academic standards.

References

1   New ChipScope Pro Integrated Bus Analyzer with a sub-title Powerful Debugging Tools for Virtex-II Pro FPGAs, by Brent Przybus, Product Marketing Manager at Xilinx, Xilinx Xcell Journal, Issue 44, pp 24—25, (Winter 2002 edition).

2   Deep Memory Yields Effective In-System Debugging, by Adrian M. Hernandez, an R&D Engineer at Agilent Technologies, Xilinx Xcell Journal, Issue 44, pp 26—28. (Winter 2002 edition).

3   Tarari and Celoxica Deliver Fast and Easy Algorithm Acceleration, by Dale Riley, Systems Engineer Tarari, Xilinx Xcell Journal, Issue 46, pp 38—42, (Summer 2003 edition).

4  Interfacing SMA Connectors to Virtex-II Pro MGTs, by Warren Miller and Vince Gavagan, Xilinx Xcell Journal, Issue 49, pp 12—15, (Summer 2004), The url is www.xilinx.com/publications/archives/xcell/Xcell49.pdf. They describe how SMA connector choice has a surprising effect on signal integrity.

5  Statistical Profiler for Embedded IBM PowerPC, by Njuguna Njoroge,
  Xilinx Application Note XAPP545 (V1.0), 15 Sep, 2004, pages 1 to 7.