THE "PCI GENERIC CARD" : HARDWARE RECONFIGURATION USING A FPGA-BASED PCI ADD-ON BOARD

Stéphane HAURADOU Thierry LEJEALLE Sébastien HAEZEBROUCK Olivier MEULLEMEESTRE Arnaud GALISSON

École Nationale Supérieure des Télécommunications
Département Électronique - Laboratoire ELEC-num
46, rue Barrault 75634 Paris Cedex 13 - FRANCE
contact : Stéphane Hauradou ; E-mail : hauradou@e...


ABSTRACT

Although SRAM-based FPGAs have been available for years, recent advances in process technology have led to the availability of high-density components (up to 100,000 usable gates). In addition, feature size reductions have contributed to this evolution with increased speeds available. Such devices can then be seen as "hardware reserves" that provide a source of reconfigurable logic. The ability to continually and dynamically redefine that hardware is the key to developing user-reconfigurable applications. Using software to reconfigure hardware leads to great flexibility, thus encouraging the development of CPLD-based generic architectures easily adaptable to specific applications.

This paper presents the "PCI Generic Card", an add-on board developed in the E.N.S.T. electronic departmentlaboratories. The card uses an ALTERA EPLD to implement a PCI master/slave interface that connects to a high-density SRAM-based FPGA, (typically 50,000 to 100,000 gates) which is software reconfigurable (at the maximum rate supported, within 200 ms) and therefore allows the implementation of a wide variety of applications.

1. INTRODUCTION

While the realization of a fully "generic card" remains a distant goal, mainly because of the diversity of the input/output format required, it isstill possible to implement an on-board generality for specified fields of applications (image/signal processing, data compression, data encryption, ...). The "PCI Generic Card" discussed in this paper strongly relies on that approach as it's divided into both a specific and a generic part consistingrespectively of a PCI add-on board and a daughterboard.

The add-on board is in charge of the communications between the computer bus and the backend application and is therefore connected to the computer mainboard through a bus slot. A high-speed EEPROM-based EPLD is used to implement both the computer bus and the backend application interface. As it was decided to develop for the PCI bus (which is on the way to becoming a new computer standard as it is present on most of the Pentium equipped PC motherboards - recently on MAC and even on workstation-based motherboards), a working knowledge of the PCI bus fundamentals1 is strongly recommended. Two types of interface have been developed ; to validate the concept, a PCI slave interface was implemented, followed by a master/slave version of that interface.

The daughterboard is in charge of the application and will support the high-density SRAM-based CPLD. Its architecture is fully application dependent and therefore, the device can be reconfigured to implement any type of application in a given field.

This paper will first present a brief description of both the add-on board andthe daughterboard, as well as the multiple evolutions of the project in termsof past, new and commercial developments.

The second part of this paper is dedicated to the hardware architecture. The PCI interface will be detailed in terms of the slave and the master/slave developments. On the application side, the reconfiguration and communication mechanisms willbe described as well as the software control over the card.

To conclude this presentation, some of the possible applications and future developments for the "PCI Generic Card" will be discussed.

2. HARDWARE DESCRIPTION

Figure 1 shows the system configuration:



Fig. 1 - System configuration

2.1. The add-on board

The PCI interface is handled by a EEPROM-based MAX EPM7256ERC208 from ALTERA. The choice of speed grade depends on the type of interface required (slave or master/slave) and will be discussed in section 3. The backend application interface is also implemented in this circuit. 4x8Kbyte of fast asynchronous SRAM (10 ns) are on the board to provide buffer memory between PCI and the backend FPGA. The SRAM was used during debug to test memory accesses but is mostly part of the FPGA reconfiguration mechanism. The system clock is distributed through a PLL as the PCI specification does not allow any PCI signal to be directly connected to more than one input pin2. To help testing, a 50-pin connector is present on the board as well as two other 50-pin connectors dedicated to the daughterboard. Figure 2 shows the add-on board configuration.



Fig. 2 - The add-on board configuration

2.2. The daughterboard

Using a daughterboard to support the backend application allows great flexibility as that board is fully application dependent. As a first development, the board is equipped with a high-density SRAM-based FLEX EPF10K50 from ALTERA (50,000 usable gates). It was decided to have two 4 Mbyte DRAM banks on board to implement a simple image processing application (color filtering). Other applications would of course require other memory capabilities. Figure 3 presents the daughterboard configuration.



Fig. 3 - The daughterboard configuration

2.3. The "PCI Generic Card" evolution

The first development used two-EPM7256 for a slave and master/slave version of the add-on board since the main objective was to avoid resource saturation. It was decided to have one EPM7256 dedicated to the PCI bus and backend interface and the other as a source of additional hardware in case the master/slave interface would notfit in a single device. The second circuit was also planned to support the EPF10K reconfiguration mechanism3. In fact, the objective of this first development was to evaluate the complexity of either the slave and master/slave PCI interface.

This paper describes the second generation "PCI Generic Card". A single EPM7256 supports both the PCI/backend interface in either slave and master/slave mode and the EPF10K reconfiguration mechanism. Of course the latest version requires most of the available resources of the EPM7256 device. This point will be further discussed in section 3.

Figure 4 shows the project evolution.



Fig. 4 - Evolution of the "PCI Generic Card"

PLDA (Powerful Logic Design Applications), a French start-up company recently launched a commercial development with the initial objective of presenting a demonstration prototype (Q4 96). In this case, the main goal is to promote the EPLD-based PCI interfaces (with an In-Situ Programmable device - ISP) rather than the CPLD reconfiguration feature. Therefore, instead of having a daughterboard, the EPF10K50 is on the add-on board thus reducing costs and improving integration.

A demonstration application is under development. This application will implement a PCI bus analysis (under WINDOWS NT and WINDOWS 95), with a graphic display of the results on the computer screen.

Figure 5 shows the card architecture.



Fig. 5 - PLDA prototype

3. HARDWARE ARCHITECTURE

The overall hardware architecture is shown in figure 6. All required PCI signals except the system clock are hard-wired to the EPM7256 device. The system clock is distributed through a Phase Locked Loop but that circuit can be removed if the backend application device (namely the EPF10K) implements its own local clock.

A few output pins from the EPM7256 are connected to the test connector to visualize internal or even PCI signals (those are of course delayed due to the on-chip delay of the device). All other I/Os are connected to the two 50-pin connectors and will feed the EPF10K50 on the daughterboard. The 4x8Kbyte static RAM is shared between the PCI interface (EPM7256) and the application interface (EPF10K).

The next section presents in a more technical manner the internal architectureof both the PCI and the backend application interface.



Fig. 6 - The add-on board global architecture

3.1. The PCI interface

The PCI specification defines both a 33 Mhz and a 66 Mhz bus frequency operation. In each case, the address/data bus length can be either 32 or 64 bits.

The project was developed on a Pentium-based PC for a 33 Mhz/32 bitsPCI bus.

Table 1 presents different factors which were considered before choosing the host device for the PCI interface.

Table 1 - Development factors
Factor Note
interface complexity estimated to 2500 gates for the slave interface, 3500 gates for the master/slave interface
number of I/O pins required
- 50-pins reserved for PCI signals
- 80-pins dedicated to backend application signals (demultiplexed address/data bus)
- 20-pins for test/extra signals

for a total of 150 I/O pins
cycle time available 30 ns at 33 Mhz
setup time on all PCI signals 7 ns imposed by the PCI specification2

To respond to these constraints, the EPM7256ERC208 offered :

Moreover, the EPM internal architecture is particularly adapted to the implementation of state-machines4. It is precisely in the form of a macro-state machine that both the slave and master/slave PCI interface were designed.

3.1.1. The slave macro-function

The slave macro-function is implemented in a EPM7256ERC208-12P and is fully compatible with the PCI specification version 2.1. The macro-function supports all PCI commands and all required accesses in configuration space2 are handled. Among all the features supported, here are some of the most characteristic :

The state machine is shown in figure 7. Five states are implemented : IDLE, CONF, MEM, BUS_BUSY and TURNAROUND. A transaction is always initiated in the IDLE state and terminated in the TURNAROUND state. The five states, except TURNAROUND are macro-states : many clock cycles may be used during these states to accomplish multiple operations.



Fig. 7 - The slave macro-function state machine

Table 2 presents a brief description of each state.

Table 2 - The slave interface states description .
Macro-state description
IDLE The device waits for a transaction to be initiated on the bus.
- if the device recognizes itself as the target of the memory access and if the backend is ready to handle the transaction, then the next state is MEM,
- if the PCI command indicates a configuration access addressed to the device, the next state is CONF,
Two other cases can occur:
- the initiated transaction is not addressed to the device OR,
- the transaction is addressed to the device but the backend is not ready to serve that access.
In both cases, next state is BUS_BUSY.
CONF In this state, the device responds to configuration accesses. After the first data has been transmitted, the device disconnects itself as burst transactions are not allowed in configuration space2
MEM In this state, the device responds to memory accesses and supports bursts of any length. No wait-state are inserted as the static RAM and the backend (EPF10K50) can handle the transfer rates. However, the device response can be adapted (with the insertion of wait-states) to the backend application timing constraints.
MEM state is active until the end of the current transaction. The next state is then TURNAROUND.
BUS_BUSY This state has two functions:
- in the case of a transaction not addressed to the card, the device stays in the BUS_BUSY state until the end of that transaction,
- in the case of a transaction addressed to the card while the backend is not ready to serve any access, the device asserts a retry towards the initiator2.
TURNAROUND This state is used to release the PCI bus lines before returning to the IDLE state. A one clock cycle is required for this operation2.

The slave interface occupies 75% of a EPM7256 device. This includes:

3.1.2. The master/slave macro-function

This macro-function is also fully compatible with the PCI specification version 2.1 but this time is implemented in a EPM7256SRC208-10 to respond to the new timing constraints imposed by the state-machine increase in complexity. The state-machine body does not change except for the MEM state which is divided to handle master transactions. Figure 8 shows the master/slave state-machine.



Fig. 8 - The master/slave macro-function state machine

MEM_S is identical to the previous slave state MEM. The additional capability beyond the slave interface is the possibility for the backend application to request possession of the PCI bus. The PCI interface simply transmits that request to the host bridge. When the bus is granted (ADR_M state) the transaction can be initiated (MEM_M state) by the application.
To complete the master/slave macro-function description, table 3 presents all the supported features regarding the totality of the required/optional features described inthe PCI specification. (this table is especially intended for people with a good understanding of thePCI constraints and specification)

Table 3 - The master/slave PCI interface features
PCI Transactions Device generation Device response Supported Note
Memory Read Required Optional yes
Memory Write Required Optional yes
Memory Read BURST mode Optional Optional yes No wait-state inserted.
Memory Write BURST mode Optional Optional yes No wait-state inserted.
Configuration Read n/a Required yes
Configuration Write n/a Required yes
I/O Read Optional Optional No I/O space implemented. No response.
I/O Write Optional Optional No I/O space implemented. No response.
Special Cycle Optional Optional No response.
Interrupt Acknowledge Optional Optional No response.
Memory Read Multiple Optional Optional note Treated as MEMORY READ.
Dual Access Cycle Optional Optional No response.
Memory Read Line Optional Optional note Treated as MEMORY READ.
Mem. Write & Invalidate Optional Optional note Treated as MEMORY WRITE.
Delayed Transaction Optional Optional
Fast Back-to-Back Type 1 Optional Required yes
Fast Back-to-Back Type 2 Optional Required yes
Arbitration Parking n/a Required yes
Exclusive Access Optional Optional
Address/Data Stepping Optional Optional Required if device deals with system/executable memory.
Parity Generation Required n/a yes
Parity Error Reporting note Optional yes Required only if risks of system integrity failure.
System Error Reporting note n/a Required only if risks of system integrity failure.
Cache Support Optional n/a
Interrupt Support Optional n/a yes
Master Time-out Required n/a yes
Master Abort Required n/a yes
Retry Optional Required yes
Disconnect with data A Optional Optional yes
Disconnect with data B Optional Optional yes
Disconnect without data 1 Optional Optional yes
Disconnect without data 2 Optional Optional yes
Target Abort Optional Required yes
Byte Enable Support Optional Optional Device is declared PREFETCHABLE.
Class Code/Revision ID Required n/a yes
Header Type Required n/a yes
Device ID/Vendor ID Required n/a yes Not registered.
Status/Command Required n/a yes
Base Address Register Required n/a yes Only one BAR.
Latency Timer Required n/a yes 8 bit.
Cache Line Size Optional n/a
Built-in Self Test (BIST) Optional n/a
CardBus CIS Pointer Optional n/a
Interrupt Line/Interrupt Pin Optional n/a
MIN_GNT/MAX_LAT Optional n/a
Subsystem V. ID/Subsys. ID Optional n/a
Expansion ROM BAR Optional n/a

The master/slave interface occupies 75% of the EPM7256 device, the same as for the slave interface. This can be explained by the fact that the complexity evaluation tool uses macro-logical cells5 as base elements for computing these rates (in fact, the master/slave interface requires approximately 30% more resources than the slave interface).

3.2. The application interface

Interactions between the backend application (EPF10K) and the host environment are of three types:

3.2.1 The EPF10K reconfiguration mechanism

Depending on the project version (see figure 4), the EPF10K configuration mechanism is either supported in:
- the EPM7256 dedicated to interfacing the PCI bus,
- another on-board EPM7256 not only dedicated to EPF reconfiguration. In the case of an implementation of this feature in the PCI interface-dedicated device, utilized resources go from 75% up to 89%. Special care should then be taken when fitting the project into the device.

In the case of a single EPM7256 device architecture, device reconfiguration is handled with the addition of three states to the global state-machine configuration. Another feature that must be implemented is the division of the PCI clock frequency to obtain the 10Mhz frequency required to configure the EPF10K device3 (passive serial mode).
The two following solutions were considered for the programming of the EPF10K device:

The second solution was developed although full EPF10K configuration requires multiple iterations: the 4x8Kbyte on-board static RAM capacity is not sufficient for a one-shot programming. The software drivers then have to handle this configuration technique. This paging technique can be bypassed with the utilization of larger RAM components. To give an idea, the EPF10K100 requires 144Kbyte of configuration data (5 times more memory than the implemented SRAM size: 32Kbyte). Therefore, 5 pages are needed to fully reconfigure the EPF10K device.
In terms of speed, the EPF10K100 is reconfigured in less than 120ms (with a 10Mhz local clock frequency in passive serial mode).

3.2.2. Communications between EPM7256 and EPF10K

Approximately 10 control signals have been defined for backend communication. A simple handshake mechanism controls the transfers between the EPM7256 and the EPF10K. In addition to these signals, the address/data bus is either multiplexed (as for the PCI bus)or demultiplexed depending on the project version. A maximum of 64 signals are required to handle demultiplexing (the address bus length can be reduced to match a smaller address space).
The implemented communication protocol can be adapted to the backend application requirements (number of signals and signification, address/data bus length anddemultiplexing, ...).
Figure 9 shows the backend interface as implemented in a master/slave EPM7256SRC208-10 device with multiplexed address/data signals.



Fig. 9 - The master/slave backend interface

Table 4 presents a brief description of each signal.
Table 4 - The backend interface signal description
signal type description
flex_slave_en EPM input When the backend asserts this signal, only slave transactions are allowed.
flex_bus_req EPM input The backend asserts this signal to request a master transaction.
flex_master_RW EPM input The backend uses this signal to indicate the master access direction.
flex_master_last EPM input The backend asserts this signal to indicate the last data phase.
flex_master_en EPM output If actively driven, the EPM device supports master transactions.
flex_slave_access EPM output Indicates a slave transaction in progress.
flex_slave_RW EPM output Indicates the slave access direction (read or write).
flex_address EPM output Indicates the possible presence of an address on the flex_AD bus.
flex_valid_data EPM output Indicates that a valid data is on the flex_AD bus.
flex_target_abort EPM output Used to report a target abort in master transaction (PCI specification).
flex_reset EPM output Connected to the card RESET.
flex_AD[31..0] bi-directional The 32-bits multiplexed address/data bus.
3.2.3. software drivers control

Communication between the backend application and the software drivers is handled through a 10-bit register implemented inside the EPM7256 device (in its memory address space). In addition, an interrupt is available for signaling. The status register is especially useful during the EPF10K reconfiguration process. Table 5 describes each one of the status registers bit.

Table 5 - The EPM7256 internal status register description
bit access signification
0 read only The EPF device is reconfigured (programmed).
1 read only The EPF device is under reconfiguration (programming mode).
2 read only The "PCI Generic Card" is ready to handle a memory transaction.
3 read/write The interrupt has been asserted (the bit will be cleared after completion of the interrupt routine).
4 read only An interrupt is in progress.
5 read/write The static RAM is locked and cannot be allocated to the EPF device.
6 read only An error occurred during EPF configuration.
7 read only A data page has been transferred to the EPF device for configuration.
8-10 read/write reserved for future use.

3.3. Fields of application for the "PCI Generic Card"

One of the major interest of the "PCI Generic Card" is certainly in the field of ASIC prototyping where it becomes a useful development tool easily and rapidly set to help ASIC conceptors. The most interesting features covers:

>From the user point of view, the interest of an application build from reconfigurable logic is based on its superiority over its software implementation, in terms of speed and CPU utilization. Therefore, the "PCI Generic Card" is dedicated to fields of application demanding high computational power, as for example: In that sense, the "PCI Generic Card" can compare to other existing systems6 using reconfigurable logic (for example, the Wild-one computer from Annapolis Micro Systems, the Spectrum platform from Giga Operations or the EVC1 from Virtual Computer Corp). However, the advantage is in the EEPROM-based PCI interface device which can also be reconfigured (using an ISP device, as shown in figure 5) to implement either a slave or a master/slave interface with user- programmable adds or adaptations of the PCI macro-functions.

4. FUTURE DEVELOPMENTS

The card software drivers have been developed in C++ and Assembler for the DOSoperating system. It is planned to transfer that software under WINDOWS 95 and WINDOWS NT operating systems. A daughterboard is under development and will be dedicated to image processing. The first application in that field should lead to an image filtering architecture for the FLEX EPF10K50. The final objective is to transfer both the PCI interface and the application in a single SRAM- based CPLD for an increase in integration and a diminution of costs.

5. CONCLUSION

This paper presented the "PCI Generic Card", a project intended to present a PCI-based reconfigurable coprocessor along with EPLD-based PCI interfaces. The project was conducted thanks to the support of ALTERA FRANCE, and led to the design of two prototypes. At this point, here are the results of the six months development surrounding the project:

In parallel to that, a French start-up company (PLDA) recently launched a commercial development with the support of the E.N.S.T electronic department laboratories. A fully FPGA- based prototype PCI card is under test and a first demonstration application (PCI bus analyzer) should be ready by the end of the year (Q4 96).

REFERENCES

  1. Tom Shanley & Don Anderson, "PCI SYSTEM ARCHITECTURE", 3rd ed., Addison & Wesley
  2. PCI Special Interest Group, "PCI LOCAL BUS SPECIFICATION REV. 2.1", June 95
  3. ALTERA, "CONFIGURING FLEX10K DEVICES", application note 59, dec. 95
  4. ALTERA, "MAX+plusII USER GUIDE"
  5. ALTERA, "DATA BOOK", 1996
  6. EDN Europe, "RECONFIGURABLE LOGIC", Design Feature, July96
This page was accessed 1701 times.