Table of contents

Main characteristics and design philosophy

F-TTA0 is a completely new architecture, based loosely on the work previously done by groups researching the general TTA concept and previous work on other F-CPU architecture proposals, but otherwise completely independent of any other architecture. The basic idea of the proposed TTA is to simplify the hardware as much as possible while still providing maximum generality and parallelism. This is facilitated by the use of a TTA, which effectively exposes more of the pipeline to the software, and allows many of the instruction scheduling and control operations to be performed by the compiler instead of the hardware. Unfortunately, this also creates certain difficulties in terms of requiring that the compiler know many details of the internal architecture, and eliminating these dependencies often forces a less optimistic approach to execution (instruction reordering at run time is impossible with current techniques, for example). However, the use of a TTA does allow greater thread-level parallelism when combined with SMT techniques, which should help compensate for the reduced ILP. Consequently, the F-TTA0 will be an SMT architecture.

In order to support SMT, the F-TTA0 will be capable of storing information for multiple contexts on-chip. These contexts will have separate external and internal representations; that is, contexts may not be identified in hardware by the OS-defined PID, TID, etc. Whenever an instruction generates a bus transaction, control signals on the bus will indicate which context is issuing the instruction, so that functional units may act accordingly.

The F-TTA0 is composed of several different functional units, grouped into two categories: control units and execution units. Execution units are the obvious case: adders, FPUs, and the like; the things that actually DO the computation. Control units include the hardware that does the more obvious control activities such as instruction fetch, but also things like the register file. These units will be covered in more detail in later chapters. All functional units, control or execution, will be connected by one or more busses (one per move instruction in the instruction word).

Constants will be implemented as words in the instruction stream. One read-only address will be used as a constant generator; any reads from this address will cause the instruction fetch unit to place the data in the next instruction word in the stream onto the bus as a constant.

Instructions and encoding

The F-TTA0 will not use traditional instructions. As a TTA, it will have only one instruction: move. "Instructions" will consist of four overhead bits and two six-bit addresses: source and destination. This will result in 16-bit move instructions, which will be grouped to allow multiple move executions in a single cycle, making the F-TTA0 pseudo-VLIW. The four overhead bits will be grouped into two conditional cancellation bits and two operand size bits, covered below.

Instruction words can be any length, depending on the capabilities of the control units. An F-TTA0 implementation requiring minimal die space could even issue a single move per cycle, at the expense (obviously) of very low performance.

The four high-order bits of every instruction word will be the two conditional cancellation bits, immediately followed by the two operand size bits. The F-TTA0 will have two conditional cancellation units. The output of each unit will be ANDed with the respective conditional cancellation bit in the instruction word. If the result is 1, the instruction will be squashed. If it is 0, the instruction will be executed. The operand size bits will reference one of four special-purpose registers, which will determine the size of the operands to be computed, up the the maximum capability of the implementation.

The control bits will be followed by the six-bit source operand, then the six-bit destination operand. These operands will specify logical address which may (depending on the implementation) be translated by the appropriate control unit into a physical bus address. These addresses will specify specific ports on functional units; some ports may be read-only or write-only. Read-only and write-only ports may share addresses on the bus (e.g. a write-only integer operand register may share an address with a read-only integer result register).

Logical addresses implemented in F-TTA0 will be reserved for functional units as follows:

000000constant as source, NOP as dest
000001Invalid address
000010-000111BC/reserved
001000-001111RF
010000-010011IB
010100-010111RN
011000-011011LS
011100-100011Reserved for future use
100100-100111R0
101000-101011R1
101100-101111U0
110000-110011U1
111000-111011C0
111100-111111C1

Abbreviations are defined below.

Branching and conditional execution

The F-TTA0 will have two conditional cancellation units. Each will have a single-bit output not present as a logical address on the bus, and each will have three writable addresses: two data operands and one control operand specifying the comparison to be made by the unit (a<b, a<=0, etc). Whenever the output of a cancellation unit and the corresponding cancellation bit in an instruction are both 1, the instruction will be squashed. The program counter will also be readable and writable on the bus, allowing branches to be implemented as conditional or unconditional writes to the PC.

A traditional branch predictor will not be implemented; instead, a squash predictor may be implemented, using any algorithm. The instruction fetch unit will monitor for writes to the PC and, if a squash predictor is available, will use its prediction to speculatively fetch instructions. The fetch unit will monitor the conditional cancellation units when issuing a branch to the bus, and will squash speculatively fetched instructions based on the branch instruction word and the outputs of the conditional cancellation units. The control unit may also take other optimistic measures based on squash predictions of other conditional instructions, such as preemptive actions on constants.

Control units

The F-TTA0 will have the following functional control units:

BC: Bus Control

The bus control unit is similar to the traditional "control unit" of an OTA. It is the arbiter of all internal busses, but also performs instruction fetch (and decode, to the limited extent that a TTA does instruction decode), squash prediction, exception handling, power-on BIST, etc. Note that the BC will NOT actually transfer data on the bus, except as requested through its bus ports. It will simply send the necessary control signals to other units, which will read from or write to bus data lines as required.

The BC will have three ports: control, status, and immediate. The control port will be write-only and will provide for any miscellaneous control actions not controlled by special-purpose registers, as well as requesting machine configuration and status information from the BC unit. Control will be write-only. Status will provide machine status and configuration information, will be read-only, and will be aliased to control. Immediate will provide immediate data read from the instruction stream when requested by an instruction. It will be read-write, but writes to this address will be ignored (this will act as a redundant NOP).

RF: Register File

The register file will contain all registers, whether special-purpose or general-purpose. The number of general-purpose registers will be determined by the implementation, but will normally be 64. The RF will have multiple input ports to specify internal registers; the data written to these ports will specify special- or general-purpose registers, and the specific register to read or write. Each register specifier port will specify the register to be connected to one bus port, both read and write. To accommodate larger implementations, with as many as eight internal busses, the RF may need to have a very large number of read and write ports. Since implementing this tends to create a rather distended RF, the RF may be split into segments, with each segment containing a subset of the registers and read/write ports.

The RF will have eight ports assigned to it: four address/status ports and four data ports. The address ports will be used to specify the specific register to connect to a data port and will be write-only. The data ports will allow for reading from and writing to specific registers, and will be read-write. The status ports will be read-only and will be aliased with address ports.

C0/1: Conditional Cancellation Units

Although these could also be considered execution units, since they perform decision-making tasks, they also are heavily involved in instruction flow and bus control, so they are considered to be functional control units instead. Each will have five ports allocated: operand, specifying the comparison to make, status, giving software access to information about the unit, and three data inputs. The data and operand ports will be write-only. The status port will be read-only and will be aliased with the operand port.

Execution units

The F-TTA0 will have the following functional execution units:

Integer/bitwise unit

This unit will perform all generic integer and bitwise operations: add, shift, etc. It will have eight ports: opcode, three operand ports, status, and three result ports. The opcode port will specify the operation to perform on the operands and will be write-only. The status port will provide information on the current status of the IB, will be read-only, and will be aliased to the opcode port. The three operand ports will be write-only and will provide data on which to operate. The three result ports will provide the results of the opcode, will be read-only, and will be aliased to the operand ports.

Real-number unit

This unit will perform computations on more complex data formats such as floating-point representations. It will have the same ports as the IB. It will be capable of standard IEEE representations of floating-point data. It may also be capable of alternative representations such as logarithmic number systems. When other number systems are supported, the RN will support conversion between the supported number systems.

Load/Store unit

The LS will perform all interactions with caches and memory. It will have five ports: An opcode port, a status port, an address port, a data port, and a result port. The opcode port will specify the operation to perform: load, store, prefetch, cache block invalidate, etc, and will be write-only. The status port will provide information about the status of the LS unit, will be read-only, and will be aliased with the opcode port. The address port will be write-only, and will specify the address of the operation to perform. The data port will be write-only and will specify additional data (such as data to write to the specified address for a store instruction). The result will specify the result of the operation performed (such as the data fetched from memory for a load operation), will be read-only, and will be aliased to the data port.

User-defined units

These units will be defined by a particular F-TTA0 implementation. No guarantees of functionality or the ability to uniquely identify one particular user-defined unit will be made. These will for implementations designed for a specific purpose, and any software which uses one of these units should be designed in conjunction with the implementation itself. The user-defined units will have four to eight ports. Four addresses will be reserved for each unit for reading, and four for writing. They may be read-only, write-only, read-write, aliased or not, at the discretion of the designer. In short, use these at your own risk and for your own benefit.

Re-programmable units

These units are general special-purpose units. They will consist of re-programmable logic, to be used by any software. Each will have five to eight ports: control and status, and three definable addresses. The control port will allow for control and reprogramming of the unit, and will be write-only. Status will provide information about the status of the unit, will be read-only, and will be aliased with the control port. The remaining three addresses assigned to the re-programmable units will be usable at the discretion of the software programming the unit.

Issues, comments, and analysis