\chapter{The exceptions}

 A processor of any kind (CISC, RISC or any other architecture) generates a lot of 
exceptions, interrupts, traps and system calls (here, context switches are not the point). 
Each pipeline stage can generate several errors that the OS must handle, which requires that 
the application must "restart" the trapped instruction or continue after the trap. This implies 
that the whole context must be saved, but which ?

~

 Control can be transfered to the OS, an interrupt handler or a trap handler at anytime, at any 
stage of the pipeline. A classic RISC pipeline comprises (and generates) for example :

\begin{itemize}
 \item  IF (Instruction Fetch) : page fault
 \item  ID (Instruction Decode) : invalid instruction, trap instruction, privileged instruction.
 \item  EX (EXecute) : divide by zero, overflow, any IEEE FP math error...
 \item  MEM (MEMory access) : page fault, protection error
\end{itemize}

~

 Not only should the processor trigger the correct handler (because several errors can occur in 
the same cycle) but it must also preserve or flush the correct stages of the pipeline. And since 
FC0 completees the operations OOO, it is too complex to do without a lot of buffers everywhere as
well as sophisticated bookkeeping, which we can't afford for obvious reasons. We need to keep 
precise exception anyway, and the ability to stop the pipeline at any time without losing data 
that would require some code to be reexecuted. We need a simple and predicatable yet efficient
pipeline that is not influenced in its architecture by faults.

~

 The simplest alternative to this problem is dictated by good sense : 
make all the exception occur at one place, before the potentially faultive instructions enter
the pipeline and require additional hardware. This means : \textit{NO INSTRUCTION IS ISSUED IF IT CAN 
TRIGGER AN EXCEPTION} or, in other words, \textit{ALL EXCEPTIONS MUST BE CHECKED AT DECODE TIME AS TO 
PREVENT THEM FROM OCCURING IN THE EXECUTION PIPELINE}. Remember this clearly, meditate about this,
since it influences how the instruction set is designed too.

~

 The good side of this choice is that there is no "trap source" register as in the MIPS CPUs. 
All exceptions are caught at the same place and are disambiguified and ordered implicitely. 
Another important good consequence is that there is no temporary buffer or "renamed registers" 
as called in the PowerPC. The previously described OOOC pipeline is not changed at all and the 
critical datapath does not suffer from additional buffers. There is no register allocation 
bookkeeping, nor added control logic.

~

 The other side, which is about the constraints, is discussed here. 
Most obvious limitations have simple turnarounds. The first problem is : can we detect all the 
exceptions at decode time and how ?

~

 \underline{First cause :} page fault at instruction fetch time.

First, we are not absolutely sure that we will even decode the next coming instruction, since the
last instruction of a page could be a jump, or any similar instruction. So why trigger the trap 
now~? The easy turnaround to this problem is to "tag" the instruction as faultive or, better, replace
it with a trap instruction (which requires less hardware). So, if the instruction is 
executed, it will trap. Simple, isn't it~? Of course, if a page fault is triggered by the instruction 
prefetch unit, it is a good practice to directly prefetch the necessary code before it is needed.
Just by precaution.            

~

 \underline{Second cause :} invalid instruction, privileged instruction...

Why bother ? It traps. Depending on the type of trap, we will advance the 
instruction pointer or not, fetch the needed code to execute it, and begin to backup of the 
registers with the SRB mechanism. The precedent instructions don't need to be flushed from the 
pipeline, because the SRB will communicate with the scoreboard to backup the registers in a 
correct order. When the pipeline will be "naturally" flushed from the old application's instructions,
the registers will be saved and the faultive application will restart later without any loss or 
reexecution.

~

 \underline{Third cause :} math fault.

The saturation (or overflow) exception (a la MIPS) is not implemented. The IEEE Floating Point 
instructions have a "compliance" flag that stops the instruction issue until the result is "safe",
otherwise the result will sturate and not trigger any trap. The "division by zero" condition is 
easily detected at decode stage with the ZERO property bit of the dividing register. At the same time,
we can detect if the result will be zero and issue a "clear" operation instead of the divide operation.

~

 \underline{Fourth cause :} page fault, invalid address fault.

We can consider that the memory is protected on a page granularity basis, so the page fault will 
trigger a protection checking code before loading the page. But detecting a page fault is very 
simple : we have to check the address with the values contained in a page table. If the address 
does not correspond to the available pages, it is a page fault : we trap.

~

 Now, the problem is to have the status (page present or not ?) at decode time. Let's be smart, 
because memory accesses are almost half of the executed instructions !

The alternative is to use a similar mechanism to the ZERO "property" bits of each registers. 
This means that when a value is written to the register set through the Xbar, some ports of the 
Xbar communicate the value to the page table. In one cycle or two, the data is ready for the ID 
stage, this is a speculative check that is transparent to the instruction set architecture. In 
this page check time, we can also check for the address range, verify if the value is in L1 cache
, and if yes, indicate in which bank it is and prefetch the cache line, etc...

~

 An obvious problem though is that we can't seriously check all the values flowing through the 
Xbar to the reg set. Not only this is not always useful but it also consumes power. The simplest way
(for the prototype) is to check the result of the pointer updates since they are most likely to be 
reused soon as pointer.

~

 For more sophisticated architectures, another "transparent tag", saying that the register is used 
as a pointer, can be very useful. We can allow for example only a few registers to hold this tag, 
something like 16 (64/4 sounds reasonable) and this flag would be set each time a memory access is 
performed with this register. The flags would be allocated with a LRU mechanism using a 4 bit down 
counter. This way, when the ID recognizes a memory read/write instruction, it checks the pointer 
flag and if set, sends the associated informations to the L/S unit (informations like : in which 
L1 bank the data is, or in which buffer, etc.) or it traps if the page table lookup returned a 
negative value. If the pointer flag is not set, the ID pauses for a page table lookup and sets 
the pointer flag. Of course, like all transparent flags, their value is not saved during context 
switches and is regenerated automatically as soon as they are used. In the absence of explicit 
flags in the instructions, this is a rather simple way to reduce the table lookup overhead, and 
the address can be checked BEFORE it is needed. The L/S unit is only in charge of buffering 
the data that flows to/from memory and caches. This last detail invalidates the drawing of the 
figure 2.2 where the page table was stored in the L/S Unit.

~

 There, almost all exception causes are covered and the turnarounds have been explained. There is
no visible impact to the ISA but coding rules are getting tighter, like in a superscalar processor.
Anyway the turnarounds of the problems caused by the "exception-less" execution pipeline of the FC0
are known and explained. Other new exceptions will probably use the same idea of the existing 
exceptions : using a dynamic flag. This way, programming the FC0 looks almost like programming a 
normal RISC CPU with some additional coding rules.