f-cpu/c/jaap.txt
notes about fcpusim by: Jaap Stolk (JWS) jwstolk@yahoo.com
version:
Sun Jul 21 04:12:04 CEST 2002 JWS: updated.
Wed Jul 24 23:49:49 CEST 2002 JWS: updated.

-notes are in no particular order.
-feel free to comment


to do:

- check for posible swaped xbar write, when latency_w0 != latency_w0 ??
- swap port nr / reg nr in fcpusim.png
- add 2x delay in fcpusim.png
- decoder operand output and delayed fetcher operands are the same ?
  (whel, the used registers are the same.)





status: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 - most test_xxxx.c files are broken (but not needed for fcpusim)
 - very limited working of some units
 - the input signals (for the next cycle) and the output signals
   (of the last cycle) of all the units can be viewed
   (exept: the x-bar ports, memory > 130 bytes, etc. )
 - instuctions are now read from (simulated) memory
   (no memory stalls or page system yet)
   memory is loaded from a binary file (use YGASM)
 - ADD instruction works, not mutch else
 - bypass and delayed bypass works 
 - scheduler works, it handles  register and write bus stalls.
 - ... 

compile the simulator:  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 - the files in this snaphot should be all thet is needed to
   compile this f-cpu simulator.
 - run: runme.sh in the /f-cpu/c/fcpusim/  directory

 - if you need to modify the configuration files, 
   you will have to use the scripts from the original YG snapshot

   i'm wiil add the scripts to this snapshot soon...

use the simulator:  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 run: /f-cpu/c/fcpusim/fcpusim <bynary_filename>
 
 you can run it without a binary file, it will leave the memory
 empty, but you can still look around. (and execute NOP's ...)
 or use a nice random file, like a JPG or a GZ file :-) 
 
 the simulator uses a text based interface. (for now).
 you can look at different units by pressing one of the key's
 indicated on the botom of the screen.

 NOTE:  the status is shown AFTER a cycle is completed.
 the shown inputs, are the input values that are going to be used
 in the NEXT cycle. 
 the shown outputs are the valus of the jus finished cycle.

 press <ENTER> to simulate the next cycle.
 (the selected vieuw will be updated)

 press q to quit.

 it shoeld be posible to folow an instuction from memory, till
 the point where the result is saved in the register.

 als the working of the scheduler can be vieuwd.
 (note that the scoreboard and write queue are shown after a
  cycle, thus as they will be used in the next cycle )


 some ramblings about a simulator: 

I seem to have confused the D-latches between the pipeline stages
with  flip-flops, sorry, i'll try and correct it later.

how it works:  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

   see also the /c/fcpusim.png  picture
  
the simulator simulates each unit of the f-cpu. every unit has input and
output values. the simulator runs all units and then
connects all units by copying output values to input values. this
copying-stage simulates the Filp-Flops in the f-cpu. if a unit takes more
than one stage, it has internal Filp-Flops/values as well.

if the pipeline is stalled, some units are stopped, i.e. there input
or output is not cloccked (in the simulation: "copyed")

detecting / counting / visualising pipeline stalls would be handy for
optimising code. (it also needs to show the reason that caused the stall)
this information could even be used for automatic optimisation ( in a
compiler ??)

 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 m = memory
 f = fetcher
 i = inc
 2 = rop2
 a = asu
   = shl
 s = scheduler 
 p = popc
 b = bist
 v = div
   = mul
   = sr's (i really want to use "s" for shuffle and "r" for registers)
 x = xbar
 r = registers ( R/W ports)
   = data unit/mem
 c = code unit/mem
 t = tlb's (data and code?)
 d = decoder

debug interface:
can we use a SR register as serial I/O ?, and connect it to a terminal ?
this way we couls easily print test results.

speed:
i might optimise the code a bit more, but the only reason will be to be able to
test run programs in less than an hour.
this simulator is not intended to be fast, its written to test/try different
configurations of the f-cpu, and find bottlenecks, optimal TLB size, etc.
it will be possible to add/remove optional EU's, an it should be possible to
run it with 128 / 256 bit registers.
olso testing different configurations of the X-bar would be interesting.
(shearing Xbar ports between different EU's) or even experiment with the
number of read/write busses of the Xbar ?

every simulated unit must be cross tested with the corresponding VHDL unit

it should be possible to run the simulator at maximum speed by not selecting
any particular status view. also run until next call/ret would be nice.
(or even put breakpoints on the use of register / memory / units / etc)

the copying stage that simulates the ff might be removed, if the units are
run in the correct order, but at some point (xbar?) they are still needed
to close the loop.

the actual f-cpu has no ff for the register unit, i change the simulation to
work the same way.

it would be nice to mix C and VHDL units, this could be done if the VHDL units
read/write there input/output ff from/to files that are read by the simulator.

a "save_state" function might be nice, if its saved in a compact but readable
text format, it could be e-mailed, and someone else could look at that
situation? ( -> copy past the terminal text would do the same job ?)

what programs i would like to run on this simulator:
- things like the Winograd DCT algorithm, and other critical routines
  i.e. things that today's (and tomorrows) programs spend most cycles on
- L4 (or other micro kernels)
- programs to test different IRQ I/O designs
- 

i could add a history buffer (ff status for the last 100 cycles), so we
could trace backward and find the cause of a pipeline stall its.
(when walking back, the ff states are only changed, the units are not run)

the simulator needs to show the ASM (and C?) code that is executed, as well as
(parts of) memory.

would it be possible to add a <script> </script> tag to the c files and turn
it into an on-line simulator ?? (i.e. type a few binary instructions and see
them flow threw the f-cpu ?).

at some point the simulator needs to be connected to (simulated?) I/O
to start with: a serial port (for console).

simulator also needs to test the power up (random start) sequence!.
this should be done by the BIST unit.

also show a TLB's status screen.

