Easily the most controversial and misunderstood feature in the F-CPU (proposed) architecture, is, to date, the choice of a memory-to-memory architecture. Nevertheless, it is simpler to think of the F-CPU in terms of a standard RISC machine - without registers!
"What? No registers? Is this a joke?" or "Well, if it's not a RISC CPU, it will perform like a dog." are the typical remarks. Some people with more experience notice that the TMS9900 microprocessor (one of the first 16-bit microprocessors) was a memory-to-memory machine, but they too observe that this was in the good old days were CPUs were slow and DRAM was fast. Today, we have exactly the opposite: CPUs are very fast and DRAM is slow.
We know all that.
Now please turn to page C-0 of H&P's Quantitative Approach. It says:
"RISC: any computer announced after 1985.", quoting Steven Przybylski, a designer of the Stanford MIPS. Why do H&P begin their description of various RISC CPUs with this humorous remark? Because it's true: you can't design a microprocessor architecture nowadays without taking into account the work done by the early RISC pioneers.
The term m2m/RISC was coined to describe the F-CPU architecture, so lets see how it differs from a standard RISC.
The rationale behind the concept of registers is, as we know, that of a memory hierarchy. This concept is discussed extensively in H&P, section 1.7. Each level of storage has smaller sizes, faster access times, greater bandwidth and smaller latencies as we go from disk storage to main memory, to L3 cache (if any), to L2 cache (if any), to L1 cache (all modern CPUs have this), to registers (all modern CPUs have these).
The sole memory-to-memory microprocessor architecture known to the authors was the Texas Instruments TMS9900. The idea sounds counter-intuitive given H&P description of memory hierarchies in modern RISC CPUs.
However, the choice of a m2m/RISC architecture stems from a few observations of modern CPU designs:
The design objectives of the Freedom CPU architecture WRT its data path organization were:
The proposed Freedom architecture has banks of virtual registers, which are used to shadow (earlier on we used the terms to cache or to mirror, but these led to confusions and are now deprecated) 256-byte regions of memory. How does that work and why is it good?
The idea is really very simple: each memory window (virtual register bank) has an associated 64-bit special register which defines the (virtual, as opposed to physical) base address in memory of a block of 32 (64-bit) words. Well call this register MWBn, where 0 <= n <= (number of windows of the implementation-1). The MWB registers are control registers, which means their value can only be changed in supervisor mode.
A single MW (memory window) is active at any time, meaning that a user program only ever sees 32 memory addresses that can be directly operated upon (other memory addresses must be accessed indirectly). Well call the virtual registers in the active memory window VR0 - VR31. VR0 is always set to 0, following standard RISC tradition.
The active memory window is defined by the least significant bits in register AMW, which is one of the control registers
MW0 is reserved for the operating system kernel. Switching between memory windows (we call this a context switch) is achieved by changing the value of AMW. Since AMW is one of the control registers, it is only accessible to the OS kernel.
Whenever MWBn is loaded with a new value (a new base address), a 256-byte block is brought in from memory into the corresponding 32 internal VRs, and the previous values in the VRs are simultaneously written back to the 256 byte block (memory window) previously pointed to. This we define as a process switch. Note that the OS kernel must also save paging information during a process switch. Well see how this can be achieved.
After a process switch, the contents of the VRs are _not_ maintained coherent with their corresponding memory addresses anymore.
So, as can be seen, each register set is in fact a "memory window", in the sense that its contents reflect the state of a memory block.
How does the hardware work? Just like any RISC CPU. Are there additional delays to use any of the internal virtual registers? No. Is this slower in any way compared to a RISC? No. Does this add complexity to the data path, compared to a RISC? No.
What is gained, compared to a standard RISC? Mainly lower context switch latencies: assuming the OS wants to switch from one MW to another, all it takes is to load a new value in the corresponding bits in the AMW register. This can be done in a single clock cycle, and obviously has a very low latency, compared to saving/restoring 32 registers.
For the compiler, the F-CPU is very much like many RISC CPUs: 32 general purpose registers (in reality special memory locations).
From the software point of view, the F-CPU is a pure memory-to-memory machine. From the hardware point of view, the F-CPU is a standard RISC machine with (32 x n) registers, where n is the number of register sets (32 in the F1 CPU implementation), and each register set is made up of 32 registers. The F-CPU doesnt have a flags register. A mechanism similar to the DLX/MIPS is used to test for condition codes.
The only conventional register is the 64-bit Program Counter register. Interestingly, during a context switch, the value of the PC is saved in VR0. Context switches can be achieved without any use of a stack to save the CPU state.