Part 1 :
1.1 Description of the F-CPU project
1.2 Frequently Asked Questions
1.2.1 Philosophy
1.2.2 Tools
1.2.3 Architecture
1.2.4 Performance
1.2.5 Compatibility
1.2.6 Cost/Price/Purchasing
1.3 The genesis of the F-CPU Project
1 History
2 The Freedom GNU/GPL'ed architecture
3 Developing the Freedom architecture : issues and challenges
4 Tools
5 Conclusion
6 Appendix A Ideas for a GPL'ed 64-bit high performance processor design
7 Appendix B Freedom-F1 die area / cost / packaging physical characteristics
8 Appendix C Legal issues / financial issues
1.4 A bit of F-CPU history
1.4.1 M2M
1.4.2 TTA
1.4.3 Traditional RISC
1.5 The design constraints
1.6 The project's roadmap
The F-CPU group is one of the many projects that try to follow the example shown by the GNU/Linux project, which proved that non-commercial products can surpass expensive and proprietary products. The F-CPU group tries to apply this "recipe" to the Hardware and Computer Design world, starting with the "holly grail" of any computer architect : the microprocessor.
This utopic project was only a dream at the beginning but after two group splits and much efforts, we have come to a rather stable ground for a really scalable and clean architecture without sacrificing the perforance. Let's hope that the third attempt is the good one and that a prototype will be created anytime soon.
The F-CPU project can be split into several (approximative and not exhaustive) parts or layers that provide compatibility and interoperability during the life of the project (from HardWare to SoftWare) :
- F-CPU Peripherals and Interfaces (bus, chipset, bridges...)
- F-CPU Core Implementations (individual chips, or revisions) [for example, F1, F2, F3...]
- F-CPU Cores (generations, or families) [for example, FC0, FC1, etc]
- F-CPU Instruction Set and User-visible ressources
- F-CPU Application Binary Interface
- Operating System (aimed at Linux-likes)
- Drivers
- Applications
Any layer depends directly or indirectly from any other. The most important part is the Instruction Set Architecture, because it can't be changed at will and it is not a material part that can evolve when the technology/cost ratio change. On the other hand, the hardware must provide binary compatibility but the constraints are less important. That is why the instructions should run on a wide range of processor microarchitectures, or "CPU cores" that can be changed or swapped when the budget changes.
Any core family will be binary compatible with each other and execute the same applications, run under the same operating systems and deliver the same results with different instruction scheduling rules, special registers, prices and performances. Each core family can be implemented in several "flavours" like a different number of instructions executed by cycle, different memory sizes, different word sizes, but the software should directly benefit from these features without (much) changes.
This document is a study and working basis for the definition of the F-CPU architecture, aimed at prototyping and first commercial chip generation (codenamed "F1"). This document explains the architectural and technical backgrounds that led to the current state of the "FC0" core as to reduce the amount of basic discussions on the mailing list and introduce the newcomers (or those who come back from vacations) to the most recent concepts that have been discussed.
This manual describes the F-CPU family through its first implementation and core. The FC0 core is not exclusive to the F-CPU project, which can and will use other cores as the project grows and mutates. The FC0 core can be used for almost any similar RISC architecture with some adaptations.
The document will (hopefully) evolve rapidly and incorporate more and more advanced discussions and techniques. This is not a definitive manual, it is open to any modification that the mailing list agrees to make. It is not exhaustive either, and may lag as the personal free time fluctuates. You are very encouraged to contribute to the discussion, because nobody will do it for you.
Some development rules :
Last modified : 31/05/99
modified by Whygee, 9/11/1999
Q1 : What does the F in F-CPU stand for ?
A : It stands for Freedom, which is the original name of the architecture, or Free, in the GNU/GPL sense.
The F does not stand for free in a monetary sense. You will have to pay for the F1 chip, just as you have to pay nowadays for a copy of a GNU/Linux distribution on CD-ROMs. Of course, you're free to take the design and masks to your favorite fab and have a few batches manufactured for your own use.
Q2 : Why not call it an O-CPU (where O stands for Open) ?
A : There are some fundamental philosophical differences between the Open Source movement and the original Free Software movement. We abide by the latter, hence the F.
The fact that a piece of code is labeled Open Source doesn't mean that your freedom to use it, understand it and improve upon it is guaranteed. Further discussion of these matters can be found here.
A licence similar to the GPL (GNU Plublic Licence from the Free Software Foundation) is in creation. Yet, in the absence of a definitive licence that is adapted to the "hardware Intellectual Property", you can read the GPL by replacing the word "software" with the word "Intellectual Property". Specifically, there are at least three levels of freedom that must be preserved at any cost :
- Freedom to use the Intellectual Property : no restriction must exist to use the IP of the F-CPU project. This means, no fee to access the data and ALL the necessary informations to recreate a chip.
- Freedom to reverse-engineer, understand and modify the Intellectual Property at will.
- Freedom to redistribute the IP.
This is NOT public domain. The F-CPU group owns the IP that it produces. It chooses to make it freely available to anybody by any means. Every file or hardware generated from the description files and the Intellectual Property of the F-CPU team keeps the copyright of the F-CPU team. You can read more about it at htpp://www.gnu.org.
Q1 : Which EDA tools will you use ?
A : There has been a lot of debate on this subject. It's mainly a war between Verilog and VHDL. We'll probably use a combination of both.
We will first begin with software architecture simulators written in C (++). We could also use some new "free" EDA tools that are appearing. The use of Alliance (http://www-asim.lip6.fr/alliance/) is considered. We'll have to use commercial products at one point or another because the chip makers use proprietary software. In any case, a pen, paper and a brain help.
Q1 : What's that memory-to-memory architecture I heard about ? Or this TTA engine ? Why not a register-to-register architecture like all other RISC processors ?
A : M2M was an idea that was discussed for the F-CPU at its beginning. It had several believed advantages over register-to-register architectures, like very low context switching latency (no registers to save and restore).
YG : That's what they thought. The SRB mechanism included in the FC0 solves this problem.
TTA is another architecture that was explored before the current design (FC0) started.
Q2 : You're thinking about an external FPU ?
A : Maybe.
No. Bandwidth and pin count problems.
Q3 : Why don't you support SMP ?
A : SMP is an Intel-proprietary implementation of Symmetric Multi-Processing.
We'll probably try. If not in F1, in F2 :).
The "F1" will be like a "proof of concept" chip. It will not even support IEEE floating point numbers, so we can't support a classical SMT system from the beginning. Anyway, memory coherency will be enforced on the F1 with an OS-based paging mechanism where only one chip at a time in a system can cache a page : this avoids the bus snoops and the waste of bandwidth. Anyway, multi-processing is not exactly a problem with the CPU core but it depends on the implementor's needs and budget, so it is currently left out of the design scope. See the roadmap for further details.
Q1 : What can we expect in terms of performance from the F1 CPU ?
A : Merced-killer. : -). No seriously, we hope to get some serious performance.
We think we can achieve good performance because we start from scratch (x86 is slower because it has to be compatible with older models). We're intend to have gcc/egcs as the main compiler for the F-CPU and port Linux too.
LINUX and GCC are not the best garanties for performance in themselves. For example, GCC doesn't handle SIMD data. We will certainly create a compiler that is more adapted to the F-CPU and GCC will be used as a "bootstrap" at the beginning. The ongoing work on GNL will probably allow developpers to create better code than what GCC would ever do.
Objectively, the FC0 core family is aimed to achieve the best MOPS/MIPS ratio possible, around 1 (and maybe a bit more). The superpipeline garanties that the best clock frequency is reached for any silicon technology. The memory bandwidth can be virtually increased with different hint strategies. So we can predict that a 100MHz chip with 1 instruction decoded at each cycle can easily achieve 100 million operations per second. Which is not bad at all because you can achieve that with an "old" (cheap) silicon technology that couldn't achieve 100MOPS with a x86 architecture. Add to that unconstrained SIMD data width, and you get a picture of the peak MOPS it can reach. If you really want screaming numbers, with a 64-bit version, SIMD operations on bytes leads to 8 operations per cycle, or 800MOPS peak.
Q1 : Will the F-CPU be compatible with x86 ?
A : No.
There will be NO binary compatibility between the F-CPU and x86 processors.
It should however run Windows emulators that include x86 CPU emulators such as Twin, as well as Windows itself under whole-PC emulators such as Bochs. In either case however you will need to run another operating system, such as GNU/Linux, and emulation will likely be fairly slow.
And what would be the point of using Windblows when you can run Linux/FreeBSD instead ? ;-D
Q2 : Will I be able to plug the F-CPU in a standard Socket 7, Super 7, Slot 1, Slot 2, Slot A motherboard ?
A : It's an ongoing debate.
Great chances are that no early version of the F-CPU will be available for Socket7 or x86 mother boards.
Reason 1 : BIOS should be rewritten, the chipsets should be analysed, and there are way too many chipsets/motherboards combinations around.
Reason 2 : socket/pins/bandwidth : the x86 chips are really "memory bound", the bandwidth is too low, some pins are not useful for a non-x86 chip and supporting all the functions of the x86 interface will make the chip (its design and debugging) too complex.
Reason 3 : we don't want to pay the fees for the use of proprietary slots.
ALPHA- or MIPS-like slots will probably be supported, we might include an EV-4 interface to the F-CPU standard. Anyway, a custom socket and interface will avoid any compatibility and misunderstanding problem.
Q3 : What OS kernels will the F-CPU support?
A : Linux will be ported first. Other ports may follow. The port of Linux will be developed simultaneously with the F-CPU development.
But first we must have a working software development tool that simulates the architecture and creates binaries, so we must first define the F-CPU...
Q4 : What programs will I be able to run on the F-CPU ?
A : We will port gcc/egcs to the F architecture. Basically the F-CPU will run all the software available for a standard GNU/Linux distribution.
GCC is not perfectly adapted to fifth generation CPUs. We will probably adapt it for the F-CPU but making a GCC backend will be enough to compile LINUX/whatever. GNL will allow anyone to write better and faster code.
Q1 : Will I be able to buy a F-CPU someday ?
A : We hope so.
that's all the point of the project, but be patient and take part of the discussions !
Q2 : How much will the F-CPU cost ?
A : We don't know. It depends on how many are made.
There was an early slightly optimistic estimate that an F-CPU would cost approximately US$100, if 10000 were made.
This also depends on a lot of factors like the desired performance, the size of the cache memory, the number of pins, and most of all, the possibility to combine all these factors in the available technology. The latest estimations for a first limited version gave around $60 each for a batch of 1K ASIC.
A lot of things have happened since the following document was written. The motivation has not changed though, and the method is still the same. The original authors are unreachable now but we have kept on working more and more seriously on the project. At the time of writing, several questions asked in the following text have been answered, but now that the group is structuring itself, the other questions become more important because we really have to face them : it's not utopy anymore, the fiction slowly becomes reality.
Don't forget too that the technical features that are described here are NOT realistic and don't correspond to anything real. This was more a dream than a coherent analysis.
The first generation was a "memory to memory" (M2M) architecture that disapeared with the original F-CPU team members. It was believed that context switch time consumed much time, so they mapped memory regions to the register set, as to switch the registers by changing the base register. I have not tracked down the reasons why this has been abandonned, I came later in the group. Anyway, they launched the F-CPU project, with the goals that we now know, and the dream to create a "Merced Killer". Actually, i believe that we should compete with the ALPHA directly ;-)
The second generation was a "Transfer Triggered Architecture" (TTA) where the computations are triggered by transfers between the different execution units. The instructions mainly consist of the source and destination "register" numbers, which can also be the input or output ports of the execution units. As soon as the needed input ports are written to, the operation is performed and the result is readable on the output port. This architecture has been promoted by the anonymous AlphaRISC, now known as AlphaGhost. He has done a lot of work on it but he has left the list and the group lost track of the project without him.
Brian Fuhs explained TTA on the mailing list this way :
TTA stands for Transfer-Triggered Architecture. The basic idea is that you don't tell tell the CPU what to do with your data, you tell it where to put it. Then, by putting your data in the right places, you magically end up with new data in other places that consists of some operation performed on your old data. Whereas in a traditional OTA (operation-triggered architecture) machine, you might say ADD R3, R1, R2, in a TTA you would say MOV R1, add; MOV R2, add; MOV add, R3. The focus of the instruction set (if you can call it that, since a TTA would only have one instruction : MOV) is on the data itself, as opposed to the operations you are performing on that data. You specify only addresses, then map addresses to functions like ADD or DIV.
That's the basic idea. I should start by specifying that I'm focusing on general processing here, and temporarily ignoring things like interrupts. It is possible to handle real-world cases like that, since people have already done so; for now, I'm more interested in the theory Any CPU pipeline can be broken down into three basic stages : fetch and decode, execute, and store. Garbage in, garbage processing, garbage out. :). With OTAs this is all done in hardware. You say ADD R3, R1, R2, and the hardware does the rest. It handles internal communication devices to get data from R1 and R2 to the input of the adder, lets the adder do its thing, then gets the data from the output of the adder back into the register file, in R3. In most modern architectures, it checks for hazards, forwards data so the rest of the pipeline can use it earlier, and might even do more complicated things like reordering instructions. The software only knows 32 bits; the hardware does everything else.
The IF/ID stage of a TTA is very different. All of the burden is placed on software. The instruction is not specified as ADD (something), but as a series of SRC, DEST address pairs. All the hardware needs to do is control internal busses to get the data where it is supposed to go. All verification of hazards, optimal instruction order, etc should be done by the compiler. The key here is that a TTA, to achieve IPC measures comparable to an OTA, must be VLIW : you MUST be able to specify multiple moves in a single cycle, so that you can move all of your source data to the appropriate places, and still move the results back to your register file (or wherever you want them to go). In summary, to do an ADD R3, R1, R2, the hardware will do the following :
TTA OTA --------------------------------------------------------------------- MOV R1, add ADD R3, R1, R2 Move R1->adder Check for hazards MOV R2, add Check for available adder Move R2->adder Select internal busses and move data (adder now does its thing in both cases) MOV add, R3 Check for hazards Move adder->R3 Schedule instruction for retire Select internal busses and move data Retire instruction
The compiler, of course, becomes much more complicated, because it has to do all of the scheduling work, at compile time. But the hardware in a TTA doesn't need to worry about much of anything... About all it does in the simple cases is fetch instructions and generate control signals for all of the busses.
Execution is identical between TTA and OTA. Crunch the bits. Period.
Instruction completion is again simplified in a TTA. If you want correct behavior, make sure your compiler will generate the right sequence of moves. This is compared to a OTA, where you at least have to figure out what write ports to use, etc.
Basically, a TTA and an OTA are functionally identical. The main differences are that a TTA pretty much has to be VLIW, and requires more of the compiler. However, if the "smart compiler and dumb machine" philosophy is really the way to go, TTA should rule. It exposes more of the pipeline to software, reducing the hardware needed and giving the compiler more room to optimize. Of course, there are issues, like code bloat and constant generation, but these can be covered later. The basic ideas have been covered here (albeit in a somewhat rambling fashion... I had this email all composed in my head, and had some very clear explanations, right up until I sat down and started typing). For more information see http://www.cs.uregina.ca/~bayko/design/design.html and http://cardit.et.tudelft.nl/MOVE. These two have a lot more informatin on the details of TTA; I'm still hopeful that we can pull one of these off, and I think it would be good for performance, generality, cost, and simplicity. Plus, it's revolutionary enough that it might turn some heads - and that might get us more of a user (and developer) base, and make the project much more stable.
Send me questions, I know there will be plenty...
Brian
If you want to understand further the TTA concept, the difference is in the philosophy, it's as if you had instructions to code a dataflow machine on-the-fly. Notice also the fact that less registers are needed : registers are required to store the temporary results of operations between instructions of a code sequence. Here, the results are directly stored by the units, there are less "temporary storage" needed, less register pressure.
To envision this difference, think about a data dependency graph : in OTA, an instruction is a node, while in TTA the mov instruction is the branch. Once this is understood, there's not much work to do on an existing (yet simple) compiler to generate TTA instructions.
Let's examine : S = (a+b) * (c-d) for example. a, b, c, d are known "ports", registers or TTA addresses.
a b c d 1\ /2 3\ /4 + - 5\ /6 \ / * |7 S
In TTA there is one "port" in each unit for each incoming branch. This means that ADD, having two operands, has two ports. There is one result port, which uses the address of one port, but that is used as read, not write. Another detail is that this read port can be static : it holds the result until another operation is triggered. We can code
mv ADD1,a
mv SUB1,c
mv ADD2,b (this triggers the a+b operation)
mv SUB2,d (this triggers the c-d operation)
mv MUL1,ADD
mv MUL2,SUB (this triggers the * operation)
TTA is not "better", it's not "worse", it's just completely different while the problem will always be the same. If the instructions are 16 bit wide, it takes 96 bits, just as the OTA example would do. In some cases, it can be better as it was shown long ago on the list. TTA has some interesting properties, but unfortunately, in the very near future, it's not probable that a TTA will enter inside a big computer as RISC or CISC do. A TTA core can be as efficient as the ARM core, for example, it suits well to this scale of die size, but too few studies have been made, compared to the existing studies on OTA. Because the solution of its scaling up are not (yet) known, this leads to the discussions that shaked the mailing list near december 1998 : the problem of where to map the registers, how would the ports be remapped on the fly, etc. When additional instructions are needed, this jeopardizes the whole balance of the CPU and evolutivity is more constraining than for RISC or OTA in general.
The physical problem of the busses has also been raised : if we have say 8 buses of 64 bits, this makes 512 wires, it takes around one millimeter of width with a .5u process. Of course, we can use a crossbar instead.
As discussed a few times long ago, because of its scalability problems (assignation of the ports and its flexibility), TTA is not the perfect choice for a long-lasting CPU family, while its performance/complexity ratio is good. So, it would be possible that the F-CPU team makes a RISC-> TTA translator in front of a TTA core that would not have most of the scalability problems. This would be called the "FC1" (FC0 is the RISC core). Of course, time will show how the TTA ghosts of the F-CPU group will change.
But TTA's problem is probably that it is too specialized, where OTA can change its core and still use the same binaries. It's one of the points that "killed" the previous F-CPU attempt. Each TTA implementation could not be completely compatible with another, because of the instruction format, of the assignation of the "port" and other similar details : the notion of "instruction" is bound with the notion of "register".
I am not trying to prove the advantage of one technique over another, i am trying to show the difference of point of view, that finally treats the same problem. The scalability, that is necessary for such a project, is more important than we thought, and the group finally showed interest for a more classical technology.
The third generation rose from the mailing list members who naturally studied a basic RISC architecture, like the first generation MIPS processors or the DLX described by Patterson & Hennessy, the MMIX, the MISC CPUs, and other similar, simple projects. From a simple RISC project, the design grew in complexity and won independence from other existing architectures, mainly because of the lessons learnt from their history and the specific needs of the group, which led to adapted choices and particular characteristics. This is what we will discuss in the next parts of this document.
The F-CPU group is rather heterogeneous but each member has the same hope that the project will come true, because we are convinced that it is not impossible and therefore feasible. Let's remember the Freedom CPU Project Constitution :
" To develop and make freely available an architecture, and all other intellectual property necessary to fabricate one or more implementations of that architecture, with the following priorities, in decreasing order of importance: 1. Versatility and usefulness in as wide a range of applications as possible 2. Performance, emphasizing user-level parallelism and derived through intelligent architecture rather than advanced silicon process 3. Architecture lifespan and forward compatibility 4. Cost, including monetary and thermal considerations "
We could add as goal #5 : be successful !
This text sums up a lot of aspects of the project : this is "free intellectual property", meaning that anybody can make money with it without worrying, as long as the product complies with the general rules and standards, and all the characteristics are freely available (similarly to the GNU Public Licence). Just like the LINUX OS project, the team members hope that the free availability of this Intellectual Property will benefit everybody by reducing the cost of the products (since most of the intellectual work is already performed), by providing an open and flexible standard that anyone can influence at will without signing a NDA. It is also the testbench of new techniques and the "first CPU" for a lot of "hobbyists" that can build it easily at home. Of course, the other expected result is that the F-CPU will be used in everybody's home computer as well as by the other specialized markets (embedded/real time, portable/wearable computers, parallel machines for scientific number crunching...).
In this situation, it is clear that one chip does not fit all needs. There are economic constraints that also influence the technologic decisions, and everybody can't access the most advanced silicon fabrication units. The reality of the F-CPU "for and by everybody" is more in the realm of the reconfigurable FPGAs, the low-cost sea-of-gates and ASICs that are fabricated in low volumes. Even though the final goal is to use full-custom technologies, there is a strong limitation for the prototyping and the low-volume quantities. The complexity is limited for the early generations and FC0, the estimated transistor count for the first chips would be 1 Million, including some cache memory. This is rather tight, compared to the current CPUs but it's huge if one remembers the ARM core or the early RISC CPUs.
The "Intellectual Property" will be available as VHDL or VERILOG files that anyone can read, compile and modify. A schematic view is also often needed to understand the function of a circuit at the first sight. The processor will therefore exist more in the form of a software description than a hardware circuit. This will help the processor families to evolve faster and better than other commercial ones, and this polymorphism will garantee that anyone finds the core needed in any case. And since the development software will be common to all the chips, freely available through the GPL, porting any software to any platform will be eased to the maximum.
The interoperability of the software on any member of the family is a very strong contraint, and probably the most important design rule of the project : "NO RESSOURCE MUST BE BOUND". This led to create a CPU with an "undetermined" data width. A F-CPU chip can implement a characteristic datawidth of any size above 32 bits. Portable software will respect some simple rules so that it will run as fast as the chip can, independently from algorithmic considerations. In fact, the speed of a certain CPU is determined by the economic constraints, and the designer will build a CPU as wide as the budget and the technology allow. This way, there is no other "roadmap" than the user's needs, since he is its own funder. The project is not bound by technology and is flexible enough to last... as long as we want.
Here are the steps that the project intends to follow in the near future. There is NO SCHEDULE as this is a naturally growing project, not a commercially oriented product ; we are more concerned by the pertinence and efficiency of the chip than time-to-market, and several "coopetitors" can change the priorities of the F-CPU team. The following milestones are very important though and show that this is an EVOLUTIVE project rather than a truely ground-breaking utopia.
Generation : | Prototype | Pre-series | Commercial class | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Codename : | "POC" : Proof Of Concept | "TOY" : need i say more ? : -) | F1, F2, F3... other nicknames will be found (and trademarked)Goal : | Have a "chip" that can be shown / demonstrated at trade shows / conferences, make the FC0 core work, test it, explore the memory interface and its performance impact, make a first chip that works, prove the initial architectural assumptions, prove that the F-CPU concept is possible. | It is NOT intended to be a commercial chip because of the very limited functions it provides. Other scaled-down F-CPUs should be derived from more advanced designs, starting with the "commercial class". Provide the first users with an advanced, yet limited platform for testing the F-CPU for real. Allow people to write real-world software and have experience with the instruction set and the programming habits, in order to further modify the instruction set and the architecture for the commercial class. | It is not either a design from which other architectures should be directly derived. The goal is to reassign the opcode map and learn to design ASICs, as well as to gain publicity / press coverage / hype. Define a hardware platform from which other pin-compatible chips can be derived. The "motherboard" and the I/O interfaces should give as much free space as possible for further enhancements. "Coopetitors" will have a common ground from which to develop efficient chips. The main problem being the memory bandwidth, the memory interface will be VERY wide as to keep the following chips from being memory-starved. | At that time, a first stable version of the reference architecture will be officially released. It will then evolve naturally. Technology : | CMP / Europractice / Chip Express / ATMEL / HITACHI | depending on the opportunities and available budget. Probably 0.35u, 5V. Chip Express / ATMEL / HITACHI | depending on the opportunities and available budget. Probably 0.35u or 0.25u, mixed 5V/3.3V ? | Depends on anyone's whims... Speed : | ? | Sorry, my cristal ball coredumped... One of the fun stuffs to do is to clock it with an external PLL. Since the memory will be asynchronous to the core, we will be able to test the capacity of the core to stand very high and low working frequencies. I have absolutely no idea of the frequencies we can get this way, probably more than a few hundreds of MHz. ? | At least, more than the proto. ? | If you have a good fan, why not 1GHz ?... Number : | Half a dozen | A few hundreds or thousands | A lot more ! | Price : | Very expensive (low volume prototyping), not for sale. Thousands of dollars each. | Rather expensive compared to a CPU of the same class, if you forget that you have the sources... provided with full-custom "motherboard", hundreds of dollars without SDRAM (and fan ?). | Competitive (high volume, consumer market), maybe around $50 or $100 alone (without PCB and fan ;-D). | Word size : | 64 | 64 | 64 or more | (any power of 2, above 32 bits) Package : | PGA 144 | PGA 299 or Socket 7-like (315 pins, intersticial spacing) if possible (NOT compatible with PC motherboards, the problem being : to find a decent and cheap Zero-Insertion Force (ZIF) socket). | PGA or BGA, more than 500 pins. | Memory addressing Range : | logical : 64 bits | physical : 20 (+5) bits (economic...) logical : 64 bits | physical : 32 (+5) bits offchip + 4xSDRAM slots (mux[10+12] (+5) =27 bits) of private memory (comfortable...) logical : 64 bits or more | physical : 64 (+5) bits offchip, 4 or 8 SDRAM slots (28 bits) (ready to replace the ALPHAs in the CRAYs...) External memory bus widths : | 64 bits (private asynchronous SRAM) | + 8 bits (debug port) 128b + 16ECC for private SDRAM, | 32 of multiplexed + bursted + asynchronous "I/O" bus (memory-mapped) 256 bits + 32 ECC of external memory bus (DDR-SDRAM ?) | + 64 bits of memory-mapped "IO" (multiplexed, bursted, asynchronous) JTAG / onsite debug : | custom byte-wide interface | JTAG (or similar) | + I/O bus (used as quick examination / debug port) JTAG + I/O port | Cache : | Onchip data + instruction, 2KB each. | Onchip data + instruction, 4 or 8 KB each. | Onchip data + instruction, 8 KB or more each. | External cache : data bus shared with the SDRAM, onchip TAG SRAM. Instructions per cycle : | 1 | 1 | more | Core : | FC0 | FC0 | FC0 and others | Lifespan : | Short (months) | Short (not more than a few years) | Much longer :-) | Evolutivity / compatibility : | None (proto) | None | Yes | "Motherboard" (CPU module) : | Breadboard or 2-layers PCB, interface with ISA bus or similar. | High-quality 5-layers PCB | + home-made (breadboard or 1-layer) I/O PCB High-quality, high volume production-class PCB | + I/O+intercom+EEPROM PCB (A PCI, AGP, IDE/SCSI bridge will be needed) Target / Users : | F-CPU team and advance users | Programers / Developpers / Advanced Integrators | Anybody (>10yo) | |
We hope that this table answers most of your questions. If not, do NOT hesitate to ask.