http://f-cpu.seul.org/new/F-CPU_boot.txt created Sat Oct 12 01:15:53 CEST 2002 by whygee@f-cpu.org ************** In case you didn't know ************** F-CPU is a set of specifications that describe a family of microprocessors and their reference implementations. The Freedom CPU project is a community of volunteers that work on defining these specifications, write the reference source code, develop all the necessary files in order for F-CPU to become a serious and long-lasting alternative to the existing microprocessor families. ************** Introduction ************** This file contains a preliminary overview of the mechanisms used by F-CPU for "booting". - It covers BIST-time, monitor-time and kernel setup-time libraries, communications and troubleshooting. - It applies to a single CPU with a minimum number of external devices. Multi-CPU booting is not addressed yet and it must be handled at a higher level. - The goal of this "spec" is to be very simple to understand, to implement and to use on any implementation of F-CPU, whether as a software simulation, FPGA, Full-Custom and whatever the core family (not limited to FC0) - It covers a "least common denominator" way to start an operating system, providing enough features to allow extensions. ************** Requirements ************** This being said, here are the minimum requirements. This should be relatively easy to get, either in software, FPGA or ASIC. - a working F-CPU core (huh, not ready yet) - one or more external RAM interface (typically, SDRAM or any available or necessary technology) - EEPROM (probably FLASH with any necessary controller to handle fine-grained access) The interconnexion is not specified. However the access to the EEPROM must be transparent (to not require any preliminary configuration). In short, to implement this specification, you need only a few components that are easy to find and assemble. A FPGA starter kit's board should contain them and it's also the common parts of a minimum "F-CPU module". Finally, these different implementations should only need a single common debug and troubleshooting tool (this also reduces the coding efforts). ************** boot-time I/O channel ************** Starting an operating system (even minimal) on F-CPU requires 3 steps after power-up and/or reset : - BIST - initialisation from EEPROM - kernel initialisation and startup Before this finishes, there is _no_ way to communicate with outside devices or user, as there is no other device that has been initialised. Support of peripherals is not to be standardised and it is unwanted, as it would bloat the EEPROM and make this spec so complex that it would make it unimplementable. Support of video, disk, keyboard or network must be handled by additional, user-provided software, as these peripherals might not be implemented or can evolve. Yet, the 3 "init" steps require a means to report status and get commands from external tools (when necessary). This can be achieved by mapping a very simple character-based interface in the Special Register ("SR") space. This avoids the use of memory-mapped communication (difficult to track with an external probe, since the F-CPU core uses to cache things a lot) and is straight-forward to implement. It also remains independent from the external architecture and its evolutions. 2 SRs are created : [RO, 2 bits] SR_CONS_STATUS : contains handshaking bits [RW, 8 bits] SR_CONS_DATA : where the CPU reads or puts a byte. The chosen protocol is a handshake with some limited HW support. After BIST is successful, these special registers are reset to 0. The protocol is the same in both direction : the "sender" waits for the "data ready" flag to be cleared, then writes a byte in the data register. This action sets the "data ready" flag. The "receiver" waits for this flag to be set, and reads the data register : this resets the flag to 0. The set and reset are handled in hardware, thus reducing the protocol complexity a bit. Data in SR_CONS_DATA are "multiplexed" (from the core's point of view) : reading SR_CONS_DATA always returns the contents of the receive buffer, writing to it writes to the output buffer. These two buffers are independent and have a single handshake flag each. The two handshake flags are visible from both ends of the channel. | ----|----< DIN | | INTERNAL _____| | DATA BUS | | ----|----> DOUT | From the F-CPU core's point of view, this is used with few instructions : read_char : loopentry r1; define the start of the wait loop get SR_CONS_STATUS, r2; read the handshaking flags andi 1, r2, r2; isolate the "data in ready" flag jmp.0 r2, r1; if nothing ready, try again (ok, i could have used the LSB condition) get SR_CONS_DATA, r2; read the input character (and clear the flag) write_char : loopentry r1; define the start of the wait loop get SR_CONS_STATUS, r2; read the handshaking flags andi 2, r2, r2; isolate the "data out ready" flag jmp.1 r2, r1; if buffer busy, try again put SR_CONS_DATA, r3; write the output character (and set the flag) This code implies that : - SR_CONS_DATA is 8-bit only - bit 0 of SR_CONS_STATUS is the "data in ready" flag and it is set to 1 when there is something to read - bit 1 of SR_CONS_STATUS is the "data out ready" flag and one can write when it is cleared (0). Other behaviours are undetermined and you are not encouraged to play with them (though i guess that this protocol will be enhanced, but it will loose its simplicity). The protocol is roughly the same for the "host", or whatever is connected to the other end of the channel. From there, it is easy to write some more code that handles character buffers like a UNIX console, or whatever. Adding support for a timer will provide asynchronous communications, but it is out of the scope of this spec : the most important goal is that any software can interact with a user, or at least display booting information, before the classical I/O peripherals are initialised by an operating system. The hardware implementation is very simple from the SR side. It provides a 8+8+2-bits interface to the outside world, which can then be transmitted to a host using many kinds of links, including : - "parallel printer port" cable - RS-232 - JTAG - a named pipe or a /dev entry (thus providing a single interface to simulated, emulated or built versions) or it can simply remain disconnected. Otherwise, it provides a simple means to - output boot messages - debug low-level drivers - select a kernel (if a multiboot utility is written) - upload or download kernel images to/from EEPROM - or simply connect a dumb alphanumeric LCD + keypad Now that we can examine and control the CPU's activity, let's proceed to the real stuff : booting to some kernel, or whatever. ************** boot environment ************** Among all the golden rules that are necessary for F-CPU to never suffer from compatibility issues, one is : to never define a fixed memory address map. If there was a definition of a device mapped at a certain address, then the addition of devices would make software and hardware more complex in the future. All device mapping addresses and the control registers are mapped in the Special Register space, which does not communicate with the memory addressing space, thus ensuring fine-grained protection, simplifying the address decoders and keeping the pipelines from complex interactions with some configuration changes. Using SRs for defining the address map also helps when devices are hotswapped, for example. There is one exception, though : the instruction stream must start somewhere in the memory space and it is logical to start at address 0. Some architectures start at 0xFFFFFsomething, but F-CPU pointers have no MSB. Starting at address 0 ensures that any F-CPU compliant core can boot the same code without porting effort. The EEPROM is mapped from address 0x0 and there is no size limit. However the code it contains must know this size. No need to mention that all the protection bits are cleared and all the resources are available to the boot code. Another important fact concerning the booting environment is that upon booting, the core has no other temporary storage location than the register set. The EEPROM is read-only at this time, the cache is probably working but useless, and the external private RAM is not initialised. So the first thing to do, when starting at address 0, is to configure and initialize the RAM controller(s). It depends a lot on the available technology so it won't be described precisely. Let's hope that it is not too complex, though, so the 63 usable registers are enough. The boot code must detect the available SDRAM controllers through the SRs. There can be more than one SDRAM ports and some parts can be performed in parallel (for example when scanning the chips for integrity checking). For each controller, the boot code reads the HW parameters off the SDRAM chips and configures the controller to match these : size, number of banks, wait cycles, interleaving, precharge and most importantly : the base address. If there are several controllers, the base addresses must be contiguous. Then, turn the Dcache off and start writing and reading the RAM to check its integrity. Both the involved SR and the boot code can change in the future so the priority is to keep the interface clean and simple, rather than add more and more definitions. The goal is to have a cachable memory area at the end of this process. However i have mentioned in the introduction that this document doesn't address multi-CPU configuration. But the base address of the private memory areas must not collide with other processors ! There are several workarounds : - include the multi-CPU setup in the boot code ==> this would be superfluous because it's the kernel's job, and the boot code would become overly large. - assign a unique CPU number to each processor in the system (à la SHARC) to compute the base address ==> there would be collisions or holes if the system is not heterogeneous (not the same amount of RAM for all CPUs) and we would like the memory space to be contiguous (all the RAMs form a unique block) - include the memory configuration in the EEPROM (the base address would be computed by the kernel, then written to EEPROM) ==> the system configuration could change between 2 boots and would force to recompute the addresses (though it's unlikely) ==> Another problem is that the boot EEPROM could be used and read by several CPU at a time, the cores can't boot in parallel at the risk of mapping their RAMs to the same address --> boot must be serialised - some inter-CPU communication channel could be created and mapped to the SR ==> the protocol could be too complex and not portable, as it depends on the system's topology and the available HW Choose your camp, according on the system's design and environment. ************** bootstrapping some software ************** Now, the CPU can access the EEPROM and a contiguous area of faster RAM. The most important core's features are configured (IRQs are off, protection is disabled, etc.) and it's time to dig what's left in the EEPROM. All this process can be punctuated by messages sent to SR_CONS_DATA. This means that the EEPROM has some code that knows how to do this, a kind of a library that manages a dumb microconsole. To make life easier, this code can be reused by some other parts of the software remaining in the EEPROM. Another library manages the allocation of blocks in the private RAM. It's a kind of low-level malloc() and free() that can be used by other software, and that can use the microconsole code, for example, to output debug messages. The last library is a set of ROMFS-like low-level handling routines that can read files inside a simplified file system located in the EEPROM. This library requires malloc() functions provided by the memory library in order to load "files" to the RAM before executing binaries from there (the files in the F-ROMFS can be unaligned to save some space, and 256-bit versions will be rather stong about code and data alignment). The FROMFS code has been started already, but is not complete. All these things will certainly require trap handlers during development and debug, for example to catch invalid addresses, invalid opcodes or alignment faults to name a few possible coding errors. They can be removed later but are recommended in case a binary, as run off FROMFS, can contain flaws => the user will be happy to know why the computer hangs. These handlers can be replaced by other code before or when the kernel installs itself. When the RAM initialisation is completed, the code calls a function from the FROMFS library, asking to execute a file off the EEPROM, for example "runme.first". Then, the user's choice prevails. To sum up : the EEPROM contains 5 parts : - the initialisation code (init trap handlers+IRQ..., init SDRAM controller(s), then call FROMFS code) - the message printing/reading library - the memory allocation library - the fromfs library - the fromfs image The 4 first parts are provided by the F-CPU project, as well as some debugging tools and fromfs image manipulation software. They come more or less in that order. Since they are provided by F-CPU, they are "free software" and can be compiled by any user, so the entry points are known and can even be controlled. This means that the addresses of the functions depend on the version of the software, but it's not important because the symbol table can be easily exported and reused during the kernel's compilation. ************** other software ************** The FROMFS specification is described in a different file and it can change anyway, but it is primarily a dumb file system : each "file" is described with an entry in the file table. A name, some attributes, a size and an offset in the image are the minimal properties. From there, the provided FROMFS library can locate, open and seek into a file given its name. Using the malloc library, a file can be loaded into RAM and then executed. This software can in turn execute some other software located in the FROMFS, load data files, allocate more memory, communicate with the user through the microconsole and more importantly : add new features and detect more devices to extend the reach of the software. The first possibility is to simply link a Linux kernel with the existing "libraries". The first messages will be output to the microconsole and the kernel will enumerate all the known devices before redirecting the messages to them. The rest of the story is well known. The same works with microkernels as well. Just name the kernel as "runme.first" or hardlink it. Another possibility is to choose between several kernels or kernel parameters, with software like GRUB or LILO. This would use the provided facilities to select the boot parameters and fetch the correct "file" from the EEPROM. The multiboot utility would be hardlinked to "runme.first" and each kernel image can keep a distinct name. However, in case no "microconsole" is connected, this might be less practical than expected. Some HW detection software, or "device driver", must be installed to allow GRUB to use the screen, the keyboard and any mass storage device. Then the device driver would be named "runme.first" but it is getting a bit complex now ! Some basic command or script interpreter can be programmed to run the desired software, in the order specified in a "file". Another eventuality is to exploit the microconsole as a communication link, and download an image to execute. Though it's a bit slow (the link is not designed for high-speed communication, with a maximum of 1M bit/s) it can spare some FLASH room in a large multi-CPU system or it can be used when developping new kernels (instead of writing to the EEPROM each time). There are certainly other possible uses, it is even possible to design a boot system like these of SPARC or ALPHA, but if this is not needed, it still works. ************** conclusion ************** This specification is very important both for the software and hardware development of the F-CPU project, which is still in its infancy and not completely determined. Defining a minimal "console port" and the necessary SRs is important when designing the core, this specification can influence the existing files but much care is taken to avoid any impact. The definition of the bootstrap procedure is also critical for the development of the first SW tools : simulator, emulator, debuggers, compilers and so much more. Making these tools independent from the target (SW or HW), providing a flexible, powerful and simple interface for booting a CPU, lowers the coding efforts and makes it suitable for more applications, but this keeps the architecture independent from them and can evolve without compatibility issue. Finally, these guidelines are open enough so that sombeody can code and boot whatever software he wants, whether it is a monolithic kernel, a microkernel, a custom application or simply a toy software.