Processor Overview

The heart of the microcomputer is the microprocessor unit. The microprocessor unit contains the microprocessor itself, as well as related memory subsystems. The design of the microprocessor was extremely critical, and went through many revisions until a suitable design was developed. Many different considerations were taken into account: power, flexibility, and simplicity of construction.

In order to fully illustrate the design trade-offs made, it is beneficial to review both the final design and those that preceded it. The primary goal was simplicity of construction; modern processors incorporate millions of transistors, a degree of complexity impossible to rival with the low and moderate-scale integration devices available. A simpler design had to be developed.

First considered was an extremely simple Reduced Instruction Set (RISC) design. Since memory was at a premium (and memory throughput even more so, due to the 8 bit data bus), such a design should be limited to at most 16 bit instructions; 8 would be better. With a significant number of these bits reserved for register addresses (at least two register addresses are required, each requiring several bits), very few bits remain to encode which operation should be conducted upon those registers. Complex instructions would have to be forgone, in favor of simpler instructions that could be used in groups to provide desired functionality-which requires additional memory. Further, a large memory space is desirable (at least 16 bits, corresponding to 64K), meaning that even a 16bit instruction cannot fully encode an address as well as an instruction to perform on that address. Thus, with memory at such a premium, a constant instruction-width is difficult to work around. Another drawback is that in a conventional RISC design, while simple in structure, has a large number of relatively wide data paths, requiring extensive, laborious and error-prone wiring.

The obvious modification is to move to a variable instruction width. This is consistent with a Complex Instruction Set (CISC) design. A clear advantage is the ability to specify addresses for an arbitrarily large address space and provide a rich and powerful instruction set while conserving memory. Unfortunately, a CISC design is rather complicated and involves the intimate functioning of several interrelated units. Combating this is the potential elimination of extensive wiring through use of common busses. A design of this type was pursued for this project.

Processor Design

The processor consists of a handful of subunits which act in exquisitely coordinated concert (see Figure 1.) As is typical of a CISC design, an instruction from the program maps to a subroutine of microcode. In this processor, each instruction is mapped to a section of 16 microinstructions, though it may only use a fraction of this. An incoming instruction from memory, always encoded as a single byte, is multiplied by 16 to yield the microcode ROM address. The microcode can read any arguments to the instruction that it requires. This microprogram counter (rompc) is incremented every clock cycle. To load in the next instruction, the instruction is loaded in (multiplied by 16) and the lower 4 bits, the phase, are cleared. Unlike some more complex designs, the microcode has no branching, jumping, or looping facilities; it merely scripts the flow of information. The instructions available to applications, implemented by the microcode, are listed and described in Appendix J. The microcode listing itself is in Appendix K.

Three ROMS are addressed with the ROMPC, amounting to a total of 24 control bits per cycle. Each of these bits is used as an assert line. These lines regulate all of the muxes, tristate enables, and mode settings (e.g., which ALU function to perform). It was a design goal to minimize the number of needed control lines in order to simplify wiring. To do this required compact and non-redundant signals; for example, for a bus with two devices on it, it is less efficient to have a tristate enable for each than it is to use a single line which selects between the two. This occasionally requires extra external logic, but as it turns out, many of the devices used are implemented in programmable array logic (PAL) devices and the additional logic could be programmed into the device with no external logic.

A program counter is used to store the location in memory which holds the next byte. The program counter (PC) is completely separate from the microcode ROMPC. The program counter, on any given cycle, has the ability to hold its present value, increment, or load a new value in from the address bus, a 16 bit bus. The other main device on the address bus is the effective address register which is used to feed the memory addresses separate from the instruction stream. For instance, a load from memory instruction sets the effective address registers to the desired memory location, and then the memory reads from the effective address register. The PC, meanwhile, is unaffected and is ready to fetch the next instruction. There is one last device on the address bus, which only provides a facility for putting an address onto the databus. Some instructions, like branches and call-subroutine, need the current PC. However, since the address bus is 16 bits and the data bus is only 8, only half the address bus can be put on the data bus during any cycle. This is implemented through a pair of three-state drivers, each given half of the address bus, which can be controlled independently.

Processor Functional Block Diagram

Several arithmetic operations can be performed on data, through the ALU sub-unit. Since the data bus is only 8 bits, there is not enough room for two arguments simultaneously. Therefore, the ALU has two argument registers named A and B. The ALU can only perform operations on data loaded into the two argument registers. The ALU supports instructions such as increment, decrement, add, subtract, and, or, shift right, and exclusive-or. A shift left capability is provided through addition. It also supports a slightly unusual "zero" operation, which ignores the A and B registers completely and simply outputs zero to the databus, which is useful to several microprograms (see Appendix K). The ALU also has a "disable" operation, which causes the ALU to stop driving the data bus.

During an arithmetic operation, two flags can optionally be updated: the zero and carry flags. The zero flag goes high if every resulting bit from an operation is zero. The carry flag goes high if there was overflow or underflow during the last increment, decrement, addition, or subtraction operation. The ALU has the ability to selectively update the flags since a microprogram may make several operations, only one of which is directly of interest to the application. Lastly, there is a mux which allows either the zero or carry flag to be outputted. These flags can be used for conditional branching or jumping.

The carry flag, it should be mentioned, does not affect additions or subtractions unlike other microprocessors of this class (like the Motorola 65K and 68K series.) In those architectures, if an addition takes place and the carry is set, the result is incremented. During subtraction with the carry clear (indicating a "borrow"), the result is decremented. Because this would significantly increase the size of the ALU, a "notifying" or "lazy" carry was opted for. It allows conditional program execution, but does not affect the results of ALU operations.

The data bus is also used by a single-ported register file. Eight registers are available: A, B, X, Y, S, T0, T1, and T2. The A and B registers should not be confused with the A and B arguments to the ALU. Of these registers, T0, T1, and T2 are reserved for microprogram temporary storage and are not visible to the application. The A and B registers serve as a general-purpose accumulator, while the X and Y registers serve as indexing registers to provide more rich addressing. The S register is the stack pointer, which is managed completely by the microprogram. The stack is limited to 256 bytes due to the 8 bit nature of the register. While in this implementation the stack must be in the first memory page ($0000-$00FF), the microcode could be easily modified to allow relocating the stack to any page by changing a temporary register into the stack page register.

In order to simplify the project, and reduce its size, programmable array logic (PAL) devices were used extensively to combine multiple functions in one device. For example, the program counter PAL incorporates a register, an incrementer, and a 3-way mux. PALS with pins left unused were used to implement random decoding logic needed elsewhere. The rompcphase PAL, for instance, contains not only the phase register and incrementer, but also the branch logic and select logic for the rompc PAL. The large amount of time that was spent in minimizing the number of devices made the project practical; it would be easy for a microprocessor implementation to exceed the available space on the protoboards. Space-saving measures such as these resulted in the protoboard being about 20% empty.

Implementation

Most of the devices in the processor are implemented entirely within PALs. An understanding of the operation of those devices is best served by an examination of the PAL descriptions and the processor Functional Block Diagram. Those devices which require special explanation, or are potentially tricky are explained below. PAL descriptions can be found in Appendix G.

Clock Circuitry

A typical oscillator clock running at 1.8MHz is used. The processor is capable of running at speeds in excess of 5MHz, however, the UART could not put data on the bus that quickly. Buffering could have been provided to resolve this problem, but was omitted in the interest of timely completion.

From the crystal, the clock goes through an inverter tree, to help prevent problems related to clock fan-out. All clock signals went through similar paths, so clock skew is negligible.

ALU

The ALU, and its two argument registers, requires the most extensive and complex logic of any system in the project, and was thus implemented on a Configurable Programmable Logic Device (CPLD). Original plans called for inclusion of the register file into the CPLD as well, but the CPLD lacked the required number of internal flip-flops.

The ALU takes a number of arguments: A_LE and B_LE, which cause the A and B register, respectively, to load in values from the databus; ALUFN, which specifies which operation to perform on A and B (or whether the ALU should stop driving the bus); CCRSEL, which indicates which condition code register should be sent to the CCR output line (the zero flag or the carry flag); and UPDATE_CCR, which enables the modification of the carry and zero flag based on the results of the present operation. See Appendix K for a list of ALU operations and their function codes.

The zero flag goes high when every bit of the operation result is zero, and the carry flag goes high when overflow or underflow occurs on a additive operation. This is calculated by examining the most significant bits of the A, B, and result data lines. Since numbers are represented in two's-complement, carry can be detected if the sign of the two arguments is the same and the sign of the result is different from that of the arguments.

The ALU is otherwise a large mux, providing the desired operation result as a function of ALUFN. The ALU is capable of adding, subtracting, incrementing, decrementing, anding, oring, exclusive-oring, shifting right by one, as well as a number of boolean operations. The boolean operations all operate on the A argument only, and include identity, boolean inverse, and false (zero). The ALUFN can also encode "turn the ALU off", but if a valid operation is specified, the result is driven directly onto the databus. The VHDL description of the ALU can be found in Appendix H.

Branch Logic

If the microcontroller wishes for the main instruction stream to branch, it must do so based solely on the state of the zero or carry flag. The microcode can assert BR_EN, which enables branches. If the microcode wishes to branch on "not zero" or "not carry", it also asserts BR_NEG. The condition code register, outputted from the ALU, is exclusive-ored with BR_NEG and anded with BR_EN. The result, BR_OK is high if a branch should take place. Jumps (unconditional branches) are encoded by clearing BR_EN and setting BR_NEG, a combination which otherwise would go unused) to form the JUMP logic line. A branch is finally achieved when the program counter loads the effective address, which it does with BR_OK or JUMP is high. The logic for this implemented in a PAL.

Memory Subsystem

A microprocessor in a vacuum is a useless device. In order to provide greater functionality, a memory subsystem was added. This memory is in addition to and completely separate from the control roms. It provides space in ROM for utility functions and start-up code, and RAM for run-time storage. Unallocated space in the memory map may be used for any purpose, and this was taken advantage of to implement memory-mapped I/O. See See Memory Map

Memory Map

The memory subsystem has three ports: an address bus, a data bus, and a write enable. The address bus contains an address identifying the device and the specified device puts or reads the data on the databus in the same cycle. The result is that the databus is glitch-prone, and contains invalid data for a significant amount of time. The only requirement placed upon the memory subsystem is that it provide data early enough before the next rising clock edge to meet the setup times of a flip-flop. Unlike in RISC processors, and due to the fact that the memory subsystem shares the databus with the ALU, a read and ALU function cannot happen simultaneously, so there is little need for the memory system to be extremely fast.

Each device within the memory system has a chip select input. A PAL computes chip select lines for each device within the memory subsystem as a function of the address lines, so that only one will access the databus during any given cycle.

The memory subsystem includes several devices, including an Intel 28F256 ROM using 16K of its capacity (a smaller ROM would have been used if available), a Hitachi 6246 SRAM, as well as a RS-232 serial I/O subsystem. The chip-select PAL generates a ROM select signal, a RAM select signal, and three RS-232 enable signals (send, receive, and status), strictly as a function of the address bus. The processor does not need to know any details about the memory subsystem; the subdivisions and allocations can be changed without any changes to the processor. The only change required would be a software modification. See See Memory Subsystem Block Diagram .

The RAM device was wired with /CE connected to clock and /OE connected to the RAM select signal. This was necessary due to the fact that data on the bus would occasionally change before the RAM select signal would go inactive with the result being that incorrect data was stored in RAM. The same modification was necessary with the register file SRAM.

Unfortunately, this modification increased the minimum clock period, due to the fact that other devices depend on the result of a RAM read operation, which is also delayed through this procedure. A better solution may involve making RAM chip enables a function of write-enable and the clock; if the operation is a read, allow the operation immediately, but if the operation is a write, allow the operation only if clock is low. Another possible solution involves an asymmetrical clock with a shorter high period. The RAM would be disabled for a shorter period of time so read delay would be less affected while still providing quick RAM deactivation on the rising clock edge. Neither of these solutions were persued since performance was already adequate.

Memory Subsystem Block Diagram

Microcode

Microcode writing is a challenging task. Many resources are required for any arbitrary instruction, and allocating those resources in the most efficient manner is difficult. Achieving maximum performance by minimizing the amount of required for an instruction is also difficult. Perhaps even more difficult is deciding which microinstructions should be implemented.

Deciding which instructions to implement was made easier by the intent to be similar to the 65K and 68K series Motorola processors, though personal preference played a role, as did the use of a lazy carry flag in the ALU. The compromises made by using a lazy carry flag are discussed elsewhere.

Two main resources have to be allocated during each cycle: the address bus and the data bus. Most devices affect only one bus, but the memory subsystem requires both busses. Up to two operations can occur simultaneously, one on each bus, provided there is no contention. In the interest of rapid development and increased facility for validation, optimizations were neither persued nor desired so much as clear operation. Consequently, the microcode is inefficient, but debugging it took a very insignifcant amount of time. An inspection of the microcode now reveals that many instructions could occur concurrently. In particular, PC_COUNT is often asserted on a cycle of its own, but it can occur regardless of any bus activity. Some instructions could be reduced in length (with an inverse consequence in speed) by a factor of two, and almost any instruction could be reduced in length by one or two cycles.

Assembler

Having created an entirely unique microprocessor, no assembler was available to generate code for it. There are a number of generalized assemblers which claim to support proprietary hardware, however, they do not have the expressivity or power desired. It was necessary, unfortunately, to implement an assembler. This assembler, however, is perfectly generalizable, and should be useful to future students, hobbyists, and educators.

EdAsm, the assembler arising from this need, features unlimited constant and macro expansion, variable word size, and support for variable-length instructions. One feature in particular missing from other assemblers was the ability to generate an assembled file for a device whose first byte is not mapped to location zero. For example, in this microprocessor project, the ROM is installed such that ROM address $0000 actually corresponds to the address $C000, as sent on the address bus. Other assemblers required manual deletion of the first $C000 bytes of output and other manual adjustments, while EdAsm supports through a simple assembler directive.

EdAsm is remarkably simple, consisting of four C files, a Table object, a Tokenizer object, the EdAsm assembler itself, and a "scrubbing" utility. The EdAsm assembler uses the Tokenizer to assist in parsing the source and Table objects to keep track of symbol definitions during its two-pass assemble process. The scrub utility is designed as a modular replacement of the meta-output generated by EdAsm to suit whatever needs required by the user. The current scrub utility generates output in the ".dat" format.

Assembly takes place in five phase:

Precompilation. Expansion of macros and constant replacement.

Symbol definitions. This constitutes the first pass of assembly, resolving the address of all labels used in the source file.
Assembly. This is the second assembly pass. The source file is assembled again, this time generating output code using the symbol table generated in step 2.
Postcompilation. Simplification of arithmetic expressions.
Formatting or "Scrubbing". The output is reformatted to meet the needs of the user.

The code is very short due to the extensive use of the standard UNIX utility, m4. M4 is a rich, macro programming language. EdAsm uses m4 to expand macro definitions in precompilation step 1 and for the simplification of arithmetic expressions step 4.

EdAsm may be useful to future students, but was written quickly to get a job done. It consequently lacks several error reporting functions (although many errors are caught and well-reported by m4). The five phase procedure could doubtless be optimized, but is successful in being extremely simple, very easy to understand, and easy to modify. Source code is included in Appendix I. Even without optimization, the assembling of the test and demo code written for this project took only a couple seconds.

Processor Testing and Validation

The design of the processor unit took a considerable amount of time. A complication of such an intricate arrangement of subunits-- subunits that can accomplish virtually nothing on their own-- makes incremental testing an unrealizable fantasy.

While subsystems were validated independently, to whatever degree possible, the degree of confidence arising from the very limited pool of tests was hardly reassuring. Construction of the circuit continued for days at a time without even applying power to the system.

Of course, the clock circuitry was easily testable. However, the first useful and nontrivial test occurred when the system was fully assembled. The microcode ROMs were outfitted with a small repertoire of commands and the instruction ROM was filled with an infinite loop.

Debugging time had clearly been reduced by a coherent, well-planned architecture. Diagrams like Figure 1 were the first produced and were heavily relied upon during every stage. Surprisingly, the infinite loop code was up and running in remarkably little time from when wiring was completed.

At this point, testing could become more careful and reassuringly methodical. Test code was written line by line and verified cycle-by-cycle with the logic analyzer. Lingering bugs were quickly exposed through detailed logic analyzer traces and eliminated. Fortunately, there were very few bugs, but the bugs took a while to overcome due to the time required to reburn the PROMs. A major revision to the microcode would require burning four PROMs. A change to the program ROM required burning the remaining PROM.

Test programs became more and more ambitious, following a progression conceived during wiring. Additional instructions were implemented to test different devices in the microprocessor until every device was tested. The hardware worked; all that remained was debugging the microcode.

The final test consisted of a series of loops, conditional jumps, memory loads and stores, and, as a grand finale, the computation of a fibbonacci number through a simple recursive program. Subroutine invocation was by far the most challenging microcode to be written, and once it worked, testing was complete. Integration with the other kit, the video and keyboard systems, took place with only small noise-related problems to be hurdled.

While basic functionality had been achieved, several instructions were added without problem to facilitate easier coding of the demonstration software. Attesting to the robustness and simplicity of the system, a change in the ALU to add a right-shift capability, as well as instructions to exploit the operation was made on the final day, without any complications whatsoever.

Recommendations for Future Improvements

Since the construction of an entire microprocessor was relatively untried, the architecture implemented is relatively simple. Given the success of this project, future projects can aim to be more ambitious. Several features in particular are missed and should be present in any future design.

The current ALU's "lazy" carry flag should become active; if the carry flag is set, the result of an addition should be incremented. During subtraction, if the carry flag (acting as a "borrow" flag) is clear, the result should be decremented. This allows arbitrary precision arithmetic without the need for tests of the carry flag and appropriate branching after every byte. Moreover, since the microcode lacks the ability to branch altogether, the microcode is unable to correctly calculate the sum of two numbers if either of the numbers is larger than 8 bits. This adversely affects offset-encoded branches, since the effective address is the sum of the 8-bit offset and the 16-bit program counter. It also eliminates the possibility of the more expressive addressing modes found on many processors such as indirect, indirect by index, etc. Two additional instructions would be necessitated by an active carry flag: set carry and clear carry, so that the condition of the carry flag can be specified in order to avoid undesired incrementing or decrementing.

The ALU lacks several operations which could be very useful. While a multiplication and division facility would be extremely useful, it is unlikely that the CPLD could contain all of the necessary logic. A very reasonable compromise would be a set of variable shift instructions, to perform multiple bit-wise rotations on arguments, effecting multiply and divide by a power of two. The carry flag should be used to indicate overflow or underflow.

Another useful upgrade that would require very little additional work would be to increase the width of the registers to 16 bits. At a minimum, the index registers could become 16 bit in order to allow the extremely useful "index by X" addressing mode found on the 68K family. Much of the architecture, including bus widths, could remain the same; this would necessitate only new microcode and a larger register file (16 entries would suffice.) Almost any modification to the current design would require another ROM for control signals, as the current design requires exactly 24.

AT Keyboard Interface

The keyboard is the basic asynchronous link between the user and the computer. Early computers, such as the Apple II and the TRS-80, used custom-mapped keyboards and software polling routines that were often inextensible and confusing.

AT Keyboard Timing

Today, however, most computers use some form of the standard AT keyboard. The AT keyboard transmits data serially in an 11-bit format: one start bit, eight data bits, one parity bit, and one stop bit (see See AT Keyboard Timing ). The keyboard accepts +5 V and GND from the computer and provides a 10-20 kHz CLK and a single serial DATA line. Because the AT keyboard was designed to be a bidirectional device (i.e., capable of accepting messages from its host), the CLK and DATA lines are both open-collector. If the keyboard is used only as an input device, however, this fact can be ignored.

The AT keyboard does not transmit standard ASCII codes in response to keypresses. Rather, it transmits a series of semi-standard scan codes (see Appendix B). When a key is depressed, the 8-bit scan code associated with that key is sent to the host. On most AT keyboards, if the key is held down for a certain length of time, the code is sent repeatedly. When the key is released, the same scan code is sent again, but preceded by the special BREAK scan code, 0xF0.

Pressing "a", for instance, results in the transmission of three scan codes: "a" (0x61), BREAK (0xF0), and "a" (0x61). Pressing SHIFT-"a", on the other hand, results in the transmission of six scan codes: SHIFT (0x12), "a" (0x61), BREAK (0xF0), SHIFT (0x12), BREAK (0xF0), and "a" (0x61). See Typical AT Keyboard Transaction depicts a typical trace of the latter transaction.

This protocol allows the host to recognize practically every combination of keys on the keyboard. Standard low ASCII, however, utilizes only the CTRL and SHIFT modifiers. Consequently, the algorithm for translating AT keyboard depressions into standard ASCII is particularly straightforward.

Typical AT Keyboard Transaction

AT Keyboard Scan Code to ASCII Translator

The translator itself is a finite state machine (FSM) implemented with two 22v10 PALs, and a scan code-to-ASCII ROM. The FSM is clocked by the keyboard itself. Since the CLK line is active for eleven cycles only when a keyboard event occurs, and high otherwise, the translator draws close to zero power when idle.

AT Keyboard Interface: Block Diagram

When CLK goes low and the start bit is put on the DATA line, the translation begins. The translator uses the first active CLK cycle to reset all of its internal registers. For the next eight cycles, the FSM simply shifts the serial data into its registers. The scan codes are sent LSB to MSB, so the shift is to the right. During the tenth cycle, the FSM decides whether SHIFT or CTRL has been depressed and, if so, sets the SHIFT or CTRL flag, respectively. When SHIFT or CTRL is later released, the appropriate flag is cleared. The FSM raises the VALID line for all other key depressions, and ignores key releases. See AT Keyboard Finite State Machine Flowchart , as well as the commented fsmc code in Appendix E, provide a more detailed description of this process.

A 2716 2K x 8 EPROM translates the scan code on the FSM registers to standard ASCII. Technically, the ROM should accept a 10-bit address--eight bits for the scan code, one for the SHIFT flag, and one for the CTRL flag--but all practical keys on the AT keyboard are in the low half of the scan code set. The MSB of the scan code is thus discarded and only nine bits worth of data, or 512 locations, are programmed into the ROM. See Appendix B for detailed mappings of keys to scan codes and scan codes to ASCII.

AT Keyboard Finite State Machine Flowchart

Serial Interface

As in many distributed computer systems, the link between the CPU and the terminal follows the RS-232 standard. This standard specifies 15 V signals on data lines, where +15 V corresponds to logical "0" and -15 V corresponds to logical "1". Like AT keyboard data, RS-232 data is sent serially and is sandwiched between a START bit and a STOP bit. Unlike the AT protocol, however, the RS-232 standard allows the receiver and transmitter to operate on separate physical clocks. If flow control is not used, then both sides of the transmission line should operate at the same baud rate.

RS-232 Timing

A pair of AY-3-1015D universal asynchronous receiver/transmitters (UARTs) handle the serial transactions between the CPU and the terminal. Both UARTS are configured for eight bits of data, no parity, two stop bits, and 9600 baud operation. MAX233s are used to convert the 5 V logic of the AY-3-1015Ds to 10 V, a sufficient level for compliance with RS-232.

Terminal UART: Transmission

To transfer a new ASCII code from the keyboard interface to the CPU, a one-shot synchronizer first converts the VALID output of the keyboard interface to a strobe suitable for use as the DS input to the UART. Because keyboard events happen much less frequently than 9600 baud, the TBMT (transmission buffer empty) status signal can be safely ignored. When DS goes low--i.e., a valid key or combination of keys is depressed--the 8-bit ASCII code on the keyboard interface bus is read by the UART and serially clocked over the transmission line.

Terminal UART: Reception

Reception on the terminal side is somewhat more complicated than transmission. Because the terminal video memory is not continually available, a buffer must be used to store characters that arrive when access to the video memory is disabled. A complex programmable logic device (CPLD), described in further detail in See Video Display , handles the interaction between the serial interface and the video display.

CPU UART

The CPU uses a memory-mapped I/O system to access the serial port. It generates three control signals when it requires access to the serial port: SEND, RECEIVE, and STATUS. The SEND command and the transmission data on the CPU bus are both registered before being placed on the UART bus. The RECEIVE command enables the receive bus on the UART, and resets the internal "data available" (DAV) flip-flop on the UART a cycle later. The STATUS command returns DAV and TBMT (transmission buffer empty) into the MSBs of the CPU bus.

Video Display

The MC6847 video display generator (VDG) is a one-chip solution for applications requiring limited graphics capabilities. It features several modes, the most useful of which to a video terminal is its External Alphanumerics mode. In this mode, the display area is divided into 16 rows of 32 characters each. Each character is 8 pixels wide by 12 pixels high (see See VDG Display Area Timing ).

Display System Block Diagram

Block diagrams of the display system are shown in See Display System Block Diagram and See Display Output Circuit Block Diagram . In the scheme of See Display System Block Diagram , a 512 x 8 RAM is used to store the state of the screen. The data in the RAM is standard 8-bit ASCII code, and is used to address a 4 K x 8 character generator ROM. This ROM contains the printable representations of every ASCII character and generates the actual patterns written to the screen. A counter provides the low 4 bits of the address to the ROM. It is synchronized to the VDG such that it supplies the proper row of pixels as the VDG sweeps the screen.

Pixel Representation of ASCII Character "A"

On the output, the VDG supplies vertical blanking pulses which must be converted to the vertical sync required by RGB monitors. Its chroma outputs--Y, fA, and fB--must also be converted to digital R, G, and B signals.

Display Input

A complex programmable logic device (CPLD) is used to control the flow of data from the serial UART to the VDG. A microprogrammed control unit (MCU) could also conceivably be used, but the large number of inputs and outputs needed to address the RAMs makes the CPLD the preferred choice.

The block diagram of See Display System Block Diagram illustrates the basic idea (see Appendix F for a detailed schematic). The VDG requires exclusive access to the RAM while FS is high (see See Field Sync Timing )--approximately 14 out of every 17 ms. Because the serial UART is allowed to pass data at rates of up to 9600 baud, data must be buffered for the time that the RAM is locked (14 ms, or 9600 0.014 = 134 characters). A 128-character buffer is sufficient and is implemented with a 6264 static RAM. When FS goes low, the buffer must be emptied into the RAM 1 ; for the rest of the time that the RAM is not in use by the VDG, the UART may put data onto the bus as soon as it arrives. The CPLD manages the enable signals of the UART and the buffer such that tri-state bus contention is avoided.

Field Sync Timing

Besides buffer management, the other functions that the CPLD performs is position management and special character recognition. The CPLD keeps track of the position of the cursor in its "row" and "col" internal registers (see Appendix E). It increments the "col" position of the cursor when a regular, printable character arrives and writes the character into RAM at that position. Control characters such as carriage return (ctrl-M, 0x0D), line feed (ctrl-J, 0x0A), and backspace (ctrl-H, 0x08) are also handled appropriately. Carriage return returns the cursor to the beginning of the line, line feed moves the cursor one line down, and backspace positions the cursor one column to the left.

The flowchart in See CPLD Process Flowchart and the commented Verilog Hardware Description Language (VHDL) code in Appendix D provide a more detailed description of the process.

CPLD Process Flowchart

Display Output

When the CPLD is not updating the video RAM, the VDG is reading it and writing its data to the screen. As described earlier, an external charcter generator ROM first converts the ASCII data in the RAM to pixel data.

VDG Display Area Timing

The VDG writes 16 12 = 192 horizontal lines from top to bottom. It asserts HS at the end of every line, RP every 12 lines, and FS every 192 lines. One way to address the ROM is thus to tie HS to the row counter's CLK, RP to the row counter's CLR function, and FS to the row counter's LD function. CLRing the counter after every 12 rows implies that 16 - 12 = 4 bytes of padding exist between each character in the ROM. LDing the counter with "9" after every screenful ensures that the counter will start at "0" after the borders, blanking, and retrace are drawn (a total of 70 extra lines during which RP is not asserted; 15 - (70 mod 16) = 9).

After reading the pixel data from the character ROM, the VDG then outputs an analog luminance signal (Y) and two analog chroma signals (fA and fB). In the absence of the MC6847's companion device, the MC1372 Chroma/RF Modulator, the chroma and sync signals of the MC6847 must be manually decoded into digital red, green, and blue channels for use with an RGB monitor. Some calibration of the comparison voltages in Appendix F is generally necessary. The values and logic equations used to decode Y, fA, and fB into R, G, and B are listed in Table 1 and in Appendix F.

Display Output Circuit Block Diagram

Chroma Decoding Table
COLOR	Y (V)	fA (V)	fB (V)	L	Ah	Al	Bh	Bl	R	G	B
GRN	0.687	1.413	1.403		0	1	0	1	0	1	0
YEL	0.641	1.586	1.389	1	0	0	0	1	1	1	0
BLU	0.732	1.574	1.752		0	0	1	0	0	0	1
RED	0.732	1.762	1.564		1	0	0	0	1	0	0
BUFF	0.641	1.588	1.578	1	0	0	0	0	1	1	1
CYAN	0.687	1.399	1.576		0	1	0	0	0	1	1
MAG	0.687	1.758	1.748		1	0	1	0	1	0	1
ORN	0.687	1.763	1.382		1	0	0	1	0	0.5	0
BLK	0.764	1.589	1.579	0	0	0	0	0	0	0	0

1. A CPLD operating at 1.8432 MHz takes only 128 1843200 = 69 ms to empty an entire 128-character buffer. At full 9600 baud, characters arrive every 1 9600 = 104 us, making it unlikely for the CPLD to miss even one character.