Dave wrote:
Is there an art to changing the clock speed of a running CPU? As the address decoder already knows when we're accessing the 8-bit space, dropping the clock from [FAST] to [STOCK] needs to be instant.
My 68EC020FG25 runs happily at almost twice the listed 25 MHz.
Don't even try, this is quite a nightmare, but it can be done if it's whole number multiples or the clock must be stopped for a whole number of cycles. Asynchronous clocks are near impossible with simple logic and 100% reliability. But, you are not supposed to do this anyway, this is why the DTACKL signal is provided on the CPU!
On a standard QL, the CPU actually can run asynchronously with the rest of the system but only at either slower, or fractionally faster speeds.
As you say the 8301 is the real bottleneck. There are differences between how various versions (ZX8101/CLAxxxx) react, but normallyu you can run the CPU asynch at some 9.5MHz reliably. Around 10M it breaks down. The main difference in how far you go depends on the PCB version, as in older ones the ZX8302 access is handled by the ZX8301 and slowdown may apply depending on the state of screen RAM access, while on later issues, the HAL chip decodes it and it runs at full bus speed, asynchronously. The ZX8302 itself can actually be accessed at even faster speeds than 10MHz.
The ZX8301 (and CLA counterparts) do not actually rely on the CPU clock, but rather only on DSL and to an extent RDWL. That being said, the internal logic knows some things about the CPU to make the accesses a bit faster.
The basic way the 68008 accesses anything is: generate address and if write, data, set DSL low to signal the system address and data is stable, wait for system to pull DTACKL low to signal it's done reading or writing as the CPU requested.
There is some delay before the CPU internally synchronizes and recognizes the DTACKL signal, and the CPU will then latch in the data on the bus on read, at the next clock. There is a specified data set-up time which requires that the read data is stable some nanoseconds before that particular latching clock edge to guarantee the CPU will read correct data. The 8301 knows how long the CPU needs to recognize the DTACKL signal, and at that point it has a whole clock cycle less the set up time to provide data. Since it knows exactly when the data will arrive from the RAM (as it's the device that controls the RAM), it pulls DTACKL low before the actual data is present on the bus, to prevent a further 1 clock delay to an access that is already slowed down a lot.
However, when the CPU is run at a faster clock, the ULA still expects one cycle of a slower clock delay less the setup time. The set-up time is rather generous, so the ULA provides data in time even when the CPU is faster and running ahead - but at some point, as the CPU speed is increased, the data starts arriving too late and the CPU reads wrong data, and the system crashes. A similar thing happens for writes, but in this case the CPU removes the data too quickly, violating another parameter, called 'hold time' which specifies how long written data should remain on the bus after DTACK is recognized and before the next access is started.
On the address side of things, the ULA expects a stable address when DSL goes low, indicating a start of a bus access cycle (ASL is not used on the QL, there is a subtle difference but let's just say that a stable address is guaranteed when DSL goes low). This is important to let the address decoding inside the ZX8301 know it's time to decode an address. Like for data, there is also a set-up time for addresses, i.e. a lenght of time they will be stable before DSL goes low, as well as a hold time, which is a length of time the address lines still hold stable after DSL goes high, indicating the end of the current cycle.
So, what happens when a faster CPU is connected to the ZX8301?
First, because everything is scaled to the CPU clock, the address set-up time will become shorter, and the address lines may not be stable long enough before DSL goes low. Hold time does not seem to apply as strictly, it seems the ULA expects a very short hold time for addresses (this usually means the address is internally latched on the falling edge of DSL). The solution to this problem is generating a bus DSL line from the actual CPU DSL, by delaying the bus line going low when the real one does. On the other hand, when the real DSL goes high, so should the bus version. What this does is that the CPU supplies address (and data on a write) on the bus, but the devices on the bus only see DSL going low after the delay. In other words, the set-up time is increased by the added delay time.
Second, when the system returns DTACKL, assuming it has to do so 'in advance' of the actual data, the assumption does not hold any more because the faster CPU means less advance. The system will provide DTACKL too quickly, the data will appear too late on a read and random bus hash will be read instead, or the CPU will remove data to be written ona write, before it's actually written into external devices. The solution to this is creating a cpu DTACKL line from the actual bus DTACKL line, so that when the bus DTACKL goes low, there is a delay before the cpu DTACKL goes low. However, when bus DTACKL goes high, the cpu DTACKL goes high with no delay. This way the delay in DTACKL going low compensates the assumed advance so the CPU does not recognize bus DTACKL too quickly.
Now, it should be noted that the original 68008 has around 40ns aetup time for addresses and data on write, but because DSL trails ASL on write by a full clock cycle, the address set-up time on write is very large. So, the shorter of the two is important. At 20MHz the setup time is much shorter, around 10ns. That being said, it seems the ZX8301 does it's own internal recognition of DSL and maybe even synchronization to the clock so may even tolerate zero set-up time. From what I remember from discussions with Stuart H of Miracle systems, slowing down of the DSL signal may not be needed. However, signal integrity is still a concern - although no real set-up time may be needed, it may take some time for the signals to stabilize on the actual bus lines, so at some point as the CPU clock goes up, a delay may become necessary. As far as i know, none was used on the SGC, that would be a 68EC020 running at 24MHz.
On the other front, members of the 68k family require no hold time on read, and offer about half a clock hold time on write. In most cases, devices on the bus expect data to be stable when DSL goes low, as this is when they latch it on write. On read, the CPU does not need any hold time after DSL goes high. These conditions should be satisfied already by the fact DSL is not used directly but through a decoder, so the time the decoder takes to decode is added to the set-up and hold time. This means that only delaying DTACKL going low may be enough to get a fast CPU to communicate with a ZX8301. As far as I remember, this is the method the GC and SGC use. I am not sure exactly but I seem to remember something like a 7 clock cycle delay at 24MHz?
SGC also buffers the CPU lines and only enables the buffers on the QL side when the relevant addresses are actually accessed. This is because you do not want the fast changing signals when fast RAM or similar is addressed to leak to the QL bus.
When this sort of thing is done, one must take care to include the buffer delays into the calculation, or at least buffer all the lines the same (including DSL) so that relative timing remains unchanged. If lines are not buffered the same, we come back to the possibility that a delay in the bus version of DSL WRT the CPU DSL may become necessary.