In the previous long post, a signal trace was shown of some CPU and 8301 signals, and this trace shows one of the 40 timing chunks the 8301 generates for each line of display it sends to the screen. In actual fact, it shows one of the 32 'active' chunks during which the 8301 is reading data from RAM that will be output as visible pixels.
The next picture shows a trace of one of the equivalent timing chunks that happens when the 8301 is generating completely blank lines, in our case lines 256 to 311 out of the (0 to 311) that make the whole screen. And, it's actually a very interesting one as will be shown soon.
Just for a minute, let us loom at the first picture, namely at the state of the address lines shown on the trace.
There are only 4, because this is basically enough to figure out what part of the motherboard hardware is being addressed.
* Note, the traces were taken on a 'bare' QL.
So, the 8301 disregards A18 and A19 from the CPU which means it reduces the address map to 256k and this is then repeated 4 times within the 1M total.
The decoding table is actually very simple:
A17:A16=00 and /WR=1 decodes the ROM. ROMOEH goes high and /DTACK is a copy of /DS which makes the CPU perform the fastest possible access in ROM.
A17:A16=00 and /WR=0 just generates /DTACK as a copy of /DS which makes the CPU perform a write cycle that effectively does nothing.
A17:A16=01 accesses the IO area. There are some differences between ULA versions it seems. There are only a few addresses used but the OS will use $18000 as the base address so, for this one also A15=1. It also uses A6 (and on some ULA versions A5) to select control registers within 8302 (A6=0), or 8301 (A6=1 or A6:A5=11).
When the 8302 is selected, the pin /PCENL on the 8301 also goes low, this is the chip select output that drives the equally named input pin on the 8301.
A17=1 accesses the RAM, and when A17=1 and A16=0, /CAS0 will be activated, when A16=1, /CAS1 will be activated, i.e. A16 selects the RAM bank.
If we look at the previous trace, we can see that the CPU is addressing the first 32k of the RAM (A17:A15=100), i.e. screen 0.
In the picture above, we see something curious - it is addressing the IO area with A6=0, which are the registers inside the 8302, and the /WR signal being low tells us that it is writing data.
Also, we can see CSYNCH is high and an even more curious thing within the portion of the access chunk that is normally used to read screen data - there only /RAS goes low, but there is no /CAS signal and in fact if we look into the /RAM datasheet, this means no data is transferred at all (which is OK since there is nothing to display, these lines are all black) and the RAM is actually being refreshed. So, the CPU is writing a value into the 8302 on an idle QL during vertical retrace - what it is actually doing is serving the frame interrupt - which occurs every time a vertical retrace begins, and it is generated from the VSYNCH signal.
The trace however immediately poses two questions:
1) Why is the 8301 using 8 clock cycles just to refresh the RAM when it could use just 4, freeing twice the usual time for CPU access? And, further (though not shown in the diagram) why does it do it for every one of the 32 chunks that would normally be used for visible pixels in a line?
2) What does the 8302 have to do with this, as the 8301 is obviously letting data on the RAM data bus (/TXOE=0) even though the RAM is not using it?
Well, let us start with the easy one first - and that's question (2).
If I could have squeezed the signal /PCEN (or PCENL as it is written on the QL schematic) into the trace, it would be shown going low at the same time /TXOE goes low, and going back high a bit before /TXOE goes high. In other words, the 8302 is being selected.
Well, the explanation is that the 8302 was connected to the RAM data bus on all issues of the QL motherboard up to 5 (or in any case up to whichever one has the HAL chip on it). The 8301 represents one more small load on the bus along with 16 already existing loads on it. On the CPU side, there are two ROMs and the data buffer chip on the data bus so connectiong the 8302 there would really have made no problem at all. However, it appears that the logic inside the 8301 counts on the 8302 being connected to it's side of the data bus, and the only logical explanation is that these two started as a single design, with a single internal data bus - and the logic was just split without correcting it for the new situation.
And yes - this implies that accessing the 8302 in this case does incur the same slowdown as accessing RAM.
In later versions the 8302 data bus was connected directly to the CPU data bus, but the internal decoding logic of the 8301 had to be circumvented by a piece of decoding logic inside the HAL chip.
Now, question (1) has an easy and a more complex answer.
The easy part is that the logic was made the simplest possible, though... I think Albert Einstein is credited with a saying that goes like this: things should be as simple as possible but not simpler than that. And this may be an example of 'simple than that', but without knowing how complicated the logic inside of the 8301 is, it is difficult to say for sure. If it was a CPLD or FPGA it would have needed very little added logic to free 8 out of 12 total cycles during vertical retrace, so the access to RAM would have been twice as fast. In the grand scheme of things, the difference is not massive, but a small calculation will reveal it is still just a bit more than what one would get by upping the CPU clock to the full 8MHz.
The more complex part has to do with refreshing of the RAM. As I mentioned before, QL uses DRAM and it requires refresh. The requirement is that all 256 rows should be refreshed (by one of several methods on offer) or at least guaranteed to be read, every 2ms. In other words, the complete set of row addresses must be cycled through tat least once every 2ms. Rather than perform explicit refresh, the 8301 relies on the sequential reading of the RAM when it's doing screen data reads, to go through all of the row addresses often enough. In the previous long post I have shown that there are a total of 56 unused lines, and since all 312 take roughly 20ms (19.96 to be exact), 56 take just about 3.5ms is longer than 2ms, so reading or explicit refresh must not stop during the invisible lines or the RAM will lose data.