Peter wrote:Nasta wrote:Even at 20MHz with an 8-bit bus it should work around GC speed, if a decent memory controller is implemented.
At first sight this looks wrong to me. I have not actually measured timing on the GC, but it has 80 ns RAS access time DRAM chips, so I would guess DRAM cycle time around 150 ns, without using page mode. If we assume the real cycle time 30% slower, the GC would be under 200 ns cycle time for 16 Bit.
Because of the 8 Bit bus, the 68SEC000 would need 100 ns cycle time, which is two 20 MHz clock cycles, to break even. But it has minimum 4 clock cycles!
Well, I think the GC DRAM timing is based on a 24MHz clock, and - at least on my GC (some time ago as I traded it in for a SGC) and on the SGC, and on the QXL the same DRAM was used, which is 80ns, so let's say roughly 2x that for cycle time. No page mode, the logic would never fit into the CPLD. At 24MHz the ideal timing would be to use 4 clock cycles as that's around 166ns. but it would require using half-clocks (I don't think the CPLD was capable of sual edge triggering of synchronous logic), so I am thinking more along the line of 5 or 6 cycles, 200 or 250ns. The latter is exactly 4 clock cycles of the CPU and is synched to it, so that seems to be the winner. In any case, we could say it just manages near 0 wait states at 16MHz but it also has a 16-bit bus.
However, when compared to 68008 at zero wait states, the 68000 is not 2x faster at the same closk, reason being it's not as sophisticated as the 020, internal processing can take quite a lot of cycles (like effective address calculations etc) and this is exactly the same on the 68008 as on the 68000, as the internal architecture and bus widths are the same - while the architecture is not advanced enough to hide most of it behind pipelining.
There are, however, some QL specifics but hhese should then be taken into account: The GC is indeed >4x faster than the bare-bone QL but that's because the 68008 gets further bogged down by the ZX8301 when accessing RAM (*) but the GC shadows this in 16-bit RAM so this should be taken into account. Even so, when both are running zero wait, the GC is around 3-3.5x the 68008 speed.
The little 68SEC000 project would use one single 16Mx8 DRAM chip with a 50ns access time. The timing is based on a clock which is 2x CPU clock, basically to be able to use both CPU clock edges. It can run near 0 wait up to a bit over 20MHz. The idea, in fact, was to use a simple clock synth chip to get 3x 7.5MHz. So, it's really an apples to apples comparison, as long as you keep the 68SEC configured as a 68008 running 0 wait, at 3x the clock frequency, it runs 3x faster, so into GC territory.
Oh, and of course screen access would be shadowed. Not only that, unlike SGC, it can run Aurora full speed, which is near 0 wait at 200ns/cycle, realistically around 2.2-2.5x QL bus speed. And this is HIGHLY desirable with nearly 8x the screen RAM and 8-bit access. The SGC can only access Aurora screen ram at QL speed and does no shadowing since there was no Aurora when the SGC was made, to take into account, while the 68SEC thingy can, and has the extra RAM to do it too.
Now, one could make the DRAM controller run asynchronously to the CPU. The problem here is that you can't really get real zero wait bus operation in this case, and the paradox is, it gets closer to >1 wait the closer the CPU frequency is to the DRAM controller frequency, because the CPU needs to synch on every access. But, we can continue to up the CPU frequency as far as the CPU will go - the only thing is, the external bus stays at 15-20MHz levels, but internal processing still scales up as the clock goes up, so it's a trade-off. One thing that is not a benefit in this mode is shadowing Aurora RAM, the asynch DRAM controller would in fact work exactly the same way the Aurora does (it is based on 8 cycles at 20MHz) and any speed benefit in clock would likely be dispensed with due to added time needed to cynch the CPU and RAM on every access. This is why I said that at some point upping the CPU speed shows diminishing returns in the average case, although some cases where most of the work is done inside the CPU would show a marked speed improvement.
Now, with some good static RAM available, for high capacity SRAM one expects sub-80ns access and cycle times, 55ns is not uncommon. Given the 68k bus timing, this would be good to over 50MHz without wait states
but forget cheap 16M of RAM then.
There is a case to be made for more RAM as soon as you do anything to increase the screen resolution. Even before that, as soon as you put PE into the mix. With 32k per screen, things are not too constricted even with 'only' 2M, but when it's 192k like with the Aurora... although probably half that on average in mode 4, one may find 2M wanting. The idea to put a full 16M on board was simply there because it's a single chip (or 2 chips maximum).
There is also one other consideration - this sort of a thingy could run SMSQ/E. And... there are 256 color drivers there for the Aurora. All of a sudden we may be speaking about 240k or screen a pop... and then you really do want all the RAM you can get.
(*) In the old days of the bare-bones QL, RAM expansion speed was something one wanted, and there were full speed designs out there, or near enough. But, things can be faster, and by shadowing the screen RAM and replacing the top 64 (or 96k if you do not need the second screen) of the internal RAM by fast external RAM. Funny no-one though about it on commercial products back then, until the GC arrived, where it was obvious to have it. My first RAM expansion was something I did myself, and it had 768k total when fitted with all 3 256k banks. Which means when the last 256k bank was fitted, it replaced the original 128k and implemented screen shadowing, plus added 128k at the top of 640k which was supposed to be the upper limit. Believe me, I was very happy when it worked because I only assumed it could, there was nothing definite in the scarce available literature toindicate it would. Ad it was a zero-wait design because one could then get 120ns DRAM for PCs, whereas most QL expansions used 150ns. I actually reverse-engineered Miracle's Expanderam to figure out how the bus worked but did not like the way they treated RAM access so I did my own thing. I was also delighted when it measured faster than the fastest RAM and back then I think it was the Sandy SuperQBoard