ULA replacement - could it work in 16-bit mode ?

Nagging hardware related question? Post here!
User avatar
janbredenbeek
Super Gold Card
Posts: 629
Joined: Wed Jan 21, 2015 4:54 pm
Location: Hilversum, The Netherlands

Re: ULA replacement - could it work in 16-bit mode ?

Post by janbredenbeek »

Brane2 wrote:I've just took a peek at old 41256 dram datasheet.

Setup times are narrow enough to enable intelieveing 2 8-bit banks, even on word level.
Which means that one could use existing circuitry on PCB to transfer data between CPU socket VIDEO DRAM and ULA1 socket 16-bits at the time, provided that they are implemented to use that.

Also, they are pin-compatible with existing 4164, which means that simple replacement would up the video RAM 4x - to 512K.
Just one line would have to be drawn to connect all pins No.1 of all the chips and simple extra multiplexer would be needed...
This has already been done by members of our user group in the '80s. The 4164s were replaced by 41256s (which means soldering out the old chips, not a job for the uninitiated) and an extra 74LS257 added plus a PROM for the extra address decoding. In fact, my first QL has been modified for 512K this way. Using a PROM for address decoding seems a bit overkill but does allow for other extensions, e.g. extra extension ROMs in the upper memory area. It also allows the memory extension to be disabled using a small switch at the back of the QL.

Unfortunately, as the 8301 still has to access both memory banks for building the video and DRAM refresh, RAM timings aren't any faster than with the original 128K though.


Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: ULA replacement - could it work in 16-bit mode ?

Post by Nasta »

Having looked at the ZX8301 timings a while ago, I was more than a bit perplexed why it did not use interlieve bank access since it actually always has two 64k banks at it's disposal. It also has two CAS lines. The only difference pin-wise would have been that the address lines A16 and A17 would have to become A0 and A17, and of course the multiplexers would have to be moved one address line up (A1 to A16 rather than A0 to A15). The other address lines would have been accessible the same way they are now, by using the TTL multiplexer outputs to drive the ULA RAM address pins as inputs. Then the whole thing could have worked at 16MHz base frequency, which is exactly what you need to get the video timings spot on without overscan and the CPU working at the full 8MHz it's capable of.
One thing which may be marginal already on the stock machine is DRAM refresh, so knowing that 41256 upgrades were offered, I wonder how good the refresh was on these given there are twice as many rows to refresh.
The thing with the ULA is, it's biggest ball and chain around the leg is the soldered in old DRAM, followed by the ultra-long tracks with unbuffered video signals going almost the entire length of the machine. The latter can be defeated by adding a separately wired connector (as mechanically impractical as that is). The DRAM could be just kept alive by periodic RASL pulses but it still uses up power and generates heat. So, we are back to a replacement motherboard...
8301 could work with a 16-bit RAM using a multiplexer and some glue logic to decode high or low byte, and interface to a full 68k with 16-bit bus, still slowing it down a lot, but not as much as the 68008 with half the data bus and the need to do twice as many cycles to read instructions and data.


Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: ULA replacement - could it work in 16-bit mode ?

Post by Nasta »

PALCE/GAL were not available for a Sinclair-compatible price back then... but I do agree, not using a HAL or PAL right from the start seriously protracted the agony of the QL's initial release. I don't remember exactly who said it of the creators of the QL, that it was one of the most mismanaged projects they ever worked on.

Even on the latest motherboards there are things that work by the grace of god, like the interrupt acknowledge. The 68k manual clearly states that autovectored interrupt acknowledge should be done by pulling VPAL low, and vectored interrupt acknowledge is done by providing a vector and pulling DTACKL low. It also says that pulling both DTACKL and VPAL low at the same time will produce undefined operation (which is what the QL does), so it's mad luck that this resolves to VPA and autovectoring in the case of the QL, especially given various sources for the CPU, which do not have the exact same mask set.

Using a HAL (if one wanted the minimum number of components) or slightly different decode logic, would have freed at least two pins on the 8301 and one on the 8302, as well as reduced the logic slightly, like enough to add 4 instead of 2 screen support on the 8301 and maybe 16 colors.

One thing that would be interesting to know is what the exact ULA types were used for the various versions of the 8301. Obviously some space was left or made available in later chips to implement NTSC mode. The largest ULA die photo I have seen has 990 elements organized in a 3x3 groups, each containing an array of 11x10 logic cells, however it still has a total of 44 IO+power pads... It is also a sort of irony that the Spectrum 128 used a 48 pin ULA, so larger plastic DIP cases did become available. Also of note is that Sinclair used a 40 pin simpler ULA type chip to replace a whole lot of TTL multiplexing and buffering logic in the Spectrum 128. It s a great pity that the QL motherboard was never re-spun with these advances incorporated, as it would have finally produced a really decent machine, perhaps even a full 640k on-board capable one. At the time 41256 DRAM was already available, as was 4464 64kx4 DRAM which would have reduced the 128k RAM chip count to 4 (and if I were re-creating a motherboard now, that is what I would be using, or maybe, if I felt a bit adventurous, a larger x16 organized single chip dual CAS DRAM, or, slightly more difficult to get, 256kx8, these were widely used in older CD ROM drives).

It should be noted that with well established technology, mass producing chips makes the silicon itself on the same order of cost as encapsulation into a case, with ceramic cases being a LOT more expensive than plastic ones (often easily by an order of magnitude) and often a lot more expensive than the actual chip inside, so it begs the question: what about all these ceramic 8301s, they are not that uncommon either...

Then, there is the question of using DSL as the main bus strobe which actually complicates things with optimum design of the 8301, since DSL is quite late in the write cycle. Since the 8301 internally operates at 2x CPU clock (which is also what the CPU does actually!), using ASL and RDWL which is stable even before the address and ASL, would give the 8301 an exact idea when it was supposed to expect data and prepare the proper RAM timing ahead of DSL. This would have shortened the CPU write access by one cycle, as it is now it appears there is one wait cycle added to each write even under ideal circumstances (in which case the access speed is reduced from 4 to 5 clock cycles). This is in fact ONLY relevant to DRAM access where some sub-types of 6164 chips latch the data if write is low when CAS goes low to shorten the write cycle. The 8302 latches data on the rising edge of DSMC or PCEN (hence one is actually superfluous), and there are no other write capable devices on the board. It should again be noted here that using an external gate (either part of a TTL chip or HAL logic) to buffer RDWL for the DRAM array would have saved another pin on the 8301.


User avatar
M68008
Trump Card
Posts: 223
Joined: Sat Jan 29, 2011 1:55 am
Contact:

Re: ULA replacement - could it work in 16-bit mode ?

Post by M68008 »

Brane2 wrote:I've just noticed that China has heaps of fast static RAMs ( 128k*8 ) and 2Mx16/60ns IIRC for literally under $1/piece.
Link?


Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: ULA replacement - could it work in 16-bit mode ?

Post by Nasta »

I am surprised no one is considering one of the following:

MC68SZ328 which has the fastest running 68k core on board (66+ MHz no overclocking), tons of peripherals on board including an LCD controller. This might be a very interesting base of a re-do of a QL if one can get around the 192 ball MAPBGA case. These are also available from various sources for sensible money.

ColdFire V4, which is also highly integrated and beats 060 speedwise, but does not (directly)support the full 68k instruction set. That being said, the official emulation pack is now free and can be customized to best reflect the actual specific core one needs. There is ONE remaining difference which requires changes to the OS (and possibly some system software but likely not regular apps), which is the single stack pointer. On the other hand, 020 and up have 3 instead of 2 and it was handled, so...


Derek_Stewart
Font of All Knowledge
Posts: 3928
Joined: Mon Dec 20, 2010 11:40 am
Location: Sunny Runcorn, Cheshire, UK

Re: ULA replacement - could it work in 16-bit mode ?

Post by Derek_Stewart »

Nasta wrote:I am surprised no one is considering one of the following:

MC68SZ328 which has the fastest running 68k core on board (66+ MHz no overclocking), tons of peripherals on board including an LCD controller. This might be a very interesting base of a re-do of a QL if one can get around the 192 ball MAPBGA case. These are also available from various sources for sensible money.

ColdFire V4, which is also highly integrated and beats 060 speedwise, but does not (directly)support the full 68k instruction set. That being said, the official emulation pack is now free and can be customized to best reflect the actual specific core one needs. There is ONE remaining difference which requires changes to the OS (and possibly some system software but likely not regular apps), which is the single stack pointer. On the other hand, 020 and up have 3 instead of 2 and it was handled, so...
Hi

Is there a development board with the MC68SZ328 soldered to the board?


Regards,

Derek
Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: ULA replacement - could it work in 16-bit mode ?

Post by Nasta »

Derek_Stewart wrote:
Nasta wrote:I am surprised no one is considering one of the following:

MC68SZ328 which has the fastest running 68k core on board (66+ MHz no overclocking), tons of peripherals on board including an LCD controller. This might be a very interesting base of a re-do of a QL if one can get around the 192 ball MAPBGA case. These are also available from various sources for sensible money.

ColdFire V4, which is also highly integrated and beats 060 speedwise, but does not (directly)support the full 68k instruction set. That being said, the official emulation pack is now free and can be customized to best reflect the actual specific core one needs. There is ONE remaining difference which requires changes to the OS (and possibly some system software but likely not regular apps), which is the single stack pointer. On the other hand, 020 and up have 3 instead of 2 and it was handled, so...
Hi

Is there a development board with the MC68SZ328 soldered to the board?
There might have been a long time ago but it was not freely available. The SZ328 was really custom made for the Palmpilot devices, though Mot. tried to sell it elsewhere. Palm migrated to ARM well before the SZ328 was obsoleted but there are plenty available and actually cheaper than it used to be new.
One could, in theory, build a 'breakout board' from the BGA to something more manageable like a PGA, but consider that there are some parts like SDRAM you would likely want to use with it that require close coupling and good ground planes given the signal frequency.
I have looked at the SZ328 a long time ago, as a possible platform for games (net the PC style but rather ones you would see in a casino or betting place), but what killed the project ultimately was limited graphics capabilty due to relatively slow bus (66MHz SDRAM). The actual resolution x colors used was not so much the limit as the rapid reduction in available monitors and also going to 16:9 aspect ratio at the time. These days one can hardly find anything else. The SZ328 is limited to 512 vertically which is not a huge limitation in QL terms but it is a problem due to monitor timing requirements for some resolutions one would like to target, and I think it can't do double scan (i may be wrong there, have to look into the datasheet).
The SZ328 is a very complex and also very flexible device, and some work is needed to configure it to look like a piece of QL compatible hardware. For one, there needs to be external logic to produce QL compatible graphics. It is all doable but in the end you get something close to the Q68...
There are ways to configure the SZ328 somewhat more creatively as a fast 68k CPU, with extra peripherals. One could think of it as a reversed Raspberry PI idea, where the 'hat' headers are actually 68k compatible pins that plug into a board instead of a 68k CPU, and the connectors on the carrier board are used for IO of various kinds.


Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: ULA replacement - could it work in 16-bit mode ?

Post by Nasta »

Brane2 wrote:
Nasta wrote:I am surprised no one is considering one of the following:

MC68SZ328 which has the fastest running 68k core on board (66+ MHz no overclocking), tons of peripherals on board including an LCD controller. This might be a very interesting base of a re-do of a QL if one can get around the 192 ball MAPBGA case. These are also available from various sources for sensible money.

ColdFire V4, which is also highly integrated and beats 060 speedwise, but does not (directly)support the full 68k instruction set...
For me, it's either about:
  • staying true to 68K form as it was and taking the trip trough what-if lane to show what original could have looked like. For that, one kind of needs original 68000 ( or eventually let's stretch it to 68030)
  • staying true to Sinclair's philosphy. But once you step out of 68K, there is little reason to remain with it. Why not ARM, RISC-V or something entirely different ?
WRT 68000, part of the appeal would be to show that having multiple cores was very doable in an efficient ( 0WS or close) way. And it could do something close to 32-bit addressing.
Same with 68030, except that the thing would be beefier.

WRT DragonBAll - it doesn't offer any of those. It's not something that Sinclair could use at the time. It also comes prepackaged, so not much space for "Sinclair magic".
(BTW, its datasheet seems to be universally unavailable. Same with ColdFire series.)
Well, the SZ328 does indeed have a 68k core (NOT CPU32).
But if you are stretching this to 030, then that's already significantly enhanced. 020+ offer a true 32 bit ALU so for long word operations they can be twice as quick, and they have some prefetch stuff. There is, however, a middle way although it is an integrated processor of the 68300 series, the amount of integration is fairly small but useful, and the core is CPU32 which is 020 derived, without caches but with the fully 32-bit stuff you want, albeit running on a 16-bit bus (that can also work faster than 68k 16-bit bus) AND with higher speed (legally up to 25MHz, but it has an on-board relatively fine grain clock synth so one can be loose with that spec). It has some system integration, two serial ports and DMA. No (D)RAM controller so you get to build your own, and it does have auto bus sizing. Look up the 68340.
Last edited by Nasta on Mon Aug 09, 2021 11:16 am, edited 1 time in total.


Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: ULA replacement - could it work in 16-bit mode ?

Post by Nasta »

Brane2 wrote: WRT DragonBAll - it doesn't offer any of those. It's not something that Sinclair could use at the time. It also comes prepackaged, so not much space for "Sinclair magic".
(BTW, its datasheet seems to be universally unavailable. Same with ColdFire series.)
In case you want a look:
https://www.nxp.com/docs/en/reference-m ... Z328RM.pdf


Nasta
Gold Card
Posts: 443
Joined: Sun Feb 12, 2012 2:02 am
Location: Zapresic, Croatia

Re: ULA replacement - could it work in 16-bit mode ?

Post by Nasta »

So, if one wanted to re-create a QL-like hardware machine, there seem to be several options:

1. Use as close as possible to original hardware, perhaps expanded to what was available then assuming that (then) money would be no object. So in this group we have a machine based on the original 68k, and this means we would be limited to something like the 68SEC000.
On the + side:
* Can work in a 3V3 environment which makes it possible to use more modern cheaper memory components
* Can be made as a drop-in for 68008
* MUCH higher clock rates available
* very low power
* Reasonable price
* Fully compatible assuming some minor things (like treatment of interrupt lines, AVEC and lack of E and VPA... good riddance to those).
On the - side:
* Not as instruction-per-clock efficient as more modern members of 68k
* Not 5V tolerant when used at 3V3
* No dynamic bus sizing for attaching older peripherals so some external logic has to be used (like on the GC)
Undecided:
* 24 bit addressing

2. Two steps up, use of a more advanced and faster CPU, but at a reasonable price. Availability and price means we are practically limited to 68EC030, which is a fine option.
On the + side:
* Has dynamic bus sizing so using an 8 bit bus for slower peripherals is not a problem.
* Speed improved far over the original QL CPU, 50MHz capable parts are not uncommon
* Very reasonable price/performance ratio
* More instructions per clock compared to original 68k
* 32bit bus width makes things even faster.
* Full 32 bit addressing
On the - side:
* OS needs patching to cater for changes in exception behaviour, extra stack pointer and caches.
* 5V only, no direct interfacing to 3V3 unless the 3V3 parts are 5V tolerant. This means extra parts for interfacing modern memory, crucial for low cost. But see note at the end...
* Bus protocol(s) are different an need extra hardware to make QL compatible, this an be largely avoided if most traditional prepherals are replaced with more modern equivalents.
* To get maximum performance, some advanced bus functions need to be used which requires a fair amount of external logic, so count on a decent CPLD to get a complete system at a reasonable cost and speed.

3. The middle way between 1 and 2, using an integrated 68k series MCU. Several candidates aexist, most notably 68340 and 68SZ328. The former may be more interesting in something more like the vintage machine.
On the + side:
* 68k compatible cores, no OS changes needed past adding extra features (if wanted)
* Highly integrated can help put together a capable system with very few parts
* SZ328 is 3V3 capable, has built in LCD controller which can double as CRT base controller but for limited resolution (VGA 640 x 480).
* Much faster than original CPU
* 68340 is CPU32 based which is basically 68020 without the extra stack pointer and cache, and 16 bit bus, runs about 1.6x faster at the same clock compared to original 68k. This also means it has dynamic bus sizing!
* Can be had for fairly reasonable pricing, although the actual MCU price may seem high, it is compensated by many on-board features.
On the - side:
* 68340 is 5V only, 68SZ328 is 3V3 only - both may imply extra hardware for parts of the system
* 68340 runs at slower clock but with better efficiency but does not have a cache so to get the maximum performance, some really fast memory is needed. With fastest bus cycles, there is basically a single cycle of write data available per bus transaction which means either a decent SDRAM interface running at 2x clock (requires conversion of 5V to 3V3 logic) or 40ns or less static RAM which is expensive.
* 68340 has fairly low capability serial ports by todays standards, though far better than original machine
* 68SZ328 has so many options it is difficult to decide on a practical configuration
* 68SZ328 uses a 0.8mm pitch MAPBGA case which mat require a 6 layer PCB, though 4 layer could be possible with a 'breakout' adapter or VERY fine track geometry.

4. Top of the line performenace. Here it's almost only the 68060 and derivatives. FOr QL like applications the 68EC060 is perfectly fine. A FPu would be nice but a MMU is 95% wasted.
On the + side:
* FAR faster than the original machine, especially if overclocked.
* 3V3 compatible and tolerant which may simplify hardware
* Full 32-bit implementation of the bus
* VERY easily shares a common bus with other 68060(s?) which is a whole new area of exploration.
On the - side:
* Sky high prices and fake parts
* Top speed requires good memory sub-system, preferably SDRAM, but since that is 3V3 only, it basically negates 5V tolerance.
* No automatic bus sizing so interfacing to narrower buses either requires external logic or different address mapping. This may not be a big minus as interfacing to slow buses would require some sort of buffering between the fast and slow bus portions anyway.
* Has extra features over 68k that need to be catered for in the OS, like 68020 and higher plus extras.
* Quite large and potentially (very) hot running chip (especially overclocked)
* Expect decent size CPLD or two or FPGA to implement system glue logic efficiently.

5. The FPGA way. This is quite open ended as it can take the shape of any of the above propositions depending on FPGA programming. But in this case I will limit this to a FPGA re-implementation of a 68k fully compatible core (likely with speed enhancements) packaged to look like a sort fo 68k MCU variant. Think something like a small PCB which may be a drop-in replacement for an old style 68k CPU.
On the + side:
* Immense flexibility. One can envision a 68k 'chip' with a faster internal clock and cache, plus extra hardware depending on actual application, or something like the Q68 and then, depending on amount of work or money available, all the way to something very involved with a '68080' core.
* Reprogrammability means bugs can be fixed and the whole design streamlined and improved over time to the extent the used FPGA's capacity, but then even if that is reached, a re-design with a more modern one is possible.
* Implementation may range from very 'original' to very advanced, depending what one means under 'fidelity to the original design's soul'.
On the - side:
* Extra learning curve or VHDL/Verilog experience and a very wide understanding of CPU operation and native hardware required
* Lots of hours to be invested in perfecting and optimizing the design
* FPGA may get obsolete by the time the design is finished since it is practically an impossibility to make this a widely co-operative project. Usually it is a one-man band thing so it will require time, and FPGA manufacturers always want to sell you their latest and greatest.
* As a rule the FPGA is 3V3 only which means that some implementations, like replacing an older style 5V part, require external hardware. This may, however, also protest the FPGA from 'accidents' with clumsy users.

6. The path not traveled yet. This is about retargetting the hardware to one of the latest ColdFire chips, particulairly V4 is the most interetsing
On the + side
* Hard to beat performance, perhaps a very advanced FPGA implementation like the 68080 core could get close
* A TON of extra hardware and features integrated on the chip as Coldfire were always intended as MCUs.
On the - side:
* Hard to get chips and can be unreasonably expensive. Also may be prone to faking.
* not completely 68k compatible although V4 gets very close. Most of this can be fixed by the public domain emulation package.
* On the OS level, patching is needed to cater for single stack pointer. So here the problem is one less, rather than 020+ where there is one extra :P
* The bewildering array of extra on chip hardware make sit difficult to chose a sensible QL style configuration and most of the added high level functions will likely (sadly) remain unused. This ha salso repercussions on the hardware, these are all BGA chips with tons of pins...
* Never been done before so no existing code base for the hard non-compatible stuff, plus lots of time needed to fix problems that will arise.

6.5 A special mention, Coldfire V1. The MCF5102 Coldfire 'bridge chip' and the only V1 chip is essentially a compiled version of a slightly reduced 68EC040, with only the multiplexed bus mode available and smaller caches. This makes it fit into a small 144 pin TQFP case. This is the only Coldfire that is fully 68k compatible (or more precisely, as much as 68020 or 68030).
On the + side:
* Low power and small form factor, 3V3 operation
* Essentially it's an 68EC040 so OS patching stuff is available from Q40
* Very good performance, fully 32-bit, fastest part was 40MHz, Internally clock doubled so simple instructions approach 1 clock cycle per instruction.
* Expects SDRAM, multiplexed bus makes the system somewhat simpler to implement on a board basis but requires some more resources inside the inevitable glue logic CPLD or FPGA that needs to be used to make an efficient hardware design
On the - side:
* No dynamic bus sizing, so same considerations like the 060, a CPLD solution is needed to interface to a narrowr bus if one wants to meka this a QL compatible board or a drop-in replacement for an older 68k style CPU
* Not 5V tolerant (If I remember right)
* Has some bus quirks which actually prevent it from working as fast as theoretically possible
* Used to be fairly cheap, now it is unreasonably expensive and may also be faked (lower grade re-marked as higher grade)

Special note for Brane2 :)
There is a nice Sinclair aspect to the potential use of a 68EC030. Like the full 030, this one too has dedicated power pins for the core and various pin drivers, so the challenge would be, figuring out if supplying different voltage power to pins that power the bus and control signal drivers, can get the 68030 to be 3V3 compatible... Assuming the treshold voltages of the P and N MOSFETs in the buffers don't cross when a lower voltage is applied, resulting in excess current through the buffer output stage, at least outputs should be no problem, inputs might be tricky. So... maybe consider it a challenge :P


Post Reply