Fun things to do with an MC68EC020....

dilwyn · Post by **dilwyn** » Wed Apr 24, 2013 9:55 am

Nasta wrote:Now, a finer point on this is that tasks are NOT jobs. This is one definition that appears no-where in the QDOS/SMSQ documantation, assuming the difference is known.

Unfortunately, the distinction pretty much vanished early on in the life of the QL largely through terminology used in DP's BASIC compilers. While Q-Liberator used the term 'job' (hence the default filename extension of _obj which can be either 'object' or 'job' depending on who you speak to), Supercharge and Turbo always used the term '_task'. So the term '_task' became largely (and incorrectly) synonymous with 'job'.

However, Nasta has now set the record straight!

A very interesting post...

Dave · Post by **Dave** » Wed Apr 24, 2013 5:07 pm

Once again Nasta knocks it out of the park

One of the great things is that there's a great deal of technical 'meat' to these posts. It increases my knowledge in small spurts as concepts become clear. It's also a problem because I do get frustrated when I don't learn instantly, and some of the concepts are quite advanced.

I'll be re-reading a lot.

One of the EVERY way Nasta has me beat is that besides knowing every inside technical detail, he also knows in great detail how to implement things in hardware, where I don't. For example, he explained that when using a faster clocked 020, a signal needs to be delayed, and I have been delaying that signal by chaining the gates of a 74LS00, adding about 70-120ms (my oscilloscope isn't that accurate

)

I think what this is pushing me towards is dividing the job into two tasks by drawing an imaginary line on the board. First, do the combo expansion board. Get all the expansion parts of the board working correctly, refine the schematic and logic, the drivers etc. and settle on a ROM image. Put this out there so people can get their RAM, floppy, IDE, ethernet interface. Then work on the right side of the expansion connector and redo the QL board as-is, and wrap the interfaces around that.

I already accept that the expansion card will need 32k of ROM, and I've never seen another card go over 16k. I will just have to implement it as two consecutive 16k areas, I believe.

One thing Nasta said in an earlier post is certainly true:- a QL with a 68020 at 7.5MHz is noticeably faster than a 68008 at 7.5 MHz. I can't give a figure though, because it's not a consistent percentage faster, and depends on the type of task it's doing. Loops involving reads/writes and moving blocks of data get a lesser increase over tight loops that fit in the cache, which can speed up a LOT.

Nasta · Post by **Nasta** » Thu Apr 25, 2013 1:21 am

dilwyn wrote:
Nasta wrote:Now, a finer point on this is that tasks are NOT jobs. This is one definition that appears no-where in the QDOS/SMSQ documantation, assuming the difference is known.
Unfortunately, the distinction pretty much vanished early on in the life of the QL largely through terminology used in DP's BASIC compilers. While Q-Liberator used the term 'job' (hence the default filename extension of _obj which can be either 'object' or 'job' depending on who you speak to), Supercharge and Turbo always used the term '_task'. So the term '_task' became largely (and incorrectly) synonymous with 'job'.

However, Nasta has now set the record straight!

A very interesting post...

That is precisely right. And, in fact the consequence of not having a proper reference manual.
Interestingly, I find the original user's manual was better organized than the technical guide.
QDOS did and still largely does miss a nice 'concept' write-up, as in the basic ideas and concepts, how they were realized and why in that particular way.
Having been exposed in recent years to various things that pass as 'real-time OS' I had quite a number of occasions to laugh out loud knowing how old QDOS worked, never mind more advanced versions or what was proposed for Stella (still have Tony T's write-up on this!)

Nasta · Post by **Nasta** » Thu Apr 25, 2013 3:51 am

Dave wrote:Once again Nasta knocks it out of the park

Well, I am aware that lots of stuff in the messages above sounds like tooting my own horn rather without foundation as it's about a product that never materialized. I wish I had more time to do things QL, but life is what it is, so at least I tried to do a sort-of 'memory dump' hoping there will be enough useful info for someone. I've been reading posts here and there is actually a lot of information around that answers many questions, but it's either disorganized or in the heads of people that are not easily available, or - unfortunately - not at all.

One of the great things is that there's a great deal of technical 'meat' to these posts. It increases my knowledge in small spurts as concepts become clear. It's also a problem because I do get frustrated when I don't learn instantly, and some of the concepts are quite advanced. I'll be re-reading a lot.

Don't get frustrated, write out the questions. I may not be available often but at least if you send me a private message I get notified by email, so I will try to get an answer if I know it, when I get some time.

One of the EVERY way Nasta has me beat is that besides knowing every inside technical detail, he also knows in great detail how to implement things in hardware, where I don't. For example, he explained that when using a faster clocked 020, a signal needs to be delayed, and I have been delaying that signal by chaining the gates of a 74LS00, adding about 70-120ms (my oscilloscope isn't that accurate )

Well, the thing here is... oh lord I know I had that piece of paper with a drawing done by Sturat H somewhere, but it's been 3 moves

... anyway, the first bit of hardware where that was required was obviously the GC as this was the first instance of a fast CPU being used. So, the forst implementation of the 'delay hardware' was in the GC's INGOT PLD. It's actually a counter which counts clock cycles after the CPU activetes it's DS line, until it will let the DTACKL signal go through from the motherboard to the CPU. It's done the same way on the SGC except it needs more cycles as eaxh cycle is shorter at 24MHz than the CG's 16MHz. At 24MHz each cycle is about 41ns so a few of them are needed.

DTACKL is in any case one of the more problematic signals on the QL because it's only pull down. A resistor pulls it up to logic 1, and the actual chip driving the signal to zero has to overcome the various parasitic components of the wiring AND the resistor. However, any type of logic chip (or output of a PLD) is quite good at this, because it's internal equivalent resistance when the signal is active, is very low - less than a few tens of ohms. For most modern PLDs it can be only a few ohms - and (as hard it is to believe) it can be too fast, but that's another story. When DTACK needs to go back from low to high, it's the resistor that has to pull the line high, and unfortunately, the closer the line is to a high state, the less of a pull the resistor has. This results in the actual signal starting off rather quickly from logic zero (slose to 0V) towards logic 1 (close to 5V), but then it tapers off and gets slower and slower. When a CPU monitoring this line is very fast, it may detect DTACK low, start a new cycle and immediately still detect DTACK low because it has not gone sufficiently high in voltage in the available time to be recognized as high. There is a solution to this problem if plan your hardware in advance, making provisions for what one can expect connected to the system. In the original QL all signals were simply routed out to the connector and whatever fixes were to be made, would have to made externally. When one builds a new system, knowing the shortcomings of certain components in advance makes it possible to circumvent them.

Regarding DTACK delay, although the method used (series gates) works, it's quite unpredictable due to component tolerances. A different specimen of the same chip even from the same manufacturer may provide a different delay. A different logic family also - gate loading and temperature can sometimes have dramatic differences re performance. This is the reason why one will often find specs for delays of a simple gate expressed as 3 values: minimum, maximum, and typical. Quite often the difference between minimum and maximum can be 100%, and 200% is not uncommon.
In some cases chips can be unexpectedly fast - the HCMOS family comes to mind. Lightly loaded lines (and it's the capacitive load that is the problem) and low temperature (as in closer to room, since HCMOS generally use a fraction of LS TTL power and don't generate as much of their own heat!) can make simple logic gates in HCMOS chips rated for 10s of ns delay have delays in the single ns range. This can seriously upset things.

Side story: I once used a HCMOS version of the Z80 chip connected to a regular Z80 PIO chip. The latter uses an internal mechanism to convert the edge of the write signal to a short pulse to internally write it's registers. On the old Z80 the speed of the internal logic was such that you could actually see edges of signals being very noticeably slanted as it took a rather long time for them to change state. On the HCMOS version it was so fast (because it was only driving a ROM and the PIO chip) that the PIO chip did not even detect the edge! I actually had to put a capacitor between the write line and ground to simulate about as much load as perhaps 20 or more chips, to make it work.

Moral of the story: do not take chip delays for granted. The simplest way to generate short delays (up to few hundred ns) is to use a RC filter - series R, C to ground. Put a gate as a buffer in front so it drives the R (up to some 330 ohms or so - HCMOS gives lots more freedom here), and a gate after to shape the signal that gets slowed down by the RC combination ito something more 'digital' with proper edges. The R and C usually have up to 10% tolerance, which is rather precise compared to 100-200% of the gates themselves

- and, you can put a trimmer for the R or C and play with the time constant (R x C) and adjust the delay. You can also make the delay one-way, i.e. delay only 1 to 0 transition, or 0 to 1 transition with some play of gate input or adding a diode across the R (1N4148 work beautifully and you can get 100 for a buck

), to either make the C charge fast (so 1 to zero is delayed less) or discharge fast (so 0 to 1 is delayed less). Once the required delay is known, it can be implemented by more precise means should that be needed.

This brings me to one aside to using PLDs that's not obvious. Some things that are easy to do with 74xxx logic chips are not so easy to do with a PLD, but also, some things that are simple to do in a PLD one would never go about doing the same way with 74xxx logic chips. So, while it's doubtful one would use a counter to generate a delay using discrete 74xxx logic chips, it's the most logical thing to do inside a PLD - and that's why it was done this way in the GC and SGC INGOT. The good thing about that implementation is that once you have paid for a PLD and the logic fits, it costs you nothing to put it in there, and also, because it's clock driven, the delay is very precise and synchronized with the workings of the CPU.

I think what this is pushing me towards is dividing the job into two tasks by drawing an imaginary line on the board. First, do the combo expansion board. Get all the expansion parts of the board working correctly, refine the schematic and logic, the drivers etc. and settle on a ROM image. Put this out there so people can get their RAM, floppy, IDE, ethernet interface. Then work on the right side of the expansion connector and redo the QL board as-is, and wrap the interfaces around that.
I already accept that the expansion card will need 32k of ROM, and I've never seen another card go over 16k. I will just have to implement it as two consecutive 16k areas, I believe.

Since you are designing almost all of it anew, you can actually do some things as they best fit you, not he old specs. Sticking to them 'to the letter' holds you back to the very things you are trying to improve, and those specs should be read with a bit of pragmatism.
For instance, while the spec says each ROM should be up to 16k, and there are up to 16 permitted, and all code within must be position independent since there are no guarantees that a ROM will be mapped into a specific address - read the concept behind it. And that is the following: The OS checks for a ROM header at every 16k boundary from C0000h up (and in the ROM slot, a couple of other if it's a Minerva). Since the IO part is probably in the end not going to be removable, take as much ROM as you need just be certain that it starts at a 16k boundary starting from whatever address is used as the expansion area, and construct your ROM accordingly. If you look at the specs for that (how the tables are built), as far as I remember it's all 32-bit offsets. And even if it's 16 bit, you can always continue with an additional ROM image as long as it starts on a 16k boundary do the OS can detect it. It stands to reason that you will not stop at the original spec of up to 256k of expansion space - after all you are doing an 'upgrade'.
Now, I seem to remember that Minerva was especially helpful here - from my expansions using the 68008FN which has a 4M byte address range instead of the original 1M. Minerva correctly figures out the address range of the CPU and starts looking for ROM images at 0C0000h as usual, but continues from there and scans the whole remaining address space. It gladly detected and initialized a Cumana floppy controller at 3C0000h. In fact, one could just as well scan the entire address space, starting from the ROM slot and check at all 16k boundaries. The only problem would be a case where one wanted to store ROM images within a larger ROM somewhere in the address map, to activate when needed, they should not be stored at 16k boundaries or the OS will find and attempt to link them all, which is not necesairly what we want.

Then, there is also the question of where to locate the various registers of the IO hardware, such as floppy controller chips, or parallel ports. When one designs an expansion card, normally one part of the ROM space (16k) would be used for this, with special hardware to decode it instead of the actual ROM - for instance the last 256 bytes of the 16k available for each expansion. It would be quite odd to emulate this sort of a system for every Io device one ultimately wants to have 'on-board', as it's a lot of decoding. It's often far easier to patch the driver code. Because all references to everything on an expansion board have to be relative to it's base address (where the system found the ROM), there is a provision for an offset in the driver linkage block where one can put data like this. This can be found and patched. It's not trivial but also not terribly difficult. The OS passes this to the driver code in an address register so there is usually only one place where it has to be patched, for each device the ROM code (driver, extensions) caters for. Doing this makes it possible for the hardware decoding to be simplified, and - sometimes more importantly - locating various devices into addresses which were never used before, so others that have more universal use can be freed.

That being said, if one is designing a system from ground up and knows in advance certain hardware is always going to be present, and the software to drive it, it makes no sense to occupy configurable resources such as the original expansion space by this hardware since it's fixed - not optional, not movable, not configurable in a sense that someone would decide to put it all in a different address. Under these conditions it may also be feasible to store the driver code somewhere outside the areas used for more configurable things, at some fixed address, and patch the OS to link it in outside the usual ROM search procedure. This is exactly what the GC/SGC does, and it lets it use a 64k byte EPROM for all it's code (with quite a bit of spare space left over) without actually using the usual addresses. The QL is quite flexible here because one can start with the usual expansion ROM method, and then using patches etc. migrate to using areas outside those normally dedicated for expansions.

On a more general note, using a mix of slow and fast hardware on a fast CPU is always a problem because in several ways the slow bits interfere with optimal operation of the fast ones and the other way around. Slow stuff loads down the buses making them slower which is a problem for the fast bits. Fast signals for the fast bits may upset the slow bits etc. So, when deciding where to draw delimiter lines, separating the slow and the fast (and sometimes speed categories between those two extremes) must be one of the major considerations.

Again an aside here: designing boards to be small as well a few other techniques such as properly using ground planes, is a way to more or less automatically insure proper signal integrity. As much as one wants to try to avoid this, one cant - and one 'avoids' it normally only when there is o other way, because THEN things get really problematic.
A very important thing to understand here is that clock speed does not directly tell you how difficult it is to get a signal along a line of a PCB (or a wire for that matter). This is because you can have chips capable of extremely high speeds operate at a very low speed. Without special considerations, you have to design for the highest speed the chips are capable, even when they are used at only it's fraction. The reason for this is simple but not obvious - it's not the rate of change state of a signal that is 'speed' in this context, rather the speed at which the signal state itself changes - in other words, the steepness of the 'edge' of a signal created by a change of state. So, while a signal can change states once a second, the actual change itself can be extremely fast, say one nanosecond - so in theory, the maximum rate of change would be as fast (if it takes one nanosecond to change, then in theory one could make a change every nanosecond) - in clock speeds this would mean 1GHz. It is EXTREMELY difficult to design for this 'off the top of one's head'. Wires stop behaving like wires, but in an electrical sense more like suspended bridges (you know, the kind Indiana Jones would run over and then cut the suspending ropes to prevent his foes catching him). If you walk over them slowly, stepping softly, they appear solid and secure, but if you run in quick footsteps, pounding down, they buckle and vibrate and bend until you might fall off. This is precisely what a PCB trace will do to an electric signal. You take the maximum theoretical rate of change, calculate the wavelength as speed of propagation of the signal down the trace (for typical line width, PCB material, etc, about 1/3 of the speed of light) and if your trace is longer than 1/4 of that from end to end, you will have problems unless the trace has appropriate termination - and this is WITH a ground plane. Without it, it depends on trace shape to the point you might not be able to make the signal pass and be recognizable on the other end at all.
The problem here is that chips that are commonly available now such as memory and PLDs are quite bit faster than they used to be because they need to keep pace with the demands for ever faster hardware. So, in a way one is forced to design using fast logic and therefore one has to obey certain rules that were not o strict before. Fortunately a few simple rules go a LONG way.
1) Keep lines short!
2) Use ground planes PROPERLY (there are a few simple rules on what properly means and if you can make PCBs with sufficiently small features and clearances, it's not too difficult to obey them).
3) If you have fast logic generate slow signals, be sure to configure the logic for that particular signal to be slow (if possible - this mostly applies to PLDs), or slow it down before it connects to a long trace on the PCB. Fortunately, slowing it down is usually as simple as adding a small value resistor in series. (See that rule 1) obviously solves this problem - if lines are short, then this either means 'short for the speed in question or shorter' so no need to slow anything down).

Side story: I once spent half a day debugging a problem with some hardware made on a two-layer board, that runs at 100MHz (Yes it's possible and in fact the key to that particular hardware were short lines), because I could not figure out how a signal generated by another can appear BEFORE the signal that generates it. At some point one is bound to think 'back in time' or 'time machine'

but the solution came when I accidentally exchanged the scope probes I was using to look at the signals, and had them the other way around. Suddenly, the signal generated from another signal appeared as it should, after the signal it's generated from, but quite a bit later than expected. The solution became obvious when I noticed one probe had an about half a meter longer wire - this added a 3ns delay to the signal from that probe compared to the shorter wire one. It should be oted that a full period of a 100MHz clock - one transition from 1 to 0, followed by one transition from 0 back to 1, takes 10ns. Logic guaranteed to work at 100MHz is routinely capable of sub-ns delays in typical load and temperature situations, so 3ns (30% error between same direction edges, 60% error between two successive edges) is HUGE. If you think this is far removed from the QL, consider this: GAL1 on the QUBIDE is specified for 15ns speed, and is guaranteed to work at 66MHz clock rates. However, typical speeds when configured as simple logic tend to be about twice that. And - how many years ago was that?

A 68020 capable of operating at 33MHz must have signal edge speeds multiples of that to satisfy all it's timing parameters - although 33MHz would suggest 16ns edges, in reality it's more like 3-4ns. Passing this signal further than about 50cm requires perfect ground plane and RF techniques, and with realistic ground planes and traces, 15-20cm is about the limit.
This is why you have to take a 'speed-tiered' approach to things. What needs to be fastest, must be closest to the CPU and obviously without buffering as they add to the delays. In this case, it's the RAM that must be the fastest as second only the the CPU interior, the RAM is the place where most stuff is going on. ROM (Flash, EPROM...) is generally at least somewhat slower but the idea is to (eventually) run ROM code from it's copy in RAM, mainly for reasons of memory width. Why not wide (32-bit) ROM? Although of course possible, adding more pins to every signal, with the grand majority of them going in parallel from one ship to the next (all data and address lines), lengthens the signal lines that should be kept short, and also loads the more, although loading makes them slower so there is a compensation effect of sorts. Caution - designing for lots of RAM and ROM can end up in an unstable system when only some of the RAM or ROM is fitted. This is because signal lines are lengthened to reach un-populated spaces for chips, but no chips are fitted that would load them and slow them down to compensate for the added length. Hence the advent of memory modules - they add their own line lengths only when fitted into a socket, and it's not a cumulative function because they do not extend lines, but rather add branches off of a short main line. When implementing say expandable static RAM, think about modules that stack vertically on top of each other, or something like old style 72-pin SIMMs.

If the contents of a ROM are copied to RAM, one could use 8-bit wide ROM. Since it's only copied once, slower speed grades (put possibly higher capacity) may be feasible. Also, some peripherals (a good example may be an IDE interface) can approach speeds of a typical EPROM or Flash chip. One way to handle these is by providing some buffer chips between the fast portion and these chips. However, in the case of a ROM, one might use sufficiently large sizes so that only a single chip is needed - so there is only one additional load per each CPU signal regardless of the ROM capacity (which means we limit ROM size to some sensible capacity to satisfy this). On the other hand, an IDE interface connects to a hard drive via a rather long cable, which implies buffering - so no need to buffer something only to have it buffered again. What I'm trying to say is that sometimes the speed separation problem can 'solve itself'.

On the lowest speed end (and I am speaking relative to a fast CPU, which may still be faster that the old QL bus!), we have comparatively slow peripherals, either by their nature (serial, parallel ports, floppy controller, keyboard, mouse interfaces etc) or by them being re-used from the original QL (ZX 8301/8302 ULA). All of these are usually 8-bit. Also, if an 8-bit expansion bus is implemented (with some enhancements), it's really a sort of 'universal' slow 8-bit peripheral. Speed is sacrificed for signal integrity, because here we are looking at signal lines which are outside the control of the designer, and will possibly be long and less than perfect at signal integrity, so we need to plan in terms of a 'worst case'. This is not far removed from actuality even if we completely disregard the expansion bus. Most peripherals from this category must have certain positions on the board, close to relevant cut-outs on the case, connectors, or other mating parts that we need to be compatible with. So, we can expect that at least some signal lines will be quite long - the 8-bit data bus would be one example. Perhaps fewer address lines will be long (maybe 8 or so) if we handle decoding for these devices centrally. Data strobes and read-write signals will also be long lines. In any case these then must be properly buffered (explicitly by buffers or by things like decoder logic that generates them so they do not come directly from the CPU), and slowed down with series termination to account for such a state of affairs. It's also probable that these will extend from end to end of the board or close enough.

More on details of how to implement this later...

One thing Nasta said in an earlier post is certainly true:- a QL with a 68020 at 7.5MHz is noticeably faster than a 68008 at 7.5 MHz. I can't give a figure though, because it's not a consistent percentage faster, and depends on the type of task it's doing. Loops involving reads/writes and moving blocks of data get a lesser increase over tight loops that fit in the cache, which can speed up a LOT.

I think the cache is initially disabled. However, the 020 has a 'look back' feature indepoendend of the cache, it uses a multi-stage pipeline approach to instruction deciding and execution. Because of this, it may be possible to fit up to 3 instructions in the pipeline, and if a loop jumps back one or two instructions, they can be executed from within the pipeline rather than being fetched from memory again. This is sometimes referred to as '020 loop mode'. Something like MOVE.L (Ax)+, (Ay)+ then DBRA back to the MOVE will appear only to read data from (Ax) and write it to (Ay) and only read the MOVE.L and DBRA instructions once at the beginning of the loop. Even if the 020 was made to precisely emulate the way the 68008 uses the bus (including 8-bit width) it would take less than 2/3 of the time a 68008 takes at the same clock (because the 68020 also overlaps Ax and Ay increments with the actual reading and writing of the data). In it's native bus mode (32 bit and less cycles per access) the 020 would be about 12x as fast as the 68008 at the same clock speed for this particular case! Still, even in the general case it's faster precisely because it's better at overlapping various stages of instruction execution between successive instructions, and the hardware has been significantly enhanced to speed up complex instructions and address calculations. Enabling the cache will speed it up significantly as reading instructions from the cache takes up only a single cycle and the cache is 32-bit wide, however, as I mentioned in the previous long posts, simply enabling the cache is not a complete solution, cache integrity has to be taken into account when new code is loaded into locations previously occupied by code that may be left over in the cache.

Nasta · Post by **Nasta** » Wed May 01, 2013 1:42 am

I've had a short time to look into the bus ownership arbitration implemented on the 020, and sadly it makes setting up two CPUs to use the same bus quite complicated, and impractical.
The issue is that the 020, like it's predecessors, expects to implicitly own the bus initially, and an external device to ask for it's use - and with two CPUs, there would initially be two masters, which is an obvious problem. However, a bigger problem is that once the CPU gives up the bus there is no way for external hardware to detect that it needs it again, unless it gives the CPU bus ownership and waits and sees what it will do. The wait is the problem - it may take many cycles before the CPU actually does something, during which the CPU we just took bus ownership from, might have done useful work.
There might be a way to get around this but it only works on the full 020, not the EC020.
The 060 has a much more friendly nature on the bus - it does not assume ownership, rather requests it immediately after reset and patiently waits for the bus to be granted it. This makes it very easy to implement a fair bus sharing scheme - simply start with one CPU and if at the end of it's present cycle the other is requesting the bus, stop granting it to the present owner CPU and grant it to the other one. If it's not, do nothing. In a single CPU setup, the bus request signal is tied directly to the bus grant signal so the CPU grants itself the bus by requesting it.

However, the other bits and pieces mentioned can be implemented with relative ease, and I would venture to say, with nothing more complicated re PLDs than various GAL chips.

Peter · Post by **Peter** » Sat Nov 30, 2013 6:48 pm

I was searching this thread for a specific info on the Gold Card ROM shadowing (which I didn't find). But here's a little remark on high speed interrupts.

Nasta wrote:Fortunately, some of this work was done before, on the Q40/60. It implemented a fast poll interrupt (20kHz IIRC!) and it showed that the CPU could handle this approach with clever programming. And since the programming was after all done by Tony Tebby, you knew it was REALLY clever. The purpose of a fast poll interrupt is to support hardware that requires very quick but fairly simple interrupt processing - for instance, emptying or filling a FIFO, with a small overhead to test if it needs to be done at all.

Yes 20 kHz, or optionally 10 kHz. The Q40/Q60 sound handler was just a dedicated Level 4 Interrupt, independent from the poll interrupt, and with a higher priority. As far as I know, Tony Tebby did not implement anything special here - I had already written the routines and Mark Swift also supported it before.

The idea was just simplicity by brute CPU force. During design, I roughly calculated the interrupt handling overhead, if only the minimum of registers was saved. And to my surprise, a 40 MHz 68040 would only lose a small percentage in speed, as long as things were kept separate from the normal poll interrupt. So I decided not to implement FIFO or DMA for sound, and simply have a timer that triggers a higher priority interrupt level.

On the Q68, I went back to a FIFO buffered implementation because the CPU runs slower (the usual QL benchmarks are somewhere between Super Gold Card and QXL at the moment).

Dave · Post by **Dave** » Sun Mar 12, 2017 1:25 am

I'm reviving this thread because I have picked this up to continue working on as a hobby. This is not a project any more, just being worked on for leisure. Any results will be openly published. I have a low confidence anything will come of it, and you should too.

I pulled my QL2 prototype out of storage a few weeks ago when I was retrieving the microdrives to mail to Italy. I've played with it a little. I am happy with how far I'd come, but unhappy with many of the omissions and things I Just Don't Understand(tm).

What I have right now is a 68EC020 that has 4MB of SRAM being read as 32-bits, that works with a BBQL. It looks like when I put it down due to illness I was working on having Minerva moved to a 32 bit wide ROM image (4 of 16kx8 ROMs side by side) and was working on the logic for telling the CPU the width of the memory map at different addresses. I didn't complete that work, but it looks like I created the ROM images.

Obviously this is too complex for me to work on alone. It was also hard to fund that kind of development budget out of pocket, and obviously I failed.

So, from here, I'm just looking at what I can re-use, what I can do and what I need to drop to make it fit within my budget

Paul · Post by **Paul** » Sun Mar 12, 2017 11:21 am

Hello Dave
I also have some 68EC020 in my storage and would be pleased to discuss ideas and solutions with you.
I'm no guru in this but I can try many things.
Please pm me if you have questions or ideas to discuss.
I will send you a pm as well.
kind regards Paul

Peter · Post by **Peter** » Sun Mar 12, 2017 5:49 pm

Dave wrote:What I have right now is a 68EC020 that has 4MB of SRAM being read as 32-bits, that works with a BBQL.

If that already worked, it is a lot. Reminds me of my first QL speedup project, long before the SGC came out.
It was wire-wrapped, plugged directly into the 68008 socket, so I had the expansion slot free

Minerva had no 68020 support back then, so I patched QDOS for 68020.

Dave wrote:It looks like when I put it down due to illness I was working on having Minerva moved to a 32 bit wide ROM image (4 of 16kx8 ROMs side by side) [...]

To save the hardware effort with 4 ROMs, have you considered 32 Bit shadow RAM for the 8 bit ROM area, in connection with a little loader that copies ROM contents to RAM?
Or do you prefer a bit more hardware complexity so you don't have to touch the firmware?

Good luck!

Dave · Post by **Dave** » Sun Mar 12, 2017 6:32 pm

I don't really know that it works with any authority - I just know that it boots. There's no storage, serial, network, so lots of places it has opportunities to fall over are just not implemented. I am using the latest 1.98 Minerva image - I don't know how modded that is to accommodate the '020. I can type. A Hello World loop runs in BASIC.

Like Sinclair, I'm a firm believer in "do it in software" - if he had stayed closer to his roots with the QL it would be a much simpler machine. I know what the trickery saved him, but at what cost?

The Sinclair QL Forum

Fun things to do with an MC68EC020....

Re: Fun things to do with an MC68EC020....

Re: Fun things to do with an MC68EC020....

Re: Fun things to do with an MC68EC020....

Re: Fun things to do with an MC68EC020....

Re: Fun things to do with an MC68EC020....

Re: Fun things to do with an MC68EC020....

Re: Fun things to do with an MC68EC020....

Re: Fun things to do with an MC68EC020....

Re: Fun things to do with an MC68EC020....

Re: Fun things to do with an MC68EC020....