FGPA Anyone?

Nagging hardware related question? Post here!
daniel_baum
Bent Pin Expansion Port
Posts: 90
Joined: Sat Aug 26, 2017 11:58 am

Re: FGPA Anyone?

Post by daniel_baum »

Hi Derek,

My problems were are the Mister core on the Mister, not the Q68. The Q68 works very nicely, both with SMSQ/E and Minerva.

I had some problems making getting the QL-SD on the Mister to work properly. I solved that, and now find that the system is very unstable.

Further investigation has shown that it is much more stable without the Pointer Environment. Not sure what to make of that.

One more thing - the most common lock-ups I am getting are when quitting applications, does this ring a bell with anyone?

D.


User avatar
mk79
QL Wafer Drive
Posts: 1349
Joined: Sun Feb 02, 2014 10:54 am
Location: Esslingen/Germany
Contact:

Re: FGPA Anyone?

Post by mk79 »

I did a lot of work on the MiSTer core a year ago. Because of some Greek guy I happen to know :-P I resurrected it a week or two ago and a few minutes ago I finally put the stuff online:

https://www.kilgus.net/ql/mister/

I don't have time to test everything but I'm pretty sure I uploaded the right files. Enjoy.

Marcel


User avatar
M68008
Trump Card
Posts: 223
Joined: Sat Jan 29, 2011 1:55 am
Contact:

Re: FGPA Anyone?

Post by M68008 »

Good job!
I was thinking about the same these days... glad I don't have to even be tempted since it's already done! :)

I'll probably end up getting a Mister some day to play with it and learn some FPGA rudiments, but every time I'm ready to buy I'm turned off by the price of the pre-built IO boards (I understand the VGA output is needed to recompile the HDL without a paid Quartus licence). At that price point it's almost tempting to go with the DE10-Standard instead, but I don't think Mister supports it.

I've been working on cycle-precise timings for Q-emuLator for years, simulating the ZX8301 and running micro-benchmarks on the QL, and I'm almost there (within about 2%), but the reverse engineering of the 68k microcode and the new CPU emulations based on it are fantastic recent developments and make my task easier. Still not sure about the IPC timings, though, I suspect the 8049 disassembly online and the firmware in my QL are different versions as I'm getting different timings... I'll try to dump mine and compare.


User avatar
mk79
QL Wafer Drive
Posts: 1349
Joined: Sun Feb 02, 2014 10:54 am
Location: Esslingen/Germany
Contact:

Re: FGPA Anyone?

Post by mk79 »

M68008 wrote:I'll probably end up getting a Mister some day to play with it and learn some FPGA rudiments, but every time I'm ready to buy I'm turned off by the price of the pre-built IO boards
Yeah, I built my board myself. But if you count the time it's probably not worth it ;)
(I understand the VGA output is needed to recompile the HDL without a paid Quartus licence).
This is not true anymore, somebody implemented an open source scaler so there are no more commercial IP blocks. But you need the I/O board for the SD-card slot (or maybe a smaller SD-card only board if that exists).
I've been working on cycle-precise timings for Q-emuLator for years, simulating the ZX8301 and running micro-benchmarks on the QL, and I'm almost there (within about 2%), but the reverse engineering of the 68k microcode and the new CPU emulations based on it are fantastic recent developments and make my task easier.
Cool. I've implemented the DRAM simulation the way Nasta described it here and get well within the ballpark of the QL speed, but it's not perfect yet and I'm not sure why.
Still not sure about the IPC timings, though, I suspect the 8049 disassembly online and the firmware in my QL are different versions as I'm getting different timings... I'll try to dump mine and compare.
MiSTer comes with a real MCS48 and executes a ROM dump, you'll find ipc8049.hex in the source. Of course no idea where the dump is from, but then where else could it be from than an original chip...

Marcel


User avatar
Pr0f
QL Wafer Drive
Posts: 1298
Joined: Thu Oct 12, 2017 9:54 am

Re: FGPA Anyone?

Post by Pr0f »

mk79 wrote:]MiSTer comes with a real MCS48 and executes a ROM dump, you'll find ipc8049.hex in the source. Of course no idea where the dump is from, but then where else could it be from than an original chip...

Marcel
There were a couple of versions of the IPC - the one in the docs folder in "The Distribution" seems to be relatively late version and is the same version in 2 of the IPC chips I have - I am not sure if the code was version stamped in any way - Hermes apparently is.


daniel_baum
Bent Pin Expansion Port
Posts: 90
Joined: Sat Aug 26, 2017 11:58 am

Re: FGPA Anyone?

Post by daniel_baum »

mk79 wrote: But you need the I/O board for the SD-card slot
With many cores, the need for the second SD-card has been replaced with a virtual hard disk file on the main SD-card. AFAIK, the only cores that still actually need the second SD-Card reader are the Multicomp and the QL. Could this be done for the QL too?

D.


User avatar
M68008
Trump Card
Posts: 223
Joined: Sat Jan 29, 2011 1:55 am
Contact:

Re: FGPA Anyone?

Post by M68008 »

mk79 wrote:This is not true anymore, somebody implemented an open source scaler so there are no more commercial IP blocks. But you need the I/O board for the SD-card slot (or maybe a smaller SD-card only board if that exists).
Good to know!
I've implemented the DRAM simulation the way Nasta described it here and get well within the ballpark of the QL speed, but it's not perfect yet and I'm not sure why
I can replicate the QL speed of some microbenchmarks I wrote to within a fraction of a cycle, so the logic I use is likely either correct or very close. Here it is, from reverse engineering my notes and prototype (hopefully without too many inaccuracies):
  • 312 scanlines
  • the ZX8301 grabs the bus 32 times each of the 256 video lines, and 8 times for RAM refresh during the other lines
  • 480 cycles per line. Each time the bus is grabbed, the VDA signal is active (VDA=1) for 8 cycles and the distance between the grabs (VDA=0) is 4 cycles
  • Now things become interesting with the interaction with the 68K. Assume the 68K starts a memory access:
    • After 2 cycles of the memory access for reads and after 3 cycles for writes, determine whether the access can be completed immediately:
    • if VDA is 0 and will be 0 for the next 3 cycles, then the memory access completes immediately, which takes 2 cycles.
    • If the access can't be completed immediately, wait for the next video/refresh slot to complete (VDA changing to 1 if it isn't already, then going back to 0). Wait an additional cycle. Finally complete the 68K memory access (2 cycles).
  • (To delay CPU memory accesses, the ZX8301 uses the DTACK signal)
This algorithm seems to work well to describe the behaviour as seen from the CPU and should result in the correct timings. It may not be 100% correct in terms of signals, for example I don't remember if VDA is actually half cycle longer/shorter than I described (the video logic uses a 15 MHz clock), it's possible that the ZX8301 'decides' earlier whether to assert DTACK, etc. The different duration of 68K memory reads and writes may be due to the ZX8301 only looking at AS and not DS. These timings are based on a single QL, it would be interesting to know if different revisions work differently (e.g. by running the microbenchmark).

Of course with FX68K you would need to treat each 16 bit access as two independent 8 bit accesses (each could be delayed by a different amount depending on where we are on the scanline).

MiSTer comes with a real MCS48 and executes a ROM dump, you'll find ipc8049.hex in the source.
I know, I compared that MisTer dump to the one in MESS and the 8049 disassembly... they are all identical. I bet other versions of the firmware exist, but nobody seems to have dumped them. Once I get a component I ordered from China, I'll try dumping the one in my QL, hoping it's different (and that I don't burn it! :D )


User avatar
mk79
QL Wafer Drive
Posts: 1349
Joined: Sun Feb 02, 2014 10:54 am
Location: Esslingen/Germany
Contact:

Re: FGPA Anyone?

Post by mk79 »

M68008 wrote:
I've implemented the DRAM simulation the way Nasta described it here and get well within the ballpark of the QL speed, but it's not perfect yet and I'm not sure why
I can replicate the QL speed of some microbenchmarks I wrote to within a fraction of a cycle, so the logic I use is likely either correct or very close. Here it is, from reverse engineering my notes and prototype (hopefully without too many inaccuracies):
  • 312 scanlines
  • the ZX8301 grabs the bus 32 times each of the 256 video lines, and 8 times for RAM refresh during the other lines
  • 480 cycles per line. Each time the bus is grabbed, the VDA signal is active (VDA=1) for 8 cycles and the distance between the grabs (VDA=0) is 4 cycles
  • Now things become interesting with the interaction with the 68K. Assume the 68K starts a memory access:
    • After 2 cycles of the memory access for reads and after 3 cycles for writes, determine whether the access can be completed immediately:
    • if VDA is 0 and will be 0 for the next 3 cycles, then the memory access completes immediately, which takes 2 cycles.
    • If the access can't be completed immediately, wait for the next video/refresh slot to complete (VDA changing to 1 if it isn't already, then going back to 0). Wait an additional cycle. Finally complete the 68K memory access (2 cycles).
  • (To delay CPU memory accesses, the ZX8301 uses the DTACK signal)
Thanks a lot, this sounds more or less the same as I have implemented from Nasta's description.

Code: Select all

reg [5:0] chunk;					// We got 40 chunks per display line...
reg [3:0] chunkCycle;			// ...with 12 cycles per chunk

// In chunks 0..31 the CPU only gets the last 4 cycles in the 12 cycle chunk.
// And as an access takes 4 cycles only cycle 8 can start a new access
// Similarly in chunk 39 an access can only start at cycle 8 at the latest
wire could_start = 
		chunk < 6'd32 && chunkCycle == 4'd8
	|| chunk >= 6'd32 && chunk < 6'd39
	|| chunk == 6'd39 && chunkCycle <= 4'd8;

// 00 01 02 03 04 05 06 07 08 09 10 11 00 01 02 03 04 05 06 07 08 09 10 11
//                       I W4 D1  I
// ________________________------_________________________________________  8-bit access allowed, delay dtack 2 cycles
//              I WS WS WS WS D1  I
// _______________---------------_________________________________________  8-bit access must wait, delay dtack to 3rd cycle in next slot
//                       I W4 S3 S2 S1 WN D1  I
// ________________________------------------_____________________________  16-bit access allowed, chunk 32+
//              I WS WS WS WS S3 S2 S1 WN WN WN WN WN WN WN WN WN D1  I
// _______________---------------------------------------------------_____  16-bit access must wait, chunk < 32
(Diagram: I = idle, WS = wait for slot, W4 = STATE 4 (check UDS/LDS), S3/S2/S1 = waste slot cycles for 16-bit access, WN = wait for next slot, D1 = allow DTACK)

One major difference however is the handling of the invisible lines, your description contradicts Nasta's here, I think:
Nasta wrote:But wait, you might say, did I not say that also not all display lines are used for actual visible pixels? Yes I did, and you would be right - 256 are used out of 312. So, along with 80% of any line being used for visible pixels, roughly 82% of all lines are used for visible pixels, so it would logically follow that along with 20% of each line time being accessible to the CPU full speed, the same full speed could be had for 18% of all display lines. Alas - the cleverness of the 8301, such as it is, does not stretch that far. Sadly, it still does the same even during the invisible lines, just does no actual reading of data.
So I thought that the invisible lines have the same speed penalty (and generally I have the problem that I'm too fast and not too slow ;) ).
This algorithm seems to work well to describe the behaviour as seen from the CPU and should result in the correct timings. It may not be 100% correct in terms of signals, for example I don't remember if VDA is actually half cycle longer/shorter than I described (the video logic uses a 15 MHz clock), it's possible that the ZX8301 'decides' earlier whether to assert DTACK, etc. The different duration of 68K memory reads and writes may be due to the ZX8301 only looking at AS and not DS. These timings are based on a single QL, it would be interesting to know if different revisions work differently (e.g. by running the microbenchmark).
I'd gladly give it a go if you want.

Cheers, Marcel


User avatar
mk79
QL Wafer Drive
Posts: 1349
Joined: Sun Feb 02, 2014 10:54 am
Location: Esslingen/Germany
Contact:

Re: FGPA Anyone?

Post by mk79 »

daniel_baum wrote:With many cores, the need for the second SD-card has been replaced with a virtual hard disk file on the main SD-card. AFAIK, the only cores that still actually need the second SD-Card reader are the Multicomp and the QL. Could this be done for the QL too?
Once again I put way too much time into this thing, but yes, I was able to make it work. And even better, I have changed the QL-SD driver in a way that it not only accepts SD-card images (as these are difficult to change from a PC) but that you can just mount the .WIN files directly. Just select them from the menu. So basically no second SD slot is needed anymore. I described it in a bit more detail on my page.


User avatar
M68008
Trump Card
Posts: 223
Joined: Sat Jan 29, 2011 1:55 am
Contact:

Re: FGPA Anyone?

Post by M68008 »

mk79 wrote:I'd gladly give it a go if you want.l
At some point I started rewriting the benchmark and never completed that, but I've found a an old copy (from 2011!):
http://www.terdina.net/ql/soft/benchmark.zip

At QL speed, it should run for 20+ minutes (dots in channel #0 show progress), then spit cryptic numbers to a file.


Post Reply