mk79 wrote:On the unexpanded QL I currently guess that streaming whole frames directly from SD to screen memory might even be the fastest approach overall.
Well, almost the fastest. The only approach I found faster (with an unmodified QL-SD) than whole frames, is to store "half" frames with 1 bit per pixel instead of R and G separately. Then use a specialized SD card routine, which writes every byte from SD card immediately to two consecutve screen bytes, expanding on-the-fly from 16 KB per frame on the card to 32 KB on the screen.
This way I streamed an unfragmented file outside the QL filesystem container, with modified versions of my original FAT32 / SD card routines. Those were still "only" in C language, before Adrian and Wolfgang translated them to assembler for the actual QL-SD and Q68 drivers. Still the result did not look too bad on the original machine. To have the file outside the QLWA container made streaming long files possible for
128K RAM. This way, I could use a small 8 MB QL filesystem container. With large containers, the driver consumed (too) much memory.
By the way, the other demo scenes, like Spectrum, were using original machines just with mass storage! It would be nice if that could be done for the QL as well - otherwise people might blame the success on the SuperGoldCard. Which is an almost completely different machine than the QL, two CPU generations newer, with cache and 32 bit RAM.
I was mentioning "unmodified QL-SD" because there is a relatively simple trick to improve the QL-SD hardware in a way that the "whole frames" approach becomes the fastest possible, even faster than "half frames", despite the doubled video file size. QL-SD reads are equivalent to ROM accesses - and as such faster than instruction fetches from contended 128 KB QL DRAM. To exploit this, I tricked QL-SD into delivering two bytes at once by a word access to $FEE4, simply by ignoring the lowest address bit when decoding. Should be fully backward compatible, although I have no time for any testing. Unfortunately, I decoded transfer speed switching at $FEE6, limiting this trick to word size. Otherwise, even longword access would be an option.
A bit sad I didn't implement this hardware trick, when I designed QL-SD in the first place. Looking back, I even had the idea early enough - but forgot it, because I worked on the Q68 at the same time, where it was not needed. (The Q68 is theoretically fast enough to allow undelayed consecutive SD card reads even at byte access width.)