Minimig improvements? (Tentative discussion)

Samurai_Crow
Posts: 13
Joined: Tue Jun 07, 2022 8:27 pm
Has thanked: 3 times
Been thanked: 6 times

Minimig improvements? (Tentative discussion)

Unread post by Samurai_Crow »

Development Idea

Someday there could become a "Maximig" core (tentative name) that throws out Amiga's round-robin timing and makes a real out-of-order processor out of the Blitter and Copper instruction sets.

As much as I liked Amiga though, I should walk before I run. (Anyone willing to sell a high-quality Terasic DE10 Nano for cheap aftermarket price? Otherwise, I'd better wait until I see how the Intel Cyclone series FPGA stacks up using a knock-off brand card.)

Phases of development

Step 1: 68040 CPU core, inorder, 4-stage single-issue pipeline

Step 2: 68060 CPU core, inorder, 4-stage dual-issue pipelines

Step 3: 68080 equivalent core, inorder, 4-stage dual-issue pipelines including integer vector unit

Step 4: 68090 class core, out-of-order 5-stage triple issue, hyperthreaded SoC using second thread to execute GPU instructions, reducing Minimig chipset footprint to squeeze it all in.

Step 5 (dreaming): 68100 class is a multi-core design using a hypervisor to replace WHD-Load and manage inter-core communication and an improved MMU/Memory Protection Unit integration to lock down the operating system.

Notes

  • Steps 1-3 follow in the footsteps of the Apollo team whose Vampire v4 design is used in the Apollo Standalone Computer.

  • Step 5 requires a bigger FPGA than a MiSTer has and would likely only be sold as an ASIC.

  • Step 4+ would need a total rework of the Minimig chipset core.

  • Open-source RISC-V SoC cores are already past step 3 but might be too big to use.

  • Converting a RISC-V SoC core to use an enhanced Minimig core might be too much work for too little reward.

  • Duplicating efforts of Apollo Team only amounts to being able to use ApolloOS as open-source while their SAGA chipset core and 68080 cpu are closed-source.

Conclusions

I'm not able to do this yet but I'm collecting thoughts to figure out what's worth the effort to learn and what is not. (No promises, IOW.) Also, I'd be interested in other core developers' thoughts on how much effort this would take. Finally, even though there is an open-source, cut-down version of a 68030 core in VHDL and could be built back up with another open-source VHDL 4-way, set-associative cache and the MOVE16 opcode added to make a 68EC040 clone, newer HDLs like Chisel could make the long haul easier to navigate.

rhester72
Top Contributor
Posts: 1323
Joined: Thu Jun 11, 2020 2:31 am
Has thanked: 15 times
Been thanked: 213 times

Re: Minimig improvements? (Tentative discussion)

Unread post by rhester72 »

First, what you're suggesting would be a titanic amount of effort for a single, very experienced FPGA developer with intimate knowledge of the processors in question (along with CPU design in general). For someone who's never touched FPGA before, it's a noble dream, but unlikely to be realizable in the practical lifetime of the MiSTer.

All that being said, Sergey (the project owner) has long been opposed to 'fantasy' systems, feeling they have no place (at least as official cores), so take that for what it is. MiSTer is largely a preservation project versus a blank canvas for systems-that-never-were (nor is it fast enough at a gate-switching level to attain anywhere near the sort of speeds you are envisioning, which are more akin to WinUAE-style JIT implementations).

Samurai_Crow
Posts: 13
Joined: Tue Jun 07, 2022 8:27 pm
Has thanked: 3 times
Been thanked: 6 times

Re: Minimig improvements? (Tentative discussion)

Unread post by Samurai_Crow »

Thanks for the warning. I'll just try making a 68040 core for the existing Minimig chipset based on the cut-down 68030 open core on GitHub. That'll replace the tiny TG68C core.

cursedverses
Posts: 180
Joined: Sun May 24, 2020 9:13 pm
Has thanked: 186 times
Been thanked: 34 times

Re: Minimig improvements? (Tentative discussion)

Unread post by cursedverses »

That's a hell of a utopia for FPGA Amiga!
I think really the next logical step would be to get the timings right on AGA/68020 (I think A500/68000 is already accurate) then expand on that.
Honestly though, while I appreciate any improvements, Minimig for me has just worked.

Samurai_Crow
Posts: 13
Joined: Tue Jun 07, 2022 8:27 pm
Has thanked: 3 times
Been thanked: 6 times

Re: Minimig improvements? (Tentative discussion)

Unread post by Samurai_Crow »

Thanks, cursedverses. I just wanted to see how far the envelope could be pushed. Speaking of which, how much DDR3 is left after the Linux programming loads in for the ARM cores?

User avatar
LamerDeluxe
Top Contributor
Posts: 1239
Joined: Sun May 24, 2020 10:25 pm
Has thanked: 887 times
Been thanked: 284 times

Re: Minimig improvements? (Tentative discussion)

Unread post by LamerDeluxe »

cursedverses wrote: Sun Sep 01, 2024 12:25 am

That's a hell of a utopia for FPGA Amiga!
I think really the next logical step would be to get the timings right on AGA/68020 (I think A500/68000 is already accurate) then expand on that.
Honestly though, while I appreciate any improvements, Minimig for me has just worked.

The A500/68000 is still not quite there. There's still a glitch in the game Hybris, which does a lot of racing the beam, that indicates chipset versus CPU timing is not completely right yet.

An FPU for the Minimig core would be great though, I used my Amiga a lot for ray-tracing back in the day, it would be cool if software like Sculpt 4D, Turbo Silver, Imagine (I programmed a number of procedural textures for that) and Lightwave worked well on the core. As well as my own Candy Factory Pro software, which runs a lot better with an FPU.

niallquinn
Posts: 136
Joined: Wed Jun 05, 2024 4:54 pm
Has thanked: 140 times
Been thanked: 33 times

Re: Minimig improvements? (Tentative discussion)

Unread post by niallquinn »

Wow, that's some list, but I think the 68040 would be enough. I know the some used it, and the Amiga 40000.

A 68030 could open up the Atari Falcon,

But, and it's a big but, the 68040, has 1.2 million transistors, so I'm guessing it's a no.

So whichever one you pick, would be a monumental task. And for not much gain. That's just my 2p worrth. Well a again for us yes, but not the dev who spends a year or two on doing it.

virtuali
Posts: 124
Joined: Mon Feb 01, 2021 10:41 pm
Has thanked: 2 times
Been thanked: 37 times

Re: Minimig improvements? (Tentative discussion)

Unread post by virtuali »

niallquinn wrote: Sun Sep 01, 2024 9:23 am

A 68030 could open up the Atari Falcon

That would require adding support for a whole extra chip, the 56001 DSP, no idea how complex it is or, if you restrict it to games only (I guess the DSP was used only in audio apps), a possible Falcon core might work without the DSP.

User avatar
Hodor
Posts: 142
Joined: Mon May 25, 2020 8:29 am
Has thanked: 378 times
Been thanked: 30 times

Re: Minimig improvements? (Tentative discussion)

Unread post by Hodor »

cursedverses wrote: Sun Sep 01, 2024 12:25 am

That's a hell of a utopia for FPGA Amiga!
I think really the next logical step would be to get the timings right on AGA/68020 (I think A500/68000 is already accurate) then expand on that.
Honestly though, while I appreciate any improvements, Minimig for me has just worked.

Agreed. A500/68000 is not fully accurate, though. There are some demos like Absolute Inebriaton that don´t work, for instance.

Samurai_Crow
Posts: 13
Joined: Tue Jun 07, 2022 8:27 pm
Has thanked: 3 times
Been thanked: 6 times

Re: Minimig improvements? (Tentative discussion)

Unread post by Samurai_Crow »

Timing irregularities in the Minimig chipset aren't within my abilities yet. Also, my original plan was to make a different core that would prioritize performance over precise timing and gut the original Minimig core except as a source reference as open-source, taking only a few bits for the parts that couldn't benefit performance improvements.

That said, improving the CPU core is much easier to do than trying to make a beefy computer out of an old one. The 68EC040 lacks floating point and MMU and the open source 68030 core is hobbled on purpose. I know how to un-hobble it but don't expect miracles yet.

Re:Atari Falcon, the original author of the 68030 core was making a Falcon and the registered version of the 68030 core has an MMU and caches already. If you like, I'll get you a link or PM you his email address if you want to pass around the hat to take up a collection. I'll see if I still have his website in my bookmarks.

Samurai_Crow
Posts: 13
Joined: Tue Jun 07, 2022 8:27 pm
Has thanked: 3 times
Been thanked: 6 times

Re: Minimig improvements? (Tentative discussion)

Unread post by Samurai_Crow »

http://experiment-s.de/en/progress/ is the link to the Atari Falcon project but it is stalled. The CPU core would still be useful though.

I've mirrored the unmodified CPU sources at https://github.com/AmigaSchool/wf68k30L if anyone wants to check it out.

Bas
Top Contributor
Posts: 623
Joined: Fri Jan 22, 2021 4:36 pm
Has thanked: 80 times
Been thanked: 324 times

Re: Minimig improvements? (Tentative discussion)

Unread post by Bas »

Interesting. I'd say the only practical reason to go beyond the 68020-workalike that we have right now, from a pure CPU-only viewpoint, would be the addition of an FPU and MMU (in no particular order). A fully decked-out Amiga 3000 still feels like the ultimate non-fantasy machine in the Amiga pedigree from before Commodore's demise.

If anything, a fully qualified 68030 with FPU and MMU also enables bigger Macs and things like 68k Linux/NetBSD on these ancient systems. SGI IRIS would also be fun. ;-)

Samurai_Crow
Posts: 13
Joined: Tue Jun 07, 2022 8:27 pm
Has thanked: 3 times
Been thanked: 6 times

Re: Minimig improvements? (Tentative discussion)

Unread post by Samurai_Crow »

68030 vs. 68040 vs. 68060

The fully registered version of the WF68K30 has a full MMU but no FPU. The 68882 FPU was not pipelined so the 68040 added some FPU ops. The 68060, in addition to the second pipeline and stack cache, added single-cycle 16-bit multiplies and more primitive FPU ops.

I feel the days of multi-dozen-clock multiplies and half-finished FPUs are painful to remember. If I make an 040, I'll likely include a full 68882 FPU eventually, an 040-class MMU (smaller and simpler than the 030 MMU) and a set-associative cache. That's probably all I'll have room for without drastic changes.

Fantasy World

The out-of-order core i was thinking about for phase 4 (and am growing ever more skeptical of) would have had to be microcoded operations throughout the instruction crackers just to keep things small. Of course, basing it on the Berkely Out-of-Order Machine (BOOM V1) would already be too big so I'd have to whittle out 64 bit ops into 32-bit RISC-V and retest before switching the instructions to 68060 and chipset ops.

For the record, I knew it wouldn't exceed 78 MHz or so, I was just going for parallelism and vectorization of the 16-bit GPU ops.

robinsonb5
Posts: 130
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 58 times

Re: Minimig improvements? (Tentative discussion)

Unread post by robinsonb5 »

Samurai_Crow wrote: Sat Aug 31, 2024 7:46 pm

(Anyone willing to sell a high-quality Terasic DE10 Nano for cheap aftermarket price? Otherwise, I'd better wait until I see how the Intel Cyclone series FPGA stacks up using a knock-off brand card.)

[I've based these comments on the assumption that you're a relative newcomer to HDL - so my apologies if I've underestimated your experience level, but I hope this is helpful anyway.]

You can actually make a start without hardware - the most useful tool you'll be using when developing a CPU core is a simulator. GHDL is very good for VHDL code, and Verilator is good for verilog / systemverilog. Both Quartus and Vivado have simulation tools, too - and unlike the Free-software options, they can do mixed-language simulation.

I have testbenches that run both TG68 and WF68L under GHDL. Spoiler-alert: WF68L's performance was disappointing bearing in mind it was running from a zero-wait-state 'ROM' - it was slightly slower than TG68 for the same clock speed.

Phases of development

Step 1: 68040 CPU core, inorder, 4-stage single-issue pipeline

Speaking as someone who's written a simple CPU core [https://github.com/robinsonb5/EightThirtyTwo], could I recommend some preliminary steps if you're serious about writing a CPU from scratch? :

Step -3: Learn to create testbenches and run simulations of VHDL and Verilog designs, and read output traces.

Step -2: Study and understand some simple CPU cores. I'd recommend looking at F32C (MIPS and RISC-V, fairly small, pipelined with branch prediction), ZPU, and maybe the OnePageComputing OPC series of CPUs - they're absurdly tiny.
[https://github.com/f32c/f32c]
[https://github.com/robinsonb5/ZPUFlex]
[https://revaldinho.github.io/opc/]

Step -1: Write a simple CPU core of your own design, maybe RISC-V, or design your own instruction set

Step 0: (This isn't something I've done, but it would be my next step if I were about to attempt what you're describing) Study and learn one or more of the higher-level HDLs, such as Chisel (as you suggested), SpinalHDL, Amaranth or Migen. I think the use of such tools will help a lot to keep the complexity under control. SpinalHDL looks particularly inviting because of what's being done with the VexRISCV and NaxRISCV cores. If you don't want to use higher level HDLs then learn to make good use of VHDL record types or systemverilog structs to keep the complexity under control.

Also, I'd be interested in other core developers' thoughts on how much effort this would take. Finally, even though there is an open-source, cut-down version of a 68030 core in VHDL and could be built back up with another open-source VHDL 4-way, set-associative cache and the MOVE16 opcode added to make a 68EC040 clone.

It'll be a phenomenal amount of work, especially if you're not already a wizard with HDL.

You're also making an assumption that a better cache will yield a big speedup - I recently made similar assumptions about performance on the SiDi128 platform, and was proved wrong; the current cache is working better than I thought, and there's very little scope for improvement there. (Not the same cache used on MiSTer, but I'd be astonished if the MiSTer core's cache wasn't as good!) [https://retroramblings.net/?p=2020]

I hope those thoughts are helpful.

Samurai_Crow
Posts: 13
Joined: Tue Jun 07, 2022 8:27 pm
Has thanked: 3 times
Been thanked: 6 times

Re: Minimig improvements? (Tentative discussion)

Unread post by Samurai_Crow »

robinsonb5 wrote: Sun Sep 01, 2024 9:44 pm
Samurai_Crow wrote: Sat Aug 31, 2024 7:46 pm

(Anyone willing to sell a high-quality Terasic DE10 Nano for cheap aftermarket price? Otherwise, I'd better wait until I see how the Intel Cyclone series FPGA stacks up using a knock-off brand card.)

[I've based these comments on the assumption that you're a relative newcomer to HDL - so my apologies if I've underestimated your experience level, but I hope this is helpful anyway.]

You assume correctly. My experience in my first college degree (2 years of electronic engineering technology) was with TTL gates in small circuits. My second college degree was 4 years of computer science.

I have testbenches that run both TG68 and WF68L under GHDL. Spoiler-alert: WF68L's performance was disappointing bearing in mind it was running from a zero-wait-state 'ROM' - it was slightly slower than TG68 for the same clock speed.

It was deliberately hobbled by the author. Did you try replacing the repeat shifter with a 3-stage barrel shifter in the instruction fetcher?

It'll be a phenomenal amount of work, especially if you're not already a wizard with HDL.

You're also making an assumption that a better cache will yield a big speedup - I recently made similar assumptions about performance on the SiDi128 platform, and was proved wrong; the current cache is working better than I thought, and there's very little scope for improvement there. (Not the same cache used on MiSTer, but I'd be astonished if the MiSTer core's cache wasn't as good!) [https://retroramblings.net/?p=2020]

I hope those thoughts are helpful.

Very! I traded off my SiDi original model when I found it had too small of an FPGA to hold a pipelined CPU. The MiST version of the Minimig core was the biggest thing it could handle. It also used the TG68C core.

Since I'm not really looking to become a professional chip designer at my age. (Sept. 29, 1974 I'm almost 50!) I just thought I'd modify an existing core. I don't want to spend the next 10 years trying to debug a complex CPU from scratch.

Thanks for your simulator recommendations!

Post Reply