Lets actually try Hybrid Emulation

foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Re latency question: probably a question of how long it waits to be handled by the slower clocked fpga side
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So continuing the dev... the actual component is defined in:
quartus/libraries/vhdl/wysiwyg/cyclonev_components.vhd

Code: Select all

component cyclonev_hps_interface_clocks_resets
        generic (
                h2f_user0_clk_freq      :       natural := 100;
                h2f_user1_clk_freq      :       natural := 100;
                h2f_user2_clk_freq      :       natural := 100;
                lpm_type        :       string := "cyclonev_hps_interface_clocks_resets"        );
        port(
                f2h_cold_rst_req_n      :       in std_logic := '0';
                f2h_dbg_rst_req_n       :       in std_logic := '0';
                f2h_pending_rst_ack     :       in std_logic := '0';
                f2h_periph_ref_clk      :       in std_logic := '0';
                f2h_sdram_ref_clk       :       in std_logic := '0';
                f2h_warm_rst_req_n      :       in std_logic := '0';
                h2f_cold_rst_n  :       out std_logic;
                h2f_pending_rst_req_n   :       out std_logic;
                h2f_rst_n       :       out std_logic;
                h2f_user0_clk   :       out std_logic;
                h2f_user1_clk   :       out std_logic;
                h2f_user2_clk   :       out std_logic;
                ptp_ref_clk     :       in std_logic := '0'
        );
end component;
It looks like the generics are not set in either case.

So, I need to instantiate in sysmem and just plumb h2f_rst_n back out to the qsys I think.

Code: Select all

signal                 dir  type        sysmem              hps fpga bridge  merged
f2h_cold_rst_req_n     in   std_logic   f2h_cold_rst_req_n  1                f2h_cold_rst_req_n
f2h_dbg_rst_req_n      in   std_logic   1                   1                1
f2h_pending_rst_ack    in   std_logic   1                   1                1
f2h_periph_ref_clk     in   std_logic
f2h_sdram_ref_clk      in   std_logic
f2h_warm_rst_req_n     in   std_logic   f2h_warm_rst_req_n  1                f2h_warm_rst_req_n
h2f_cold_rst_n         out  std_logic;
h2f_pending_rst_req_n  out  std_logic;
h2f_rst_n              out  std_logic;  h2f_rst_n           h2f_rst_n        h2f_rst_n
h2f_user0_clk          out  std_logic;  h2f_user0_clk                        h2f_user0_clk
h2f_user1_clk          out  std_logic;
h2f_user2_clk          out  std_logic;
ptp_ref_clk            in   std_logic
			
DONE: fingers crossed that it now works!
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So this is reading the first 100 words...

1114:4ef9:00f8:00d2:0000:ffff:0028:0044:0028:000a:ffff:ffff:0041:4d49:4741:2052:4f4d:204f:7065:7261:7469:6e67:2053:7973:7465:6d20:616e:6420:4c69:6272:6172:6965:7300:436f:7079:7269:6768:7420:a920:3139:3835:2d31:3939:3320:0043:6f6d:6d6f:646f:7265:2d41:6d69:6761:2c20:496e:632e:2000:416c:6c20:5269:6768:7473:2052:6573:6572:7665:642e:0033:2e31:2052:4f4d:2000:6578:6563:2e6c:6962:7261:7279:0065:7865:6320:3430:2e31:3020:2831:352e:372e:3933:290d:0a00:4e71:4e71:4afc:00f8:00b6:00f8:3706:0228:0969:00f8:008e:

Which seems to match the kickstart rom:
00000000 11 14 4e f9 00 f8 00 d2 00 00 ff ff 00 28 00 44 |..N..........(.D|
00000010 00 28 00 0a ff ff ff ff 00 41 4d 49 47 41 20 52 |.(.......AMIGA R|
00000020 4f 4d 20 4f 70 65 72 61 74 69 6e 67 20 53 79 73 |OM Operating Sys|
00000030 74 65 6d 20 61 6e 64 20 4c 69 62 72 61 72 69 65 |tem and Librarie|
00000040 73 00 43 6f 70 79 72 69 67 68 74 20 a9 20 31 39 |s.Copyright . 19|
00000050 38 35 2d 31 39 39 33 20 00 43 6f 6d 6d 6f 64 6f |85-1993 .Commodo|

so then I did this on the arm:
for (;;)
{
int i = 0xdff180;
virtual_base16[i/2] = rand();
}

and I get this:
Attachments
IMG_8644.JPG
IMG_8644.JPG (3.33 MiB) Viewed 8238 times
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

So, it works... Now need a 68K emulator to plug in and also to plumb interrupts. I could add another avalon slave for polling interrupts thought it'd be better with real interrupts. I'll try the slave method for now then improve after.

Doing some timing and seem to get ~2MB/s. So better fix that!
Blitzwing
Posts: 103
Joined: Sat Sep 05, 2020 9:52 pm
Has thanked: 11 times
Been thanked: 24 times

Re: Lets actually try Hybrid Emulation

Unread post by Blitzwing »

foft wrote: Sat Apr 10, 2021 9:34 am So, it works... Now need a 68K emulator to plug in and also to plumb interrupts. I could add another avalon slave for polling interrupts thought it'd be better with real interrupts. I'll try the slave method for now then improve after.

Doing some timing and seem to get ~2MB/s. So better fix that!
Congrats, great to see something come to life.
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I'm only supposed to achieve 3.5MB/s on A500 and 7MB/s on A1200 right?

It seems that there is delay as follows:
i) 70% waiting for chip ram.
ii) 30% waiting for avalon to start next transfer.

I think tackling (ii) by making use of waitrequestAllowance will get it up to 3.5MB/s.

Does anyone have to hand actual MB/s for AGA and OCS chip reads and writes on the Minimig?
User avatar
Sorgelig
Site Admin
Posts: 890
Joined: Thu May 21, 2020 9:49 pm
Has thanked: 2 times
Been thanked: 214 times

Re: Lets actually try Hybrid Emulation

Unread post by Sorgelig »

qsys is heavy and cluttered system.
I suggest to generate the code in qsys and then just take *fpga_interfaces module which will include all bridges you've configured in qsys.
you don't even need to add it to framework. You can instantiate the bridges right in your code. Just make sure you don't use already used modules such as DDR.
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

It seems to have a fair bit of interconnect logic in there between the axi mm bridge and the slaves (hps_fpga_bridge_mm_interconnect_0). Perhaps removing that will cut some latency. I'm checking the signals on the logic analyzer to see where they appear on the master side.

Intel do seem to want everyone to use qsys, given the gui does not allow instantiating these lower level entities and there isn't proper documentation that I can find.
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Seems worth clocking the Avalon slave a little higher.

I had been using the 28MHz system clock. If I make the slave response immediate (i.e. never set waitrequest) with a tight loop on the HPS side doing 16-bit reads I get an access every 10 cycles. i.e. 2.8MHz, so 5.6MB/s.
With the clock at 4x (114MHz), I get an access every 16 cycles so 7.1MHz or 14.2MB/s.

Of course to get more throughput I could use 32-bit/64-bit or even 128-bit (giving 112MB/s max). However this is chipram we're talking about, so even on A1200 7MB/s will cut it.

Also of course I'm still talking about an immediate chipram response, which is far from true.
robinsonb5
Posts: 130
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 58 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

On MiST and TC64 the SDRAM controller and CPU logic runs at 114MHz - necessary since on those platforms everything has to run from a single SDRAM - MiSTer uses a simpler SDRAM controller and lower clock since it only has to deal with Chip RAM.

Making it 32-bits wide makes sense, since AGA machines had that bus width. If it turns out that you can't match AGA chip RAM bus speeds it won't be completely disastrous, because (a) for games it makes more sense to use the FPGA-side CPUs, and (b) for the kinds of things where the speed of the hybrid emulation make more sense, chances are you'll be using RTG. (Also, until recently MiST Minimig's chip RAM speed was somewhat lower than a real AGA machine, and in practice most games were still fine.)
User avatar
Sorgelig
Site Admin
Posts: 890
Joined: Thu May 21, 2020 9:49 pm
Has thanked: 2 times
Been thanked: 214 times

Re: Lets actually try Hybrid Emulation

Unread post by Sorgelig »

I think it's better to drop the idea to have both FPGA and HPS CPUs in a single core. It will require a lot of interconnects and non-optimal work. We still can have 2 cores - original Minimig and hybrid one. They may share the same internal name and use the same folders and files, so it won't have much difference between changing the CPU inside the core or load another core.
User avatar
Sorgelig
Site Admin
Posts: 890
Joined: Thu May 21, 2020 9:49 pm
Has thanked: 2 times
Been thanked: 214 times

Re: Lets actually try Hybrid Emulation

Unread post by Sorgelig »

Actually, ao486 core is more suitable for first hybrid attempt. It requires less work on integrating as there is no special ChipRAM there nor timing accurate bus. it's also grown from avalon system bus. It was using real avalon when it was based on qsys design.
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

For either ao486 or minimig its the same setup that we need to get right. i.e understand and connect up the hps-fpga bridge and get it working efficiently. Figure out the interrupt plumbing to the arm: e.g. polled avalon slave or using the fpga2hps interrupts and writing a kernel driver to handle it.

How much throughput do we need on ao486? In some ways minimig is not demanding because the cpu-chip ram interface is so slow on the A500 and A1200 anyway! We might need to push it further on ao486?

Last night I did some tests on the bridge. With a 16-bit bridge I checked 8-bit accesses, misaligned 16-bit accesses and 32-bit accesses (aligned and misaligned 1 byte,2 bytes). All of these 'just work' which is convenient.

As I said earlier I tried clocking the axi bridge higher and got much better performance in terms of (nop) transactions per second. I used the x4 clock since it was there, though it could be clocked higher. Of course this means we need to add clock domain crossing. Since the clocks are aligned from the same pll this is just a case of putting a register in to make meeting timing workable (just about anyway...). I found in qsys there is a handy component to do cross domain sync for me, so thought I'd save the work by including the avalon mm cross domain crossing bridge (works using a fifo). However it didn't work, I don't know why so back to plan A of just adding this myself. As a reminder it took 10 cycles per transaction at 28MHz and 16 cycles at 4x, so 10 vs 4 at 28MHz. So adding a 2 cycle delay will get us to 10 vs 6 (worst case).
edit: small note: I put the axi bridge itself on signaltap to see when the axi levels are triggered, so see if stripping that layer is worthwhile. I see maybe 2-3 (axi clock) cycles here. AXI looks much more complex though so I think its not worth it.

I also make a start on cross compiling the cpu code so I can try running some software soon.

First things first though, I have to fix the springs on my dishwasher door!
chanunnaki
Posts: 105
Joined: Tue Jul 07, 2020 1:33 am
Been thanked: 19 times

Re: Lets actually try Hybrid Emulation

Unread post by chanunnaki »

Forget about the dishwasher, this is far more urgent. :P
lordoftime79
Posts: 111
Joined: Sun Feb 14, 2021 6:29 pm
Has thanked: 1 time
Been thanked: 5 times

Re: Lets actually try Hybrid Emulation

Unread post by lordoftime79 »

This is amazing progress! really well done!
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Status:
i) Mushahi cross compiled/wired up in the most basic way!
ii) Interrupts exposed, for now polled every 100 cycles!
iii) Bus speed up completed
edit iv) Dishwasher door fixed!

So, fingers crossed... here we go!
Neocaron
Top Contributor
Posts: 375
Joined: Sun Sep 27, 2020 10:16 am
Has thanked: 209 times
Been thanked: 87 times

Re: Lets actually try Hybrid Emulation

Unread post by Neocaron »

Good luck :D

Remastering Classic Game Cinematics: My new Youtube fun, check it out :D
https://www.youtube.com/@neocaron87

foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Code: Select all

bool works = false;
while(!works)
{
  fix();
  works = test();
}
Getting closer... Think I just fixed one last nasty bug and, fingers crossed, it'll work this time!
robinsonb5
Posts: 130
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 58 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Mon Apr 12, 2021 5:52 pmGetting closer... Think I just fixed one last nasty bug and, fingers crossed, it'll work this time!
"Hopefully this will now not work less than it didn't work before!"
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Diagrom is now running...

I was just running into endian issues, which I fixed fairly quickly. The HPS-FPGA is little endian, so I need to do conversion on both sides. I missed one of the places in the hardware side - there are two writedata's for some reason - which confused me quite a lot until I found it.
zakk4223
Posts: 289
Joined: Sun May 24, 2020 10:55 pm
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by zakk4223 »

foft wrote: Mon Apr 12, 2021 5:52 pm

Code: Select all

bool works = false;
while(!works)
{
  fix();
  works = test();
}
Getting closer... Think I just fixed one last nasty bug and, fingers crossed, it'll work this time!
I can't help but notice your test function is undefined...
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

zakk4223 wrote: Mon Apr 12, 2021 6:27 pm I can't help but notice your test function is undefined...
Ah, that's the problem. I thought it was the lack of a fix function all along!

I've got something weird still with the chipmem accesses. Sometimes it responds much more slowly and sometimes not at all.
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I added fast ram. Oddly I'm getting spurious memory failures when using it. This is straight from the arm, so no fpga side involved (unless the fpga-hps bridge is writing here?).

I'm mmapping 0x20000000-0x20000000+384MB. That is the correct area reserved in the DDR for fast ram right?

edit: changed to 0x10000000 and it seems happier - be good to know the 'correct' address though.
zakk4223
Posts: 289
Joined: Sun May 24, 2020 10:55 pm
Been thanked: 120 times

Re: Lets actually try Hybrid Emulation

Unread post by zakk4223 »

Code: Select all

#define MISTER_SCALER_BASEADDR     0x20000000
That might explain things ;)
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

Haha, that'd do it! Do you know where the fast ram should go?

I can get to the boot screen and start loading some stuff now. Though its rather crashy! Probably this chip ram timing issue I need to dig into.
lordoftime79
Posts: 111
Joined: Sun Feb 14, 2021 6:29 pm
Has thanked: 1 time
Been thanked: 5 times

Re: Lets actually try Hybrid Emulation

Unread post by lordoftime79 »

its seems you having made amazing progress!!!
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

I think/hope tonight I'll get it booted to workbench.
lordoftime79
Posts: 111
Joined: Sun Feb 14, 2021 6:29 pm
Has thanked: 1 time
Been thanked: 5 times

Re: Lets actually try Hybrid Emulation

Unread post by lordoftime79 »

^^^ This I wanna see!
foft
Posts: 344
Joined: Thu Dec 03, 2020 11:05 am
Has thanked: 29 times
Been thanked: 125 times

Re: Lets actually try Hybrid Emulation

Unread post by foft »

OK, perhaps another day :D

Been getting confused by a few things. For example the 68k bus dtack_n is supposed to be asserted when the data is ready for the cpu to read. However I see in the logic that dtack triggers the sdram controller to START a read. So then there is MUCH later chip_ready, way after the data is actually ready. Anyway this looks like its because minimig.v was written to use sram, which can get a result on the same cycle and here we are using sdram. However the timing was never adjusted to wait for the sdram. I'm not sure how fx68k works though since it uses dtack_n and ... works.

Anyway I'm starting to agree with @Sorgelig that I should go with a separate core for the hybrid. Then I can strip out a lot of the extra complexity that will be taken care of on the arm side: cache, fastram, etc etc. Then it becomes just an instance of minimig.v (modified for sdram rather than sram) and the hps-fpga interface logic. Anyway will sleep on it and think what to do tomorrow!
robinsonb5
Posts: 130
Joined: Fri Jun 19, 2020 8:54 pm
Has thanked: 13 times
Been thanked: 58 times

Re: Lets actually try Hybrid Emulation

Unread post by robinsonb5 »

foft wrote: Tue Apr 13, 2021 8:41 pmBeen getting confused by a few things. For example the 68k bus dtack_n is supposed to be asserted when the data is ready for the cpu to read. However I see in the logic that dtack triggers the sdram controller to START a read. So then there is MUCH later chip_ready, way after the data is actually ready.
What might be confusing you is that there are two different paths into Chip RAM. I'm not as familiar with MiSTer's Minimig core as I am the MiST/TC64 variant, so I'm not sure whether it's the D-Cache or AGA settings (or both) which determine which path is taken, but one roughly equates to A500 speed, the other to A1200 speed.

When using the slower path, the CPU sends requests to the custom chips, to the kickstart and to chip RAM all via Minimig.v which in turn sends the access to the SDRAM controller's "chip" port. In this mode, the TG68's cycle is terminated by the chipready signal which the CPU wrapper generates from DTACK.
FX68 has a more traditional 68k interface so uses DTACK directly.

In turbo mode the CPU bypasses minimig.v and accesses the SDRAM directly for Chip RAM and Kickstart. These cycles are terminated by the ramready signal instead.
Anyway this looks like its because minimig.v was written to use sram, which can get a result on the same cycle and here we are using sdram. However the timing was never adjusted to wait for the sdram. I'm not sure how fx68k works though since it uses dtack_n and ... works.
Minimig.v is designed to operate on the Amiga's original four 7MHz ticks per cycle, and the SDRAM controller has a fixed cycle, completing one round per 7Mhz tick with deterministic timing, so it's guaranteed to respond on schedule.
Post Reply