Lets actually try Hybrid Emulation
Re: Lets actually try Hybrid Emulation
So continuing the dev... the actual component is defined in:
quartus/libraries/vhdl/wysiwyg/cyclonev_components.vhd
It looks like the generics are not set in either case.
So, I need to instantiate in sysmem and just plumb h2f_rst_n back out to the qsys I think.
DONE: fingers crossed that it now works!
quartus/libraries/vhdl/wysiwyg/cyclonev_components.vhd
Code: Select all
component cyclonev_hps_interface_clocks_resets
generic (
h2f_user0_clk_freq : natural := 100;
h2f_user1_clk_freq : natural := 100;
h2f_user2_clk_freq : natural := 100;
lpm_type : string := "cyclonev_hps_interface_clocks_resets" );
port(
f2h_cold_rst_req_n : in std_logic := '0';
f2h_dbg_rst_req_n : in std_logic := '0';
f2h_pending_rst_ack : in std_logic := '0';
f2h_periph_ref_clk : in std_logic := '0';
f2h_sdram_ref_clk : in std_logic := '0';
f2h_warm_rst_req_n : in std_logic := '0';
h2f_cold_rst_n : out std_logic;
h2f_pending_rst_req_n : out std_logic;
h2f_rst_n : out std_logic;
h2f_user0_clk : out std_logic;
h2f_user1_clk : out std_logic;
h2f_user2_clk : out std_logic;
ptp_ref_clk : in std_logic := '0'
);
end component;
So, I need to instantiate in sysmem and just plumb h2f_rst_n back out to the qsys I think.
Code: Select all
signal dir type sysmem hps fpga bridge merged
f2h_cold_rst_req_n in std_logic f2h_cold_rst_req_n 1 f2h_cold_rst_req_n
f2h_dbg_rst_req_n in std_logic 1 1 1
f2h_pending_rst_ack in std_logic 1 1 1
f2h_periph_ref_clk in std_logic
f2h_sdram_ref_clk in std_logic
f2h_warm_rst_req_n in std_logic f2h_warm_rst_req_n 1 f2h_warm_rst_req_n
h2f_cold_rst_n out std_logic;
h2f_pending_rst_req_n out std_logic;
h2f_rst_n out std_logic; h2f_rst_n h2f_rst_n h2f_rst_n
h2f_user0_clk out std_logic; h2f_user0_clk h2f_user0_clk
h2f_user1_clk out std_logic;
h2f_user2_clk out std_logic;
ptp_ref_clk in std_logic
Re: Lets actually try Hybrid Emulation
So this is reading the first 100 words...
1114:4ef900d2:0000:ffff0044000a:ffff:ffff4d49:47414f4d7065:7261:7469:6e677973:7465:6d20:616e:6420:4c69:6272:6172:6965:7300:436f:7079:7269:6768:7420:a920383539390043:6f6d:6d6f:646f:72656d69:6761496e:632e416c:6c20:5269:6768:74736573:6572:7665:642e2e314f4d6578:65636962:7261:72797865:63202e312831372e290d4e71:4e71:4afc00b637060969008e:
Which seems to match the kickstart rom:
00000000 11 14 4e f9 00 f8 00 d2 00 00 ff ff 00 28 00 44 |..N..........(.D|
00000010 00 28 00 0a ff ff ff ff 00 41 4d 49 47 41 20 52 |.(.......AMIGA R|
00000020 4f 4d 20 4f 70 65 72 61 74 69 6e 67 20 53 79 73 |OM Operating Sys|
00000030 74 65 6d 20 61 6e 64 20 4c 69 62 72 61 72 69 65 |tem and Librarie|
00000040 73 00 43 6f 70 79 72 69 67 68 74 20 a9 20 31 39 |s.Copyright . 19|
00000050 38 35 2d 31 39 39 33 20 00 43 6f 6d 6d 6f 64 6f |85-1993 .Commodo|
so then I did this on the arm:
for (;;)
{
int i = 0xdff180;
virtual_base16[i/2] = rand();
}
and I get this:
1114:4ef900d2:0000:ffff0044000a:ffff:ffff4d49:47414f4d7065:7261:7469:6e677973:7465:6d20:616e:6420:4c69:6272:6172:6965:7300:436f:7079:7269:6768:7420:a920383539390043:6f6d:6d6f:646f:72656d69:6761496e:632e416c:6c20:5269:6768:74736573:6572:7665:642e2e314f4d6578:65636962:7261:72797865:63202e312831372e290d4e71:4e71:4afc00b637060969008e:
Which seems to match the kickstart rom:
00000000 11 14 4e f9 00 f8 00 d2 00 00 ff ff 00 28 00 44 |..N..........(.D|
00000010 00 28 00 0a ff ff ff ff 00 41 4d 49 47 41 20 52 |.(.......AMIGA R|
00000020 4f 4d 20 4f 70 65 72 61 74 69 6e 67 20 53 79 73 |OM Operating Sys|
00000030 74 65 6d 20 61 6e 64 20 4c 69 62 72 61 72 69 65 |tem and Librarie|
00000040 73 00 43 6f 70 79 72 69 67 68 74 20 a9 20 31 39 |s.Copyright . 19|
00000050 38 35 2d 31 39 39 33 20 00 43 6f 6d 6d 6f 64 6f |85-1993 .Commodo|
so then I did this on the arm:
for (;;)
{
int i = 0xdff180;
virtual_base16[i/2] = rand();
}
and I get this:
- Attachments
-
- IMG_8644.JPG (3.33 MiB) Viewed 8242 times
Re: Lets actually try Hybrid Emulation
So, it works... Now need a 68K emulator to plug in and also to plumb interrupts. I could add another avalon slave for polling interrupts thought it'd be better with real interrupts. I'll try the slave method for now then improve after.
Doing some timing and seem to get ~2MB/s. So better fix that!
Doing some timing and seem to get ~2MB/s. So better fix that!
Re: Lets actually try Hybrid Emulation
Congrats, great to see something come to life.foft wrote: ↑Sat Apr 10, 2021 9:34 am So, it works... Now need a 68K emulator to plug in and also to plumb interrupts. I could add another avalon slave for polling interrupts thought it'd be better with real interrupts. I'll try the slave method for now then improve after.
Doing some timing and seem to get ~2MB/s. So better fix that!
Re: Lets actually try Hybrid Emulation
I'm only supposed to achieve 3.5MB/s on A500 and 7MB/s on A1200 right?
It seems that there is delay as follows:
i) 70% waiting for chip ram.
ii) 30% waiting for avalon to start next transfer.
I think tackling (ii) by making use of waitrequestAllowance will get it up to 3.5MB/s.
Does anyone have to hand actual MB/s for AGA and OCS chip reads and writes on the Minimig?
It seems that there is delay as follows:
i) 70% waiting for chip ram.
ii) 30% waiting for avalon to start next transfer.
I think tackling (ii) by making use of waitrequestAllowance will get it up to 3.5MB/s.
Does anyone have to hand actual MB/s for AGA and OCS chip reads and writes on the Minimig?
- Sorgelig
- Site Admin
- Posts: 890
- Joined: Thu May 21, 2020 9:49 pm
- Has thanked: 2 times
- Been thanked: 214 times
Re: Lets actually try Hybrid Emulation
qsys is heavy and cluttered system.
I suggest to generate the code in qsys and then just take *fpga_interfaces module which will include all bridges you've configured in qsys.
you don't even need to add it to framework. You can instantiate the bridges right in your code. Just make sure you don't use already used modules such as DDR.
I suggest to generate the code in qsys and then just take *fpga_interfaces module which will include all bridges you've configured in qsys.
you don't even need to add it to framework. You can instantiate the bridges right in your code. Just make sure you don't use already used modules such as DDR.
Re: Lets actually try Hybrid Emulation
It seems to have a fair bit of interconnect logic in there between the axi mm bridge and the slaves (hps_fpga_bridge_mm_interconnect_0). Perhaps removing that will cut some latency. I'm checking the signals on the logic analyzer to see where they appear on the master side.
Intel do seem to want everyone to use qsys, given the gui does not allow instantiating these lower level entities and there isn't proper documentation that I can find.
Intel do seem to want everyone to use qsys, given the gui does not allow instantiating these lower level entities and there isn't proper documentation that I can find.
Re: Lets actually try Hybrid Emulation
Seems worth clocking the Avalon slave a little higher.
I had been using the 28MHz system clock. If I make the slave response immediate (i.e. never set waitrequest) with a tight loop on the HPS side doing 16-bit reads I get an access every 10 cycles. i.e. 2.8MHz, so 5.6MB/s.
With the clock at 4x (114MHz), I get an access every 16 cycles so 7.1MHz or 14.2MB/s.
Of course to get more throughput I could use 32-bit/64-bit or even 128-bit (giving 112MB/s max). However this is chipram we're talking about, so even on A1200 7MB/s will cut it.
Also of course I'm still talking about an immediate chipram response, which is far from true.
I had been using the 28MHz system clock. If I make the slave response immediate (i.e. never set waitrequest) with a tight loop on the HPS side doing 16-bit reads I get an access every 10 cycles. i.e. 2.8MHz, so 5.6MB/s.
With the clock at 4x (114MHz), I get an access every 16 cycles so 7.1MHz or 14.2MB/s.
Of course to get more throughput I could use 32-bit/64-bit or even 128-bit (giving 112MB/s max). However this is chipram we're talking about, so even on A1200 7MB/s will cut it.
Also of course I'm still talking about an immediate chipram response, which is far from true.
-
- Posts: 130
- Joined: Fri Jun 19, 2020 8:54 pm
- Has thanked: 13 times
- Been thanked: 58 times
Re: Lets actually try Hybrid Emulation
On MiST and TC64 the SDRAM controller and CPU logic runs at 114MHz - necessary since on those platforms everything has to run from a single SDRAM - MiSTer uses a simpler SDRAM controller and lower clock since it only has to deal with Chip RAM.
Making it 32-bits wide makes sense, since AGA machines had that bus width. If it turns out that you can't match AGA chip RAM bus speeds it won't be completely disastrous, because (a) for games it makes more sense to use the FPGA-side CPUs, and (b) for the kinds of things where the speed of the hybrid emulation make more sense, chances are you'll be using RTG. (Also, until recently MiST Minimig's chip RAM speed was somewhat lower than a real AGA machine, and in practice most games were still fine.)
Making it 32-bits wide makes sense, since AGA machines had that bus width. If it turns out that you can't match AGA chip RAM bus speeds it won't be completely disastrous, because (a) for games it makes more sense to use the FPGA-side CPUs, and (b) for the kinds of things where the speed of the hybrid emulation make more sense, chances are you'll be using RTG. (Also, until recently MiST Minimig's chip RAM speed was somewhat lower than a real AGA machine, and in practice most games were still fine.)
- Sorgelig
- Site Admin
- Posts: 890
- Joined: Thu May 21, 2020 9:49 pm
- Has thanked: 2 times
- Been thanked: 214 times
Re: Lets actually try Hybrid Emulation
I think it's better to drop the idea to have both FPGA and HPS CPUs in a single core. It will require a lot of interconnects and non-optimal work. We still can have 2 cores - original Minimig and hybrid one. They may share the same internal name and use the same folders and files, so it won't have much difference between changing the CPU inside the core or load another core.
- Sorgelig
- Site Admin
- Posts: 890
- Joined: Thu May 21, 2020 9:49 pm
- Has thanked: 2 times
- Been thanked: 214 times
Re: Lets actually try Hybrid Emulation
Actually, ao486 core is more suitable for first hybrid attempt. It requires less work on integrating as there is no special ChipRAM there nor timing accurate bus. it's also grown from avalon system bus. It was using real avalon when it was based on qsys design.
Re: Lets actually try Hybrid Emulation
For either ao486 or minimig its the same setup that we need to get right. i.e understand and connect up the hps-fpga bridge and get it working efficiently. Figure out the interrupt plumbing to the arm: e.g. polled avalon slave or using the fpga2hps interrupts and writing a kernel driver to handle it.
How much throughput do we need on ao486? In some ways minimig is not demanding because the cpu-chip ram interface is so slow on the A500 and A1200 anyway! We might need to push it further on ao486?
Last night I did some tests on the bridge. With a 16-bit bridge I checked 8-bit accesses, misaligned 16-bit accesses and 32-bit accesses (aligned and misaligned 1 byte,2 bytes). All of these 'just work' which is convenient.
As I said earlier I tried clocking the axi bridge higher and got much better performance in terms of (nop) transactions per second. I used the x4 clock since it was there, though it could be clocked higher. Of course this means we need to add clock domain crossing. Since the clocks are aligned from the same pll this is just a case of putting a register in to make meeting timing workable (just about anyway...). I found in qsys there is a handy component to do cross domain sync for me, so thought I'd save the work by including the avalon mm cross domain crossing bridge (works using a fifo). However it didn't work, I don't know why so back to plan A of just adding this myself. As a reminder it took 10 cycles per transaction at 28MHz and 16 cycles at 4x, so 10 vs 4 at 28MHz. So adding a 2 cycle delay will get us to 10 vs 6 (worst case).
edit: small note: I put the axi bridge itself on signaltap to see when the axi levels are triggered, so see if stripping that layer is worthwhile. I see maybe 2-3 (axi clock) cycles here. AXI looks much more complex though so I think its not worth it.
I also make a start on cross compiling the cpu code so I can try running some software soon.
First things first though, I have to fix the springs on my dishwasher door!
How much throughput do we need on ao486? In some ways minimig is not demanding because the cpu-chip ram interface is so slow on the A500 and A1200 anyway! We might need to push it further on ao486?
Last night I did some tests on the bridge. With a 16-bit bridge I checked 8-bit accesses, misaligned 16-bit accesses and 32-bit accesses (aligned and misaligned 1 byte,2 bytes). All of these 'just work' which is convenient.
As I said earlier I tried clocking the axi bridge higher and got much better performance in terms of (nop) transactions per second. I used the x4 clock since it was there, though it could be clocked higher. Of course this means we need to add clock domain crossing. Since the clocks are aligned from the same pll this is just a case of putting a register in to make meeting timing workable (just about anyway...). I found in qsys there is a handy component to do cross domain sync for me, so thought I'd save the work by including the avalon mm cross domain crossing bridge (works using a fifo). However it didn't work, I don't know why so back to plan A of just adding this myself. As a reminder it took 10 cycles per transaction at 28MHz and 16 cycles at 4x, so 10 vs 4 at 28MHz. So adding a 2 cycle delay will get us to 10 vs 6 (worst case).
edit: small note: I put the axi bridge itself on signaltap to see when the axi levels are triggered, so see if stripping that layer is worthwhile. I see maybe 2-3 (axi clock) cycles here. AXI looks much more complex though so I think its not worth it.
I also make a start on cross compiling the cpu code so I can try running some software soon.
First things first though, I have to fix the springs on my dishwasher door!
-
- Posts: 105
- Joined: Tue Jul 07, 2020 1:33 am
- Been thanked: 19 times
-
- Posts: 111
- Joined: Sun Feb 14, 2021 6:29 pm
- Has thanked: 1 time
- Been thanked: 5 times
Re: Lets actually try Hybrid Emulation
Status:
i) Mushahi cross compiled/wired up in the most basic way!
ii) Interrupts exposed, for now polled every 100 cycles!
iii) Bus speed up completed
edit iv) Dishwasher door fixed!
So, fingers crossed... here we go!
i) Mushahi cross compiled/wired up in the most basic way!
ii) Interrupts exposed, for now polled every 100 cycles!
iii) Bus speed up completed
edit iv) Dishwasher door fixed!
So, fingers crossed... here we go!
-
- Top Contributor
- Posts: 375
- Joined: Sun Sep 27, 2020 10:16 am
- Has thanked: 209 times
- Been thanked: 87 times
Re: Lets actually try Hybrid Emulation
Good luck
Remastering Classic Game Cinematics: My new Youtube fun, check it out
https://www.youtube.com/@neocaron87
Re: Lets actually try Hybrid Emulation
Code: Select all
bool works = false;
while(!works)
{
fix();
works = test();
}
-
- Posts: 130
- Joined: Fri Jun 19, 2020 8:54 pm
- Has thanked: 13 times
- Been thanked: 58 times
Re: Lets actually try Hybrid Emulation
"Hopefully this will now not work less than it didn't work before!"
Re: Lets actually try Hybrid Emulation
Diagrom is now running...
I was just running into endian issues, which I fixed fairly quickly. The HPS-FPGA is little endian, so I need to do conversion on both sides. I missed one of the places in the hardware side - there are two writedata's for some reason - which confused me quite a lot until I found it.
I was just running into endian issues, which I fixed fairly quickly. The HPS-FPGA is little endian, so I need to do conversion on both sides. I missed one of the places in the hardware side - there are two writedata's for some reason - which confused me quite a lot until I found it.
Re: Lets actually try Hybrid Emulation
I can't help but notice your test function is undefined...foft wrote: ↑Mon Apr 12, 2021 5:52 pmGetting closer... Think I just fixed one last nasty bug and, fingers crossed, it'll work this time!Code: Select all
bool works = false; while(!works) { fix(); works = test(); }
Re: Lets actually try Hybrid Emulation
Ah, that's the problem. I thought it was the lack of a fix function all along!
I've got something weird still with the chipmem accesses. Sometimes it responds much more slowly and sometimes not at all.
Re: Lets actually try Hybrid Emulation
I added fast ram. Oddly I'm getting spurious memory failures when using it. This is straight from the arm, so no fpga side involved (unless the fpga-hps bridge is writing here?).
I'm mmapping 0x20000000-0x20000000+384MB. That is the correct area reserved in the DDR for fast ram right?
edit: changed to 0x10000000 and it seems happier - be good to know the 'correct' address though.
I'm mmapping 0x20000000-0x20000000+384MB. That is the correct area reserved in the DDR for fast ram right?
edit: changed to 0x10000000 and it seems happier - be good to know the 'correct' address though.
Re: Lets actually try Hybrid Emulation
Code: Select all
#define MISTER_SCALER_BASEADDR 0x20000000
Re: Lets actually try Hybrid Emulation
Haha, that'd do it! Do you know where the fast ram should go?
I can get to the boot screen and start loading some stuff now. Though its rather crashy! Probably this chip ram timing issue I need to dig into.
I can get to the boot screen and start loading some stuff now. Though its rather crashy! Probably this chip ram timing issue I need to dig into.
-
- Posts: 111
- Joined: Sun Feb 14, 2021 6:29 pm
- Has thanked: 1 time
- Been thanked: 5 times
-
- Posts: 111
- Joined: Sun Feb 14, 2021 6:29 pm
- Has thanked: 1 time
- Been thanked: 5 times
Re: Lets actually try Hybrid Emulation
OK, perhaps another day
Been getting confused by a few things. For example the 68k bus dtack_n is supposed to be asserted when the data is ready for the cpu to read. However I see in the logic that dtack triggers the sdram controller to START a read. So then there is MUCH later chip_ready, way after the data is actually ready. Anyway this looks like its because minimig.v was written to use sram, which can get a result on the same cycle and here we are using sdram. However the timing was never adjusted to wait for the sdram. I'm not sure how fx68k works though since it uses dtack_n and ... works.
Anyway I'm starting to agree with @Sorgelig that I should go with a separate core for the hybrid. Then I can strip out a lot of the extra complexity that will be taken care of on the arm side: cache, fastram, etc etc. Then it becomes just an instance of minimig.v (modified for sdram rather than sram) and the hps-fpga interface logic. Anyway will sleep on it and think what to do tomorrow!
Been getting confused by a few things. For example the 68k bus dtack_n is supposed to be asserted when the data is ready for the cpu to read. However I see in the logic that dtack triggers the sdram controller to START a read. So then there is MUCH later chip_ready, way after the data is actually ready. Anyway this looks like its because minimig.v was written to use sram, which can get a result on the same cycle and here we are using sdram. However the timing was never adjusted to wait for the sdram. I'm not sure how fx68k works though since it uses dtack_n and ... works.
Anyway I'm starting to agree with @Sorgelig that I should go with a separate core for the hybrid. Then I can strip out a lot of the extra complexity that will be taken care of on the arm side: cache, fastram, etc etc. Then it becomes just an instance of minimig.v (modified for sdram rather than sram) and the hps-fpga interface logic. Anyway will sleep on it and think what to do tomorrow!
-
- Posts: 130
- Joined: Fri Jun 19, 2020 8:54 pm
- Has thanked: 13 times
- Been thanked: 58 times
Re: Lets actually try Hybrid Emulation
What might be confusing you is that there are two different paths into Chip RAM. I'm not as familiar with MiSTer's Minimig core as I am the MiST/TC64 variant, so I'm not sure whether it's the D-Cache or AGA settings (or both) which determine which path is taken, but one roughly equates to A500 speed, the other to A1200 speed.foft wrote: ↑Tue Apr 13, 2021 8:41 pmBeen getting confused by a few things. For example the 68k bus dtack_n is supposed to be asserted when the data is ready for the cpu to read. However I see in the logic that dtack triggers the sdram controller to START a read. So then there is MUCH later chip_ready, way after the data is actually ready.
When using the slower path, the CPU sends requests to the custom chips, to the kickstart and to chip RAM all via Minimig.v which in turn sends the access to the SDRAM controller's "chip" port. In this mode, the TG68's cycle is terminated by the chipready signal which the CPU wrapper generates from DTACK.
FX68 has a more traditional 68k interface so uses DTACK directly.
In turbo mode the CPU bypasses minimig.v and accesses the SDRAM directly for Chip RAM and Kickstart. These cycles are terminated by the ramready signal instead.
Minimig.v is designed to operate on the Amiga's original four 7MHz ticks per cycle, and the SDRAM controller has a fixed cycle, completing one round per 7Mhz tick with deterministic timing, so it's guaranteed to respond on schedule.Anyway this looks like its because minimig.v was written to use sram, which can get a result on the same cycle and here we are using sdram. However the timing was never adjusted to wait for the sdram. I'm not sure how fx68k works though since it uses dtack_n and ... works.