Lets actually try Hybrid Emulation
Re: Lets actually try Hybrid Emulation
I make something and it works for everyone but me!
I tried the same drive image with 68040 on FS-UAE. It runs fine, no crashes. So does seem core related, or my DE10 related.
I tried the same drive image with 68040 on FS-UAE. It runs fine, no crashes. So does seem core related, or my DE10 related.
-
- Top Contributor
- Posts: 531
- Joined: Tue May 26, 2020 5:06 am
- Has thanked: 87 times
- Been thanked: 211 times
Re: Lets actually try Hybrid Emulation
Here is a new 68000.zip for anyone interested.
The script is the same but I've updated MiSTer to include the latest changes which includes a reorganization to the Minimig OSD.
The script is the same but I've updated MiSTer to include the latest changes which includes a reorganization to the Minimig OSD.
Does whdload work for you?
Re: Lets actually try Hybrid Emulation
Its a bit big to share, so I'll cut it down first.
Its really just a vanilla 3.1.4 rom and 3.1.4.1 workbench install. The only changes were that I copied 68040old.library into libs:68040.library and now I installed this Peter K icon_68020.library into libs:icon.library.
- Caldor
- Top Contributor
- Posts: 930
- Joined: Sat Jul 25, 2020 11:20 am
- Has thanked: 112 times
- Been thanked: 111 times
Re: Lets actually try Hybrid Emulation
Maybe someone should share one of their Workbench HDF files to have a comparison here? Probably not on this forum directly, but testing whether this is a hardware issue, will require that we have the exact same settings and software setups.
Re: Lets actually try Hybrid Emulation
This is the image I'm using. It is basically vanilla except:
i) mister shared filesystem (mount share:)
ii) Peter K's icon library from Aminet
iii) 68040.library from the phase V Aminet link posted (68040old.library as 68040.library)
http://www.64kib.com/HDDSmallTestOnly.hdf.gz
Its just a 10MB image but compressed down to a couple of MB. I'm running minimig in AGA mode with 2MB chip and 384MB fast. I'm using the 3.1.4 A1200 rom.
Definitely crashes with qemu when I double click icons. I double clicked WB2: then closed that and double clicked RAM: -> boom. Though it varies how many double clicks I can do, sometimes first time, sometimes 10-20.
i) mister shared filesystem (mount share:)
ii) Peter K's icon library from Aminet
iii) 68040.library from the phase V Aminet link posted (68040old.library as 68040.library)
http://www.64kib.com/HDDSmallTestOnly.hdf.gz
Its just a 10MB image but compressed down to a couple of MB. I'm running minimig in AGA mode with 2MB chip and 384MB fast. I'm using the 3.1.4 A1200 rom.
Definitely crashes with qemu when I double click icons. I double clicked WB2: then closed that and double clicked RAM: -> boom. Though it varies how many double clicks I can do, sometimes first time, sometimes 10-20.
Re: Lets actually try Hybrid Emulation
Finally some progress on the icon double click crash issue.
Seems to be due to something like this:
i) IRQ processing starts
ii) cpu writes to disable irqs, immediately after I poll the irq line
iii) cpu writes to enable irqs, immediately after I poll the irq line
iv) RTE
Usually at step (ii) I read IPL2:0 of 15 -> i.e.no IRQ
Sometimes at step (iii) I read IPL2:0 of 12 -> this is when it blows up
I wonder if this is related to immediately polling the irq status and perhaps the minimig logic has not yet cancelled it due to some timing details.
I made a change:
i) ioctl irq -> always handle
ii) hardware write it irq regs -> only handle disable
Which leads to no crashes...
Seems to be due to something like this:
i) IRQ processing starts
ii) cpu writes to disable irqs, immediately after I poll the irq line
iii) cpu writes to enable irqs, immediately after I poll the irq line
iv) RTE
Usually at step (ii) I read IPL2:0 of 15 -> i.e.no IRQ
Sometimes at step (iii) I read IPL2:0 of 12 -> this is when it blows up
I wonder if this is related to immediately polling the irq status and perhaps the minimig logic has not yet cancelled it due to some timing details.
I made a change:
i) ioctl irq -> always handle
ii) hardware write it irq regs -> only handle disable
Which leads to no crashes...
Re: Lets actually try Hybrid Emulation
Onto the next question... performance.
In musashi this program takes 1 minute 50 seconds.
In qemu 'mister' this program takes > 6 minutes (I gave up timing but I tested it finished before probably about 10 mins)
From linux m68k qemu it takes <2 seconds.
/media/fat# ./run_m68k ./dhrystone_m68k -l 1
duration: 0 seconds
number of threads: 1
number of loops: 1000000
delay between starting threads: 0 seconds
Dhrystone(1.1) time for 1000000 passes = 1.8
This machine benchmarks at 550149 dhrystones/second
313 DMIPS
Total dhrystone run time: 1.854414 seconds.
...
So why?
I though perhaps its spending a lot of time accessing chip memory instead of fast memory. I instrument musashi to show the % fast/chip. >99% fast ram in the loop of running this.
In musashi this program takes 1 minute 50 seconds.
In qemu 'mister' this program takes > 6 minutes (I gave up timing but I tested it finished before probably about 10 mins)
From linux m68k qemu it takes <2 seconds.
/media/fat# ./run_m68k ./dhrystone_m68k -l 1
duration: 0 seconds
number of threads: 1
number of loops: 1000000
delay between starting threads: 0 seconds
Dhrystone(1.1) time for 1000000 passes = 1.8
This machine benchmarks at 550149 dhrystones/second
313 DMIPS
Total dhrystone run time: 1.854414 seconds.
...
So why?
I though perhaps its spending a lot of time accessing chip memory instead of fast memory. I instrument musashi to show the % fast/chip. >99% fast ram in the loop of running this.
-
- Posts: 130
- Joined: Fri Jun 19, 2020 8:54 pm
- Has thanked: 13 times
- Been thanked: 58 times
Re: Lets actually try Hybrid Emulation
I think I might have asked this before, but what about ROM functions? Would your calcs show accesses in the E0/F8 ranges as Fast or Chip?
Re: Lets actually try Hybrid Emulation
I copied rom into fast ram, so that doesn't go over the bridge.
But 0xe0 isn't rom is it? ... checks memory map, hmmm perhaps I should put that first 512k here too.
int rom_bytes = 1*1024*1024;
void * rom_addr_orig = mmap(NULL,rom_bytes,(PROT_READ|PROT_WRITE),MAP_SHARED,fdcached,hpsbridgeaddr+0xf00000);
void * rom_addr_fast = malloc(rom_bytes);
for (int i=0;i!=rom_bytes;i+=4)
{
*((unsigned int *)(rom_addr_fast+i)) = *((unsigned int *)(rom_addr_orig+i));
}
...
memory_region_init_ram_ptr(rom, NULL, "mister_minimig.rom", rom_bytes, rom_addr_fast);
rom->readonly = true;
memory_region_add_subregion(address_space_mem, 0xf00000, rom);
But 0xe0 isn't rom is it? ... checks memory map, hmmm perhaps I should put that first 512k here too.
int rom_bytes = 1*1024*1024;
void * rom_addr_orig = mmap(NULL,rom_bytes,(PROT_READ|PROT_WRITE),MAP_SHARED,fdcached,hpsbridgeaddr+0xf00000);
void * rom_addr_fast = malloc(rom_bytes);
for (int i=0;i!=rom_bytes;i+=4)
{
*((unsigned int *)(rom_addr_fast+i)) = *((unsigned int *)(rom_addr_orig+i));
}
...
memory_region_init_ram_ptr(rom, NULL, "mister_minimig.rom", rom_bytes, rom_addr_fast);
rom->readonly = true;
memory_region_add_subregion(address_space_mem, 0xf00000, rom);
-
- Posts: 130
- Joined: Fri Jun 19, 2020 8:54 pm
- Has thanked: 13 times
- Been thanked: 58 times
Re: Lets actually try Hybrid Emulation
Not usually, but on systems with 1 Meg ROMs it can be. (The CD32's extended ROM lives there, for instance.)
Anyhow, if you're copying the ROM to Fast then it's not that.
Is the dhrystone_m68k prebuilt, or are you compling it? If the latter, which compiler?
Re: Lets actually try Hybrid Emulation
The compiler is different.
Latest m68k gcc for Debian and this one for Amiga:
https://github.com/AmigaPorts/m68k-amigaos-gcc
I’ll have a look at the code they produce.
Latest m68k gcc for Debian and this one for Amiga:
https://github.com/AmigaPorts/m68k-amigaos-gcc
I’ll have a look at the code they produce.
-
- Top Contributor
- Posts: 531
- Joined: Tue May 26, 2020 5:06 am
- Has thanked: 87 times
- Been thanked: 211 times
Re: Lets actually try Hybrid Emulation
Would it be worthwhile trying to compile with SAS/c?foft wrote: ↑Sat May 15, 2021 9:37 am The compiler is different.
Latest m68k gcc for Debian and this one for Amiga:
https://github.com/AmigaPorts/m68k-amigaos-gcc
I’ll have a look at the code they produce.
Re: Lets actually try Hybrid Emulation
Here is the updated qemu that changes the irq handling, making it more stable:
http://www.64kib.com/qemu_system_testv9.tar.xz
As a reminder here is the kernel module that you need to setup on boot:
http://www.64kib.com/minimig_irq_core.tar.gz
Hopefully v10 will be stable and faster...
http://www.64kib.com/qemu_system_testv9.tar.xz
As a reminder here is the kernel module that you need to setup on boot:
http://www.64kib.com/minimig_irq_core.tar.gz
Hopefully v10 will be stable and faster...
Re: Lets actually try Hybrid Emulation
So I run this simple code:
00000000 <_start>:
0: 4e56 fffc linkw %fp,#-4
4: 42ae fffc clrl %fp@(-4)
8: 6004 bras e <_start+0xe>
a: 52ae fffc addql #1,%fp@(-4)
e: 0cae 00ff ffff cmpil #16777215,%fp@(-4)
14: fffc
16: 66f2 bnes a <_start+0xa>
18: 4e71 nop
1a: 4e71 nop
1c: 4e5e unlk %fp
1e: 4e75 rts
In C:
for (int i=0;i!=0xffffff;++i)
{
if ((i&0xffff)==0)
{
}
}
On entry the regs are this:
D0 = ffffffff A0 = 8006afa8 F0 = 7fff ffffffffffffffff ( nan)
D1 = 8006afa8 A1 = 00000000 F1 = 7fff ffffffffffffffff ( nan)
D2 = 8006afc9 A2 = 00000000 F2 = 7fff ffffffffffffffff ( nan)
D3 = 00000000 A3 = 00000000 F3 = 7fff ffffffffffffffff ( nan)
D4 = 00000000 A4 = 00000000 F4 = 7fff ffffffffffffffff ( nan)
D5 = 00000000 A5 = 00000000 F5 = 7fff ffffffffffffffff ( nan)
D6 = 00000000 A6 = 40800c5c F6 = 7fff ffffffffffffffff ( nan)
D7 = 00000000 A7 = 40800c44 F7 = 7fff ffffffffffffffff ( nan)
& the code is placed a 0x401ae964 (i.e. malloc-ed ram, no fpga involved)
Note that A6, aka stack pointer, is also in fast ram.
...
I have a binary that loads the same code to a malloc'ed array then executes it:
User mode: 0.6 seconds
my 'Mister' machine (from newcli): 5 mins, 15 seconds
official 'Virtual 68k' machine: 1.3 seconds
The only reasons I can think for that are:
i) The qemu machine has some throttling settings and I need to tell it to go flat out?
ii) It accesses the chip memory or io area all the time, despite the code not telling it to. MMU tables, or something like that?
00000000 <_start>:
0: 4e56 fffc linkw %fp,#-4
4: 42ae fffc clrl %fp@(-4)
8: 6004 bras e <_start+0xe>
a: 52ae fffc addql #1,%fp@(-4)
e: 0cae 00ff ffff cmpil #16777215,%fp@(-4)
14: fffc
16: 66f2 bnes a <_start+0xa>
18: 4e71 nop
1a: 4e71 nop
1c: 4e5e unlk %fp
1e: 4e75 rts
In C:
for (int i=0;i!=0xffffff;++i)
{
if ((i&0xffff)==0)
{
}
}
On entry the regs are this:
D0 = ffffffff A0 = 8006afa8 F0 = 7fff ffffffffffffffff ( nan)
D1 = 8006afa8 A1 = 00000000 F1 = 7fff ffffffffffffffff ( nan)
D2 = 8006afc9 A2 = 00000000 F2 = 7fff ffffffffffffffff ( nan)
D3 = 00000000 A3 = 00000000 F3 = 7fff ffffffffffffffff ( nan)
D4 = 00000000 A4 = 00000000 F4 = 7fff ffffffffffffffff ( nan)
D5 = 00000000 A5 = 00000000 F5 = 7fff ffffffffffffffff ( nan)
D6 = 00000000 A6 = 40800c5c F6 = 7fff ffffffffffffffff ( nan)
D7 = 00000000 A7 = 40800c44 F7 = 7fff ffffffffffffffff ( nan)
& the code is placed a 0x401ae964 (i.e. malloc-ed ram, no fpga involved)
Note that A6, aka stack pointer, is also in fast ram.
...
I have a binary that loads the same code to a malloc'ed array then executes it:
User mode: 0.6 seconds
my 'Mister' machine (from newcli): 5 mins, 15 seconds
official 'Virtual 68k' machine: 1.3 seconds
The only reasons I can think for that are:
i) The qemu machine has some throttling settings and I need to tell it to go flat out?
ii) It accesses the chip memory or io area all the time, despite the code not telling it to. MMU tables, or something like that?
-
- Posts: 130
- Joined: Fri Jun 19, 2020 8:54 pm
- Has thanked: 13 times
- Been thanked: 58 times
Re: Lets actually try Hybrid Emulation
What happens if you surround the test program with a Disable() / Enable() pair?
Re: Lets actually try Hybrid Emulation
That didn't seem to change it.robinsonb5 wrote: ↑Sun May 16, 2021 5:47 pm What happens if you surround the test program with a Disable() / Enable() pair?
Though, something interesting. I ran it several times and it went at full speed sometimes! (To be clear that was with the unchanged build where I didn't add Disable/Enable)
Re: Lets actually try Hybrid Emulation
It normally seems to say something like this:
(qemu) info profile
async time 814716981 (0.815)
qemu time 754223562 (0.754)
When I run the test app I see this:
(qemu) info profile
async time 34714891552 (34.715)
qemu time 0 (0.000)
(qemu) info profile
async time 814716981 (0.815)
qemu time 754223562 (0.754)
When I run the test app I see this:
(qemu) info profile
async time 34714891552 (34.715)
qemu time 0 (0.000)
Re: Lets actually try Hybrid Emulation
After some red herrings with icount etc... It seems to be a single chain of translation blocks. Which is what I'd expect. So I guess one of them accesses the hardware area, otherwise I really don't understand.
Time to signaltap...
Time to signaltap...
Re: Lets actually try Hybrid Emulation
So, found out a few more things...
i) The irq implementation was still incorrect
ii) The slow code is running completely locally in fast ram, no irqs and and hps-fpga bridge.
The problem with the irqs as an off-by-one error and not understanding edge triggered irqs properly.
I thought 'edge triggered' meant that on any edge I'd get an irq. So had just wired up the irq lines directly, thinking whenever they changed I'd get an irq. So I've changed this to an xor on old/new irq flags, or'ed together to give a single irq on any change.
The off-by one error was a mistake in the .dts file. I'm actually pretty shocked it worked at all and passed all the diagrom tests like this. Anyway fixed it now.
For the 'slow loop' code I now know its all in one tb (translation block) chain. I have the 68k code and the arm code logged. When its running nothing further is logged since its all in the (previously logged) dynamically compiled arm code. While it was running I had signal tap up to check for irqs and any hps avalon slave access - no access, no irqs (since I call Disable/Enable now). So, next step is ... trying to run this block of arm machine code to figure out why it doesn't work.
i) The irq implementation was still incorrect
ii) The slow code is running completely locally in fast ram, no irqs and and hps-fpga bridge.
The problem with the irqs as an off-by-one error and not understanding edge triggered irqs properly.
I thought 'edge triggered' meant that on any edge I'd get an irq. So had just wired up the irq lines directly, thinking whenever they changed I'd get an irq. So I've changed this to an xor on old/new irq flags, or'ed together to give a single irq on any change.
The off-by one error was a mistake in the .dts file. I'm actually pretty shocked it worked at all and passed all the diagrom tests like this. Anyway fixed it now.
For the 'slow loop' code I now know its all in one tb (translation block) chain. I have the 68k code and the arm code logged. When its running nothing further is logged since its all in the (previously logged) dynamically compiled arm code. While it was running I had signal tap up to check for irqs and any hps avalon slave access - no access, no irqs (since I call Disable/Enable now). So, next step is ... trying to run this block of arm machine code to figure out why it doesn't work.