Re: Release DE10 Nano Overclock Kernel BETA
Munt is really awesome at 1.2ghz, but I'm sure the OC kernel is gonna bring many more cool stuff.
The online community for MiSTer FPGA enthusiasts
https://misterfpga.org/
The fact to have Munt working as well as it does with this is though.rhester72 wrote: ↑Tue Oct 25, 2022 11:41 am I think it's important to temper expectations here.
a) Cores will NEVER _depend_ on overclocking...if they don't function properly without it, meh.
b) It's a nice little speed bump...but that's what it is. It makes the UI a bit snappier and takes Munt from sort-of-working to more-or-less-serviceable. OC the ARM core is _not_ what I would call a 'game changer', considering it's STILL slower than the _previous_ generation Raspberry Pi.
Considering that you can't get Raspberry Pis, 3s or otherwise, this isn't as compelling an argument as it would otherwise be.
Yeah I was in for a shock when I was looking into building an MT32-Pi earlier this year. Luckily I had a Pi 3b+ sitting in a box that I had replaced with a 4 for my Octoprint server thinking it was broken. Turned out it wasn't broken. Had similar good luck with an NVMe drive I thought had gotten bent and that I had replaced. Came in handy just when I needed it. It's a good idea to always double check old parts that one thinks went on the fritz, and not just throw them away. If it's small, toss it in a junk box and revisit it with a fresh outlook when it might come in handy.
Wohoo!! Is this coming up officially?FoxbatStargazer wrote: ↑Wed Nov 02, 2022 3:02 pm So if I put the modded kernel in now, will update_all override it even if there hasn't been a main kernel update yet? Getting impatient for it to come down officially...
update_all will not override your custom kernel until an official Linux update comes I believe.FoxbatStargazer wrote: ↑Wed Nov 02, 2022 3:02 pm So if I put the modded kernel in now, will update_all override it even if there hasn't been a main kernel update yet? Getting impatient for it to come down officially...
This will be coming officially at some point but you will still need those scripts (or run the equivalent shell commands).vanfanel wrote: ↑Wed Nov 02, 2022 3:39 pmWohoo!! Is this coming up officially?FoxbatStargazer wrote: ↑Wed Nov 02, 2022 3:02 pm So if I put the modded kernel in now, will update_all override it even if there hasn't been a main kernel update yet? Getting impatient for it to come down officially...
I will be using this to underclock the ARM and have a cooler system without need of a fan!
With the announcement of the N64 core I'm going to start looking into if I can do anything with the DDR3.
Well that didn't take long. Here are the commands to overclock DDR3. Try at your own risk. These commands must be run sequentially - that is, you must overclock to 900 MHz before setting 950 MHz, as large jumps will cause the system to crash. I could make clock transitions smooth with a kernel driver like I did with the CPUfreq one.
The SDRAM part Terasic uses is rated for DDR3-1066, but they downrate it to DDR3-800 as the Cyclone V SoC FPGA's memory controller is only rated for DDR3-800.
I've verified that there is a performance increase - just using dd to /tmp, but I haven't done extensive stress testing. I'm guessing the FPGA side will see the same benefit as it's all shared.
Code: Select all
# Overclock DDR3 to 900 MHz
devmem 0xFFD040C0 w 0x0000011A
# Overclock DDR3 to 950 MHz
devmem 0xFFD040C0 w 0x0000012A
# Overclock DDR3 to 1000 MHz
devmem 0xFFD040C0 w 0x0000013A
# Overclock DDR3 to 1050 MHz
devmem 0xFFD040C0 w 0x0000014A
# Overclock DDR3 to 1100 MHz - MY MISTER CRASHES HERE
devmem 0xFFD040C0 w 0x0000015A
To set it back to stock: just power cycle the MiSTer or run: devmem 0xFFD040C0 w 0x000000FA
If it crashes, just power cycle MiSTer.
Reference: https://www.intel.com/content/www/us/en ... 20482.html
Bits 3-15 of that register control the DDR3 multiplier.
25 MHz * ((value in bits 3-15) + 1) in that register equals the DDR3 frequency.
Very good to see! 1000mhz could be the sweet spot for that purpose combine with the 1.2ghz OC of the arm we should have a winner ^^
I'll try the ddr OC when I have some time. You should bring this up to Robert, I'm sure he could find some use for it during the n64 exploration phase.
Two hours and 5min
Mine crashes at 1050 MHz, but is great at 1000.
Thank you!
FYI, for anyone who'd like to see the actual effect, you can use the below shell command over ssh before and after the change - it's 50 cycles of 200MB writes to /tmp (which is a RAM disk) that to the OP's point is a quick-and-dirty way to see the roughly 20% throughput difference:
Code: Select all
TIMES=50; LOW=9999; HIGH=0; TOT=0; for i in `seq 1 $TIMES`; do sync; echo 3 > /proc/sys/vm/drop_caches; CURR=`dd if=/dev/zero of=testme bs=200MB count=1 2>&1 | grep bytes | awk '{print $10}'`; if [[ $CURR -lt $LOW ]]; then LOW=$CURR; fi; if [[ $CURR -gt $HIGH ]]; then HIGH=$CURR; fi; TOT=$((TOT+CURR)); done; echo "LOW=$LOW, HIGH=$HIGH, AVG=$((TOT/TIMES))"
Managed to test it and it crashed at 1100mhz, so I went for 1000mhz to have a bit of margin and so far so good.
I messaged Robert and he's now aware of it and will keep it mind as he's developing the N64 core Actually I came back to this because I heard the big news about the N64 core and that he was using the DDR3.
Apparently the DDR3 has plenty of bandwidth for the N64 core, but latency might be an issue, especially given the shared nature of the memory. Of course running at a higher frequency at the same timings as we're doing here should help with latency.
It would be great if all MiSTers can run at 1000 MHz - it's a straight 25% improvement.
Coolbho3k wrote: ↑Sun Apr 16, 2023 11:24 pmI messaged Robert and he's now aware of it and will keep it mind as he's developing the N64 core Actually I came back to this because I heard the big news about the N64 core and that he was using the DDR3.
Apparently the DDR3 has plenty of bandwidth for the N64 core, but latency might be an issue, especially given the shared nature of the memory. Of course running at a higher frequency at the same timings as we're doing here should help with latency.
It would be great if all MiSTers can run at 1000 MHz - it's a straight 25% improvement.
It's probably more likely to benefit the Saturn core (at least the single ram version of it) and some of the arcade cores coming up like CPS3.
I also found where to adjust timings: https://www.intel.com/content/www/us/en ... 68421.html
Ie. to set CL latency from 7 to 6 the command is devmem 0xFFC25004 w 0x783CCC07
Although timings can also be adjusted in u-boot here. https://github.com/MiSTer-devel/u-boot_ ... m_config.h.
There also seems to be some settings to prioritize burst operations from the FPGA rather than the HPS? This could make a difference and is not overclocking.
It would be great to have a stable 1000MHz frequency combined with tighter timing. It would also be helpful if the frequency could be triggered based on the core's needs. For example, if a core performs better with tighter timing rather than higher frequencies, it would be awesome to adjust this based on the core. Anyway, I'm glad to hear you were able to inform Robert about this. Any extra performance found will be welcomed I'm sure. An SDRAM OC could be nice too for dedicated cores build around it as well. Cool stuff!
Poking around some other settings. One of the complaints about DDR3 for cores is that the latency can be inconsistent due to contention with the HPS/Linux side of things. But it seems the priority of memory access can be customized. By default it seems the memory controller prioritizes HPS and FPGA memory access equally. These tweaks are not overclocking and should work on all MiSTers no matter what.
devmem 0xFFC250AC w 0x0003FFFF
sets the FPGA-to-HPS memory ports (https://www.intel.com/content/www/us/en ... 78395.html) to the highest priority, leaving the L3 and CPU ports at the lowest priority.
When I do this, I get slower memory throughput in Linux when the PSX core is running, but about the same in menu. Presumably this makes memory access from FPGA faster.
devmem 0xFFC250AC w 0x3FFC0000
is the inverse of the above, setting the L3 and CPU ports at the highest priority.
This improves the performance of memory throughput in Linux when the PSX core is running!
One more register that may helpful: https://www.intel.com/content/www/us/en ... 75739.html
This allows you to configure the memory ports to disable "auto-precharge" which improves "highly random accesses" - maybe this could improve latency at the cost of throughput for cores?
devmem 0xFFC2507C w 0x0000FC00
disables auto-precharge.
I can get some of these memory tweaks to have an effect on the PSX core if I use this GPU benchmark: https://github.com/JaCzekanski/ps1-test ... /benchmark if you turn on transparency and increase the number of triangles.
Many of the tweaks I have mentioned actually seem to have a small incremental improvement - like fractions of an FPS on this test, but I think the real improvement might be much better taken advantage of by core developers if they accounted for it.
Is there a test you can run to see if it improves PSX core sound latency? It's sort of crazy these settings have never been played with until now.
You mean the 0.1ms latency when all 24 channel are in use at the same time? Which occours almost never? Which can be solved by using the dual SD RAM core?
Robert did some initial tests in the MiSTer Discord with some of these tweaks. Overclocking the RAM did make a difference, the other tweaks not so much, including changing CL. One of the tweaks made the performance worse. One of the other tweaks didn't make a difference in the benchmark - I think it might actually be prioritizing FPGA memory accesses, but according to Intel's docs it may only really affect burst performance.
But Robert was testing with an existing core. Maybe we'll see a bigger difference once he writes the DDR3 test core during the N64 development.
I think there's potential in playing around with combinations of these tweaks once he writes that, as we'll be able to get instant feedback on how effective something ends up being.
Awesome stuff!