All Posts (blitter)




#1
Re: gccVB optimization options and assembly code
Posted on: Yesterday 22:57
Nintendoid!
Joined 2007/12/14
111 Posts
CoderLong Time User (6 Years) App Coder
Quote:

cr1901 wrote:
I HATE to be the one to bring this up, but perhaps it's time that some of us take a look at GCC internals to see what's going wrong? I'm taking a bit of a break from VB coding (call it "guilt that I'm letting my other code rot") anyway, and I probably could take a look if I had some code that is known to generate bad jumps.


If somebody wanted to take a look at the GCC 4 patches, I know of a few spots that look suspicious:

- The "return "movhi hi(%1),%.,%0\n\tmovea lo(%1),%0,%0";" in output_move_single, line 2255 or thereabouts in gcc-4.4.2-vb.patch. 32-bit loads are always encoded this way, even if the high word doesn't change between consecutive loads. This line also shows up elsewhere in that function for handling other such loads.

- "sprintf (buff, "mov r31,r10\n\tmovhi hi(%s), r0, r11\n\tmovea lo(%s), r11, r11\n\tjal .+4\n\tadd 4, r31\n\tjmp r11", name, name);" in construct_save_jarl, line 3797 or so also in gcc-4.4.2-vb.patch. While this isn't bogus code-- this code works-- I don't think we need to be doing long jumps in this way since jal takes up to a 26-bit displacement, which if my math is correct means up to an almost 64MByte jump in either direction-- well more than we need on the VB.

- Prologue and epilogue function generation. Building with -O3 or -Os in gccVB 4.8 (part of dasi's devkitV810 WIP) is totally broken here, generating unnecessary epilogue functions that clobber lp, leading to subroutines that in my testing return to address 0 (the first framebuffer), causing a crash. This can also happen occasionally in gccVB 4.4.2, though I haven't been able to create a minimal example yet.

- Lines 837-849 in binutils-2.20-vb.patch, beginning with "HOWTO (R_V810_9_PCREL,": This might be what's causing the bad jump logic, creating relative jump addresses that are multiples of 0x400000 from what they should be, eventually wrapping around to the beginning of the address space and crashing. Not sure why the entry for 9-bit branches uses 26 for bitsize and type 'long.' I posted some code that exhibits this bug a while ago-- http://www.planetvb.com/modules/newbb ... t_id=26069#forumpost26069 -- would be happy if somebody took a good hard look at what's going on. (EDIT: It might just be a linker order problem, but would be nice to know for sure.)

:)
Top

Topic | Forum


#2
Re: gccVB optimization options and assembly code
Posted on: Yesterday 5:54
Nintendoid!
Joined 2007/12/14
111 Posts
CoderLong Time User (6 Years) App Coder
Some of it yes, some of it no. I've thought of this-- in fact I wrote a tool that post-processes my ROM to poke addresses directly into the assembly, saving runtime lookups-- but any processing that adds or removes instructions would be non-trivial since that would throw any other instructions that deal with addresses completely out of whack.
Top

Topic | Forum


#3
Re: Getting started... on Linux?
Posted on: 10/18 6:23
Nintendoid!
Joined 2007/12/14
111 Posts
CoderLong Time User (6 Years) App Coder
Quote:

Greg Stevens wrote:
Im pretty sure blitter was referring to the "precompiled version" that DaVince mentioned which is 2.95. But he can correct me if I am wrong. However I would like to note that the version with VBDE as I and blitter have pointed out in other posts doesn't compile with the most optimal code for whatever reasons. Whether its the patches or something inherant in gccvb 4 is probably still an outstanding question. I setup a windows vm and installed cygwin just to compile with the 2.95 version because the compiled code was roughly 5 times faster than with the newer version. Things like using "inline" on a function which should cause the compiler to inject the function inline still produce normal function jump and return code in VBDE. Of course I don't know enough about gcc to even guess at where to look for that kind of stuff.


By default I don't think gccvb 4 builds with any optimizations. Building with -O3 or sometimes -Os can generate really performant code. However like we've mentioned there are still some outstanding bugs. gccvb 4 also has the ability to strip out code that isn't used, resulting in a smaller .text section and therefore more room for other goodies in .data and .rodata.
Top

Topic | Forum


#4
Re: Getting started... on Linux?
Posted on: 10/18 6:18
Nintendoid!
Joined 2007/12/14
111 Posts
CoderLong Time User (6 Years) App Coder
Quote:

DaVince wrote:
I guess you're talking about this? I cloned the Git repo and at least now I understand what libgccVB is for. And that I need v810-gcc, which fails to compile (at least the version included in the gccVB 2.95 source).


Was referring to the compiler suite, but since it's mentioned, the libgccvb headers I use are based on a really old set. I basically only use them for the equates and const mappings-- setting up the column table is done in my crt0.s and the remaining functionality I rewrite to fit whatever project I'm working on.
Top

Topic | Forum


#5
Re: Getting started... on Linux?
Posted on: 10/17 4:13
Nintendoid!
Joined 2007/12/14
111 Posts
CoderLong Time User (6 Years) App Coder
I can't speak for that ancient version of GCC since I've never bothered with it, but the executables for gccVB 4 are prefixed with "v810-" so they don't interfere with the system version. In any case, the make_v810.sh script installs everything to /opt/gccvb so it's contained in its own directory, but that's easily changed by editing the script yourself.

There's a bit of an effort to stabilize gccVB 4 (It's functional, but has a few outstanding issues) so that's probably why it's not in the Tools area yet. If you're feeling adventurous you can search the forums for the patches and build it yourself...
Top

Topic | Forum


#6
Re: WRAM access optimizations
Posted on: 10/14 8:18
Nintendoid!
Joined 2007/12/14
111 Posts
CoderLong Time User (6 Years) App Coder
Took a look tonight at the gcc 4.4.2 patch that's floating out there, and I think I might have an idea of what's causing this: in output_move_single...


return "movhi hi(%1),%.,%0ntmovea lo(%1),%0,%0";


That line occurs several times for each time a 32-bit quantity needs to be loaded, and basically encodes those two instructions as a couplet, always. So the compiler doesn't have a chance to optimize away the extra instruction. Looks either to me like a bug, or it simply doesn't bother optimizing that case by design. I'm leaning toward the former, as it's clearly suboptimal code. Anybody with knowledge of GCC have any ideas how to fix it?
Top

Topic | Forum


#7
Re: WRAM access optimizations
Posted on: 10/14 5:43
Nintendoid!
Joined 2007/12/14
111 Posts
CoderLong Time User (6 Years) App Coder
Thanks M.K., that should work for WRAM accesses.

Upon closer inspection I see that this pattern is also applied to other areas of memory. I found a simple example using hardware registers:


movhi 0x200
r0r10
movea 0x20
r10r10
ld
.[r10], r11
mov 5
r12
andi 0xFF
r11r11
ori 0x10
r11r11
st
.b r11, [r10]
movhi 0x200r0r11
movea 0x18
r11r11
st
.b r12, [r11]
movhi 0x200r0r11
movea 0x1C
r11r11
st
.b r0, [r11]


This is the equivalent assembly when built with -Os to:


HW_REGS
[TCR] |= TIMER_20US;
HW_REGS[TLR] = 0x05;
HW_REGS[THR] = 0x00;


The instruction 'movhi 0x200, r0, r11' is executed twice even when nothing is done in between to change the value of r11, making this unnecessary. This is when compiled with -Os for code size. Is this something that can be worked around (without writing it by hand in asm) or a bug in GCC/v810?
Top

Topic | Forum


#8
Re: Linux Support
Posted on: 9/30 4:55
Nintendoid!
Joined 2007/12/14
111 Posts
CoderLong Time User (6 Years) App Coder
I've built gccVB 4 under OS X, both PPC and Intel, and combined with Eclipse is how I do all my VB development. The FlashBoy software though is relegated to a PC with I believe a flaky motherboard, so I'll have to find a solution for that one of these days. As far as I know the FlashBoy software is Windows-only, so you'd either have to use WINE/CrossOver or write one yourself-- seems somebody has figured out the protocol... http://www.planetvb.com/modules/newbb ... ost_id=8666#forumpost8666 ...
Top

Topic | Forum


#9
Re: Oculus Rift +VB emulator
Posted on: 9/28 18:53
Nintendoid!
Joined 2007/12/14
111 Posts
CoderLong Time User (6 Years) App Coder
Quote:

HorvatM wrote:
I'm surprised there isn't the same amount of myths/criticism/superstition surrounding it as the VB, which, IMO, is for now simply a better product. Maybe because it's got John Carmack behind it? Gunpei Yokoi apparently wasn't a good enough celebrity.


For one thing, Oculus is making a big deal out of the fact that they are developer kits and there are warnings and guidelines everywhere about keeping latency down, using the proper projections, crafting the experience to minimize disorientation, etc. They have a whole team at Oculus dedicated to this kind of cognitive research, whereas with the VB Nintendo just put together documentation basically saying "This is how 3D works on the VB, good luck!"

The default IPD on the Rift is pretty reasonable-- 64mm, which according to statistical research is the military average-- and unlike the VB, the field of view is much *much* larger. That last bit alone is a big part of why the Rift is getting such widespread praise-- no other consumer-focused headset before it has been able to achieve such immersion. I love the VB too, but there's no denying that the Rift provides a much better VR experience. The VB by comparison is just a toy. There really shouldn't be a comparison.
Top

Topic | Forum


#10
Re: Oculus Rift +VB emulator
Posted on: 9/28 9:41
Nintendoid!
Joined 2007/12/14
111 Posts
CoderLong Time User (6 Years) App Coder
retronintendonerd are you a developer? Getting a DK2 now if you're not a developer would be a waste of money, as the new Crescent Bay makes it obsolete in every way. (I played with it at OC last weekend). Odds are, given what happened with the DK1->DK2 switch, another SDK refactoring is coming-- by that time, the DK2 won't be compatible with anything anymore. I'd hold off until CV1.
Top

Topic | Forum




You are not logged in.
Lost Password?
Register Resend Activation