You are not logged in.
Lost Password?


Register To Post



 Bottom   Previous Topic   Next Topic

#11
Re: A newer GCC compiler.
Posted on: 2016/3/26 4:24
Nintendoid!
Joined 2007/12/14
166 Posts
CoderLong Time User (11 Years) App Coder
I also don't know if I've mentioned this anywhere else here, but I do *not* know GCC's internals. At all. So, my patches are more hacks or bandaids to work around problems I encounter than anything else. I share them just in case they might help other devs, but please don't accept them as attempts to properly fix any problems (though if I happen to fix anything then AFAIC that's purely a coincidence. :) )
Top

#12
Re: A newer GCC compiler.
Posted on: 2016/3/26 5:24
VB Gamer
Joined 2016/3/13
42 Posts
Long Time User (2 Years)
Quote:

blitter wrote:
Yes, and yes. It has been quite a while but as I recall either r29 was ignored when I specified it in the clobber list or I got some kind of error.

Thank you, that's the kind of information that I can use!

So, if I'm understanding you correctly, you are using GCC's "inline-assembly" to do the string instructions, rather than a separate assembly function. Is that correct?


Quote:
Anything I'm doing from non-inline assembly the compiler should not touch, period, other than to assemble it. But for what it's worth I use -fomit-frame-pointers in my Makefiles. Again, it's been a while so I don't remember the exact problem moving the frame register solved, but it was definitely related to the bitstring instructions.

Thanks, again. If you're use "-fomit-frame-pointers" then the compiler should be using R29 as a general-purpose callee-saved register.

If it doesn't let you "clobber" it in inline assembly, just because it *might* be used as a frame-pointer ... then that's really helpful information.


Quote:
Frame pointers and backtraces in my experience are pretty useless in VB homebrew since source-level debugging is pretty nonexistent above the assembly code level.

Ah ... on the contrary ... IMHO that's exactly when a good backtrace is the most-useful.

If you've got a good source-level debugger with full DWARF information about the process, then it doesn't need a frame-pointer ... it already has all the information from the compiler-emitted debugging-info.

A good "backtrace", complete with actual function names, can be done on the target hardware, without a debugger, if the frame-pointer exists, and if the stack-frame-layout is sensible.

This lets you get the "context" of any error message, and lets you implement sophisticated in-engine memory debugging.

It really helps to have extra RAM available when these things are enabled ... which is why Nintendo (and everyone else) shipped their "development-kits" with more RAM than the "retail" kits (up until the last generation, when things got more complex).

You can simulate an environment like this in Mednafen just by modifying the amount of memory that the virtual VirtualBoy sees (it's a source-level hack to Mednafen).

It's not useful for "final-testing", but its a godsend for 90% of development.


Quote:
I do all my VB dev in Mac OS X. Specifically, I build the toolchain in 10.6 with an older version of GCC installed via macports. The build products continue to work in the latest version of OS X El Capitan, plus as a bonus I can build PPC versions too.

That's cool to know. I mainly run Windows on my MacPro, but I think that I may still have a 10.6.8 partition somewhere.


Quote:
I also don't know if I've mentioned this anywhere else here, but I do *not* know GCC's internals. At all. So, my patches are more hacks or bandaids to work around problems I encounter than anything else. I share them just in case they might help other devs, but please don't accept them as attempts to properly fix any problems (though if I happen to fix anything then AFAIC that's purely a coincidence. :) )

No problem ... the point is that you've tried to improve things, and so did M.K. when the GCC 4.4.2 patches were created. That's wonderful!

It took me about 6 months of agony to get the GCC 2.9.5 patches updated to GCC 4.7.4, and that included lots of flailing-around inside complex source code that I barely understood ... and still mostly-don't.
Top

#13
Re: A newer GCC compiler.
Posted on: 2016/3/28 2:33
VB Gamer
Joined 2016/3/13
42 Posts
Long Time User (2 Years)
Here's my proposal for a new stack-frame layout, together with the one that everyone is using now, and the "new" GCC ABI from 2010.

Basically ... the "old" ABI reserved 16-bytes of stack space for storing the first-4 function arguments just-in-case you call a function with variable-arguments.

In the years since that time, "stdarg.h" has replaced "varargs.h", and that space is no longer needed.

So the V850 guys got rid of it in 2010.

I'm proposing adding back 4-bytes to use for storing the Frame Pointer, so that backtraces are possible.

Reordering the output of the "saved" registers should also radically reduce the amount of space used by function-prologues ... which should help speed them up by keeping them in the instruction cache.


Any comments?


*****************************

GCC 1999-ABI V850 STACK FRAME

CALLER
          incoming
-arg0
ap
->      16-bytes-reserved

CALLEE
          saved
-lp
          saved
-??
fp->      saved-fp
          local
-variables
          outgoing
-arg?
          
outgoing-arg0
sp
->      16-bytes-reserved

*****************************

GCC 2010-ABI V850 STACK FRAME

CALLER
ap
->      incoming-arg0

CALLEE
          saved
-lp
          saved
-??
fp->      saved-fp
          local
-variables
          outgoing
-arg?
sp->      outgoing-arg0

*****************************

GCC 2016-ABI V810 STACK FRAME

CALLER
          incoming
-arg0
ap
-> fp-> saved-fp

CALLEE
          saved
-lp
          saved
-??
          
local-variables
          outgoing
-arg?
          
outgoing-arg0
sp
->      4-bytes-reserved

*****************************
Edited by ElmerPCFX on 2016/3/28 2:43
Top

#14
Re: A newer GCC compiler.
Posted on: 2016/3/31 1:34
VB Gamer
Joined 2016/3/13
42 Posts
Long Time User (2 Years)
I took a quick look at the libgccvb source code, and was surprised to see so many uses of "u8" and "u16" in the code.

The V810 CPU was designed to handle 32-bit variables ... and it doesn't do any arithmetic operations on 16-bit or 8-bit values.

That means that the compiler needs to do a lot of masking/sign-extending when it's asked to deal with 16-bit or 8-bit variables, just so that it keeps the results correct within the limits of 16-bit or 8-bit rounding.

You really should be using "int" and "unsigned" as much as possible, and avoid "short" and "char" variables.

I thought that it would be interesting to see how the different GCC compiler versions compile a couple of simple C functions.

In each case, the original libgccvb version is first, and then 1 or 2 versions replacing the "u16" and "u8" variables with "unsigned" instead.

It seems strange to me that GCC 4.4.2 is doing such a relatively-poor job compared to GCC 2.9.5 or GCC 4.7.4, I wonder what went wrong?

All examples are compiled with "-O2 -fomit-frame-pointer".



****************************************************************************************
****************************************************************************************

void copymem (u8dest, const u8srcu16 num)
{
  
u16 i;
  for (
0numi++) {
    *
dest++ = *src++;
  }
}

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_copymemandi 65535,r8,r8    _copymemandi 65535,r8,r8    _copymemandi 65535,r8,r8
          be 
.L1                        mov 0,r10                     be .L4
          addi 
-1,r8,r11                cmp r8,r10                    mov 0,r10
          andi 65535
,r11,r11            bnl .L4             .L3:      mov r7,r11
          add 1
,r11           .L6:      add 1,r10                     add r10,r11
          add r6
,r11                    ld.b 0[r7],r11                ld.b 0[r11],r12
.L3:      ld.b 0[r7],r10                andi 65535,r10,r10            mov r6,r11
          add 1
,r7                      add 1,r7                      add r10,r11
          st
.b r10,0[r6]                st.b r11,0[r6]                add 1,r10
          add 1
,r6                      add 1,r6                      st.b r12,0[r11]
          
cmp r11,r6                    cmp r8,r10                    andi 65535,r10,r11
          bne 
.L3                       bl .L6                        cmp r11,r8
.L1:      jmp [r31]           .L4:      jmp [r31]                     bh .L3
                                                            
.L4:      jmp [r31]

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

void copymem2 (u8dest, const u8srcunsigned num)
{
  
unsigned i;
  for (
0numi++) {
    *
dest++ = *src++;
  }
}

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_copymem2:mov r6,r11          _copymem2:mov 0,r11           _copymem2:cmp r0,r8
          add r8
,r11                    cmp r8,r11                    be .L10
          cmp 0
,r8                      bnl .L10                      mov 0,r10
          be 
.L7              .L12:     ld.b 0[r7],r10      .L9:      mov r7,r11
.L11:     ld.b 0[r7],r10                add 1,r11                     add r10,r11
          add 1
,r7                      add 1,r7                      ld.b 0[r11],r12
          st
.b r10,0[r6]                st.b r10,0[r6]                mov r6,r11
          add 1
,r6                      add 1,r6                      add r10,r11
          cmp r11
,r6                    cmp r8,r11                    st.b r12,0[r11]
          
bne .L11                      bl .L12                       add 1,r10
.L7:      jmp [r31]           .L10:     jmp [r31]                     cmp r10,r8
                                                                      bh 
.L9
                                                            
.L10:     jmp [r31]

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

void addmem (u8dest, const u8srcu16 numu8 offset)
{
  
u16 i;
  for (
0numi++) {
    *
dest++ = (*src++ + offset);
  }
}

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_addmem:  andi 65535,r8,r8    _addmem:  andi 65535,r8,r8    _addmem:  andi 65535,r8,r8
          andi 255
,r9,r9                mov 0,r11                     andi 255,r9,r9
          cmp 0
,r8                      andi 255,r9,r9                cmp r0,r8
          be 
.L13                       cmp r8,r11                    be .L20
          addi 
-1,r8,r11                bnl .L22                      mov 0,r10
          andi 65535
,r11,r11  .L24:     mov r9,r10          .L19:     mov r7,r11
          add 1
,r11                     add 1,r11                     add r10,r11
          add r6
,r11                    ld.b 0[r7],r12                ld.b 0[r11],r12
.L15:     ld.b 0[r7],r10                andi 65535,r11,r11            mov r6,r11
          add 1
,r7                      add r12,r10                   add r10,r11
          add r9
,r10                    add 1,r7                      add r9,r12
          st
.b r10,0[r6]                st.b r10,0[r6]                add 1,r10
          add 1
,r6                      add 1,r6                      st.b r12,0[r11]
          
cmp r11,r6                    cmp r8,r11                    andi 65535,r10,r11
          bne 
.L15                      bl .L24                       cmp r11,r8
.L13:     jmp [r31]           .L22:     jmp [r31]                     bh .L19
                                                            
.L20:     jmp [r31]

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

void addmem2 (u8dest, const u8srcunsigned numu8 offset)
{
  
unsigned i;
  for (
0numi++) {
    *
dest++ = (*src++ + offset);
  }
}

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_addmem2mov r6,r11          _addmem2mov 0,r12           _addmem2andi 255,r9,r9
          andi 255
,r9,r9                andi 255,r9,r9                cmp r0,r8
          add r8
,r11                    cmp r8,r12                    be .L20
          cmp 0
,r8                      bnl .L22                      mov 0,r10
          be 
.L18             .L24:     mov r9,r10          .L19:     mov r7,r11
.L22:     ld.b 0[r7],r10                ld.b 0[r7],r11                add r10,r11
          add 1
,r7                      add 1,r12                     ld.b 0[r11],r12
          add r9
,r10                    add r11,r10                   mov r6,r11
          st
.b r10,0[r6]                add 1,r7                      add r10,r11
          add 1
,r6                      st.b r10,0[r6]                add r9,r12
          cmp r11
,r6                    add 1,r6                      st.b r12,0[r11]
          
bne .L22                      cmp r8,r12                    add 1,r10
.L18:     jmp [r31]                     bl .L24                       cmp r10,r8
                              
.L22:     jmp [r31]                     bh .L19
                                                            
.L20:     jmp [r31]

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************

void addmem3 (u8dest, const u8srcunsigned numunsigned offset)
{
  
unsigned i;
  for (
0numi++) {
    *
dest++ = (*src++ + offset);
  }
}

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_addmem3cmp 0,r8            _addmem3mov 0,r12           _addmem3cmp r0,r8
          be 
.L24                       cmp r8,r12                    be .L25
          andi 255
,r9,r9                bnl .L28                      andi 255,r9,r9
          add r6
,r8           .L30:     mov r9,r10                    mov 0,r10
.L26:     ld.b 0[r7],r10                ld.b 0[r7],r11      .L24:     mov r7,r11
          add 1
,r7                      add 1,r12                     add r10,r11
          add r9
,r10                    add r11,r10                   ld.b 0[r11],r12
          st
.b r10,0[r6]                add 1,r7                      mov r6,r11
          add 1
,r6                      st.b r10,0[r6]                add r10,r11
          cmp r8
,r6                     add 1,r6                      add r9,r12
          bne 
.L26                      cmp r8,r12                    st.b r12,0[r11]
.
L24:     jmp [r31]                     bl .L30                       add 1,r10
                              
.L28:     jmp [r31]                     cmp r10,r8
                                                                      bh 
.L24
                                                            
.L25:     jmp [r31]

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********


****************************************************************************************
****************************************************************************************
Top

#15
Re: A newer GCC compiler.
Posted on: 2016/3/31 5:00
Nintendoid!
Joined 2007/12/14
166 Posts
CoderLong Time User (11 Years) App Coder
Quote:

ElmerPCFX wrote:
I took a quick look at the libgccvb source code, and was surprised to see so many uses of "u8" and "u16" in the code.

The V810 CPU was designed to handle 32-bit variables ... and it doesn't do any arithmetic operations on 16-bit or 8-bit values.

That means that the compiler needs to do a lot of masking/sign-extending when it's asked to deal with 16-bit or 8-bit variables, just so that it keeps the results correct within the limits of 16-bit or 8-bit rounding.

You really should be using "int" and "unsigned" as much as possible, and avoid "short" and "char" variables.


According to David Tucker's unofficial Virtual Boy specification:

Quote:
The external data buss [sic] supports both a 32-bit data mode and a 16-bit mode, but the VB only utilizes the 16-bit mode.


Now, I don't know where he got that info, since I can't find mention in the official Nintendo docs of the width of the data bus at all, but in the symposium PDFs there is sample code that copies data in memory using short* and char* pointers, so that suggests to me that the VB uses the V810's 16-bit bus mode. 32-bit pointers are not used in Nintendo's sample code. So, while arithmetic operations probably should operate on 32-bit values for best performance, is it efficient to load 32-bit values from RAM/ROM on a 16-bit wide data bus (assuming this is how the VB is configured)?
Top

#16
Re: A newer GCC compiler.
Posted on: 2016/3/31 18:46
VB Gamer
Joined 2016/3/13
42 Posts
Long Time User (2 Years)
Quote:

blitter wrote:

So, while arithmetic operations probably should operate on 32-bit values for best performance, is it efficient to load 32-bit values from RAM/ROM on a 16-bit wide data bus (assuming this is how the VB is configured)?


OK, I found a copy of the SDK (which is just the docs) online and confirmed that the VB is using a 16-bit data bus.

Ouch! Nintendo really wanted to make things difficult for their developers, didn't they?

That has a huge effect on everything ... most particularly the importance of running code from the instruction-cache as much as possible.

The compiler doesn't really seem to understand that "ld.*" is automatically sign-extending a 16-bit/8-bit read from memory.

The compiler doesn't know that it can use "in.*" on the VirtualBoy to zero-extend reads from memory (that trick won't work on the PC-FX).

That means that any code that does arithmetic on 16-bit/8-bit values is usually going to generate one or more extra instructions to sign-extend/mask the values when it reads them.

That is 4-bytes of code that are going to take 1 or 2 cycles to execute, and require 2 memory reads, usually from ROM, and potentially with 2 wait-states per read.

That is going to be no-better than the extra 2-cycle memory-read to get the high 16-bits of a 32-bit variable, and quite-possibly worse.

So I think that I'd still recommend that folks stick with 32-bit variables in C as much as possible, but it's definitely a less clear situation than it is on the PC-FX, and I'd suggest that folks actually look at the assembly code that the compiler generates in order to see what it's doing.

If you're programming in assembly, then you can just use ld.h/in.h, and you can write efficient code because you have a better understanding of the CPU architecture and the VirtualBoy than the compiler does.

BTW ... the "advice" may change in the future if I can get the compiler to understand that "ld.*" is automatically sign-extending the value, and that it doesn't need to generate its own code to do it.

But that won't apply to unsigned variables, which are still going to be masked.

Whatever happens ... it still goes to show that the VirtualBoy is another one of the old machines where an assembly-language programmer can generate better code than a compiler.
Top

#17
Re: A newer GCC compiler.
Posted on: 2016/4/1 5:20
Nintendoid!
Joined 2007/12/14
166 Posts
CoderLong Time User (11 Years) App Coder
Quote:

ElmerPCFX wrote:
The compiler doesn't really seem to understand that "ld.*" is automatically sign-extending a 16-bit/8-bit read from memory.

The compiler doesn't know that it can use "in.*" on the VirtualBoy to zero-extend reads from memory (that trick won't work on the PC-FX).


That is a cool trick! I hadn't thought to investigate the in.* instructions to see what they actually do. I'll have to use that in my projects now, thanks. :)
Top

#18
Re: A newer GCC compiler.
Posted on: 2016/4/1 19:18
VB Gamer
Joined 2016/3/13
42 Posts
Long Time User (2 Years)
Quote:

blitter wrote:

That is a cool trick! I hadn't thought to investigate the in.* instructions to see what they actually do. I'll have to use that in my projects now, thanks. :)


It's a nice trick since Nintendo made the I/O address space just by a copy of the normal address space ... but note that you don't save the extra cycle on multiple loads that you do with the "ld" instruction.

Because the V810 sign-extends any constants for math and comparison, I suspect that it's still probably best to just use signed variables, rather than unsigned variables wherever possible.
Top

#19
Re: A newer GCC compiler.
Posted on: 2016/4/1 19:50
VB Gamer
Joined 2016/3/13
42 Posts
Long Time User (2 Years)
I think that I have figured-out how to let GCC know that "ld" instruction sign-extends variables into an int.

Here are a coupe of examples of how it effects the code with newlib's "strlen" function, and then some variations on it.

The variations show how the generated code changes when things get a little bit more complex when modifying "strlen" to change the comparison so that the compiler can't just short-cut the check for zero.

The thing to pay particular attention to is the number of instructions in the inner loop.

It shows, again, that if you choose to use C on a processor like the V810, then there are definitely tricks to know that will improve the code-generation.


****************************************************************************************
****************************************************************************************

ORIGINAL FUNCTION FROM NEWLIB 2.2.0

size_t strlen 
(const char *str)
{
  const 
char *start str;
  while (*
str)
    
str++;
  return 
str start;
}

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_strlen:  ld.b 0[r6],r10      _strlen:  ld.b 0[r6],r10      _strlen:  ld.b 0[r6],r10
          cmp 0
,r10                     mov r6,r11                    shl 24,r10
          be 
.L42                       cmp r0,r10                    sar 24,r10
          mov r6
,r10                    be .L46                       be .L39
.L41:     add 1,r10           .L47:     add 1,r6                      mov r6,r10
          ld
.b 0[r10],r11               ld.b 0[r6],r10      .L40:     add 1,r10
          cmp 0
,r11                     cmp r0,r10                    ld.b 0[r10],r11
          bne 
.L41                      bne .L47                      shl 24,r11
          sub r6
,r10          .L46:     mov r6,r10                    bne .L40
          jmp 
[r31]                     sub r11,r10                   sub r6,r10
.L42:     mov 0,r10                     jmp [r31]           .L39:     jmp [r31]
          
jmp [r31]


****************************************************************************************
****************************************************************************************

MARK THE END-OF-STRING WITH A NON-ZERO CONSTANT

size_t strlen2 
(const char *str)
{
  const 
char *start str;
  while (*
str != 1)
    
str++;
  return 
str start;
}

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_strlen2ld.b 0[r6],r10      _strlen2ld.b 0[r6],r10      _strlen2ld.b 0[r6],r11
          cmp 1
,r10                     mov r6,r11                    shl 24,r11
          be 
.L47                       cmp 1,r10                     sar 24,r11
          mov r6
,r10                    be .L51                       cmp 1,r11
.L46:     add 1,r10           .L52:     add 1,r6                      be .L49
          ld
.b 0[r10],r11               ld.b 0[r6],r10                mov r6,r10
          cmp 1
,r11                     cmp 1,r10           .L46:     add 1,r10
          bne 
.L46                      bne .L52                      ld.b 0[r10],r11
          sub r6
,r10          .L51:     mov r6,r10                    shl 24,r11
          jmp 
[r31]                     sub r11,r10                   sar 24,r11
.L47:     mov 0,r10                     jmp [r31]                     cmp 1,r11
          jmp 
[r31]                                                   bne .L46
                                                                      sub r6
,r10
                                                                      jmp 
[r31]
                                                            .
L49:     mov 0,r10
                                                                      jmp 
[r31]


****************************************************************************************
****************************************************************************************

PASS THE END-OF-STRING MARKER IN AS "char" PARAMETER

int strlen3 
(const char *strchar eos)
{
  const 
char *start str;
  while (*
str != eos)
    
str++;
  return 
str start;
}

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_strlen3shl 24,r7           _strlen3shl 24,r7           _strlen3ld.b 0[r6],r10
          sar 24
,r7                     sar 24,r7                     shl 24,r7
          ld
.b 0[r6],r10                ld.b 0[r6],r10                mov r7,r12
          cmp r7
,r10                    mov r6,r11                    shl 24,r10
          be 
.L52                       cmp r7,r10                    sar 24,r12
          mov r6
,r10                    be .L56                       cmp r7,r10
.L51:     add 1,r10           .L57:     add 1,r6                      be .L56
          ld
.b 0[r10],r11               ld.b 0[r6],r10                mov r6,r10
          cmp r7
,r11                    cmp r7,r10          .L53:     add 1,r10
          bne 
.L51                      bne .L57                      ld.b 0[r10],r11
          sub r6
,r10          .L56:     mov r6,r10                    shl 24,r11
          jmp 
[r31]                     sub r11,r10                   sar 24,r11
.L52:     mov 0,r10                     jmp [r31]                     cmp r12,r11
          jmp 
[r31]                                                   bne .L53
                                                                      sub r6
,r10
                                                                      jmp 
[r31]
                                                            .
L56:     mov 0,r10
                                                                      jmp 
[r31]


****************************************************************************************
****************************************************************************************

PASS THE END-OF-STRING MARKER IN AS AN "int" PARAMETER

int strlen4 
(const char *strint eos)
{
  const 
char *start str;
  while (*
str != eos)
    
str++;
  return 
str start;
}

********* 
GCC 4.7.4 ******************* GCC 2.9.5 ******************* GCC 4.4.2 ********

_strlen4ld.b 0[r6],r10      _strlen4ld.b 0[r6],r10      _strlen4ld.b 0[r6],r10
          cmp r7
,r10                    mov r6,r12                    shl 24,r10
          be 
.L57                       cmp r7,r10                    sar 24,r10
          mov r6
,r10                    be .L61                       cmp r7,r10
.L56:     add 1,r10           .L62:     add 1,r6                      be .L63
          ld
.b 0[r10],r11               ld.b 0[r6],r10                mov r6,r10
          cmp r7
,r11                    mov r10,r11         .L60:     add 1,r10
          bne 
.L56                      cmp r7,r11                    ld.b 0[r10],r11
          sub r6
,r10                    bne .L62                      shl 24,r11
          jmp 
[r31]           .L61:     mov r6,r10                    sar 24,r11
.L57:     mov 0,r10                     sub r12,r10                   cmp r7,r11
          jmp 
[r31]                     jmp [r31]                     bne .L60
                                                                      sub r6
,r10
                                                                      jmp 
[r31]
                                                            .
L63:     mov 0,r10
                                                                      jmp 
[r31]


****************************************************************************************
****************************************************************************************
Top

#20
Re: A newer GCC compiler.
Posted on: 2016/4/2 7:19
Nintendoid!
Joined 2007/12/14
166 Posts
CoderLong Time User (11 Years) App Coder
Quote:

ElmerPCFX wrote:
Quote:

blitter wrote:

That is a cool trick! I hadn't thought to investigate the in.* instructions to see what they actually do. I'll have to use that in my projects now, thanks. :)


It's a nice trick since Nintendo made the I/O address space just by a copy of the normal address space ... but note that you don't save the extra cycle on multiple loads that you do with the "ld" instruction.


Do you mean grouping "ld" instructions together to speed up the data fetch pipeline? "in" doesn't follow those rules?
Top

 Top   Previous Topic   Next Topic


Register To Post