diff mbox

[RFC] rationale for systematic elimination of OP_SYMADDR instructions

Message ID 20170309142044.96408-1-luc.vanoostenryck@gmail.com (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Luc Van Oostenryck March 9, 2017, 2:20 p.m. UTC
While investigating some problems related to code generation
I realized that OP_SYMADDR are systematically eliminated,
the target address are simply replaced by the symbol itself.

While it's not wrong per se as it all depends to the semantic
we want to give to pseudos and the instructions and how high-
or low-level we want to IR, I don't think it was the intention
to remove them and more importantly I don't think it's desirable.

Those OP_SYMADDR allowed to make a clear separation between a symbol
(a name with a type and info for storage & linkage) and its address
(which can be stored in memory or in a register and on which
arithmetic operations can then be done on it). Once these addresses
are replaced by the symbol itself, those symbols can appears almost
everywhere in the linearized code:
- in calls' arguments,
- in adds and subs (while doing pointer arithmetic),
- in casts,
- in load & stores,
- ...
and they complicate things considerably once you begin to be
interested concretly in things after linearization & simplification
since soon or later you will need the address anyway.

So my question is:
	"is there a good reason to eliminate those instructions?",
and ultimately:
	"is there any objections to the following patch?".


-- Luc Van Oostenryck


---
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Christopher Li April 25, 2017, 7:20 p.m. UTC | #1
On Thu, Mar 9, 2017 at 10:20 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> While investigating some problems related to code generation
> I realized that OP_SYMADDR are systematically eliminated,
> the target address are simply replaced by the symbol itself.
>
> While it's not wrong per se as it all depends to the semantic
> we want to give to pseudos and the instructions and how high-
> or low-level we want to IR, I don't think it was the intention
> to remove them and more importantly I don't think it's desirable.
>
> Those OP_SYMADDR allowed to make a clear separation between a symbol
> (a name with a type and info for storage & linkage) and its address
> (which can be stored in memory or in a register and on which
> arithmetic operations can then be done on it). Once these addresses
> are replaced by the symbol itself, those symbols can appears almost
> everywhere in the linearized code:
> - in calls' arguments,
> - in adds and subs (while doing pointer arithmetic),
> - in casts,
> - in load & stores,
> - ...
> and they complicate things considerably once you begin to be
> interested concretly in things after linearization & simplification
> since soon or later you will need the address anyway.
>
> So my question is:
>         "is there a good reason to eliminate those instructions?",

This change is introduce in 962279e8 by Linus:

    Remove OP_SETVAL after symbol-pseudo simplification.

    We can just replace all users with the symbol pseudo
    directly.

    This means that we can no longer re-do symbol simplification
    after CSE, and we need to rely on the generic memop simplification.

I can see the reason to do that is simplify the CSE. Before this change,
every reference to the symbol will do a OP_SETVAL (or OP_SYMADDR now
days) to get the address into a new pseudo. That is extra work for the
CSE to discover that:  "Oh,  all those different pseudo are actually the same
address for the same symbol. Let's replace it with the same pseudo."

I haven't understand why things are more complicate after linearization
if we replace all the symbol pseudo into one? Even if we don't do it here,
wouldn't the CSE should do that any way?

The way I see it, the pseudo of the symbol *is* the address of the symbol,
I don't see a problem using the address of the symbol.

Maybe you have some specific usage case in mind. Can you give some
example?

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luc Van Oostenryck April 26, 2017, 2:49 a.m. UTC | #2
On Tue, Apr 25, 2017 at 9:20 PM, Christopher Li <sparse@chrisli.org> wrote:
> On Thu, Mar 9, 2017 at 10:20 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>> While investigating some problems related to code generation
>> I realized that OP_SYMADDR are systematically eliminated,
>> the target address are simply replaced by the symbol itself.
>>
>> While it's not wrong per se as it all depends to the semantic
>> we want to give to pseudos and the instructions and how high-
>> or low-level we want to IR, I don't think it was the intention
>> to remove them and more importantly I don't think it's desirable.
>>
>> Those OP_SYMADDR allowed to make a clear separation between a symbol
>> (a name with a type and info for storage & linkage) and its address
>> (which can be stored in memory or in a register and on which
>> arithmetic operations can then be done on it). Once these addresses
>> are replaced by the symbol itself, those symbols can appears almost
>> everywhere in the linearized code:
>> - in calls' arguments,
>> - in adds and subs (while doing pointer arithmetic),
>> - in casts,
>> - in load & stores,
>> - ...
>> and they complicate things considerably once you begin to be
>> interested concretly in things after linearization & simplification
>> since soon or later you will need the address anyway.
>>
>> So my question is:
>>         "is there a good reason to eliminate those instructions?",
>
> This change is introduce in 962279e8 by Linus:
>
>     Remove OP_SETVAL after symbol-pseudo simplification.
>
>     We can just replace all users with the symbol pseudo
>     directly.
>
>     This means that we can no longer re-do symbol simplification
>     after CSE, and we need to rely on the generic memop simplification.
>
> I can see the reason to do that is simplify the CSE. Before this change,
> every reference to the symbol will do a OP_SETVAL (or OP_SYMADDR now
> days) to get the address into a new pseudo. That is extra work for the
> CSE to discover that:  "Oh,  all those different pseudo are actually the same
> address for the same symbol. Let's replace it with the same pseudo."
>
> I haven't understand why things are more complicate after linearization
> if we replace all the symbol pseudo into one? Even if we don't do it here,
> wouldn't the CSE should do that any way?
>
> The way I see it, the pseudo of the symbol *is* the address of the symbol,
> I don't see a problem using the address of the symbol.
>
> Maybe you have some specific usage case in mind. Can you give some
> example?

Roughly, once you begin to play with code generation, something like
OP_SYMADDR is an operation that you will really do.
Depending on the relocation, it can even be something relatively costly:
I'm thinking static code on a 64bit machine where you can only generate
16bit constants, others cases may be not at all cheaper.
So it's something that soon or later will need to be exposed and
doing CSE on the address is good.
For example, with code as simple as
	extern int a;
	
	void foo(void)
	{
		a = a + 1;
	}

compiled for ARM with GCC:
	foo:
	        movw    r3, #:lower16:a
	        movt    r3, #:upper16:a
	        ldr     r2, [r3]
	        add     r2, r2, #1
	        str     r2, [r3]
	        bx      lr

The first 2 instructions correspond at taking the address of 'a',
it would be the very direct translation of sparse's:
	foo:
	.L0:
		<entry-point>
		symaddr     %r1
		load.32     %r2 <- 0[%r1]
		add.32      %r3 <- %r2, $1
		store.32    %r3 -> 0[%r1]
		ret

If we would have kept the OP_SYMADRR and doing CSE on it.
But for now we have:
	foo:
	.L0:
		<entry-point>
		load.32     %r2 <- 0[a]
		add.32      %r3 <- %r2, $1
		store.32    %r3 -> 0[a]
		ret

whose translation would be:
	        movw    r3, #:lower16:a
	        movt    r3, #:upper16:a
	        ldr     r2, [r3]
	        add     r2, r2, #1
!	        movw    r3, #:lower16:a
!	        movt    r3, #:upper16:a
	        str     r2, [r3]
	        bx      lr
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christopher Li April 26, 2017, 11:33 a.m. UTC | #3
On Tue, Apr 25, 2017 at 10:49 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> Roughly, once you begin to play with code generation, something like
> OP_SYMADDR is an operation that you will really do.
> Depending on the relocation, it can even be something relatively costly:
> I'm thinking static code on a 64bit machine where you can only generate
> 16bit constants, others cases may be not at all cheaper.
> So it's something that soon or later will need to be exposed and
> doing CSE on the address is good.

I see your point. Let me rephrase your claim. Sparse assume getting
the symbol address
is trivial. But on some architecture that is non trivial to generate
symbol address.
Typical case would be, the machine instruction size is smaller than
the address size.
It will take a few machine instruction to generate a symbol address.

I agree with this part.

Now the second part of the claim is that, it would be better to left
the OP_SYMADDR
untouched and let CSE to remove it. That part I disagree.

The reason is that, currently CSE operate on the same basic block. It only
eliminate instruction but it does not relocate instructions.

A very common case is that, the symbol address was referenced in different
basic blocks.

extern int a, d;

if (...)
     a = d;
else if (...)
     a = d + 2;

CSE would not be able to simply remove the OP_SYMADDR for "a",
because they are not in the same basic block. The best result should be,
for all the usage of that symbol in a function, find the closest
common parent basic
block and put the OP_SYMADDR there.

Because getting symbol address is used very often. Let CSE go through a hash
to re-discover that all symbol address can be combined is costly and
not necessary.
And CSE does not cover all the case commit 962279e8 covers.


> If we would have kept the OP_SYMADRR and doing CSE on it.
> But for now we have:
>         foo:
>         .L0:
>                 <entry-point>
>                 load.32     %r2 <- 0[a]
>                 add.32      %r3 <- %r2, $1
>                 store.32    %r3 -> 0[a]
>                 ret
>
> whose translation would be:
>                 movw    r3, #:lower16:a
>                 movt    r3, #:upper16:a
>                 ldr     r2, [r3]
>                 add     r2, r2, #1
> !               movw    r3, #:lower16:a
> !               movt    r3, #:upper16:a

That is because you assume every time "a" was referenced
you need to redo the generate of 32 bit address "a". That is
not necessary true. In sparse IR, "a" already translate into
pseudo, which has the user chain.  It is relative simple for
the back end to find the best place to insert OP_SYMADDR.
The end of the day, it is the back end knows the machine
register allocation and reorder of the instruction if necessary.

So the proper  solution shouldn't be let CSE to discover OP_SYMADDR
yet another time. It should be find the best place to insert OP_SYMADDR.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luc Van Oostenryck April 26, 2017, 12:17 p.m. UTC | #4
On Wed, Apr 26, 2017 at 1:33 PM, Christopher Li <sparse@chrisli.org> wrote:
> On Tue, Apr 25, 2017 at 10:49 PM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>> Roughly, once you begin to play with code generation, something like
>> OP_SYMADDR is an operation that you will really do.
>> Depending on the relocation, it can even be something relatively costly:
>> I'm thinking static code on a 64bit machine where you can only generate
>> 16bit constants, others cases may be not at all cheaper.
>> So it's something that soon or later will need to be exposed and
>> doing CSE on the address is good.
>
> I see your point. Let me rephrase your claim. Sparse assume getting
> the symbol address
> is trivial. But on some architecture that is non trivial to generate
> symbol address.
> Typical case would be, the machine instruction size is smaller than
> the address size.
Not especially. The case I showed is related to the ability for the machine
to generate constants corresponding to an address size (like ARM here
where instruction = addresses = 32bits but constants can only generated
16 bits at a time), this is a simple example you can find on static code.
The exact same problematic is present on all architecture once you have
less simple relocations (think -fpic, shared libraries and such).

> It will take a few machine instruction to generate a symbol address.
>
> I agree with this part.
>
> Now the second part of the claim is that, it would be better to left
> the OP_SYMADDR
> untouched and let CSE to remove it. That part I disagree.
>
> The reason is that, currently CSE operate on the same basic block. It only
> eliminate instruction but it does not relocate instructions.

That's not true.
The capability of CSE to move code around is limited, CSE
doesn't only operate on the same BB. It relocates instructions in simple cases.

And even if CSE would be limited to work on the same BB, it would
already be beneficial.

> A very common case is that, the symbol address was referenced in different
> basic blocks.
>
> extern int a, d;
>
> if (...)
>      a = d;
> else if (...)
>      a = d + 2;
>
> CSE would not be able to simply remove the OP_SYMADDR for "a",
> because they are not in the same basic block. The best result should be,
> for all the usage of that symbol in a function, find the closest
> common parent basic
> block and put the OP_SYMADDR there.

I invite you to look at the output of:
        extern int use(int);

        int foo(int a)
        {
                int r;

                if (a)
                        r = a + 1;
                else {
                        use(0);
                        r = a + 1;
                }

                return r;
        }

> Because getting symbol address is used very often. Let CSE go through a hash
> to re-discover that all symbol address can be combined is costly and
> not necessary.
> And CSE does not cover all the case commit 962279e8 covers.
>
>
>> If we would have kept the OP_SYMADRR and doing CSE on it.
>> But for now we have:
>>         foo:
>>         .L0:
>>                 <entry-point>
>>                 load.32     %r2 <- 0[a]
>>                 add.32      %r3 <- %r2, $1
>>                 store.32    %r3 -> 0[a]
>>                 ret
>>
>> whose translation would be:
>>                 movw    r3, #:lower16:a
>>                 movt    r3, #:upper16:a
>>                 ldr     r2, [r3]
>>                 add     r2, r2, #1
>> !               movw    r3, #:lower16:a
>> !               movt    r3, #:upper16:a
>
> That is because you assume every time "a" was referenced
> you need to redo the generate of 32 bit address "a". That is
> not necessary true. In sparse IR, "a" already translate into
> pseudo, which has the user chain.  It is relative simple for
> the back end to find the best place to insert OP_SYMADDR.
> The end of the day, it is the back end knows the machine
> register allocation and reorder of the instruction if necessary.

No, it's not the job of the backend to do this sort of things, nor
is it "relatively simple". Why? because it's the exact same problem
as CSE. If don't put this OP_SYMADDR in CSE here, you will
need to reimplement something that is equivalent to CSE later at
code generation, which is pretty stupid.
What is done, at codegen & register allocation is exactly the opposite:
when you're short on registers and need to spill some you will (maybe)
first chose the ones that were used for this sort of values because
you can recalculate/regenerate them easily (mainly because very often
getting the address will simply be a load itself).

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds April 26, 2017, 4:15 p.m. UTC | #5
On Tue, Apr 25, 2017 at 7:49 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> Roughly, once you begin to play with code generation, something like
> OP_SYMADDR is an operation that you will really do.
> Depending on the relocation, it can even be something relatively costly:
> I'm thinking static code on a 64bit machine where you can only generate
> 16bit constants, others cases may be not at all cheaper.
> So it's something that soon or later will need to be exposed and
> doing CSE on the address is good.

Ack. I think the OP_SYMADDR elimination was wrong, and your patch is
the right thing to do.

It was done to roughly approximate the 32-bit x86 code, making the
pseudo-asm that the linearizer outputs look more like x86 address
generation, but you're right, it's bogus.

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christopher Li April 26, 2017, 9:02 p.m. UTC | #6
On Wed, Apr 26, 2017 at 8:17 AM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> Not especially. The case I showed is related to the ability for the machine
> to generate constants corresponding to an address size (like ARM here
> where instruction = addresses = 32bits but constants can only generated
> 16 bits at a time), this is a simple example you can find on static code.
> The exact same problematic is present on all architecture once you have
> less simple relocations (think -fpic, shared libraries and such).

OK. Points well taken.


>> The reason is that, currently CSE operate on the same basic block. It only
>> eliminate instruction but it does not relocate instructions.
>
> That's not true.
> The capability of CSE to move code around is limited, CSE
> doesn't only operate on the same BB. It relocates instructions in simple cases.
>
> And even if CSE would be limited to work on the same BB, it would
> already be beneficial.
>
>> A very common case is that, the symbol address was referenced in different
>> basic blocks.
>>
>> extern int a, d;
>>
>> if (...)
>>      a = d;
>> else if (...)
>>      a = d + 2;
>>
>> CSE would not be able to simply remove the OP_SYMADDR for "a",
>> because they are not in the same basic block. The best result should be,
>> for all the usage of that symbol in a function, find the closest
>> common parent basic
>> block and put the OP_SYMADDR there.
>
> I invite you to look at the output of:
>         extern int use(int);
>
>         int foo(int a)
>         {
>                 int r;
>
>                 if (a)
>                         r = a + 1;
>                 else {
>                         use(0);
>                         r = a + 1;
>                 }
>
>                 return r;
>         }

I take a look at the output:
foo:
.L0:
<entry-point>
add.32      %r3(r) <- %arg1, $1
cbr         %arg1, .L4, .L2

.L2:
call.32     %r4 <- use, $0
br          .L4

.L4:
ret.32      %r3(r)

So you are right, I am wrong about the CSE did not cross basic block
boundary.

>
> No, it's not the job of the backend to do this sort of things, nor
> is it "relatively simple". Why? because it's the exact same problem
> as CSE. If don't put this OP_SYMADDR in CSE here, you will
> need to reimplement something that is equivalent to CSE later at
> code generation, which is pretty stupid.

I notice Linus ACK the patch as well. I still have a question. Let me
ask this, may be a silly one.

Does the address of the symbol ever change inside a function?
I assume it does not change. If that is the case, can we skip the CSE
and replace all the symbol address reference to one OP_SYMADDR?

For example, for each symbol access in the function we insert OP_SYMADDR
after the entry:
foo:
<entry-point>
        %r1 <- a
        %r2 <- b
...

Then all reference of symbol address of "a" and "b" inside the function
foo will use %r1 and %r2.  Notice that we still keep the OP_SYMADDR
instruction, just move to function entry.

Is that illegal or bad?

Thanks

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luc Van Oostenryck April 26, 2017, 11:02 p.m. UTC | #7
On Wed, Apr 26, 2017 at 11:02 PM, Christopher Li <sparse@chrisli.org> wrote:
> On Wed, Apr 26, 2017 at 8:17 AM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
>> Not especially. The case I showed is related to the ability for the machine
>> to generate constants corresponding to an address size (like ARM here
>> where instruction = addresses = 32bits but constants can only generated
>> 16 bits at a time), this is a simple example you can find on static code.
>> The exact same problematic is present on all architecture once you have
>> less simple relocations (think -fpic, shared libraries and such).
>
> OK. Points well taken.
>
>
>>> The reason is that, currently CSE operate on the same basic block. It only
>>> eliminate instruction but it does not relocate instructions.
>>
>> That's not true.
>> The capability of CSE to move code around is limited, CSE
>> doesn't only operate on the same BB. It relocates instructions in simple cases.
>>
>> And even if CSE would be limited to work on the same BB, it would
>> already be beneficial.
>>
>>> A very common case is that, the symbol address was referenced in different
>>> basic blocks.
>>>
>>> extern int a, d;
>>>
>>> if (...)
>>>      a = d;
>>> else if (...)
>>>      a = d + 2;
>>>
>>> CSE would not be able to simply remove the OP_SYMADDR for "a",
>>> because they are not in the same basic block. The best result should be,
>>> for all the usage of that symbol in a function, find the closest
>>> common parent basic
>>> block and put the OP_SYMADDR there.
>>
>> I invite you to look at the output of:
>>         extern int use(int);
>>
>>         int foo(int a)
>>         {
>>                 int r;
>>
>>                 if (a)
>>                         r = a + 1;
>>                 else {
>>                         use(0);
>>                         r = a + 1;
>>                 }
>>
>>                 return r;
>>         }
>
> I take a look at the output:
> foo:
> .L0:
> <entry-point>
> add.32      %r3(r) <- %arg1, $1
> cbr         %arg1, .L4, .L2
>
> .L2:
> call.32     %r4 <- use, $0
> br          .L4
>
> .L4:
> ret.32      %r3(r)
>
> So you are right, I am wrong about the CSE did not cross basic block
> boundary.
>
>>
>> No, it's not the job of the backend to do this sort of things, nor
>> is it "relatively simple". Why? because it's the exact same problem
>> as CSE. If don't put this OP_SYMADDR in CSE here, you will
>> need to reimplement something that is equivalent to CSE later at
>> code generation, which is pretty stupid.
>
> I notice Linus ACK the patch as well. I still have a question. Let me
> ask this, may be a silly one.
>
> Does the address of the symbol ever change inside a function?
> I assume it does not change. If that is the case, can we skip the CSE
> and replace all the symbol address reference to one OP_SYMADDR?
>
> For example, for each symbol access in the function we insert OP_SYMADDR
> after the entry:
> foo:
> <entry-point>
>         %r1 <- a
>         %r2 <- b
> ...
>
> Then all reference of symbol address of "a" and "b" inside the function
> foo will use %r1 and %r2.  Notice that we still keep the OP_SYMADDR
> instruction, just move to function entry.
>
> Is that illegal or bad?

The address of a symbol will of course not change.
So yes, all the OP_SYMADDR could move to the top of the function.
It wouldn't be illegal and it could be advantageous in some cases.
It would be bad, though, if these addresses are in fact not used
(because of a conditional). I'm thinking to something like:
        if (unlikely(some cond)) a++;
Of course, doing so would also need a register to hold these addresses
so pre-calculated. What if the function access a lot of symbols?

In my opinion, we should handle these OP_SYMADDR just like
any other instructions (in other words: near where they are used).
And if one day, sparse will become smart enough to things like
'loop-invariant code motion' then these symbol addresses will
gain from it like any other calculated values will (but we're not yet there).

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luc Van Oostenryck April 26, 2017, 11:04 p.m. UTC | #8
On Wed, Apr 26, 2017 at 6:15 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:

> Ack. I think the OP_SYMADDR elimination was wrong, and your patch is
> the right thing to do.
>
> It was done to roughly approximate the 32-bit x86 code, making the
> pseudo-asm that the linearizer outputs look more like x86 address
> generation, but you're right, it's bogus.

OK, thanks.
Now I just need to go hunt the regression I saw when I made and used
the patch. But that's another story.

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christopher Li Aug. 10, 2017, 3:01 p.m. UTC | #9
On Wed, Apr 26, 2017 at 7:02 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>> Does the address of the symbol ever change inside a function?
>> I assume it does not change. If that is the case, can we skip the CSE
>> and replace all the symbol address reference to one OP_SYMADDR?
>>
>> For example, for each symbol access in the function we insert OP_SYMADDR
>> after the entry:
>> foo:
>> <entry-point>
>>         %r1 <- a
>>         %r2 <- b
>> ...
>>
>> Then all reference of symbol address of "a" and "b" inside the function
>> foo will use %r1 and %r2.  Notice that we still keep the OP_SYMADDR
>> instruction, just move to function entry.
>>
>> Is that illegal or bad?
>
> The address of a symbol will of course not change.
> So yes, all the OP_SYMADDR could move to the top of the function.
> It wouldn't be illegal and it could be advantageous in some cases.
> It would be bad, though, if these addresses are in fact not used
> (because of a conditional). I'm thinking to something like:
>         if (unlikely(some cond)) a++;
> Of course, doing so would also need a register to hold these addresses
> so pre-calculated. What if the function access a lot of symbols?

Sorry for reply to a very old email. Still catching up the my backlogs.

I think the function access a lot of symbols will need to have more
OP_SYMADDR, at least one instruction per symbol if you want to
have the OP_SYMADDR. You can't do better than that.


> In my opinion, we should handle these OP_SYMADDR just like
> any other instructions (in other words: near where they are used).

I think in the current IR, how close it is to be used is not a big issue.

If you want to get it close and have one OP_SYMADDR per symbol.
The OP_SYMADDR should be place at immediate D(N), where N is the
block that use the symbol. In other words, place the OP_SYMADDR
at where N blocks join in the dominator tree. You can't do better than
hat.

Please correct me if I am wrong.  I still think:
1)  OP_SYMADDR can be generated from the symbol node and where
     to place them (immediate D(N)).

2) If one of the back end needs them. Go ahead and generate OP_SYMADDR.
   However, I don't think sparse checker need to use the OP_SYMADDR
   so it will be more optimal  for sparse checker to avoid OP_SYMADDR.

   Am I missing some thing?

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luc Van Oostenryck Aug. 10, 2017, 10:16 p.m. UTC | #10
On Thu, Aug 10, 2017 at 5:01 PM, Christopher Li <sparse@chrisli.org> wrote:
>
> Please correct me if I am wrong.  I still think:
> 1)  OP_SYMADDR can be generated from the symbol node and where
>      to place them (immediate D(N)).
>
> 2) If one of the back end needs them. Go ahead and generate OP_SYMADDR.
>    However, I don't think sparse checker need to use the OP_SYMADDR
>    so it will be more optimal  for sparse checker to avoid OP_SYMADDR.
>
>    Am I missing some thing?

I'm not sure to understand exactly what you mean here above, sorry.
But yes, I think you're missing something.

All this have already been explained during the initial discussion but:
* those OP_SYMADDR represent the *calculation* that is needed to get
the address of a symbol.
* depending on the architecture, the kind/linkage/visibility of the symbol,
  compiler flag, ... this calculation can be very simple, like:
  - just a constant (but a big one)
  - just a load at a constant address & a constant address
  - or much more complicated
* the OP_SYMADDR *abstract* away the *details* of this calculation
   but it is important to expose it
* of the reasons we want to expose it is that we *want* to do CSE on
  these calculations
* there is absolutely no reasons why we should process them differently
  than any other instructions. In particular, it would be absurd to:
  - try to put all of them at the top
  - try to place them ourself at some other special place (common parent,
    dominator, ...)
  because CSE will place them automatically at the best place: as near as
  possible from where they are needed.
* it would also be a very bad idea to *not* generate them and only let the
  backend generate them because it would then mean that iether:
  - no CSE on them
  - backend need to reimplement CSE for them
  The backend, just need to *expose the details*, maybe change those
  SYMADDR in the appropriate loads & adds & loads or whatever but
  they need to be already there and processed by the middle-end.

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christopher Li Aug. 11, 2017, 1:17 a.m. UTC | #11
First of all, I want to clarify that I ask this question is
to satisfy my curiously why it is done this way,
either it is the best optimal way to do it.

It is not reason to reject your patch. I trust you have good
reason to do it, I will accept the patch even if I don't fully
understand it. I still want to understand it after applying
it.

On Thu, Aug 10, 2017 at 6:16 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> All this have already been explained during the initial discussion but:
> * those OP_SYMADDR represent the *calculation* that is needed to get
> the address of a symbol.

That is fine, nod.

> * depending on the architecture, the kind/linkage/visibility of the symbol,
>   compiler flag, ... this calculation can be very simple, like:
>   - just a constant (but a big one)
>   - just a load at a constant address & a constant address
>   - or much more complicated

Agree so far.

> * the OP_SYMADDR *abstract* away the *details* of this calculation
>    but it is important to expose it

OK. I have the question that sparse checker don't need it.
I will defer to later part of email to ask.

> * of the reasons we want to expose it is that we *want* to do CSE on
>   these calculations

Do CSE do you mean just CSE OP_SYMADDR or CSE the complicated
instruction expand from OP_SYMADDR.

Let say you have code like this, the symbol is assign in different
scope block.

Before CSE there is 3 OP_SYMADDR "a".  After CSE where is those 3
OP_SYMADDR "a" is likely to be combine into one? Where would the
CSE place the OP_SYMADDR if it is not one of the beginning or dominator
etc?


int a;
void fool (void)
{
                ...
               { a = 1 };
           ...
           {a = 2;}
                     ....
                     {a = 3};
}


> * there is absolutely no reasons why we should process them differently
>   than any other instructions. In particular, it would be absurd to:
>   - try to put all of them at the top
>   - try to place them ourself at some other special place (common parent,
>     dominator, ...)
>   because CSE will place them automatically at the best place: as near as
>   possible from where they are needed.

In that case, you might just place it before load or store. You don't seem
to need CSE to do it. Sorry I feel that I am still missing some big picture.
I am desperately trying to fill it.

I also think that you make the OP_SYMADDR a lower level IR than
what I original have in mind writing linearizer. I expect there is a separate
phase where you translate the sparse IR into machine level code, that
is where you can insert the equivalent of OP_SYMADDR. gcc also
have tree-ssa-sink.c which move the instruction closer to the place it is
used. So it is not sparse's IR's job to keep it close. For the same reason
sparse IR don't try to make the warm blocks close together etc. I consider
those are for the level level back end to do.

What I original have in mind is that, IR should contain all information for
back end to generate machine code. The IR should also be minimalism.
Which means that, if any information can be deduced from the IR, the IR don't
have to store them (as on disk byte code format). I am not saying we should
stay that way. I am just saying what I have in mind previously.

> * it would also be a very bad idea to *not* generate them and only let the
>   backend generate them because it would then mean that iether:
>   - no CSE on them
>   - backend need to reimplement CSE for them
>   The backend, just need to *expose the details*, maybe change those
>   SYMADDR in the appropriate loads & adds & loads or whatever but
>   they need to be already there and processed by the middle-end.

OK. For the sparse checker question. The sparse checker does not use
OP_SYMADDR, can it have a global option default to on. Sparse checker
before running any file, turn that option to off? I also feel it strange that
sparse checker don't need the OP_SYMADDR while we force it to see
OP_SYMADDR just to ignore it.

Right now we don't have a real compiler using sparse(yet). However the
checker is the most common usage case. I do want to squeeze any
bit of performance out of sparse checker. So having OP_SYMADDR
can perennially slowdown sparse while nobody is benefiting from it (yet).
It is not speeding up sparse isn't it?

Can we enable OP_SYMADDR when some back end actually start to
use it? That is actually the same as the global option thing. Who needs
OP_SYMADDR turn the option on. It is no extra step as a different pass.

We can still turn the option on for checker for testing purpose. Is that
acceptable to you?

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luc Van Oostenryck Aug. 11, 2017, 12:25 p.m. UTC | #12
On Thu, Aug 10, 2017 at 09:17:44PM -0400, Christopher Li wrote:
> First of all, I want to clarify that I ask this question is
> to satisfy my curiously why it is done this way,
> either it is the best optimal way to do it.
> 
> It is not reason to reject your patch. I trust you have good
> reason to do it, I will accept the patch even if I don't fully
> understand it. I still want to understand it after applying
> it.

OK, I'll try to explain it the best I can.
 
> Do CSE do you mean just CSE OP_SYMADDR or CSE the complicated
> instruction expand from OP_SYMADDR.

Just the OP_SYMADDR, there is no expansion for them at this level.

> Let say you have code like this, the symbol is assign in different
> scope block.
> 
> Before CSE there is 3 OP_SYMADDR "a".  After CSE where is those 3
> OP_SYMADDR "a" is likely to be combine into one? Where would the
> CSE place the OP_SYMADDR if it is not one of the beginning or dominator
> etc?

Exactly like any other expression that is CSEed.

Yes, I elude your question, on purpose, because the question
about *where* those instructions need to be is unrelated to the
question "must these instructions be present or not".

> In that case, you might just place it before load or store. You don't seem
> to need CSE to do it. Sorry I feel that I am still missing some big picture.
> I am desperately trying to fill it.

Thing is that the current linearizer already *create* them, all of them,
just before the load or store that need them.
There are also a few other situations where they are needed: when doing
address calculation for symbols. For example:
	extern int a[];
	int *ptr;
	...
	ptr = &a[i];

But in all cases, the OP_SYMADDRs are correctly *created*.

The problme is that (in most cases) these OP_SYMADDR are *directly* 
optimized away or more exactly back-folded with their symbol.
For example:
	extern int a;
	int v;
	...
	v = a;

The linearized code is something like:
	symaddr	%r1, a
	load	%r2, %r1[0]
And this is *exactly* what is needed. Among others things
the load take as operand an *address* and not a symbol
which is something that belong more to the source language.

But after the very first pass of simplify_instruction() the
symaddr is 'simplified away' and we get:
	load	%r2, a[0]

It's just this simplification that should be removed.

> I also think that you make the OP_SYMADDR a lower level IR than
> what I original have in mind writing linearizer.

I didn't create them :)

> I expect there is a separate
> phase where you translate the sparse IR into machine level code, that
> is where you can insert the equivalent of OP_SYMADDR.

Of course.
But again, look at the title of the thread: the question has never been
about the creation/insertion of these OP_SYMADDRs but about their
non-elimination.

> OK. For the sparse checker question.

I won't answer now to this question because I think, if it is still
pertinant, it needs to be reformulated taking in account the fact
that *all* needed OP_SYMADDR are already created during linearization.

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/simplify.c b/simplify.c
index 5d00937f1..a84e4787f 100644
--- a/simplify.c
+++ b/simplify.c
@@ -1159,7 +1159,7 @@  int simplify_instruction(struct instruction *insn)
 	case OP_SYMADDR:
 		if (dead_insn(insn, NULL, NULL, NULL))
 			return REPEAT_CSE | REPEAT_SYMBOL_CLEANUP;
-		return replace_with_pseudo(insn, insn->symbol);
+		return 0;
 	case OP_CAST:
 	case OP_SCAST:
 	case OP_FPCAST: