diff mbox series

experimental: code sinking

Message ID 20201204163315.68538-1-luc.vanoostenryck@gmail.com (mailing list archive)
State New
Headers show
Series experimental: code sinking | expand

Commit Message

Luc Van Oostenryck Dec. 4, 2020, 4:33 p.m. UTC
A lot of the false 'context imbalance' warnings are caused by
a potential jump-threading being blocked between 2 conditional
branches on the same condition because the second CBR belong
to a non-empty BB. Often the offending instructions can be moved
to some other BB, sometimes even with some added advantages.

This patch help a bit with these false warnings by doing a limited
form of code sinking: blocking instructions with a single user
are moved in the BB where they're used, possibly making the
original BB empty and thus making the jump-threading possible.

Note: It's not the intention to use the patch as is.
      Ideally, it should first be checked if the original BB
      can be made empty before moving the instructions around,
      but this should be coordinated with other ways of moving
      these instructions.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 Makefile    |  1 +
 code-sink.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 optimize.c  |  2 ++
 optimize.h  |  3 ++
 4 files changed, 98 insertions(+)
 create mode 100644 code-sink.c

Comments

Linus Torvalds Dec. 4, 2020, 5:13 p.m. UTC | #1
On Fri, Dec 4, 2020 at 8:34 AM Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> A lot of the false 'context imbalance' warnings are caused by
> a potential jump-threading being blocked between 2 conditional
> branches on the same condition because the second CBR belong
> to a non-empty BB. Often the offending instructions can be moved
> to some other BB, sometimes even with some added advantages.

I hope you don't just use the context imbalance as a sign of this
being a good idea, because a lot of the context imbalances are likely
real and valid.

I do think moving the instruction to the (single) user sounds like a
good idea in some cases, but I'm a bit worried about doing it quite
this mindlessly. It can expand on liveness a lot - while the liveness
of the result of the sunk instruction shrinks, the liveness of the
sources to the sunk instruction can grow a lot.

That's obviously a non-issue for the use of sparse as an analysis tool
(and that's clearly the primary use), but I'd still like to think that
code generation might matter.

So I think this might be better with more heuristics. And explaining
them. Right now you have one heuristic: you only sink instructions
from bb's that end with a conditional branch. I'm not entirely sure
that I understand the reason for that heuristic, it smells a bit
arbitrary to me (I suspect it was the case you saw when looking at
examples).

On that note: would also be lovely to actually see examples of what
this results in - and not necessarily about just the context imbalance
again.

There might be cases where instruction sinking makes sense even
outside the "can we empty this bb entirely" issue. Not that I can
think of any, but I wonder if this could be used to actually shrink
liveness regions (if both the inputs to the sunk instruction are live
_anyway_ at the target, then sinking the instruction should actually
improve liveness in general, for example).

             Linus
Luc Van Oostenryck Dec. 4, 2020, 5:45 p.m. UTC | #2
On Fri, Dec 04, 2020 at 09:13:50AM -0800, Linus Torvalds wrote:
> On Fri, Dec 4, 2020 at 8:34 AM Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >
> > A lot of the false 'context imbalance' warnings are caused by
> > a potential jump-threading being blocked between 2 conditional
> > branches on the same condition because the second CBR belong
> > to a non-empty BB. Often the offending instructions can be moved
> > to some other BB, sometimes even with some added advantages.
> 
> I hope you don't just use the context imbalance as a sign of this
> being a good idea, because a lot of the context imbalances are likely
> real and valid.

Not exactly, but I confess that currently I'm focusing a lot on the
'*false* context imbalance' (those I can see in the C code or in
the IR that should be OK but sparse warning on them nevertheless). 
I've not a very clear idea of how much of those warnings are real
(but I'm begin to think more and more than most are real).

By far, the most common cause of these false warnings is a CBR-CBR
on the same condition that is not simplified away because the second
one is not empty.

> I do think moving the instruction to the (single) user sounds like a
> good idea in some cases, but I'm a bit worried about doing it quite
> this mindlessly. It can expand on liveness a lot - while the liveness
> of the result of the sunk instruction shrinks, the liveness of the
> sources to the sunk instruction can grow a lot.
> 
> That's obviously a non-issue for the use of sparse as an analysis tool
> (and that's clearly the primary use), but I'd still like to think that
> code generation might matter.

Yes, I agree for both points.

> So I think this might be better with more heuristics. And explaining
> them. Right now you have one heuristic: you only sink instructions
> from bb's that end with a conditional branch. I'm not entirely sure
> that I understand the reason for that heuristic, it smells a bit
> arbitrary to me (I suspect it was the case you saw when looking at
> examples).

Yes, indeed, it's arbitrary because it's the only case I'm interested
about for these 'false context imbalance'. But no worries, as explained
in the patch description, it's not my intention to merge this, certainly
not as-is. It's more a kind of experiment to play with, a kind of
exploratory tool.
 
> On that note: would also be lovely to actually see examples of what
> this results in - and not necessarily about just the context imbalance
> again.

Yes, I'll add this (but I'm not sure to have anything significant
not related to emptying a BB ending with a CBR).

> There might be cases where instruction sinking makes sense even
> outside the "can we empty this bb entirely" issue. Not that I can
> think of any, but I wonder if this could be used to actually shrink
> liveness regions (if both the inputs to the sunk instruction are live
> _anyway_ at the target, then sinking the instruction should actually
> improve liveness in general, for example).

I don't think I understand this. In the case of an UNOP, sinking it
increase the liveness and decrease the liveness of the output, so
it should not matter much. In the case of an BINOP or select, sinking
it will decrease the liveness of the unique output but increase the
liveness of the inputs. So, it seems to me that sinking would
globally increase the liveness (contrary to moving up instructions).
Am I missing something?

-- Luc
Linus Torvalds Dec. 4, 2020, 6:15 p.m. UTC | #3
On Fri, Dec 4, 2020 at 9:45 AM Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> > There might be cases where instruction sinking makes sense even
> > outside the "can we empty this bb entirely" issue. Not that I can
> > think of any, but I wonder if this could be used to actually shrink
> > liveness regions (if both the inputs to the sunk instruction are live
> > _anyway_ at the target, then sinking the instruction should actually
> > improve liveness in general, for example).
>
> I don't think I understand this. In the case of an UNOP, sinking it
> increase the liveness and decrease the liveness of the output, so
> it should not matter much.

Right. The UNOP case should be a no-op from a liveness perspective, but:

> In the case of an BINOP or select, sinking
> it will decrease the liveness of the unique output but increase the
> liveness of the inputs. So, it seems to me that sinking would
> globally increase the liveness (contrary to moving up instructions).
> Am I missing something?

No, moving a binop could actually *shrink* liveness under the right
circumstances - namely when the sources of the binop are live
regardless.

Completely stupid example that makes no sense, and only exists to
illustrate the issue:

    int diff(int x, int y);
    int fn2(int x, int y, int sum, int diff);

    int test(int x, int y)
    {
        int sum = x+y;

        return fn2(x, y, sum, diff(x,y));
    }

which generates

    add.32      %r3 <- %arg1, %arg2
    call.32     %r9 <- fn1, %arg1, %arg2
    call.32     %r10 <- fn2, %arg1, %arg2, %r3, %r9
    ret.32      %r10

but it would actually improve liveness if that "add" was moved down -
because even though it "expands" the liveness of %arg1/arg2 by moving
the use of those down, both of those argument pseudos have later uses
_anyway_. So that expansion of liveness is a non-issue.

Instead, it shrinks the liveness region of %r3. Ergo, it actually
shrinks liveness region in the big picture.

Now, the above stupid example is one single bb, so in that sense it's
not really relevant for your inter-bb movement, but that doesn't
actually change the argument at all. Insert a conditional in there to
get a multi-bb case.

              Linus
Luc Van Oostenryck Dec. 4, 2020, 8:23 p.m. UTC | #4
On Fri, Dec 04, 2020 at 10:15:30AM -0800, Linus Torvalds wrote:
> On Fri, Dec 4, 2020 at 9:45 AM Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >
> > > There might be cases where instruction sinking makes sense even
> > > outside the "can we empty this bb entirely" issue. Not that I can
> > > think of any, but I wonder if this could be used to actually shrink
> > > liveness regions (if both the inputs to the sunk instruction are live
> > > _anyway_ at the target, then sinking the instruction should actually
> > > improve liveness in general, for example).
> >
> > I don't think I understand this. In the case of an UNOP, sinking it
> > increase the liveness and decrease the liveness of the output, so
> > it should not matter much.
> 
> Right. The UNOP case should be a no-op from a liveness perspective, but:
> 
> > In the case of an BINOP or select, sinking
> > it will decrease the liveness of the unique output but increase the
> > liveness of the inputs. So, it seems to me that sinking would
> > globally increase the liveness (contrary to moving up instructions).
> > Am I missing something?
> 
> No, moving a binop could actually *shrink* liveness under the right
> circumstances - namely when the sources of the binop are live
> regardless.

Hmm yes, indeed. I thought about that but I also implicitly thought
there was a dual effect for the output, but there isn't. And so the
'cost' is not the same before the 'last occurrence of other uses' than
after it. In short: "moving down is good but only when not too low,
unless it's an unop".

Anyway, the idea would be to do such moves only if they would effectively
empty and even then it's clear what is the exact advantage (other than
for imbalance). Also, moving it to the point where it is needed was very
easy. Moving it just past the CBR (or somewhere in the middle) will be more
complicated. I'll see.

-- Luc
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 313664467151..5ba54659f625 100644
--- a/Makefile
+++ b/Makefile
@@ -35,6 +35,7 @@  LIB_OBJS :=
 LIB_OBJS += allocate.o
 LIB_OBJS += builtin.o
 LIB_OBJS += char.o
+LIB_OBJS += code-sink.o
 LIB_OBJS += compat-$(OS).o
 LIB_OBJS += cse.o
 LIB_OBJS += dissect.o
diff --git a/code-sink.c b/code-sink.c
new file mode 100644
index 000000000000..566ddec028a0
--- /dev/null
+++ b/code-sink.c
@@ -0,0 +1,92 @@ 
+#include "optimize.h"
+#include "lib.h"
+#include "linearize.h"
+
+
+static inline struct instruction *get_user(pseudo_t p)
+{
+	struct pseudo_user *pu;
+
+	FOR_EACH_PTR(p->users, pu) {
+		if (!pu)
+			continue;
+		return pu->insn;
+	} END_FOR_EACH_PTR(pu);
+	return NULL;
+}
+
+static bool sink_insn(struct instruction *insn, struct basic_block *bb)
+{
+	struct instruction *curr;
+
+	FOR_EACH_PTR(bb->insns, curr) {
+		if (!curr->bb)
+			continue;
+		if (curr->opcode == OP_PHI)
+			continue;
+		INSERT_CURRENT(insn, curr);
+		insn->bb = bb;
+		return true;
+	} END_FOR_EACH_PTR(curr);
+	return false;
+}
+
+static int code_sink_bb(struct basic_block *bb)
+{
+	struct instruction *insn;
+	int changed = 0;
+
+	FOR_EACH_PTR_REVERSE(bb->insns, insn) {
+		struct instruction *user;
+		pseudo_t target;
+
+		if (!insn->bb)
+			continue;
+		switch (insn->opcode) {
+		case OP_BINARY ... OP_BINCMP_END:
+		case OP_UNOP ... OP_UNOP_END:
+		case OP_SYMADDR:
+		case OP_SLICE:
+		case OP_SEL: case OP_FMADD:
+		case OP_LABEL: case OP_SETVAL: case OP_SETFVAL:
+			break;
+		case OP_CBR:
+		case OP_INLINED_CALL:
+		case OP_NOP:
+			continue;
+		default:
+			continue;
+		}
+
+		target = insn->target;
+		if (!one_use(target))
+			continue;
+		user = get_user(target);
+		if (!user || !user->bb || user->bb == bb)
+			continue;
+		if (!sink_insn(insn, user->bb))
+			continue;
+		DELETE_CURRENT_PTR(insn);
+		changed = 1;
+	} END_FOR_EACH_PTR_REVERSE(insn);
+	return changed;
+}
+
+int code_sink(struct entrypoint *ep)
+{
+	struct basic_block *bb;
+	int changed = 0;
+
+	FOR_EACH_PTR(ep->bbs, bb) {
+		struct instruction *last;
+
+		if (!bb->ep)
+			continue;
+		last = last_instruction(bb->insns);
+		switch (last->opcode) {
+		case OP_CBR:
+			changed |= code_sink_bb(bb);
+		}
+	} END_FOR_EACH_PTR(bb);
+	return changed;
+}
diff --git a/optimize.c b/optimize.c
index 3351e67b9d5e..b652b0e76d2a 100644
--- a/optimize.c
+++ b/optimize.c
@@ -105,6 +105,8 @@  repeat:
 		pack_basic_blocks(ep);
 		if (repeat_phase & REPEAT_CFG_CLEANUP)
 			cleanup_cfg(ep);
+		if (code_sink(ep))
+			repeat_phase |= REPEAT_CSE;
 	} while (repeat_phase);
 
 	vrfy_flow(ep);
diff --git a/optimize.h b/optimize.h
index 31e2cf081704..d9ac9cd48ea2 100644
--- a/optimize.h
+++ b/optimize.h
@@ -6,4 +6,7 @@  struct entrypoint;
 /* optimize.c */
 void optimize(struct entrypoint *ep);
 
+/* sink.c */
+int code_sink(struct entrypoint *ep);
+
 #endif