diff mbox

fix: kill unreachable BBs after killing a child

Message ID 20170508185710.22021-1-luc.vanoostenryck@gmail.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Luc Van Oostenryck May 8, 2017, 6:57 p.m. UTC
When simplifying a switch into a simple branch all the now
unused children of the current BB must be removed.
If one of these children become now orphaned, it is directly
killed (it will need to be killed soon or later since it is
unreachable).

However, if one of the killed children is the header of a loop
where some variables are updated this may cause problems.
Indeed, by killing the header (which contains the phisrc of
the entry value of the variable) the whole loop may become
unreachable but is not killed yet, OTOH simplification of
the associated OP_PHI may create a cycle which may then be
detected later by simplify_one_memop() which will issue a
"crazy programmer" warning while the programmer was innocent.

This situation can be seen in code like:
	int *p;
	switch (i - i) {	// will be optimized to 0
	case 0:			// will be the simple branch
		return 0;
	case 1:			// will be optimized away
		p = ptr;
		do {		// will be an unreachable loop
			*p++ = 123;
		} while (--i);
	}

Fix this by calling kill_unreachable_bbs() after having
simplified the switch into a branch. This will avoid to
create a cycle with because of the removed phisrc in the
header and as an added benefit will avoid to waste time
trying to simplify BBs that are unreachable.

In addition, it's now useless to call kill_bb() for each
removed switch's children as kill_unreachable_bbs() will
do that too.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 linearize.c                 |  3 +--
 validation/crazy02-not-so.c | 22 ++++++++++++++++++++++
 2 files changed, 23 insertions(+), 2 deletions(-)
 create mode 100644 validation/crazy02-not-so.c

Comments

Christopher Li May 9, 2017, 10:38 a.m. UTC | #1
On Mon, May 8, 2017 at 11:57 AM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> Fix this by calling kill_unreachable_bbs() after having
> simplified the switch into a branch. This will avoid to
> create a cycle with because of the removed phisrc in the
> header and as an added benefit will avoid to waste time
> trying to simplify BBs that are unreachable.
>
> In addition, it's now useless to call kill_bb() for each
> removed switch's children as kill_unreachable_bbs() will
> do that too.

I thin the fix is correct. Have some very minor comment.

>
> Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
> ---
>  linearize.c                 |  3 +--
>  validation/crazy02-not-so.c | 22 ++++++++++++++++++++++
>  2 files changed, 23 insertions(+), 2 deletions(-)
>  create mode 100644 validation/crazy02-not-so.c
>
> diff --git a/linearize.c b/linearize.c
> index a9f36b823..ee9591897 100644
> --- a/linearize.c
> +++ b/linearize.c
> @@ -642,8 +642,6 @@ static void set_activeblock(struct entrypoint *ep, struct basic_block *bb)
>  static void remove_parent(struct basic_block *child, struct basic_block *parent)
>  {
>         remove_bb_from_list(&child->parents, parent, 1);
> -       if (!child->parents)
> -               kill_bb(child);

This makes every caller of remove_parent() need to clean up the
unreachable basic blocks. Currently there is only one caller  "insert_branch()"
which is fine. But if developer write a new code call this function, he might
forget to clean up the unreachable basic blocks. Kind of like a trap.


>  }
>
>  /* Change a "switch" into a branch */
> @@ -670,6 +668,7 @@ void insert_branch(struct basic_block *bb, struct instruction *jmp, struct basic
>                 remove_parent(child, bb);
>         } END_FOR_EACH_PTR(child);
>         PACK_PTR_LIST(&bb->children);
> +       kill_unreachable_bbs(bb->ep);

It is correct to do so. The kill unreachable is relative expensive.
Need to go through
all the basic block in a function. If there function has more than one
switch statement
get simplified, then each simplification will go through each basic block again.
Preferably  kill_unreachable() only run once at the finial stage.

I don't know how feasible to have remove_parent() mark the "ep" needs to clean.
Then at some later point kill off the dead basic block all together.

Currently simplify a switch statement should be relative rare. Have two switch
statement in same function get simplify at the same time should be
even rarer.
Might not worth the effort to optimize for that.

As it is, the patch is acceptable.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luc Van Oostenryck May 9, 2017, 3:27 p.m. UTC | #2
On Tue, May 09, 2017 at 03:38:43AM -0700, Christopher Li wrote:
> On Mon, May 8, 2017 at 11:57 AM, Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> > Fix this by calling kill_unreachable_bbs() after having
> > simplified the switch into a branch. This will avoid to
> > create a cycle with because of the removed phisrc in the
> > header and as an added benefit will avoid to waste time
> > trying to simplify BBs that are unreachable.
> >
> > In addition, it's now useless to call kill_bb() for each
> > removed switch's children as kill_unreachable_bbs() will
> > do that too.
> 
> I thin the fix is correct. Have some very minor comment.
> 
> > diff --git a/linearize.c b/linearize.c
> > index a9f36b823..ee9591897 100644
> > --- a/linearize.c
> > +++ b/linearize.c
> > @@ -642,8 +642,6 @@ static void set_activeblock(struct entrypoint *ep, struct basic_block *bb)
> >  static void remove_parent(struct basic_block *child, struct basic_block *parent)
> >  {
> >         remove_bb_from_list(&child->parents, parent, 1);
> > -       if (!child->parents)
> > -               kill_bb(child);
> 
> This makes every caller of remove_parent() need to clean up the
> unreachable basic blocks. Currently there is only one caller  "insert_branch()"
> which is fine. But if developer write a new code call this function, he might
> forget to clean up the unreachable basic blocks. Kind of like a trap.

Yes, true. I didn't liked it very much myself and hesitated to
remove the helper and directly calling remove_bb_from_list()
from insert_branch().  I'll do this now.
 
but note that in 'normal' situation, it's doesn't really matter
if the BB is killed now or not because:
1) it's only the direct descendant that is killed, the indirect
   ones have to wait anyway.
2) kill_unreachable_bbs() is called after each cleanup loops

> >
> >  /* Change a "switch" into a branch */
> > @@ -670,6 +668,7 @@ void insert_branch(struct basic_block *bb, struct instruction *jmp, struct basic
> >                 remove_parent(child, bb);
> >         } END_FOR_EACH_PTR(child);
> >         PACK_PTR_LIST(&bb->children);
> > +       kill_unreachable_bbs(bb->ep);
> 
> It is correct to do so. The kill unreachable is relative expensive.
> Need to go through
> all the basic block in a function.

I don't really agree with this.
When doing the patch I also at first thought the same but then
I changed my mind because:
1) it *only* need to run through all the BB (in fact the cost is
   O(nbr of BB edges) while there is a lot of places where we 
   loop through all *instructions*
2) the fact that the unneeded BB are killed as soon as possible
   means that the normal CSE+simplification won't wastly run
   anymore on the indirect descendant blocks (I can redo the
   measurements but I think I saw a measurable speedup because
   of this).

> If there function has more than one
> switch statement
> get simplified, then each simplification will go through each basic block again.
> Preferably  kill_unreachable() only run once at the finial stage.
> 
> I don't know how feasible to have remove_parent() mark the "ep" needs to clean.
> Then at some later point kill off the dead basic block all together.

Don't forget that the only reason for this patch is a *correctness*
issue when the removed BB is followed by a loop (if it has no other
entry point (which is generaly the case) and which thus become
unreachable).

We can mark the ep (the easiest is the reuse repeat_phase and add
REPEAT_FLOW (which I'll will need to do anyway for something else,
I'm polishing the patch)) but it won't change anything as we *must
not* allow simplifications to be done once there is such a dead
cycle and the only way to detect such cycles is via something
like the marker algorithm used by kill_unreachable_bbs().

> Currently simplify a switch statement should be relative rare.

Like a lot of things, it depends.
For example, the kernel use a lot inline functions or macro with
code like:
	...
	switch (sizeof(x)) {
	case 1: ...
	case 2: ...
	case 4: ...
	case 8: ...
	default: issue a build error
	}
Also, the simplification concerned here is 'insert_branch()'
which, like I explained in the commit message, is used by the
switch simplification but is also used by other branch
simplifications.

> Have two switch
> statement in same function get simplify at the same time should be
> even rarer.
> Might not worth the effort to optimize for that.
> 
> As it is, the patch is acceptable.

I really also would prefer that we would not have to walk the BBs
but I really think we don't have good others choices.
 
If we would really really want to avoid to call kill_unreachable_bss()
we could in simplify_one_memop(), once we detect a possible loop in the
addresses calculation (which is the core of the problem here: avoid
to detect such a false loop), instead of issuing a message is to first
call kill_unreachable_bbs() and then redo the memop simplification if
some BBs were killed and the concerned instruction still exist).
But:
- it's more complicated than needed
- the advanatge would only be based on the principle that
  kill_unreachable_bbs() is costly (which is not really)
- calling kill_unreachable_bbs() early has the advantage
  to avoid doing memop and instruction simplification and CSE
  on code that is in fact unreacheble (and those are more costly 
  because they are done on every instructions)
  I don't have the number anymore, but when testing what
  I saw (and I was a bit surprised at first) was that with the
  patch, compiling the kernel with sparse and calling test-linearize
  on GCC's testsuite and such was in fact slightly faster (but
  admitingly, the difference was small and it could have been
  within the accuracy of the measurementi. I can redo the 
  measurement if needed).

Also, when talking about performance, if you do some profiling
you very quickly realize:
- CSE is quite costly (it's not new, cfr.
   http://marc.info/?l=linux-sparse&m=111616763219436&w=4
  and commit b5a8032aaeaf2121a642ead32653d4288fa2983d)
  I have looked at this more than once but I don't see what we
  can do but:
  - avoid to call CSE on dead code
  - call CSE less often, possibly to not trying to reach the
    fix point, of course this impact the quality of the code
- memory allocation eats a lot of memory cycles too (about
  16-20%, not especially in userland but also by things
  like kernel's clear_page())
  But, see how you can easily win a speed up of 5%:
  http://marc.info/?l=linux-sparse&m=149198627609282&w=4
  There is a lot that can be done here. I have a few things
  in preparation but it won't be for soon.
- __add_ptr_list() is also surprinsingly high but I think that it's
  simply because it's really called a lot.

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christopher Li May 9, 2017, 4:08 p.m. UTC | #3
On Tue, May 9, 2017 at 8:27 AM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> Like a lot of things, it depends.
> For example, the kernel use a lot inline functions or macro with
> code like:
>         ...
>         switch (sizeof(x)) {
>         case 1: ...
>         case 2: ...
>         case 4: ...
>         case 8: ...
>         default: issue a build error
>         }

Ha, that is good to know.

> Also, the simplification concerned here is 'insert_branch()'
> which, like I explained in the commit message, is used by the
> switch simplification but is also used by other branch
> simplifications.
>
>> Have two switch
>> statement in same function get simplify at the same time should be
>> even rarer.
>> Might not worth the effort to optimize for that.
>>
>> As it is, the patch is acceptable.
>
> I really also would prefer that we would not have to walk the BBs
> but I really think we don't have good others choices.

Yes, that is what I am afraid as well, it might not worth the complexity
try to optimist for it.

I have not problem walk the BBs, I just don't want to walk it more than
once if easy avoidable. Looks like there is no easy way to get it.

Let's leave it as it is.  You convinced me.  ACK.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luc Van Oostenryck May 11, 2017, 10:26 p.m. UTC | #4
The goal of this series is to get rid of false "crazy programmer"
warnings. This warning is issued when a cycle is detected in
the address calculation of memory operations. Such cycle can
happen when simplifying phi-nodes dealing with uninitialized
variables.

However the very same diagnostic can also be emitted if we try
to simplify phi-nodes inside an unreachable loop (because localy
it's exactly the same situation: since the loop is unreachable
it's very possible that the induction variables have no more
an initial value). Of course, since this situation is only
the consequence of trying to simplify code that is not reachable
while the original may very well not contain any unitialized var,
this diagnostic should not be emitted in this situation.

The solution is to kill potential unreachable BBs before
trying to simplify them, each time a parent is removed
from a BB.

---

Changes since v1:
- split the patch in 3, the first 2 being small preparatory
  patches, the third being the meat.
- try to avoid to have to call kill_unreachable_bbs() in the
  rare situations where the BB having a parent removed still
  contains another parent and thus can't be unreachable.

Luc Van Oostenryck (3):
  introduce REPEAT_CFG_CLEANUP
  let kill_unreachable_bbs() clear REPEAT_CFG_CLEANUP
  fix: kill unreachable BBs after killing a child

 flow.c                      |  2 ++
 flow.h                      |  1 +
 linearize.c                 |  5 ++++-
 validation/crazy02-not-so.c | 22 ++++++++++++++++++++++
 4 files changed, 29 insertions(+), 1 deletion(-)
 create mode 100644 validation/crazy02-not-so.c
diff mbox

Patch

diff --git a/linearize.c b/linearize.c
index a9f36b823..ee9591897 100644
--- a/linearize.c
+++ b/linearize.c
@@ -642,8 +642,6 @@  static void set_activeblock(struct entrypoint *ep, struct basic_block *bb)
 static void remove_parent(struct basic_block *child, struct basic_block *parent)
 {
 	remove_bb_from_list(&child->parents, parent, 1);
-	if (!child->parents)
-		kill_bb(child);
 }
 
 /* Change a "switch" into a branch */
@@ -670,6 +668,7 @@  void insert_branch(struct basic_block *bb, struct instruction *jmp, struct basic
 		remove_parent(child, bb);
 	} END_FOR_EACH_PTR(child);
 	PACK_PTR_LIST(&bb->children);
+	kill_unreachable_bbs(bb->ep);
 }
 	
 
diff --git a/validation/crazy02-not-so.c b/validation/crazy02-not-so.c
new file mode 100644
index 000000000..fe7133587
--- /dev/null
+++ b/validation/crazy02-not-so.c
@@ -0,0 +1,22 @@ 
+int foo(int *ptr, int i)
+{
+	int *p;
+
+	switch (i - i) {		// will be optimized to 0
+	case 0:
+		return 0;
+	case 1:				// will be optimized away
+		p = ptr;
+		do {			// will be an unreachable loop
+			*p++ = 123;
+		} while (--i);
+		break;
+	}
+
+	return 1;
+}
+
+/*
+ * check-name: crazy02-not-so.c
+ * check-command: sparse -Wno-decl $file
+ */