diff mbox

[v2,6/8] transform (A & M) >> S to (A >> S) & (M >> S)

Message ID 20170807191205.86590-7-luc.vanoostenryck@gmail.com (mailing list archive)
State Superseded, archived
Headers show

Commit Message

Luc Van Oostenryck Aug. 7, 2017, 7:12 p.m. UTC
This is especially usefull when simplifying code
accessing bitfields.

Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---
 simplify.c                 | 29 ++++++++++++++++++++++++++++-
 validation/bitfield-size.c |  4 +---
 2 files changed, 29 insertions(+), 4 deletions(-)

Comments

Christopher Li Aug. 8, 2017, 12:22 a.m. UTC | #1
On Mon, Aug 7, 2017 at 3:12 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> This is especially usefull when simplifying code
> accessing bitfields.
>
>
> +static int simplify_lsr(struct instruction *insn, pseudo_t pseudo, long long value)
> +{
> +       struct instruction *def;
> +       unsigned long long mask;
> +
> +       if (!value)
> +               return replace_with_pseudo(insn, pseudo);
> +       switch (def_opcode(insn->src1)) {
> +       case OP_AND:
> +               // replace (A & M) >> S
> +               // by      (A >> S) & (M >> S)

Why (A>>S) & (M >>S) is easier to simplify?

The original has two instruction but the result has three.

Is that that because A>>S is more likely to find match in
the CSE?

What if there is no match found, does that mean it is
not useful to do this transformation?

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luc Van Oostenryck Aug. 8, 2017, 12:29 a.m. UTC | #2
On Tue, Aug 8, 2017 at 2:22 AM, Christopher Li <sparse@chrisli.org> wrote:

> Why (A>>S) & (M >>S) is easier to simplify?
>
> The original has two instruction but the result has three.

Only 2 because M & S are constants and so (M >> S) is already evaluated.

As described succinctly in the patch description, even with 2 instructions
the real motivation is only because it's one of the three patterns that arise
when storing a bitfield and reloading it later.

But isolating a AND (smaller) mask is always a good thing because there
is a lot of other opportunities for more simplifications that can be
done on them.

-- Luc
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Aug. 8, 2017, 1 a.m. UTC | #3
On Mon, Aug 7, 2017 at 5:22 PM, Christopher Li <sparse@chrisli.org> wrote:
>
> Why (A>>S) & (M >>S) is easier to simplify?
>
> The original has two instruction but the result has three.

No, M and S are constants, so the end result also has just two ("lsr"
and "and").

And the simplified form has a smaller constant.

So I think that optimization makes a lot of sense.

It *might* result in further simplifications just because the final
'and' might be simpler, and even the "A >> S" might now be something
you can simplify (for example "A >> 16" could be turned into a 16-bit
load if the source of A is a 32-bit load).

But even in the absence of such further simplification, the smaller
constant thing likely makes it worth it. Almost all architectures have
an easier time with smaller constants, and doing

   (a >> 24) & 255;

is often noticeably more efficient than

   (a & 0xff000000) >> 24;

just because of the constant issue.

                Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Luc Van Oostenryck Aug. 8, 2017, 1:38 a.m. UTC | #4
On Tue, Aug 8, 2017 at 3:00 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> It *might* result in further simplifications just because the final
> 'and' might be simpler, and even the "A >> S" might now be something
> you can simplify (for example "A >> 16" could be turned into a 16-bit
> load if the source of A is a 32-bit load).
>
> But even in the absence of such further simplification, the smaller
> constant thing likely makes it worth it. Almost all architectures have
> an easier time with smaller constants, and doing
>
>    (a >> 24) & 255;
>
> is often noticeably more efficient than
>
>    (a & 0xff000000) >> 24;
>
> just because of the constant issue.

In truth, my real motivation is only apparent when looking at this patch
and the following. The original complete pattern was:
   ((X << S) & (M' << S)) >> S
with A = (X << S) and M = (M' << S)
which now give:
   ((X << S) >> S) & M'
and with the next patch will give
   (X & M'') & M'
which will simplify to
   X & M'''

But yes, these optimizations are worthwhile by themselves too
because they create small(er) masks.

-- Luc

(A & M) >> S to

(A >> S) & (M >> S)
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christopher Li Aug. 8, 2017, 1:48 a.m. UTC | #5
On Mon, Aug 7, 2017 at 8:29 PM, Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
> On Tue, Aug 8, 2017 at 2:22 AM, Christopher Li <sparse@chrisli.org> wrote:
>
>> Why (A>>S) & (M >>S) is easier to simplify?
>>
>> The original has two instruction but the result has three.
>
> Only 2 because M & S are constants and so (M >> S) is already evaluated.

Ah, thanks for point it out the obvious. Your M means Mask.
I did not catch that.

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christopher Li Aug. 8, 2017, 1:50 a.m. UTC | #6
On Mon, Aug 7, 2017 at 9:00 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> But even in the absence of such further simplification, the smaller
> constant thing likely makes it worth it. Almost all architectures have
> an easier time with smaller constants, and doing
>
>    (a >> 24) & 255;
>
> is often noticeably more efficient than
>
>    (a & 0xff000000) >> 24;
>
> just because of the constant issue.

That totally make sense.

Thanks

Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-sparse" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/simplify.c b/simplify.c
index 9893613da..f1a898700 100644
--- a/simplify.c
+++ b/simplify.c
@@ -412,6 +412,32 @@  static int simplify_asr(struct instruction *insn, pseudo_t pseudo, long long val
 	return 0;
 }
 
+static int simplify_lsr(struct instruction *insn, pseudo_t pseudo, long long value)
+{
+	struct instruction *def;
+	unsigned long long mask;
+
+	if (!value)
+		return replace_with_pseudo(insn, pseudo);
+	switch (def_opcode(insn->src1)) {
+	case OP_AND:
+		// replace (A & M) >> S
+		// by      (A >> S) & (M >> S)
+		def = insn->src1->def;
+		if (!constant(def->src2))
+			break;
+		if (nbr_pseudo_users(insn->src1) > 1)
+			break;
+		mask = def->src2->value;
+		def->opcode = OP_LSR;
+		def->src2 = value_pseudo(value);
+		insn->opcode = OP_AND;
+		insn->src2 = value_pseudo(mask >> value);
+		return REPEAT_CSE;
+	}
+	return 0;
+}
+
 static int simplify_mul_div(struct instruction *insn, long long value)
 {
 	unsigned long long sbit = 1ULL << (insn->size - 1);
@@ -562,13 +588,14 @@  static int simplify_constant_rightside(struct instruction *insn)
 	case OP_ADD:
 	case OP_OR: case OP_XOR:
 	case OP_SHL:
-	case OP_LSR:
 	case_neutral_zero:
 		if (!value)
 			return replace_with_pseudo(insn, insn->src1);
 		return 0;
 	case OP_ASR:
 		return simplify_asr(insn, insn->src1, value);
+	case OP_LSR:
+		return simplify_lsr(insn, insn->src1, value);
 
 	case OP_MODU: case OP_MODS:
 		if (value == 1)
diff --git a/validation/bitfield-size.c b/validation/bitfield-size.c
index c8c94bb15..34400479a 100644
--- a/validation/bitfield-size.c
+++ b/validation/bitfield-size.c
@@ -36,8 +36,6 @@  unsigned int get_pbfi_b(struct bfi *bf) { return bf->b; }
  * check-output-ignore
  *
  * check-output-excludes: cast\\.4
- * check-output-pattern-6-times: cast\\.
  * check-output-pattern-6-times: lsr\\..*\\$6
- * check-output-pattern-6-times: and\\..*\\$15
- * check-output-pattern-6-times: and\\..*\\$960
+ * check-output-pattern-12-times: and\\..*\\$15
  */