diff mbox

[v2,07/19] arm64: insn: Add encoder for bitwise operations using litterals

Message ID 20171211144937.4537-8-marc.zyngier@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marc Zyngier Dec. 11, 2017, 2:49 p.m. UTC
We lack a way to encode operations such as AND, ORR, EOR that take
an immediate value. Doing so is quite involved, and is all about
reverse engineering the decoding algorithm described in the
pseudocode function DecodeBitMasks().

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm64/include/asm/insn.h |   9 +++
 arch/arm64/kernel/insn.c      | 137 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 146 insertions(+)

Comments

James Morse Dec. 12, 2017, 6:32 p.m. UTC | #1
Hi Marc,

On 11/12/17 14:49, Marc Zyngier wrote:
> We lack a way to encode operations such as AND, ORR, EOR that take
> an immediate value. Doing so is quite involved, and is all about
> reverse engineering the decoding algorithm described in the
> pseudocode function DecodeBitMasks().


As this is over my head, I've been pushing random encodings through gas/objdump
and then tracing them through here.... can this encode 0xf80000000fffffff?

gas thinks this is legal:
|   0:   92458000        and     x0, x0, #0xf80000000fffffff

I make that N=1, S=0x20, R=0x05.
(I'm still working out what 'S' means)


> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 7e432662d454..326b17016485 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c

> +static u32 aarch64_encode_immediate(u64 imm,
> +				    enum aarch64_insn_variant variant,
> +				    u32 insn)
> +{
> +	unsigned int immr, imms, n, ones, ror, esz, tmp;
> +	u64 mask;

[...]

> +	/* N is only set if we're encoding a 64bit value */
> +	n = esz == 64;
> +
> +	/* Trim imm to the element size */
> +	mask = BIT(esz - 1) - 1;
> +	imm &= mask;

Won't this lose the top bit of a 64bit immediate?

(but then you put it back later, so something funny is going on)

This becomes 0x780000000fffffff,


> +
> +	/* That's how many ones we need to encode */
> +	ones = hweight64(imm);

meaning we're short a one here,


> +
> +	/*
> +	 * imms is set to (ones - 1), prefixed with a string of ones
> +	 * and a zero if they fit. Cap it to 6 bits.
> +	 */
> +	imms  = ones - 1;
> +	imms |= 0xf << ffs(esz);
> +	imms &= BIT(6) - 1;

so imms is 0x1f, not 0x20.


> +	/* Compute the rotation */
> +	if (range_of_ones(imm)) {
> +		/*
> +		 * Pattern: 0..01..10..0
> +		 *
> +		 * Compute how many rotate we need to align it right
> +		 */
> +		ror = ffs(imm) - 1;

(how come range_of_ones() uses __ffs64() on the same value?)


> +	} else {
> +		/*
> +		 * Pattern: 0..01..10..01..1
> +		 *
> +		 * Fill the unused top bits with ones, and check if
> +		 * the result is a valid immediate (all ones with a
> +		 * contiguous ranges of zeroes).
> +		 */

> +		imm |= ~mask;

but here we put the missing one back,


> +		if (!range_of_ones(~imm))
> +			return AARCH64_BREAK_FAULT;

meaning we pass this check and carry on, (even though 0x780000000fffffff isn't a
legal value)


(this next bit I haven't worked out yet)
> +		/*
> +		 * Compute the rotation to get a continuous set of
> +		 * ones, with the first bit set at position 0
> +		 */
> +		ror = fls(~imm);
> +	}
> +
> +	/*
> +	 * immr is the number of bits we need to rotate back to the
> +	 * original set of ones. Note that this is relative to the
> +	 * element size...
> +	 */
> +	immr = (esz - ror) & (esz - 1);


If I've followed this through correctly, this results in:
|   0:   92457c00        and     x0, x0, #0xf800000007ffffff

... which wasn't the immediate I started with.


Unless I've gone wrong, I think the 'Trim imm to the element size' code needs to
move up into the esz-reducing loop so it doesn't happen for a 64bit immediate.



Thanks,

James
Peter Maydell Dec. 12, 2017, 6:56 p.m. UTC | #2
On 11 December 2017 at 14:49, Marc Zyngier <marc.zyngier@arm.com> wrote:
> We lack a way to encode operations such as AND, ORR, EOR that take
> an immediate value. Doing so is quite involved, and is all about
> reverse engineering the decoding algorithm described in the
> pseudocode function DecodeBitMasks().

Is it possible to borrow the existing tested implementation
which a compiler surely must have for this, rather than having
to reinvent this rather complicated wheel? Here's LLVM's version:

https://github.com/llvm-mirror/llvm/blob/93e6e5414ded14bcbb233baaaa5567132fee9a0c/lib/Target/AArch64/MCTargetDesc/AArch64AddressingModes.h#L209

(confirming that the LLVM license is GPLv2 compatible is
left as an exercise for the reader, but I'm pretty sure it is)

PS: typo in subject: 'literal'.

thanks
-- PMM
Peter Maydell Dec. 12, 2017, 11:40 p.m. UTC | #3
On 12 December 2017 at 18:32, James Morse <james.morse@arm.com> wrote:
> As this is over my head, I've been pushing random encodings through gas/objdump
> and then tracing them through here.... can this encode 0xf80000000fffffff?
>
> gas thinks this is legal:
> |   0:   92458000        and     x0, x0, #0xf80000000fffffff
>
> I make that N=1, S=0x20, R=0x05.
> (I'm still working out what 'S' means)

This comment from QEMU (describing the decode direction, ie
immn,imms,immr => immediate) might assist:

    /* The bit patterns we create here are 64 bit patterns which
     * are vectors of identical elements of size e = 2, 4, 8, 16, 32 or
     * 64 bits each. Each element contains the same value: a run
     * of between 1 and e-1 non-zero bits, rotated within the
     * element by between 0 and e-1 bits.
     *
     * The element size and run length are encoded into immn (1 bit)
     * and imms (6 bits) as follows:
     * 64 bit elements: immn = 1, imms = <length of run - 1>
     * 32 bit elements: immn = 0, imms = 0 : <length of run - 1>
     * 16 bit elements: immn = 0, imms = 10 : <length of run - 1>
     *  8 bit elements: immn = 0, imms = 110 : <length of run - 1>
     *  4 bit elements: immn = 0, imms = 1110 : <length of run - 1>
     *  2 bit elements: immn = 0, imms = 11110 : <length of run - 1>
     * Notice that immn = 0, imms = 11111x is the only combination
     * not covered by one of the above options; this is reserved.
     * Further, <length of run - 1> all-ones is a reserved pattern.
     *
     * In all cases the rotation is by immr % e (and immr is 6 bits).
     */

so N=1 S=0x20 means run length 33, element size 64 (and
indeed your immediate has a run of 33 set bits).

(The Arm ARM pseudocode is confusing here because it merges
the handling of logical-immediates and bitfield instructions
together, which is nice if you're a hardware engineer. For
software you're much better off keeping the two separate.)

thanks
-- PMM
diff mbox

Patch

diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h
index 21fffdd290a3..815b35bc53ed 100644
--- a/arch/arm64/include/asm/insn.h
+++ b/arch/arm64/include/asm/insn.h
@@ -315,6 +315,10 @@  __AARCH64_INSN_FUNCS(eor,	0x7F200000, 0x4A000000)
 __AARCH64_INSN_FUNCS(eon,	0x7F200000, 0x4A200000)
 __AARCH64_INSN_FUNCS(ands,	0x7F200000, 0x6A000000)
 __AARCH64_INSN_FUNCS(bics,	0x7F200000, 0x6A200000)
+__AARCH64_INSN_FUNCS(and_imm,	0x7F800000, 0x12000000)
+__AARCH64_INSN_FUNCS(orr_imm,	0x7F800000, 0x32000000)
+__AARCH64_INSN_FUNCS(eor_imm,	0x7F800000, 0x52000000)
+__AARCH64_INSN_FUNCS(ands_imm,	0x7F800000, 0x72000000)
 __AARCH64_INSN_FUNCS(b,		0xFC000000, 0x14000000)
 __AARCH64_INSN_FUNCS(bl,	0xFC000000, 0x94000000)
 __AARCH64_INSN_FUNCS(cbz,	0x7F000000, 0x34000000)
@@ -424,6 +428,11 @@  u32 aarch64_insn_gen_logical_shifted_reg(enum aarch64_insn_register dst,
 					 int shift,
 					 enum aarch64_insn_variant variant,
 					 enum aarch64_insn_logic_type type);
+u32 aarch64_insn_gen_logical_immediate(enum aarch64_insn_logic_type type,
+				       enum aarch64_insn_variant variant,
+				       enum aarch64_insn_register Rn,
+				       enum aarch64_insn_register Rd,
+				       u64 imm);
 u32 aarch64_insn_gen_prefetch(enum aarch64_insn_register base,
 			      enum aarch64_insn_prfm_type type,
 			      enum aarch64_insn_prfm_target target,
diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 7e432662d454..326b17016485 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1485,3 +1485,140 @@  pstate_check_t * const aarch32_opcode_cond_checks[16] = {
 	__check_hi, __check_ls, __check_ge, __check_lt,
 	__check_gt, __check_le, __check_al, __check_al
 };
+
+static bool range_of_ones(u64 val)
+{
+	/* Doesn't handle full ones or full zeroes */
+	int x = __ffs64(val) - 1;
+	u64 sval = val >> x;
+
+	/* One of Sean Eron Anderson's bithack tricks */
+	return ((sval + 1) & (sval)) == 0;
+}
+
+static u32 aarch64_encode_immediate(u64 imm,
+				    enum aarch64_insn_variant variant,
+				    u32 insn)
+{
+	unsigned int immr, imms, n, ones, ror, esz, tmp;
+	u64 mask;
+
+	/* Can't encode full zeroes or full ones */
+	if (!imm || !~imm)
+		return AARCH64_BREAK_FAULT;
+
+	switch (variant) {
+	case AARCH64_INSN_VARIANT_32BIT:
+		if (upper_32_bits(imm))
+			return AARCH64_BREAK_FAULT;
+		esz = 32;
+		break;
+	case AARCH64_INSN_VARIANT_64BIT:
+		insn |= AARCH64_INSN_SF_BIT;
+		esz = 64;
+		break;
+	default:
+		pr_err("%s: unknown variant encoding %d\n", __func__, variant);
+		return AARCH64_BREAK_FAULT;
+	}
+
+	/*
+	 * Inverse of Replicate(). Try to spot a repeating pattern
+	 * with a pow2 stride.
+	 */
+	for (tmp = esz; tmp > 2; tmp /= 2) {
+		u64 emask = BIT(tmp / 2) - 1;
+
+		if ((imm & emask) != ((imm >> (tmp / 2)) & emask))
+			break;
+
+		esz = tmp;
+	}
+
+	/* N is only set if we're encoding a 64bit value */
+	n = esz == 64;
+
+	/* Trim imm to the element size */
+	mask = BIT(esz - 1) - 1;
+	imm &= mask;
+
+	/* That's how many ones we need to encode */
+	ones = hweight64(imm);
+
+	/*
+	 * imms is set to (ones - 1), prefixed with a string of ones
+	 * and a zero if they fit. Cap it to 6 bits.
+	 */
+	imms  = ones - 1;
+	imms |= 0xf << ffs(esz);
+	imms &= BIT(6) - 1;
+
+	/* Compute the rotation */
+	if (range_of_ones(imm)) {
+		/*
+		 * Pattern: 0..01..10..0
+		 *
+		 * Compute how many rotate we need to align it right
+		 */
+		ror = ffs(imm) - 1;
+	} else {
+		/*
+		 * Pattern: 0..01..10..01..1
+		 *
+		 * Fill the unused top bits with ones, and check if
+		 * the result is a valid immediate (all ones with a
+		 * contiguous ranges of zeroes).
+		 */
+		imm |= ~mask;
+		if (!range_of_ones(~imm))
+			return AARCH64_BREAK_FAULT;
+
+		/*
+		 * Compute the rotation to get a continuous set of
+		 * ones, with the first bit set at position 0
+		 */
+		ror = fls(~imm);
+	}
+
+	/*
+	 * immr is the number of bits we need to rotate back to the
+	 * original set of ones. Note that this is relative to the
+	 * element size...
+	 */
+	immr = (esz - ror) & (esz - 1);
+
+	insn = aarch64_insn_encode_immediate(AARCH64_INSN_IMM_N, insn, n);
+	insn = aarch64_insn_encode_immediate(AARCH64_INSN_IMM_R, insn, immr);
+	return aarch64_insn_encode_immediate(AARCH64_INSN_IMM_S, insn, imms);
+}
+
+u32 aarch64_insn_gen_logical_immediate(enum aarch64_insn_logic_type type,
+				       enum aarch64_insn_variant variant,
+				       enum aarch64_insn_register Rn,
+				       enum aarch64_insn_register Rd,
+				       u64 imm)
+{
+	u32 insn;
+
+	switch (type) {
+	case AARCH64_INSN_LOGIC_AND:
+		insn = aarch64_insn_get_and_imm_value();
+		break;
+	case AARCH64_INSN_LOGIC_ORR:
+		insn = aarch64_insn_get_orr_imm_value();
+		break;
+	case AARCH64_INSN_LOGIC_EOR:
+		insn = aarch64_insn_get_eor_imm_value();
+		break;
+	case AARCH64_INSN_LOGIC_AND_SETFLAGS:
+		insn = aarch64_insn_get_ands_imm_value();
+		break;
+	default:
+		pr_err("%s: unknown logical encoding %d\n", __func__, type);
+		return AARCH64_BREAK_FAULT;
+	}
+
+	insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RD, insn, Rd);
+	insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RN, insn, Rn);
+	return aarch64_encode_immediate(imm, variant, insn);
+}