diff mbox

KVM: arm/arm64: vgic: Prevent VGIC_ADDR_TO_INTID from emiting divisions

Message ID 20161029111901.16668-1-marc.zyngier@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marc Zyngier Oct. 29, 2016, 11:19 a.m. UTC
Using non-constant number of bits for VGIC_ADDR_TO_INTID() leads
to gcc 6.1 emiting calls to __aeabi_uldivmod, which the kernel
does not implement.

As we really don't want to implement complex division in the kernel,
the only other option is to prove to the compiler that there is only
a few values that are possible for the number of bits per IRQ, and
that they are all power of 2.

We turn the VGIC_ADDR_TO_INTID macro into a switch that looks for
the supported set of values (1, 2, 8, 64), and perform the computation
accordingly. When "bits" is a constant, the compiler optimizes
away the other cases. If not, we end-up with a small number of cases
that GCC optimises reasonably well. Out of range values are detected
both at build time (constants) and at run time (variables).

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
This should be applied *before* Andre's patch fixing out of bound SPIs.

 virt/kvm/arm/vgic/vgic-mmio.h | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

Comments

Christoffer Dall Nov. 1, 2016, 3:28 p.m. UTC | #1
On Sat, Oct 29, 2016 at 12:19:01PM +0100, Marc Zyngier wrote:
> Using non-constant number of bits for VGIC_ADDR_TO_INTID() leads
> to gcc 6.1 emiting calls to __aeabi_uldivmod, which the kernel
> does not implement.
> 
> As we really don't want to implement complex division in the kernel,
> the only other option is to prove to the compiler that there is only
> a few values that are possible for the number of bits per IRQ, and
> that they are all power of 2.
> 
> We turn the VGIC_ADDR_TO_INTID macro into a switch that looks for
> the supported set of values (1, 2, 8, 64), and perform the computation
> accordingly. When "bits" is a constant, the compiler optimizes
> away the other cases. If not, we end-up with a small number of cases
> that GCC optimises reasonably well. Out of range values are detected
> both at build time (constants) and at run time (variables).
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
> This should be applied *before* Andre's patch fixing out of bound SPIs.
> 
>  virt/kvm/arm/vgic/vgic-mmio.h | 33 ++++++++++++++++++++++++++++++++-
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
> index 4c34d39..a457282 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.h
> +++ b/virt/kvm/arm/vgic/vgic-mmio.h
> @@ -57,10 +57,41 @@ extern struct kvm_io_device_ops kvm_io_gic_ops;
>   * multiplication with the inverted fraction, and scale up both the
>   * numerator and denominator with 8 to support at most 64 bits per IRQ:
>   */
> -#define VGIC_ADDR_TO_INTID(addr, bits)  (((addr) & VGIC_ADDR_IRQ_MASK(bits)) * \
> +#define __VGIC_ADDR_INTID(addr, bits)  (((addr) & VGIC_ADDR_IRQ_MASK(bits)) * \
>  					64 / (bits) / 8)
>  
>  /*
> + * Perform the same computation, but also handle non-constant number
> + * of bits. We only care about the few cases that are required by
> + * GICv2/v3.
> + */
> +#define VGIC_ADDR_TO_INTID(addr, bits)				\
> +	({							\
> +		u32 __v;					\
> +		switch((bits)) {				\
> +		case 1:						\
> +			__v = __VGIC_ADDR_INTID((addr), 1);	\
> +			break;					\
> +		case 2:						\
> +			__v = __VGIC_ADDR_INTID((addr), 2);	\
> +			break;					\
> +		case 8:						\
> +			__v = __VGIC_ADDR_INTID((addr), 8);	\
> +			break;					\
> +		case 64:					\
> +			__v = __VGIC_ADDR_INTID((addr), 64);	\
> +			break;					\
> +		default:					\
> +			if (__builtin_constant_p((bits)))	\
> +				BUILD_BUG();			\
> +			else					\
> +				BUG();				\
> +		}						\
> +								\
> +		__v;						\
> +	})
> +
> +/*
>   * Some VGIC registers store per-IRQ information, with a different number
>   * of bits per IRQ. For those registers this macro is used.
>   * The _WITH_LENGTH version instantiates registers with a fixed length
> -- 
> 2.9.3
> 

Looks functionally correct, just wondering if it's cleaner to turn the
whole thing into a static inline, or if it can be rewritten to use
shifts with any benefit.

In any case, if you like this version:

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andre Przywara Nov. 1, 2016, 4:50 p.m. UTC | #2
Hej,

On 01/11/16 15:28, Christoffer Dall wrote:
> On Sat, Oct 29, 2016 at 12:19:01PM +0100, Marc Zyngier wrote:
>> Using non-constant number of bits for VGIC_ADDR_TO_INTID() leads
>> to gcc 6.1 emiting calls to __aeabi_uldivmod, which the kernel
>> does not implement.
>>
>> As we really don't want to implement complex division in the kernel,
>> the only other option is to prove to the compiler that there is only
>> a few values that are possible for the number of bits per IRQ, and
>> that they are all power of 2.
>>
>> We turn the VGIC_ADDR_TO_INTID macro into a switch that looks for
>> the supported set of values (1, 2, 8, 64), and perform the computation
>> accordingly. When "bits" is a constant, the compiler optimizes
>> away the other cases. If not, we end-up with a small number of cases
>> that GCC optimises reasonably well. Out of range values are detected
>> both at build time (constants) and at run time (variables).
>>
>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
>> ---
>> This should be applied *before* Andre's patch fixing out of bound SPIs.
>>
>>  virt/kvm/arm/vgic/vgic-mmio.h | 33 ++++++++++++++++++++++++++++++++-
>>  1 file changed, 32 insertions(+), 1 deletion(-)
>>
>> diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
>> index 4c34d39..a457282 100644
>> --- a/virt/kvm/arm/vgic/vgic-mmio.h
>> +++ b/virt/kvm/arm/vgic/vgic-mmio.h
>> @@ -57,10 +57,41 @@ extern struct kvm_io_device_ops kvm_io_gic_ops;
>>   * multiplication with the inverted fraction, and scale up both the
>>   * numerator and denominator with 8 to support at most 64 bits per IRQ:
>>   */
>> -#define VGIC_ADDR_TO_INTID(addr, bits)  (((addr) & VGIC_ADDR_IRQ_MASK(bits)) * \
>> +#define __VGIC_ADDR_INTID(addr, bits)  (((addr) & VGIC_ADDR_IRQ_MASK(bits)) * \
>>  					64 / (bits) / 8)

I remember we discussed this in length some months ago, but I was
wondering if this isn't simply:
	((addr & mask) * 8) / bits
and thus can be written as:
	((addr & mask) * 8) >> ilog2(bits)
We require <bits> to be a power of two anyway for the MASK macro.

ilog2(constant) is nicely optimized at compile time, but even at runtime
on both ARM variants it boils down to "31 - clz(bits)", which are two or
three instructions AFAICS.

Does that make sense or am I missing something here?

I changed this in my patch and adjusted the comment, quick testing seems
to be fine on Midway and Juno.

Will send it out in a minute, if no-one objects.

Cheers,
Andre.

>>  
>>  /*
>> + * Perform the same computation, but also handle non-constant number
>> + * of bits. We only care about the few cases that are required by
>> + * GICv2/v3.
>> + */
>> +#define VGIC_ADDR_TO_INTID(addr, bits)				\
>> +	({							\
>> +		u32 __v;					\
>> +		switch((bits)) {				\
>> +		case 1:						\
>> +			__v = __VGIC_ADDR_INTID((addr), 1);	\
>> +			break;					\
>> +		case 2:						\
>> +			__v = __VGIC_ADDR_INTID((addr), 2);	\
>> +			break;					\
>> +		case 8:						\
>> +			__v = __VGIC_ADDR_INTID((addr), 8);	\
>> +			break;					\
>> +		case 64:					\
>> +			__v = __VGIC_ADDR_INTID((addr), 64);	\
>> +			break;					\
>> +		default:					\
>> +			if (__builtin_constant_p((bits)))	\
>> +				BUILD_BUG();			\
>> +			else					\
>> +				BUG();				\
>> +		}						\
>> +								\
>> +		__v;						\
>> +	})
>> +
>> +/*
>>   * Some VGIC registers store per-IRQ information, with a different number
>>   * of bits per IRQ. For those registers this macro is used.
>>   * The _WITH_LENGTH version instantiates registers with a fixed length
>> -- 
>> 2.9.3
>>
> 
> Looks functionally correct, just wondering if it's cleaner to turn the
> whole thing into a static inline, or if it can be rewritten to use
> shifts with any benefit.
> 
> In any case, if you like this version:
> 
> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoffer Dall Nov. 1, 2016, 7:31 p.m. UTC | #3
On Tue, Nov 01, 2016 at 04:50:26PM +0000, Andre Przywara wrote:
> Hej,
> 
> On 01/11/16 15:28, Christoffer Dall wrote:
> > On Sat, Oct 29, 2016 at 12:19:01PM +0100, Marc Zyngier wrote:
> >> Using non-constant number of bits for VGIC_ADDR_TO_INTID() leads
> >> to gcc 6.1 emiting calls to __aeabi_uldivmod, which the kernel
> >> does not implement.
> >>
> >> As we really don't want to implement complex division in the kernel,
> >> the only other option is to prove to the compiler that there is only
> >> a few values that are possible for the number of bits per IRQ, and
> >> that they are all power of 2.
> >>
> >> We turn the VGIC_ADDR_TO_INTID macro into a switch that looks for
> >> the supported set of values (1, 2, 8, 64), and perform the computation
> >> accordingly. When "bits" is a constant, the compiler optimizes
> >> away the other cases. If not, we end-up with a small number of cases
> >> that GCC optimises reasonably well. Out of range values are detected
> >> both at build time (constants) and at run time (variables).
> >>
> >> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> >> ---
> >> This should be applied *before* Andre's patch fixing out of bound SPIs.
> >>
> >>  virt/kvm/arm/vgic/vgic-mmio.h | 33 ++++++++++++++++++++++++++++++++-
> >>  1 file changed, 32 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
> >> index 4c34d39..a457282 100644
> >> --- a/virt/kvm/arm/vgic/vgic-mmio.h
> >> +++ b/virt/kvm/arm/vgic/vgic-mmio.h
> >> @@ -57,10 +57,41 @@ extern struct kvm_io_device_ops kvm_io_gic_ops;
> >>   * multiplication with the inverted fraction, and scale up both the
> >>   * numerator and denominator with 8 to support at most 64 bits per IRQ:
> >>   */
> >> -#define VGIC_ADDR_TO_INTID(addr, bits)  (((addr) & VGIC_ADDR_IRQ_MASK(bits)) * \
> >> +#define __VGIC_ADDR_INTID(addr, bits)  (((addr) & VGIC_ADDR_IRQ_MASK(bits)) * \
> >>  					64 / (bits) / 8)
> 
> I remember we discussed this in length some months ago, but I was
> wondering if this isn't simply:
> 	((addr & mask) * 8) / bits

that's just dividing 8 into the 64, so that should be fine, yes.

> and thus can be written as:
> 	((addr & mask) * 8) >> ilog2(bits)

right, I follow that.

> We require <bits> to be a power of two anyway for the MASK macro.
> 
> ilog2(constant) is nicely optimized at compile time, but even at runtime
> on both ARM variants it boils down to "31 - clz(bits)", which are two or
> three instructions AFAICS.

cool with the ilog2 macro.

> 
> Does that make sense or am I missing something here?

makes sense I think.  Good luck writing a comment so that I can
understand this calculation later ;)

> 
> I changed this in my patch and adjusted the comment, quick testing seems
> to be fine on Midway and Juno.
> 
> Will send it out in a minute, if no-one objects.
> 
I don't object.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/virt/kvm/arm/vgic/vgic-mmio.h b/virt/kvm/arm/vgic/vgic-mmio.h
index 4c34d39..a457282 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.h
+++ b/virt/kvm/arm/vgic/vgic-mmio.h
@@ -57,10 +57,41 @@  extern struct kvm_io_device_ops kvm_io_gic_ops;
  * multiplication with the inverted fraction, and scale up both the
  * numerator and denominator with 8 to support at most 64 bits per IRQ:
  */
-#define VGIC_ADDR_TO_INTID(addr, bits)  (((addr) & VGIC_ADDR_IRQ_MASK(bits)) * \
+#define __VGIC_ADDR_INTID(addr, bits)  (((addr) & VGIC_ADDR_IRQ_MASK(bits)) * \
 					64 / (bits) / 8)
 
 /*
+ * Perform the same computation, but also handle non-constant number
+ * of bits. We only care about the few cases that are required by
+ * GICv2/v3.
+ */
+#define VGIC_ADDR_TO_INTID(addr, bits)				\
+	({							\
+		u32 __v;					\
+		switch((bits)) {				\
+		case 1:						\
+			__v = __VGIC_ADDR_INTID((addr), 1);	\
+			break;					\
+		case 2:						\
+			__v = __VGIC_ADDR_INTID((addr), 2);	\
+			break;					\
+		case 8:						\
+			__v = __VGIC_ADDR_INTID((addr), 8);	\
+			break;					\
+		case 64:					\
+			__v = __VGIC_ADDR_INTID((addr), 64);	\
+			break;					\
+		default:					\
+			if (__builtin_constant_p((bits)))	\
+				BUILD_BUG();			\
+			else					\
+				BUG();				\
+		}						\
+								\
+		__v;						\
+	})
+
+/*
  * Some VGIC registers store per-IRQ information, with a different number
  * of bits per IRQ. For those registers this macro is used.
  * The _WITH_LENGTH version instantiates registers with a fixed length