diff mbox series

[1/2] xen: make include/xen/unaligned.h usable on all architectures

Message ID 20231205100756.18920-2-jgross@suse.com (mailing list archive)
State Superseded
Headers show
Series xen: have a more generic unaligned.h header | expand

Commit Message

Jürgen Groß Dec. 5, 2023, 10:07 a.m. UTC
Instead of defining get_unaligned() and put_unaligned() in a way that
is only supporting architectures allowing unaligned accesses, use the
same approach as the Linux kernel and let the compiler do the
decision how to generate the code for probably unaligned data accesses.

Update include/xen/unaligned.h from include/asm-generic/unaligned.h of
the Linux kernel.

The generated code has been checked to be the same on x86.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 803f4e1eab7a
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/include/xen/unaligned.h | 121 +++++++++++++++++++++++++-----------
 1 file changed, 86 insertions(+), 35 deletions(-)

Comments

Julien Grall Dec. 5, 2023, 11:53 a.m. UTC | #1
Hi Juergen,

On 05/12/2023 10:07, Juergen Gross wrote:
> Instead of defining get_unaligned() and put_unaligned() in a way that
> is only supporting architectures allowing unaligned accesses, use the
> same approach as the Linux kernel and let the compiler do the
> decision how to generate the code for probably unaligned data accesses.
> 
> Update include/xen/unaligned.h from include/asm-generic/unaligned.h of
> the Linux kernel.
> 
> The generated code has been checked to be the same on x86.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 803f4e1eab7a
> Signed-off-by: Juergen Gross <jgross@suse.com>

Can you outline your end goal? At least on arm32, I believe this will 
result to abort because event if the architecture support unaligned 
access, we are preventing them on Arm32.

Cheers,
Jürgen Groß Dec. 5, 2023, 12:39 p.m. UTC | #2
On 05.12.23 12:53, Julien Grall wrote:
> Hi Juergen,
> 
> On 05/12/2023 10:07, Juergen Gross wrote:
>> Instead of defining get_unaligned() and put_unaligned() in a way that
>> is only supporting architectures allowing unaligned accesses, use the
>> same approach as the Linux kernel and let the compiler do the
>> decision how to generate the code for probably unaligned data accesses.
>>
>> Update include/xen/unaligned.h from include/asm-generic/unaligned.h of
>> the Linux kernel.
>>
>> The generated code has been checked to be the same on x86.
>>
>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
>> 803f4e1eab7a
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> Can you outline your end goal? At least on arm32, I believe this will result to 
> abort because event if the architecture support unaligned access, we are 
> preventing them on Arm32.

I need something like that in Xen tools for supporting packed data accesses
on the 9pfs ring page, so I looked into the hypervisor for related support.

I guess for arm32 using -mno-unaligned-access when building should avoid any
unaligned accesses?


Juergen
Julien Grall Dec. 5, 2023, 1:31 p.m. UTC | #3
Hi Juergen,

On 05/12/2023 12:39, Juergen Gross wrote:
> On 05.12.23 12:53, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 05/12/2023 10:07, Juergen Gross wrote:
>>> Instead of defining get_unaligned() and put_unaligned() in a way that
>>> is only supporting architectures allowing unaligned accesses, use the
>>> same approach as the Linux kernel and let the compiler do the
>>> decision how to generate the code for probably unaligned data accesses.
>>>
>>> Update include/xen/unaligned.h from include/asm-generic/unaligned.h of
>>> the Linux kernel.
>>>
>>> The generated code has been checked to be the same on x86.
>>>
>>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>>> Origin: 
>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
>>> 803f4e1eab7a
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
>> Can you outline your end goal? At least on arm32, I believe this will 
>> result to abort because event if the architecture support unaligned 
>> access, we are preventing them on Arm32.
> 
> I need something like that in Xen tools for supporting packed data accesses
> on the 9pfs ring page, so I looked into the hypervisor for related support.

Did we really introduce an ABI requiring unaligned access??? Or is this 
something you are coming up with?

Anyway, IIRC Linux allows unaligned access. So the problem I am 
describing is only for the hypervisor. Although, I would like to point 
out that unaligned access has no atomicity guarantee. I assume this is 
not going to be a concern for you?

> I guess for arm32 using -mno-unaligned-access when building should avoid 
> any
> unaligned accesses?

I am not sure. This is implies the compiler will be able to infer that 
the access will be unaligned. Is this always the case?

Anyway, given you don't seem to have a use-case yet, I would simply to 
consider to surround the declaration with an a config which can be 
selected if unaligned access is supported.

Cheers,
Jürgen Groß Dec. 5, 2023, 1:41 p.m. UTC | #4
On 05.12.23 14:31, Julien Grall wrote:
> Hi Juergen,
> 
> On 05/12/2023 12:39, Juergen Gross wrote:
>> On 05.12.23 12:53, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 05/12/2023 10:07, Juergen Gross wrote:
>>>> Instead of defining get_unaligned() and put_unaligned() in a way that
>>>> is only supporting architectures allowing unaligned accesses, use the
>>>> same approach as the Linux kernel and let the compiler do the
>>>> decision how to generate the code for probably unaligned data accesses.
>>>>
>>>> Update include/xen/unaligned.h from include/asm-generic/unaligned.h of
>>>> the Linux kernel.
>>>>
>>>> The generated code has been checked to be the same on x86.
>>>>
>>>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>>>> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
>>>> 803f4e1eab7a
>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>
>>> Can you outline your end goal? At least on arm32, I believe this will result 
>>> to abort because event if the architecture support unaligned access, we are 
>>> preventing them on Arm32.
>>
>> I need something like that in Xen tools for supporting packed data accesses
>> on the 9pfs ring page, so I looked into the hypervisor for related support.
> 
> Did we really introduce an ABI requiring unaligned access??? Or is this 
> something you are coming up with?

This is the 9pfs protocol (see [1]).

> Anyway, IIRC Linux allows unaligned access. So the problem I am describing is 
> only for the hypervisor. Although, I would like to point out that unaligned 
> access has no atomicity guarantee. I assume this is not going to be a concern 
> for you?

Correct.

> 
>> I guess for arm32 using -mno-unaligned-access when building should avoid any
>> unaligned accesses?
> 
> I am not sure. This is implies the compiler will be able to infer that the 
> access will be unaligned. Is this always the case?

This should happen through the "__packed" attribute on the access macros. As
e.g. MIPS doesn't support unaligned accesses, but is working with those access
macros in the Linux kernel, I suspect the attribute is doing its job.

> Anyway, given you don't seem to have a use-case yet, I would simply to consider 
> to surround the declaration with an a config which can be selected if unaligned 
> access is supported.

Like in xen/common/lzo.c et al? Those are compiled with CONFIG_X86 only today,
but I guess other archs might need the decompressors in future, too.


Juergen

[1]: http://ericvh.github.io/9p-rfc/rfc9p2000.html
Julien Grall Dec. 5, 2023, 1:46 p.m. UTC | #5
On 05/12/2023 13:41, Juergen Gross wrote:
> On 05.12.23 14:31, Julien Grall wrote:
>> Hi Juergen,
>>
>> On 05/12/2023 12:39, Juergen Gross wrote:
>>> On 05.12.23 12:53, Julien Grall wrote:
>>>> Hi Juergen,
>>>>
>>>> On 05/12/2023 10:07, Juergen Gross wrote:
>>>>> Instead of defining get_unaligned() and put_unaligned() in a way that
>>>>> is only supporting architectures allowing unaligned accesses, use the
>>>>> same approach as the Linux kernel and let the compiler do the
>>>>> decision how to generate the code for probably unaligned data 
>>>>> accesses.
>>>>>
>>>>> Update include/xen/unaligned.h from include/asm-generic/unaligned.h of
>>>>> the Linux kernel.
>>>>>
>>>>> The generated code has been checked to be the same on x86.
>>>>>
>>>>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>>>>> Origin: 
>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
>>>>> 803f4e1eab7a
>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>
>>>> Can you outline your end goal? At least on arm32, I believe this 
>>>> will result to abort because event if the architecture support 
>>>> unaligned access, we are preventing them on Arm32.
>>>
>>> I need something like that in Xen tools for supporting packed data 
>>> accesses
>>> on the 9pfs ring page, so I looked into the hypervisor for related 
>>> support.
>>
>> Did we really introduce an ABI requiring unaligned access??? Or is 
>> this something you are coming up with?
> 
> This is the 9pfs protocol (see [1]).

Urgh :(.

> 
>> Anyway, IIRC Linux allows unaligned access. So the problem I am 
>> describing is only for the hypervisor. Although, I would like to point 
>> out that unaligned access has no atomicity guarantee. I assume this is 
>> not going to be a concern for you?
> 
> Correct.
> 
>>
>>> I guess for arm32 using -mno-unaligned-access when building should 
>>> avoid any
>>> unaligned accesses?
>>
>> I am not sure. This is implies the compiler will be able to infer that 
>> the access will be unaligned. Is this always the case?
> 
> This should happen through the "__packed" attribute on the access 
> macros. As
> e.g. MIPS doesn't support unaligned accesses, but is working with those 
> access
> macros in the Linux kernel, I suspect the attribute is doing its job.

Someone will need to dig a bit deeper to confirm and also the impact on 
the rest of the hypervisor.

> 
>> Anyway, given you don't seem to have a use-case yet, I would simply to 
>> consider to surround the declaration with an a config which can be 
>> selected if unaligned access is supported.
> 
> Like in xen/common/lzo.c et al?

Just to clarify, I am suggesting to add in unaligned.h:

#ifdef CONFIG_HAS_UNALIGNED_ACCESS

your definitions

#endif

And then for X86, select CONFIG_HAS_UNALIGNED_ACCESS.

  Those are compiled with CONFIG_X86 only
> today,
> but I guess other archs might need the decompressors in future, too.
Possibly yes. But my point is that you don't have to solve the problem 
today. Yet I don't think it is wise to allow the header to be used on 
arm32 until we have done some investigation.

And to clarify, I am not asking you to do the investigation.

Cheers,
Jan Beulich Dec. 5, 2023, 1:55 p.m. UTC | #6
On 05.12.2023 11:07, Juergen Gross wrote:
> @@ -15,67 +7,126 @@
>  #include <asm/byteorder.h>
>  #endif
>  
> -#define get_unaligned(p) (*(p))
> -#define put_unaligned(val, p) (*(p) = (val))
> +/*
> + * This is the most generic implementation of unaligned accesses
> + * and should work almost anywhere.
> + */
> +
> +#define __get_unaligned_t(type, ptr) ({						\
> +	const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);	\
> +	__pptr->x;								\
> +})
> +
> +#define __put_unaligned_t(type, val, ptr) do {					\
> +	struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);		\
> +	__pptr->x = (val);							\
> +} while (0)

Please can we avoid gaining new (even double) leading underscore symbols,
especially when they're in helper (i.e. not intended to be used directly)
constructs? No reason to introduce further issues wrt Misra adoption.

> +#define get_unaligned(ptr)	__get_unaligned_t(typeof(*(ptr)), (ptr))
> +#define put_unaligned(val, ptr) __put_unaligned_t(typeof(*(ptr)), (val), (ptr))

May I ask to omit excess parentheses where possible?

> +static inline u16 get_unaligned_le16(const void *p)
> +{
> +	return le16_to_cpu(__get_unaligned_t(__le16, p));
> +}
> +
> +static inline u32 get_unaligned_le32(const void *p)
> +{
> +	return le32_to_cpu(__get_unaligned_t(__le32, p));
> +}
> +
> +static inline u64 get_unaligned_le64(const void *p)
> +{
> +	return le64_to_cpu(__get_unaligned_t(__le64, p));
> +}
> +
> +static inline void put_unaligned_le16(u16 val, void *p)
> +{
> +	__put_unaligned_t(__le16, cpu_to_le16(val), p);
> +}
> +
> +static inline void put_unaligned_le32(u32 val, void *p)
> +{
> +	__put_unaligned_t(__le32, cpu_to_le32(val), p);
> +}
> +
> +static inline void put_unaligned_le64(u64 val, void *p)
> +{
> +	__put_unaligned_t(__le64, cpu_to_le64(val), p);
> +}
> +
> +static inline u16 get_unaligned_be16(const void *p)
> +{
> +	return be16_to_cpu(__get_unaligned_t(__be16, p));
> +}
> +
> +static inline u32 get_unaligned_be32(const void *p)
> +{
> +	return be32_to_cpu(__get_unaligned_t(__be32, p));
> +}
>  
> -static inline uint16_t get_unaligned_be16(const void *p)
> +static inline u64 get_unaligned_be64(const void *p)
>  {
> -	return be16_to_cpup(p);
> +	return be64_to_cpu(__get_unaligned_t(__be64, p));
>  }
>  
> -static inline void put_unaligned_be16(uint16_t val, void *p)
> +static inline void put_unaligned_be16(u16 val, void *p)
>  {
> -	*(__force __be16*)p = cpu_to_be16(val);
> +	__put_unaligned_t(__be16, cpu_to_be16(val), p);
>  }
>  
> -static inline uint32_t get_unaligned_be32(const void *p)
> +static inline void put_unaligned_be32(u32 val, void *p)
>  {
> -	return be32_to_cpup(p);
> +	__put_unaligned_t(__be32, cpu_to_be32(val), p);
>  }
>  
> -static inline void put_unaligned_be32(uint32_t val, void *p)
> +static inline void put_unaligned_be64(u64 val, void *p)
>  {
> -	*(__force __be32*)p = cpu_to_be32(val);
> +	__put_unaligned_t(__be64, cpu_to_be64(val), p);
>  }
>  
> -static inline uint64_t get_unaligned_be64(const void *p)
> +static inline u32 __get_unaligned_be24(const u8 *p)

Here and below - do you foresee use of these 24-bit helpers? They weren't
there before, and the description also doesn't justify their introduction.

Jan
Jan Beulich Dec. 5, 2023, 1:59 p.m. UTC | #7
On 05.12.2023 14:46, Julien Grall wrote:
> On 05/12/2023 13:41, Juergen Gross wrote:
>> On 05.12.23 14:31, Julien Grall wrote:
>>> Anyway, given you don't seem to have a use-case yet, I would simply to 
>>> consider to surround the declaration with an a config which can be 
>>> selected if unaligned access is supported.
>>
>> Like in xen/common/lzo.c et al?
> 
> Just to clarify, I am suggesting to add in unaligned.h:
> 
> #ifdef CONFIG_HAS_UNALIGNED_ACCESS
> 
> your definitions
> 
> #endif

But that would be wrong: HAS_UNALIGNED_ACCESS would be there to indicate
one does _not_ need any special accessors.

Jan
Julien Grall Dec. 5, 2023, 2:01 p.m. UTC | #8
Hi,

On 05/12/2023 13:59, Jan Beulich wrote:
> On 05.12.2023 14:46, Julien Grall wrote:
>> On 05/12/2023 13:41, Juergen Gross wrote:
>>> On 05.12.23 14:31, Julien Grall wrote:
>>>> Anyway, given you don't seem to have a use-case yet, I would simply to
>>>> consider to surround the declaration with an a config which can be
>>>> selected if unaligned access is supported.
>>>
>>> Like in xen/common/lzo.c et al?
>>
>> Just to clarify, I am suggesting to add in unaligned.h:
>>
>> #ifdef CONFIG_HAS_UNALIGNED_ACCESS
>>
>> your definitions
>>
>> #endif
> 
> But that would be wrong: HAS_UNALIGNED_ACCESS would be there to indicate
> one does _not_ need any special accessors.

I am guessing you are disagreeing on the name rather than the concept? 
If so, what about CONFIG_UNALIGNED_ACCESS_ALLOWED?

Cheers,
Arnd Bergmann Dec. 5, 2023, 2:10 p.m. UTC | #9
On Tue, Dec 5, 2023, at 15:01, Julien Grall wrote:
> On 05/12/2023 13:59, Jan Beulich wrote:
>> On 05.12.2023 14:46, Julien Grall wrote:
>>> On 05/12/2023 13:41, Juergen Gross wrote:
>>>> On 05.12.23 14:31, Julien Grall wrote:
>>>>> Anyway, given you don't seem to have a use-case yet, I would simply to
>>>>> consider to surround the declaration with an a config which can be
>>>>> selected if unaligned access is supported.
>>>>
>>>> Like in xen/common/lzo.c et al?
>>>
>>> Just to clarify, I am suggesting to add in unaligned.h:
>>>
>>> #ifdef CONFIG_HAS_UNALIGNED_ACCESS
>>>
>>> your definitions
>>>
>>> #endif
>> 
>> But that would be wrong: HAS_UNALIGNED_ACCESS would be there to indicate
>> one does _not_ need any special accessors.
>
> I am guessing you are disagreeing on the name rather than the concept? 
> If so, what about CONFIG_UNALIGNED_ACCESS_ALLOWED?

This would repeat the mistake that we had in Linux in the
past (and still do in some drivers): Simply dereferencing
a misaligned pointer is always a bug, even on architectures
that have efficient unaligned load/store instructions,
because C makes this undefined behavior and gcc has
optimizations that assume e.g. 'int *' to never have
the lower two bits set [1].

The helpers that Jürgen copied from Linux is what we
use to abstract accesses to objects that we know may
be misaligned. If the underlying ISA allows a direct
access (e.g. on arm64 and x86), this is as efficient
as a normal pointer access but prevents the dangerous
optimizations in gcc. On architectures without these
instructions, the access will be turned into safe
bytewise access.

This is similar to a __packed annotation on a
data structure, but also works in cases where such
an annotation wouldn't work for other reasons.

     Arnd

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363
Jürgen Groß Dec. 5, 2023, 2:11 p.m. UTC | #10
On 05.12.23 14:55, Jan Beulich wrote:
> On 05.12.2023 11:07, Juergen Gross wrote:
>> @@ -15,67 +7,126 @@
>>   #include <asm/byteorder.h>
>>   #endif
>>   
>> -#define get_unaligned(p) (*(p))
>> -#define put_unaligned(val, p) (*(p) = (val))
>> +/*
>> + * This is the most generic implementation of unaligned accesses
>> + * and should work almost anywhere.
>> + */
>> +
>> +#define __get_unaligned_t(type, ptr) ({						\
>> +	const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);	\
>> +	__pptr->x;								\
>> +})
>> +
>> +#define __put_unaligned_t(type, val, ptr) do {					\
>> +	struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);		\
>> +	__pptr->x = (val);							\
>> +} while (0)
> 
> Please can we avoid gaining new (even double) leading underscore symbols,
> especially when they're in helper (i.e. not intended to be used directly)
> constructs? No reason to introduce further issues wrt Misra adoption.
> 
>> +#define get_unaligned(ptr)	__get_unaligned_t(typeof(*(ptr)), (ptr))
>> +#define put_unaligned(val, ptr) __put_unaligned_t(typeof(*(ptr)), (val), (ptr))
> 
> May I ask to omit excess parentheses where possible?
> 
>> +static inline u16 get_unaligned_le16(const void *p)
>> +{
>> +	return le16_to_cpu(__get_unaligned_t(__le16, p));
>> +}
>> +
>> +static inline u32 get_unaligned_le32(const void *p)
>> +{
>> +	return le32_to_cpu(__get_unaligned_t(__le32, p));
>> +}
>> +
>> +static inline u64 get_unaligned_le64(const void *p)
>> +{
>> +	return le64_to_cpu(__get_unaligned_t(__le64, p));
>> +}
>> +
>> +static inline void put_unaligned_le16(u16 val, void *p)
>> +{
>> +	__put_unaligned_t(__le16, cpu_to_le16(val), p);
>> +}
>> +
>> +static inline void put_unaligned_le32(u32 val, void *p)
>> +{
>> +	__put_unaligned_t(__le32, cpu_to_le32(val), p);
>> +}
>> +
>> +static inline void put_unaligned_le64(u64 val, void *p)
>> +{
>> +	__put_unaligned_t(__le64, cpu_to_le64(val), p);
>> +}
>> +
>> +static inline u16 get_unaligned_be16(const void *p)
>> +{
>> +	return be16_to_cpu(__get_unaligned_t(__be16, p));
>> +}
>> +
>> +static inline u32 get_unaligned_be32(const void *p)
>> +{
>> +	return be32_to_cpu(__get_unaligned_t(__be32, p));
>> +}
>>   
>> -static inline uint16_t get_unaligned_be16(const void *p)
>> +static inline u64 get_unaligned_be64(const void *p)
>>   {
>> -	return be16_to_cpup(p);
>> +	return be64_to_cpu(__get_unaligned_t(__be64, p));
>>   }
>>   
>> -static inline void put_unaligned_be16(uint16_t val, void *p)
>> +static inline void put_unaligned_be16(u16 val, void *p)
>>   {
>> -	*(__force __be16*)p = cpu_to_be16(val);
>> +	__put_unaligned_t(__be16, cpu_to_be16(val), p);
>>   }
>>   
>> -static inline uint32_t get_unaligned_be32(const void *p)
>> +static inline void put_unaligned_be32(u32 val, void *p)
>>   {
>> -	return be32_to_cpup(p);
>> +	__put_unaligned_t(__be32, cpu_to_be32(val), p);
>>   }
>>   
>> -static inline void put_unaligned_be32(uint32_t val, void *p)
>> +static inline void put_unaligned_be64(u64 val, void *p)
>>   {
>> -	*(__force __be32*)p = cpu_to_be32(val);
>> +	__put_unaligned_t(__be64, cpu_to_be64(val), p);
>>   }
>>   
>> -static inline uint64_t get_unaligned_be64(const void *p)
>> +static inline u32 __get_unaligned_be24(const u8 *p)
> 
> Here and below - do you foresee use of these 24-bit helpers? They weren't
> there before, and the description also doesn't justify their introduction.

I have just applied the patch from the kernel (see Origin: tag).

I can change the patch according to your comments, of course.


Juergen
Jan Beulich Dec. 5, 2023, 2:11 p.m. UTC | #11
On 05.12.2023 15:01, Julien Grall wrote:
> On 05/12/2023 13:59, Jan Beulich wrote:
>> On 05.12.2023 14:46, Julien Grall wrote:
>>> On 05/12/2023 13:41, Juergen Gross wrote:
>>>> On 05.12.23 14:31, Julien Grall wrote:
>>>>> Anyway, given you don't seem to have a use-case yet, I would simply to
>>>>> consider to surround the declaration with an a config which can be
>>>>> selected if unaligned access is supported.
>>>>
>>>> Like in xen/common/lzo.c et al?
>>>
>>> Just to clarify, I am suggesting to add in unaligned.h:
>>>
>>> #ifdef CONFIG_HAS_UNALIGNED_ACCESS
>>>
>>> your definitions
>>>
>>> #endif
>>
>> But that would be wrong: HAS_UNALIGNED_ACCESS would be there to indicate
>> one does _not_ need any special accessors.
> 
> I am guessing you are disagreeing on the name rather than the concept? 
> If so, what about CONFIG_UNALIGNED_ACCESS_ALLOWED?

Not really, no. Of course the name needs to properly express the purpose.
But I don't see why a Kconfig control would be appropriate here. You simply
want to provide accessors to unaligned data. Nobody needs to use them, but
when you have to, they ought to be there. A Kconfig control (of the name
you suggested first) would be helpful to not penalize architectures which
can do unaligned accesses without any helpers (in case the code generated
through the helpers turned out sub-optimal).

Jan
Jürgen Groß Dec. 5, 2023, 2:16 p.m. UTC | #12
On 05.12.23 14:46, Julien Grall wrote:
> 
> 
> On 05/12/2023 13:41, Juergen Gross wrote:
>> On 05.12.23 14:31, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 05/12/2023 12:39, Juergen Gross wrote:
>>>> On 05.12.23 12:53, Julien Grall wrote:
>>>>> Hi Juergen,
>>>>>
>>>>> On 05/12/2023 10:07, Juergen Gross wrote:
>>>>>> Instead of defining get_unaligned() and put_unaligned() in a way that
>>>>>> is only supporting architectures allowing unaligned accesses, use the
>>>>>> same approach as the Linux kernel and let the compiler do the
>>>>>> decision how to generate the code for probably unaligned data accesses.
>>>>>>
>>>>>> Update include/xen/unaligned.h from include/asm-generic/unaligned.h of
>>>>>> the Linux kernel.
>>>>>>
>>>>>> The generated code has been checked to be the same on x86.
>>>>>>
>>>>>> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>>>>>> Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
>>>>>> 803f4e1eab7a
>>>>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>>>
>>>>> Can you outline your end goal? At least on arm32, I believe this will 
>>>>> result to abort because event if the architecture support unaligned access, 
>>>>> we are preventing them on Arm32.
>>>>
>>>> I need something like that in Xen tools for supporting packed data accesses
>>>> on the 9pfs ring page, so I looked into the hypervisor for related support.
>>>
>>> Did we really introduce an ABI requiring unaligned access??? Or is this 
>>> something you are coming up with?
>>
>> This is the 9pfs protocol (see [1]).
> 
> Urgh :(.
> 
>>
>>> Anyway, IIRC Linux allows unaligned access. So the problem I am describing is 
>>> only for the hypervisor. Although, I would like to point out that unaligned 
>>> access has no atomicity guarantee. I assume this is not going to be a concern 
>>> for you?
>>
>> Correct.
>>
>>>
>>>> I guess for arm32 using -mno-unaligned-access when building should avoid any
>>>> unaligned accesses?
>>>
>>> I am not sure. This is implies the compiler will be able to infer that the 
>>> access will be unaligned. Is this always the case?
>>
>> This should happen through the "__packed" attribute on the access macros. As
>> e.g. MIPS doesn't support unaligned accesses, but is working with those access
>> macros in the Linux kernel, I suspect the attribute is doing its job.
> 
> Someone will need to dig a bit deeper to confirm and also the impact on the rest 
> of the hypervisor.
> 
>>
>>> Anyway, given you don't seem to have a use-case yet, I would simply to 
>>> consider to surround the declaration with an a config which can be selected 
>>> if unaligned access is supported.
>>
>> Like in xen/common/lzo.c et al?
> 
> Just to clarify, I am suggesting to add in unaligned.h:
> 
> #ifdef CONFIG_HAS_UNALIGNED_ACCESS
> 
> your definitions
> 
> #endif
> 
> And then for X86, select CONFIG_HAS_UNALIGNED_ACCESS.
> 
>   Those are compiled with CONFIG_X86 only
>> today,
>> but I guess other archs might need the decompressors in future, too.
> Possibly yes. But my point is that you don't have to solve the problem today. 
> Yet I don't think it is wise to allow the header to be used on arm32 until we 
> have done some investigation.
> 
> And to clarify, I am not asking you to do the investigation.

I've done a quick verification using gcc 7.5.

Using -mno-unaligned-access for 32-bit Arm seems to do the job:

#include <xen/unaligned.h>
int tst(const unsigned short *in)
{
     return get_unaligned(in);
}

results in:

00000000 <tst>:
    0:   e52db004        push    {fp}            @ (str fp, [sp, #-4]!)
    4:   e28db000        add     fp, sp, #0
    8:   e5d03000        ldrb    r3, [r0]
    c:   e5d00001        ldrb    r0, [r0, #1]
   10:   e1830400        orr     r0, r3, r0, lsl #8
   14:   e28bd000        add     sp, fp, #0
   18:   e49db004        pop     {fp}            @ (ldr fp, [sp], #4)
   1c:   e12fff1e        bx      lr

Without the -mno-unaligned-access I'm getting:

00000000 <tst>:
    0:   e52db004        push    {fp}            @ (str fp, [sp, #-4]!)
    4:   e28db000        add     fp, sp, #0
    8:   e1d000b0        ldrh    r0, [r0]
    c:   e28bd000        add     sp, fp, #0
   10:   e49db004        pop     {fp}            @ (ldr fp, [sp], #4)
   14:   e12fff1e        bx      lr


Juergen
Julien Grall Dec. 5, 2023, 2:19 p.m. UTC | #13
Hi Arnd,

Thanks for the answer.

On 05/12/2023 14:10, Arnd Bergmann wrote:
> On Tue, Dec 5, 2023, at 15:01, Julien Grall wrote:
>> On 05/12/2023 13:59, Jan Beulich wrote:
>>> On 05.12.2023 14:46, Julien Grall wrote:
>>>> On 05/12/2023 13:41, Juergen Gross wrote:
>>>>> On 05.12.23 14:31, Julien Grall wrote:
>>>>>> Anyway, given you don't seem to have a use-case yet, I would simply to
>>>>>> consider to surround the declaration with an a config which can be
>>>>>> selected if unaligned access is supported.
>>>>>
>>>>> Like in xen/common/lzo.c et al?
>>>>
>>>> Just to clarify, I am suggesting to add in unaligned.h:
>>>>
>>>> #ifdef CONFIG_HAS_UNALIGNED_ACCESS
>>>>
>>>> your definitions
>>>>
>>>> #endif
>>>
>>> But that would be wrong: HAS_UNALIGNED_ACCESS would be there to indicate
>>> one does _not_ need any special accessors.
>>
>> I am guessing you are disagreeing on the name rather than the concept?
>> If so, what about CONFIG_UNALIGNED_ACCESS_ALLOWED?
> 
> This would repeat the mistake that we had in Linux in the
> past (and still do in some drivers): Simply dereferencing
> a misaligned pointer is always a bug, even on architectures
> that have efficient unaligned load/store instructions,
> because C makes this undefined behavior and gcc has
> optimizations that assume e.g. 'int *' to never have
> the lower two bits set [1].

Just to clarify, I haven't suggested to use 'int *'. My point was more 
that I don't think that the helpers would work as-is on arm32 because
even if the ISA allows a direct access, we are setting the bit in SCTLR 
to disable unaligned access.

As Juergen is proposing a common header, then I could ask him to do the 
work to confirm that the helpers properly work on arm32. But I think 
this is unfair.

I also don't have the time to chase to look at it and this is not yet 
used in code reached by Arm. Hence my suggestion for now to protect the 
code so if someone tries to use them, then they know that it doesn't 
(yet) work on Arm.

Cheers,
Arnd Bergmann Dec. 5, 2023, 2:37 p.m. UTC | #14
On Tue, Dec 5, 2023, at 15:19, Julien Grall wrote:
> On 05/12/2023 14:10, Arnd Bergmann wrote:
>> On Tue, Dec 5, 2023, at 15:01, Julien Grall wrote:
>>> On 05/12/2023 13:59, Jan Beulich wrote:
>>>> On 05.12.2023 14:46, Julien Grall wrote:
>> This would repeat the mistake that we had in Linux in the
>> past (and still do in some drivers): Simply dereferencing
>> a misaligned pointer is always a bug, even on architectures
>> that have efficient unaligned load/store instructions,
>> because C makes this undefined behavior and gcc has
>> optimizations that assume e.g. 'int *' to never have
>> the lower two bits set [1].
>
> Just to clarify, I haven't suggested to use 'int *'. My point was more 
> that I don't think that the helpers would work as-is on arm32 because
> even if the ISA allows a direct access, we are setting the bit in SCTLR 
> to disable unaligned access.
>
> As Juergen is proposing a common header, then I could ask him to do the 
> work to confirm that the helpers properly work on arm32. But I think 
> this is unfair.

When I introduced the helpers in Linux, I showed that these
produce the best output on all modern compilers (at least gcc-5,
probably earlier) for both architectures that allow unaligned
access and for those that don't. We used to have architecture
specific helpers depending on what each architecture could
do, but all the other variants we had were either wrong or
less efficient.

If for some reason an Arm system is configured to trap
all unaligned access, then you must already pass
-mno-unaligned-access to the compiler to prevent certain
optimizations, and then the helpers will still behave
correctly (the same way they do on armv5, which never has
unaligned access). On armv7 with -munaligned-access, the
same functions only prevent the use of stm/ldm and strd/ldrd
but still use ldr/str.

     Arnd
Julien Grall Dec. 5, 2023, 4:29 p.m. UTC | #15
Hi Arnd,

On 05/12/2023 14:37, Arnd Bergmann wrote:
> On Tue, Dec 5, 2023, at 15:19, Julien Grall wrote:
>> On 05/12/2023 14:10, Arnd Bergmann wrote:
>>> On Tue, Dec 5, 2023, at 15:01, Julien Grall wrote:
>>>> On 05/12/2023 13:59, Jan Beulich wrote:
>>>>> On 05.12.2023 14:46, Julien Grall wrote:
>>> This would repeat the mistake that we had in Linux in the
>>> past (and still do in some drivers): Simply dereferencing
>>> a misaligned pointer is always a bug, even on architectures
>>> that have efficient unaligned load/store instructions,
>>> because C makes this undefined behavior and gcc has
>>> optimizations that assume e.g. 'int *' to never have
>>> the lower two bits set [1].
>>
>> Just to clarify, I haven't suggested to use 'int *'. My point was more
>> that I don't think that the helpers would work as-is on arm32 because
>> even if the ISA allows a direct access, we are setting the bit in SCTLR
>> to disable unaligned access.
>>
>> As Juergen is proposing a common header, then I could ask him to do the
>> work to confirm that the helpers properly work on arm32. But I think
>> this is unfair.
> 
> When I introduced the helpers in Linux, I showed that these
> produce the best output on all modern compilers (at least gcc-5,
> probably earlier) for both architectures that allow unaligned
> access and for those that don't. We used to have architecture
> specific helpers depending on what each architecture could
> do, but all the other variants we had were either wrong or
> less efficient.
> 
> If for some reason an Arm system is configured to trap
> all unaligned access, then you must already pass
> -mno-unaligned-access to the compiler to prevent certain
> optimizations, and then the helpers will still behave
> correctly (the same way they do on armv5, which never has
> unaligned access). On armv7 with -munaligned-access, the
> same functions only prevent the use of stm/ldm and strd/ldrd
> but still use ldr/str.

Unfortunately we don't explicitely do. This would explain why I saw some 
issues with certain compiler [1].

So I agree that adding -mno-unaligned-access for arm32 makes sense.

@Juergen, do you want me to send a patch?

Cheers,

[1] c71163f6-2646-6fae-cb22-600eb0486539@xen.org
Jürgen Groß Dec. 5, 2023, 4:31 p.m. UTC | #16
On 05.12.23 17:29, Julien Grall wrote:
> Hi Arnd,
> 
> On 05/12/2023 14:37, Arnd Bergmann wrote:
>> On Tue, Dec 5, 2023, at 15:19, Julien Grall wrote:
>>> On 05/12/2023 14:10, Arnd Bergmann wrote:
>>>> On Tue, Dec 5, 2023, at 15:01, Julien Grall wrote:
>>>>> On 05/12/2023 13:59, Jan Beulich wrote:
>>>>>> On 05.12.2023 14:46, Julien Grall wrote:
>>>> This would repeat the mistake that we had in Linux in the
>>>> past (and still do in some drivers): Simply dereferencing
>>>> a misaligned pointer is always a bug, even on architectures
>>>> that have efficient unaligned load/store instructions,
>>>> because C makes this undefined behavior and gcc has
>>>> optimizations that assume e.g. 'int *' to never have
>>>> the lower two bits set [1].
>>>
>>> Just to clarify, I haven't suggested to use 'int *'. My point was more
>>> that I don't think that the helpers would work as-is on arm32 because
>>> even if the ISA allows a direct access, we are setting the bit in SCTLR
>>> to disable unaligned access.
>>>
>>> As Juergen is proposing a common header, then I could ask him to do the
>>> work to confirm that the helpers properly work on arm32. But I think
>>> this is unfair.
>>
>> When I introduced the helpers in Linux, I showed that these
>> produce the best output on all modern compilers (at least gcc-5,
>> probably earlier) for both architectures that allow unaligned
>> access and for those that don't. We used to have architecture
>> specific helpers depending on what each architecture could
>> do, but all the other variants we had were either wrong or
>> less efficient.
>>
>> If for some reason an Arm system is configured to trap
>> all unaligned access, then you must already pass
>> -mno-unaligned-access to the compiler to prevent certain
>> optimizations, and then the helpers will still behave
>> correctly (the same way they do on armv5, which never has
>> unaligned access). On armv7 with -munaligned-access, the
>> same functions only prevent the use of stm/ldm and strd/ldrd
>> but still use ldr/str.
> 
> Unfortunately we don't explicitely do. This would explain why I saw some issues 
> with certain compiler [1].
> 
> So I agree that adding -mno-unaligned-access for arm32 makes sense.
> 
> @Juergen, do you want me to send a patch?

Yes, will do.


Juergen
diff mbox series

Patch

diff --git a/xen/include/xen/unaligned.h b/xen/include/xen/unaligned.h
index 0a2b16d05d..325d9f875f 100644
--- a/xen/include/xen/unaligned.h
+++ b/xen/include/xen/unaligned.h
@@ -1,12 +1,4 @@ 
-/*
- * This header can be used by architectures where unaligned accesses work
- * without faulting, and at least reasonably efficiently.  Other architectures
- * will need to have a custom asm/unaligned.h.
- */
-#ifndef __ASM_UNALIGNED_H__
-#error "xen/unaligned.h should not be included directly - include asm/unaligned.h instead"
-#endif
-
+/* SPDX-License-Identifier: GPL-2.0 */
 #ifndef __XEN_UNALIGNED_H__
 #define __XEN_UNALIGNED_H__
 
@@ -15,67 +7,126 @@ 
 #include <asm/byteorder.h>
 #endif
 
-#define get_unaligned(p) (*(p))
-#define put_unaligned(val, p) (*(p) = (val))
+/*
+ * This is the most generic implementation of unaligned accesses
+ * and should work almost anywhere.
+ */
+
+#define __get_unaligned_t(type, ptr) ({						\
+	const struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);	\
+	__pptr->x;								\
+})
+
+#define __put_unaligned_t(type, val, ptr) do {					\
+	struct { type x; } __packed *__pptr = (typeof(__pptr))(ptr);		\
+	__pptr->x = (val);							\
+} while (0)
+
+#define get_unaligned(ptr)	__get_unaligned_t(typeof(*(ptr)), (ptr))
+#define put_unaligned(val, ptr) __put_unaligned_t(typeof(*(ptr)), (val), (ptr))
+
+static inline u16 get_unaligned_le16(const void *p)
+{
+	return le16_to_cpu(__get_unaligned_t(__le16, p));
+}
+
+static inline u32 get_unaligned_le32(const void *p)
+{
+	return le32_to_cpu(__get_unaligned_t(__le32, p));
+}
+
+static inline u64 get_unaligned_le64(const void *p)
+{
+	return le64_to_cpu(__get_unaligned_t(__le64, p));
+}
+
+static inline void put_unaligned_le16(u16 val, void *p)
+{
+	__put_unaligned_t(__le16, cpu_to_le16(val), p);
+}
+
+static inline void put_unaligned_le32(u32 val, void *p)
+{
+	__put_unaligned_t(__le32, cpu_to_le32(val), p);
+}
+
+static inline void put_unaligned_le64(u64 val, void *p)
+{
+	__put_unaligned_t(__le64, cpu_to_le64(val), p);
+}
+
+static inline u16 get_unaligned_be16(const void *p)
+{
+	return be16_to_cpu(__get_unaligned_t(__be16, p));
+}
+
+static inline u32 get_unaligned_be32(const void *p)
+{
+	return be32_to_cpu(__get_unaligned_t(__be32, p));
+}
 
-static inline uint16_t get_unaligned_be16(const void *p)
+static inline u64 get_unaligned_be64(const void *p)
 {
-	return be16_to_cpup(p);
+	return be64_to_cpu(__get_unaligned_t(__be64, p));
 }
 
-static inline void put_unaligned_be16(uint16_t val, void *p)
+static inline void put_unaligned_be16(u16 val, void *p)
 {
-	*(__force __be16*)p = cpu_to_be16(val);
+	__put_unaligned_t(__be16, cpu_to_be16(val), p);
 }
 
-static inline uint32_t get_unaligned_be32(const void *p)
+static inline void put_unaligned_be32(u32 val, void *p)
 {
-	return be32_to_cpup(p);
+	__put_unaligned_t(__be32, cpu_to_be32(val), p);
 }
 
-static inline void put_unaligned_be32(uint32_t val, void *p)
+static inline void put_unaligned_be64(u64 val, void *p)
 {
-	*(__force __be32*)p = cpu_to_be32(val);
+	__put_unaligned_t(__be64, cpu_to_be64(val), p);
 }
 
-static inline uint64_t get_unaligned_be64(const void *p)
+static inline u32 __get_unaligned_be24(const u8 *p)
 {
-	return be64_to_cpup(p);
+	return p[0] << 16 | p[1] << 8 | p[2];
 }
 
-static inline void put_unaligned_be64(uint64_t val, void *p)
+static inline u32 get_unaligned_be24(const void *p)
 {
-	*(__force __be64*)p = cpu_to_be64(val);
+	return __get_unaligned_be24(p);
 }
 
-static inline uint16_t get_unaligned_le16(const void *p)
+static inline u32 __get_unaligned_le24(const u8 *p)
 {
-	return le16_to_cpup(p);
+	return p[0] | p[1] << 8 | p[2] << 16;
 }
 
-static inline void put_unaligned_le16(uint16_t val, void *p)
+static inline u32 get_unaligned_le24(const void *p)
 {
-	*(__force __le16*)p = cpu_to_le16(val);
+	return __get_unaligned_le24(p);
 }
 
-static inline uint32_t get_unaligned_le32(const void *p)
+static inline void __put_unaligned_be24(const u32 val, u8 *p)
 {
-	return le32_to_cpup(p);
+	*p++ = val >> 16;
+	*p++ = val >> 8;
+	*p++ = val;
 }
 
-static inline void put_unaligned_le32(uint32_t val, void *p)
+static inline void put_unaligned_be24(const u32 val, void *p)
 {
-	*(__force __le32*)p = cpu_to_le32(val);
+	__put_unaligned_be24(val, p);
 }
 
-static inline uint64_t get_unaligned_le64(const void *p)
+static inline void __put_unaligned_le24(const u32 val, u8 *p)
 {
-	return le64_to_cpup(p);
+	*p++ = val;
+	*p++ = val >> 8;
+	*p++ = val >> 16;
 }
 
-static inline void put_unaligned_le64(uint64_t val, void *p)
+static inline void put_unaligned_le24(const u32 val, void *p)
 {
-	*(__force __le64*)p = cpu_to_le64(val);
+	__put_unaligned_le24(val, p);
 }
 
 #endif /* __XEN_UNALIGNED_H__ */