diff mbox series

[RFC,14/25] qemu/bswap: Introduce load/store for aligned pointer

Message ID 20210518183655.1711377-15-philmd@redhat.com (mailing list archive)
State New, archived
Headers show
Series exec: Add load/store API for aligned pointers | expand

Commit Message

Philippe Mathieu-Daudé May 18, 2021, 6:36 p.m. UTC
When the pointer alignment is known to be safe, we can
directly swap the data in place, without having to rely
on the compiler builtin code.

Load/store methods expecting aligned pointer use the 'a'
infix. For example to read a 16-bit unsigned value stored
in little endianess at an unaligned pointer:

  val = lduw_le_p(&unaligned_ptr);

then to store it in big endianess at an aligned pointer:

  stw_be_ap(&aligned_ptr, val);

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
---
 docs/devel/loads-stores.rst | 27 ++++++++++++++++-----------
 include/qemu/bswap.h        | 22 ++++++++++++++++++++++
 2 files changed, 38 insertions(+), 11 deletions(-)

Comments

Peter Maydell May 18, 2021, 8:15 p.m. UTC | #1
On Tue, 18 May 2021 at 19:38, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>
> When the pointer alignment is known to be safe, we can
> directly swap the data in place, without having to rely
> on the compiler builtin code.
>
> Load/store methods expecting aligned pointer use the 'a'
> infix. For example to read a 16-bit unsigned value stored
> in little endianess at an unaligned pointer:
>
>   val = lduw_le_p(&unaligned_ptr);
>
> then to store it in big endianess at an aligned pointer:
>
>   stw_be_ap(&aligned_ptr, val);

It sounded from the bug report as if the desired effect
was "this access is atomic". Nothing in the documentation here
makes that guarantee of the implementation -- it merely imposes
an extra requirement on the caller that the pointer alignment
is "safe" (which term it does not define...) and a valid
implementation would be to implement the "aligned" versions
identically to the "unaligned" versions...

Q: should the functions at the bottom of this stack of APIs
be using something from the atomic.h header? If not, why not?
Do we need any of the other atomic primitives ?

thanks
-- PMM
Philippe Mathieu-Daudé May 19, 2021, 5:56 a.m. UTC | #2
On 5/18/21 10:15 PM, Peter Maydell wrote:
> On Tue, 18 May 2021 at 19:38, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
>>
>> When the pointer alignment is known to be safe, we can
>> directly swap the data in place, without having to rely
>> on the compiler builtin code.
>>
>> Load/store methods expecting aligned pointer use the 'a'
>> infix. For example to read a 16-bit unsigned value stored
>> in little endianess at an unaligned pointer:
>>
>>   val = lduw_le_p(&unaligned_ptr);
>>
>> then to store it in big endianess at an aligned pointer:
>>
>>   stw_be_ap(&aligned_ptr, val);
> 
> It sounded from the bug report as if the desired effect
> was "this access is atomic". Nothing in the documentation here
> makes that guarantee of the implementation -- it merely imposes
> an extra requirement on the caller that the pointer alignment
> is "safe" (which term it does not define...) and a valid
> implementation would be to implement the "aligned" versions
> identically to the "unaligned" versions...
> 
> Q: should the functions at the bottom of this stack of APIs
> be using something from the atomic.h header? If not, why not?
> Do we need any of the other atomic primitives ?

I'll defer this question to Stefan/Paolo...
Stefan Hajnoczi May 19, 2021, 9:08 a.m. UTC | #3
On Wed, May 19, 2021 at 07:56:51AM +0200, Philippe Mathieu-Daudé wrote:
> On 5/18/21 10:15 PM, Peter Maydell wrote:
> > On Tue, 18 May 2021 at 19:38, Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
> >>
> >> When the pointer alignment is known to be safe, we can
> >> directly swap the data in place, without having to rely
> >> on the compiler builtin code.
> >>
> >> Load/store methods expecting aligned pointer use the 'a'
> >> infix. For example to read a 16-bit unsigned value stored
> >> in little endianess at an unaligned pointer:
> >>
> >>   val = lduw_le_p(&unaligned_ptr);
> >>
> >> then to store it in big endianess at an aligned pointer:
> >>
> >>   stw_be_ap(&aligned_ptr, val);
> > 
> > It sounded from the bug report as if the desired effect
> > was "this access is atomic". Nothing in the documentation here
> > makes that guarantee of the implementation -- it merely imposes
> > an extra requirement on the caller that the pointer alignment
> > is "safe" (which term it does not define...) and a valid
> > implementation would be to implement the "aligned" versions
> > identically to the "unaligned" versions...
> > 
> > Q: should the functions at the bottom of this stack of APIs
> > be using something from the atomic.h header? If not, why not?
> > Do we need any of the other atomic primitives ?
> 
> I'll defer this question to Stefan/Paolo...

Stricly speaking qatomic_read() and qatomic_set() are necessary for
__ATOMIC_RELAXED semantics.

The comment in atomic.h mentions this generally has no effect on
generated code. That's probably because aligned 16/32/64-bit accesses
don't tear on modern CPUs so no special instructions are needed. This is
why not using atomic.h usually works.

Even qatomic_read()/qatomic_set() is too strong since the doc comment
mentions it prevents the compiler moving other loads/stores past the
atomic operation. The hw/virtio/ code already has explicit memory
barriers and doesn't need the additional compiler barrier.

For safety I suggest using qatomic_read()/qatomic_set().

Stefan
diff mbox series

Patch

diff --git a/docs/devel/loads-stores.rst b/docs/devel/loads-stores.rst
index 568274baec0..88493ba1293 100644
--- a/docs/devel/loads-stores.rst
+++ b/docs/devel/loads-stores.rst
@@ -13,20 +13,21 @@  documentation of each API -- for that you should look at the
 documentation comments in the relevant header files.
 
 
-``ld*_p and st*_p``
-~~~~~~~~~~~~~~~~~~~
+``ld*_[a]p and st*_[a]p``
+~~~~~~~~~~~~~~~~~~~~~~~~~
 
 These functions operate on a host pointer, and should be used
 when you already have a pointer into host memory (corresponding
 to guest ram or a local buffer). They deal with doing accesses
 with the desired endianness and with correctly handling
-potentially unaligned pointer values.
+potentially unaligned pointer values. If the pointer alignment
+is known to be safe, then the aligned functions can be used.
 
 Function names follow the pattern:
 
-load: ``ld{sign}{size}_{endian}_p(ptr)``
+load: ``ld{sign}{size}_{endian}_{aligned}p(ptr)``
 
-store: ``st{size}_{endian}_p(ptr, val)``
+store: ``st{size}_{endian}_{aligned}p(ptr, val)``
 
 ``sign``
  - (empty) : for 32 or 64 bit sizes
@@ -49,24 +50,28 @@  The ``_{endian}`` infix is omitted for target-endian accesses.
 The target endian accessors are only available to source
 files which are built per-target.
 
+By using the ``_{aligned}`` infix, unsafe optimizations might be used,
+however unaligned pointer might trigger an exception and abort the
+process.
+
 There are also functions which take the size as an argument:
 
-load: ``ldn{endian}_p(ptr, sz)``
+load: ``ldn{endian}_{aligned}p(ptr, sz)``
 
 which performs an unsigned load of ``sz`` bytes from ``ptr``
 as an ``{endian}`` order value and returns it in a uint64_t.
 
-store: ``stn{endian}_p(ptr, sz, val)``
+store: ``stn{endian}_{aligned}p(ptr, sz, val)``
 
 which stores ``val`` to ``ptr`` as an ``{endian}`` order value
 of size ``sz`` bytes.
 
 
 Regexes for git grep
- - ``\<ld[us]\?[bwlq]\(_[hbl]e\)\?_p\>``
- - ``\<st[bwlq]\(_[hbl]e\)\?_p\>``
- - ``\<ldn_\([hbl]e\)?_p\>``
- - ``\<stn_\([hbl]e\)?_p\>``
+ - ``\<ld[us]\?[bwlq]\(_[hbl]e\)\?_a?p\>``
+ - ``\<st[bwlq]\(_[hbl]e\)\?_a?p\>``
+ - ``\<ldn_\([hbl]e\)?_a?p\>``
+ - ``\<stn_\([hbl]e\)?_a?p\>``
 
 ``cpu_{ld,st}*_mmuidx_ra``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index 4cd120ca014..3f272c3cb46 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -350,25 +350,47 @@  static inline void st ## size ## _he_p(void *ptr, vtype v)\
     __builtin_memcpy(ptr, &v, sizeof(v));\
 }
 
+#define LD_CONVERT_ALIGNED(bits, rtype, vtype, size)\
+static inline rtype ld ## size ## _he_ap(const void *ptr)\
+{\
+    return *(vtype *)ptr;\
+}
+
+#define ST_CONVERT_ALIGNED(bits, vtype, size)\
+static inline void st ## size ## _he_ap(void *ptr, vtype v)\
+{\
+    *(vtype *)ptr = v;\
+}
+
 #define LD_CONVERT_END(endian, bits, rtype, vtype, size)\
 static inline rtype ld ## size ## _ ## endian ## _p(const void *ptr)\
 {\
     return (vtype)glue(endian, _bswap)(ld ## size ## _he_p(ptr), bits);\
+}\
+static inline rtype ld ## size ## _ ## endian ## _ap(const void *ptr)\
+{\
+    return (vtype)glue(endian, _bswap)(ld ## size ## _he_ap(ptr), bits);\
 }
 
 #define ST_CONVERT_END(endian, bits, vtype, size)\
 static inline void st ## size ## _ ## endian ## _p(void *ptr, vtype v)\
 {\
     st ## size ## _he_p(ptr, glue(endian, _bswap)(v, bits));\
+}\
+static inline void st ## size ## _ ## endian ## _ap(void *ptr, vtype v)\
+{\
+    st ## size ## _he_ap(ptr, glue(endian, _bswap)(v, bits));\
 }
 
 #define LD_CONVERT(bits, rtype, vtype, size)\
     LD_CONVERT_UNALIGNED(bits, rtype, vtype, size)\
+    LD_CONVERT_ALIGNED(bits, rtype, vtype, size)\
     LD_CONVERT_END(le, bits, rtype, vtype, size)\
     LD_CONVERT_END(be, bits, rtype, vtype, size)
 
 #define ST_CONVERT(bits, vtype, size)\
     ST_CONVERT_UNALIGNED(bits, vtype, size)\
+    ST_CONVERT_ALIGNED(bits, vtype, size)\
     ST_CONVERT_END(le, bits, vtype, size)\
     ST_CONVERT_END(be, bits, vtype, size)