diff mbox

[0/7] Improve buffer_is_zero

Message ID CALicx6vZiE7mDabb5i+nrOK3J+LBQMMJbCiLY1ORvRwyEN6O-Q@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vijay Kilari Aug. 25, 2016, 8:04 a.m. UTC
On Thu, Aug 25, 2016 at 12:07 PM, Vijay Kilari <vijay.kilari@gmail.com> wrote:
> Hi Richard,
>
>   Migration fails on arm64 with these patches.
> On the destination VM, follow errors are appearing.
>
> qemu-system-aarch64: VQ 0 size 0x400 Guest index 0x0 inconsistent with
> Host index 0x1937: delta 0xe6c9
> qemu-system-aarch64: error while loading state for instance 0x0 of
> device 'virtio-mmio@000000000a003e00/virtio-net'
> qemu-system-aarch64: load of migration failed: Operation not permitted
> qemu-system-aarch64: network script /etc/qemu-ifdown failed with status 256

With below changes, migration is working fine on arm64.

         VECTYPE t;                                              \
@@ -185,7 +186,7 @@ NAME(const void *buf, size_t len)
             \
         } else {                                                \
             link_error();                                       \
         }                                                       \
-        if (unlikely(!ZERO(t))) {                               \
+        if (unlikely(!ZERO(t, zero))) {                         \
             return false;                                       \
         }                                                       \
         buf += SIZE;                                            \
@@ -227,7 +228,7 @@ buffer_zero_base(const void *buf, size_t len)
     return true;
 }
-#define IDENT_ZERO(X)  (X)
+#define IDENT_ZERO(X1, X2)  (X1 == X2)
 ACCEL_BUFFER_ZERO(buffer_zero_int, 4*sizeof(long), long, IDENT_ZERO)

 static bool select_accel_int(const void *buf, size_t len)
@@ -511,7 +512,9 @@ static bool select_accel_fn(const void *buf, size_t len)
 #elif defined(__aarch64__)
 #include "arm_neon.h"

-#define DO_ZERO(X)  (vgetq_lane_u64((X), 0) | vgetq_lane_u64((X), 1))
+#define DO_ZERO(X1, X2) \
+        ((vgetq_lane_u64(X1, 0) == vgetq_lane_u64(X2, 0)) && \
+         (vgetq_lane_u64(X1, 1) == vgetq_lane_u64(X2, 1)))
 ACCEL_BUFFER_ZERO(buffer_zero_neon_64, 64, uint64x2_t, DO_ZERO)
 ACCEL_BUFFER_ZERO(buffer_zero_neon_128, 128, uint64x2_t, DO_ZERO)

@@ -526,7 +529,7 @@ static void __attribute__((constructor))
init_buffer_zero_accel(void)
        since the later is not available to userspace.  This seems
        to work in practice for existing implementations.  */
     asm("mrs %0, dczid_el0" : "=r"(t));
-    if ((t & 15) * 16 >= 128) {
+    if (pow(2, (t & 0xf)) * 4 >= 128) {
         buffer_zero_line_mask = 128 - 1;
         buffer_zero_accel = buffer_zero_neon_128;
     } else {


>
> Regards
> Vijay
>
>
> On Wed, Aug 24, 2016 at 9:47 AM, Richard Henderson <rth@twiddle.net> wrote:
>> Patches 1-3 remove the use of ifunc from the implementation.
>>
>> Patch 5 adjusts the x86 implementation a bit more to take
>> advantage of ptest (in sse4.1) and unaligned accesses (in avx1).
>>
>> Patches 2 and 6 are the result of my conversation with Vijaya
>> Kumar with respect to ThunderX.
>>
>> Patch 7 is the result of seeing some really really horrible code
>> produced for ppc64le (gcc 4.9 and mainline).
>>
>> This has had limited testing.  What I don't know is the best way
>> to benchmark this -- the only way I know to trigger this is via
>> the console, by hand, which doesn't make for reasonable timing.
>>
>>
>> r~
>>
>>
>> Richard Henderson (7):
>>   cutils: Remove SPLAT macro
>>   cutils: Export only buffer_is_zero
>>   cutils: Rearrange buffer_is_zero acceleration
>>   cutils: Add generic prefetch
>>   cutils: Rewrite x86 buffer zero checking
>>   cutils: Rewrite aarch64 buffer zero checking
>>   cutils: Rewrite ppc buffer zero checking
>>
>>  configure             |  21 +-
>>  include/qemu/cutils.h |   2 -
>>  migration/ram.c       |   2 +-
>>  migration/rdma.c      |   5 +-
>>  util/cutils.c         | 526 +++++++++++++++++++++++++++++++++-----------------
>>  5 files changed, 352 insertions(+), 204 deletions(-)
>>
>> --
>> 2.7.4
>>
diff mbox

Patch

diff --git a/util/cutils.c b/util/cutils.c
index 30fac02..9bbf31f 100644
--- a/util/cutils.c
+++ b/util/cutils.c
@@ -170,6 +170,7 @@  static bool __attribute__((noinline))
             \
 NAME(const void *buf, size_t len)                               \
 {                                                               \
     const void *end = buf + len;                                \
+    const VECTYPE zero = (VECTYPE){0};                          \
     do {                                                        \
         const VECTYPE *p = buf;                                 \