mbox series

[v3,0/6] Optimize buffer_is_zero

Message ID 20240206204809.9859-1-amonakov@ispras.ru (mailing list archive)
Headers show
Series Optimize buffer_is_zero | expand

Message

Alexander Monakov Feb. 6, 2024, 8:48 p.m. UTC
I am posting a new revision of buffer_is_zero improvements (v2 can be found at
https://patchew.org/QEMU/20231027143704.7060-1-mmromanov@ispras.ru/ ).

In our experiments buffer_is_zero took about 40%-50% of overall qemu-img run
time, even though Glib I/O is not very efficient. Hence, it remains an important
routine to optimize.

We substantially improve its performance in typical cases, mostly by introducing
an inline wrapper that samples three bytes from head/middle/tail, avoid call
overhead when any of those is non-zero. We also provide improvements for SIMD
and portable scalar variants.

Changed for v3:

- separate into 6 patches
- fix an oversight which would break the build on non-x86 hosts
- properly avoid out-of-bounds pointers in the scalar variant

Alexander Monakov (6):
  util/bufferiszero: remove SSE4.1 variant
  util/bufferiszero: introduce an inline wrapper
  util/bufferiszero: remove AVX512 variant
  util/bufferiszero: remove useless prefetches
  util/bufferiszero: optimize SSE2 and AVX2 variants
  util/bufferiszero: improve scalar variant

 include/qemu/cutils.h |  28 ++++-
 util/bufferiszero.c   | 280 +++++++++++++++---------------------------
 2 files changed, 128 insertions(+), 180 deletions(-)