[v2,0/5] migration: improve multithreads

Message ID	20181106122025.3487-1-xiaoguangrong@tencent.com (mailing list archive)
Headers	show Return-Path: <kvm-owner@kernel.org> From: guangrong.xiao@gmail.com To: pbonzini@redhat.com, mst@redhat.com, mtosatti@redhat.com Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, dgilbert@redhat.com, peterx@redhat.com, wei.w.wang@intel.com, jiang.biao2@zte.com.cn, eblake@redhat.com, quintela@redhat.com, cota@braap.org, Xiao Guangrong <xiaoguangrong@tencent.com> Subject: [PATCH v2 0/5] migration: improve multithreads Date: Tue, 6 Nov 2018 20:20:20 +0800 Message-Id: <20181106122025.3487-1-xiaoguangrong@tencent.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	migration: improve multithreads \| expand [v2,0/5] migration: improve multithreads [v2,1/5] bitops: introduce change_bit_atomic [v2,2/5] util: introduce threaded workqueue [v2,3/5] migration: use threaded workqueue for compression [v2,4/5] migration: use threaded workqueue for decompression [v2,5/5] tests: add threaded-workqueue-bench

Message ID

20181106122025.3487-1-xiaoguangrong@tencent.com (mailing list archive)

Headers

From: guangrong.xiao@gmail.com
To: pbonzini@redhat.com, mst@redhat.com, mtosatti@redhat.com
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, dgilbert@redhat.com,
        peterx@redhat.com, wei.w.wang@intel.com, jiang.biao2@zte.com.cn,
        eblake@redhat.com, quintela@redhat.com, cota@braap.org,
        Xiao Guangrong <xiaoguangrong@tencent.com>
Subject: [PATCH v2 0/5] migration: improve multithreads
Date: Tue,  6 Nov 2018 20:20:20 +0800
Message-Id: <20181106122025.3487-1-xiaoguangrong@tencent.com>
Sender: kvm-owner@vger.kernel.org
Precedence: bulk

Series

migration: improve multithreads | expand

Message

Xiao Guangrong Nov. 6, 2018, 12:20 p.m. UTC

From: Xiao Guangrong <xiaoguangrong@tencent.com>

Changelog in v2:
These changes are based on Paolo's suggestion:
1) rename the lockless multithreads model to threaded workqueue
2) hugely improve the internal design, that make all the request be
   a large array, properly partition it, assign requests to threads
   respectively and use bitmaps to sync up threads and the submitter,
   after that ptr_ring and spinlock are dropped
3) introduce event wait for the submitter

These changes are based on Emilio's review:
4) make more detailed description for threaded workqueue
5) add a benchmark for threaded workqueue

The previous version can be found at
	https://marc.info/?l=kvm&m=153968821910007&w=2

There's the simple performance measurement comparing these two versions,
the environment is the same as we listed in the previous version.

Use 8 threads to compress the data in the source QEMU
- with compress-wait-thread = off


      total time        busy-ratio
--------------------------------------------------
v1    125066            0.38
v2    120444            0.35

- with compress-wait-thread = on
         total time    busy-ratio
--------------------------------------------------
v1    164426            0
v2    142609            0

The v2 win slightly.

Xiao Guangrong (5):
  bitops: introduce change_bit_atomic
  util: introduce threaded workqueue
  migration: use threaded workqueue for compression
  migration: use threaded workqueue for decompression
  tests: add threaded-workqueue-bench

 include/qemu/bitops.h             |  13 +
 include/qemu/threaded-workqueue.h |  94 +++++++
 migration/ram.c                   | 538 ++++++++++++++------------------------
 tests/Makefile.include            |   5 +-
 tests/threaded-workqueue-bench.c  | 256 ++++++++++++++++++
 util/Makefile.objs                |   1 +
 util/threaded-workqueue.c         | 466 +++++++++++++++++++++++++++++++++
 7 files changed, 1030 insertions(+), 343 deletions(-)
 create mode 100644 include/qemu/threaded-workqueue.h
 create mode 100644 tests/threaded-workqueue-bench.c
 create mode 100644 util/threaded-workqueue.c

Comments

no-reply@patchew.org Nov. 7, 2018, 2:52 a.m. UTC | #1

Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20181106122025.3487-1-xiaoguangrong@tencent.com
Subject: [Qemu-devel] [PATCH v2 0/5] migration: improve multithreads

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
Switched to a new branch 'test'
64aabd11be tests: add threaded-workqueue-bench
e9efc70edd migration: use threaded workqueue for decompression
7753a846ad migration: use threaded workqueue for compression
f6933b5132 util: introduce threaded workqueue
810caaa566 bitops: introduce change_bit_atomic

=== OUTPUT BEGIN ===
Checking PATCH 1/5: bitops: introduce change_bit_atomic...
Checking PATCH 2/5: util: introduce threaded workqueue...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#41: 
new file mode 100644

total: 0 errors, 1 warnings, 566 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 3/5: migration: use threaded workqueue for compression...
Checking PATCH 4/5: migration: use threaded workqueue for decompression...
Checking PATCH 5/5: tests: add threaded-workqueue-bench...
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#36: 
new file mode 100644

WARNING: line over 80 characters
#237: FILE: tests/threaded-workqueue-bench.c:197:
+    printf("   -r:       the number of requests handled by each thread (default %d).\n",

WARNING: line over 80 characters
#239: FILE: tests/threaded-workqueue-bench.c:199:
+    printf("   -m:       the size of the memory (G) used to test (default %dG).\n",

ERROR: line over 90 characters
#285: FILE: tests/threaded-workqueue-bench.c:245:
+    printf("Run the benchmark: threads %d requests-per-thread: %d memory %ldG repeat %d.\n",

total: 1 errors, 3 warnings, 273 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

Xiao Guangrong Nov. 12, 2018, 3:07 a.m. UTC | #2

Hi,

Ping...

On 11/6/18 8:20 PM, guangrong.xiao@gmail.com wrote:
> From: Xiao Guangrong <xiaoguangrong@tencent.com>
> 
> Changelog in v2:
> These changes are based on Paolo's suggestion:
> 1) rename the lockless multithreads model to threaded workqueue
> 2) hugely improve the internal design, that make all the request be
>     a large array, properly partition it, assign requests to threads
>     respectively and use bitmaps to sync up threads and the submitter,
>     after that ptr_ring and spinlock are dropped
> 3) introduce event wait for the submitter
> 
> These changes are based on Emilio's review:
> 4) make more detailed description for threaded workqueue
> 5) add a benchmark for threaded workqueue
> 
> The previous version can be found at
> 	https://marc.info/?l=kvm&m=153968821910007&w=2
> 
> There's the simple performance measurement comparing these two versions,
> the environment is the same as we listed in the previous version.
> 
> Use 8 threads to compress the data in the source QEMU
> - with compress-wait-thread = off
> 
> 
>        total time        busy-ratio
> --------------------------------------------------
> v1    125066            0.38
> v2    120444            0.35
> 
> - with compress-wait-thread = on
>           total time    busy-ratio
> --------------------------------------------------
> v1    164426            0
> v2    142609            0
> 
> The v2 win slightly.
> 
> Xiao Guangrong (5):
>    bitops: introduce change_bit_atomic
>    util: introduce threaded workqueue
>    migration: use threaded workqueue for compression
>    migration: use threaded workqueue for decompression
>    tests: add threaded-workqueue-bench
> 
>   include/qemu/bitops.h             |  13 +
>   include/qemu/threaded-workqueue.h |  94 +++++++
>   migration/ram.c                   | 538 ++++++++++++++------------------------
>   tests/Makefile.include            |   5 +-
>   tests/threaded-workqueue-bench.c  | 256 ++++++++++++++++++
>   util/Makefile.objs                |   1 +
>   util/threaded-workqueue.c         | 466 +++++++++++++++++++++++++++++++++
>   7 files changed, 1030 insertions(+), 343 deletions(-)
>   create mode 100644 include/qemu/threaded-workqueue.h
>   create mode 100644 tests/threaded-workqueue-bench.c
>   create mode 100644 util/threaded-workqueue.c
>

Paolo Bonzini Nov. 20, 2018, 6:27 p.m. UTC | #3

On 12/11/18 04:07, Xiao Guangrong wrote:
> 
> Hi,
> 
> Ping...

Hi Guangrong, I think this isn't being reviewed because we're in freeze.

Paolo

> On 11/6/18 8:20 PM, guangrong.xiao@gmail.com wrote:
>> From: Xiao Guangrong <xiaoguangrong@tencent.com>
>>
>> Changelog in v2:
>> These changes are based on Paolo's suggestion:
>> 1) rename the lockless multithreads model to threaded workqueue
>> 2) hugely improve the internal design, that make all the request be
>>     a large array, properly partition it, assign requests to threads
>>     respectively and use bitmaps to sync up threads and the submitter,
>>     after that ptr_ring and spinlock are dropped
>> 3) introduce event wait for the submitter
>>
>> These changes are based on Emilio's review:
>> 4) make more detailed description for threaded workqueue
>> 5) add a benchmark for threaded workqueue
>>
>> The previous version can be found at
>>     https://marc.info/?l=kvm&m=153968821910007&w=2
>>
>> There's the simple performance measurement comparing these two versions,
>> the environment is the same as we listed in the previous version.
>>
>> Use 8 threads to compress the data in the source QEMU
>> - with compress-wait-thread = off
>>
>>
>>        total time        busy-ratio
>> --------------------------------------------------
>> v1    125066            0.38
>> v2    120444            0.35
>>
>> - with compress-wait-thread = on
>>           total time    busy-ratio
>> --------------------------------------------------
>> v1    164426            0
>> v2    142609            0
>>
>> The v2 win slightly.
>>
>> Xiao Guangrong (5):
>>    bitops: introduce change_bit_atomic
>>    util: introduce threaded workqueue
>>    migration: use threaded workqueue for compression
>>    migration: use threaded workqueue for decompression
>>    tests: add threaded-workqueue-bench
>>
>>   include/qemu/bitops.h             |  13 +
>>   include/qemu/threaded-workqueue.h |  94 +++++++
>>   migration/ram.c                   | 538
>> ++++++++++++++------------------------
>>   tests/Makefile.include            |   5 +-
>>   tests/threaded-workqueue-bench.c  | 256 ++++++++++++++++++
>>   util/Makefile.objs                |   1 +
>>   util/threaded-workqueue.c         | 466
>> +++++++++++++++++++++++++++++++++
>>   7 files changed, 1030 insertions(+), 343 deletions(-)
>>   create mode 100644 include/qemu/threaded-workqueue.h
>>   create mode 100644 tests/threaded-workqueue-bench.c
>>   create mode 100644 util/threaded-workqueue.c
>>