[RFC] qht: Align sequence lock to cache line

Message ID	20161025205558.GB22860@flamenco (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> Date: Tue, 25 Oct 2016 16:55:58 -0400 From: "Emilio G. Cota" <cota@braap.org> To: Pranith Kumar <bobby.prani@gmail.com> Message-ID: <20161025205558.GB22860@flamenco> References: <20161025153507.27110-1-bobby.prani@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161025153507.27110-1-bobby.prani@gmail.com> User-Agent: Mutt/1.5.24 (2015-08-30) Subject: Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line Precedence: list Cc: Sergey Fedorov <sergey.fedorov@linaro.org>, "open list:All patches CC here" <qemu-devel@nongnu.org>, Markus Armbruster <armbru@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Alex =?iso-8859-1?Q?Benn=E9e?= <alex.bennee@linaro.org>, Richard Henderson <rth@twiddle.net> Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Message ID

20161025205558.GB22860@flamenco (mailing list archive)

State

New, archived

Headers

Date: Tue, 25 Oct 2016 16:55:58 -0400
From: "Emilio G. Cota" <cota@braap.org>
To: Pranith Kumar <bobby.prani@gmail.com>
Message-ID: <20161025205558.GB22860@flamenco>
References: <20161025153507.27110-1-bobby.prani@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20161025153507.27110-1-bobby.prani@gmail.com>
User-Agent: Mutt/1.5.24 (2015-08-30)
Subject: Re: [Qemu-devel] [RFC PATCH] qht: Align sequence lock to cache line
Precedence: list
Cc: Sergey Fedorov <sergey.fedorov@linaro.org>,
	"open list:All patches CC here" <qemu-devel@nongnu.org>,
	Markus Armbruster <armbru@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, 
	Alex =?iso-8859-1?Q?Benn=E9e?= <alex.bennee@linaro.org>,
	Richard Henderson <rth@twiddle.net>
Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Commit Message

Emilio Cota Oct. 25, 2016, 8:55 p.m. UTC

On Tue, Oct 25, 2016 at 11:35:06 -0400, Pranith Kumar wrote:
> Using perf, I see that sequence lock is being a bottleneck since it is
> being read by everyone. Giving it its own cache-line seems to help
> things quite a bit.
> 
> Using qht-bench, I measured the following for:
> 
> $ ./tests/qht-bench -d 10 -n 24 -u <x>
> 
> throughput base   patch  %change
> update
> 0          8.07   13.33  +65%
> 10         7.10   8.90   +25%
> 20         6.34   7.02	 +10%
> 30         5.48   6.11   +9.6%
> 40         4.90   5.46   +11.42%
> 
> I am not able to see any significant increases for lower thread counts though.

Honestly I don't know what you're measuring here.

Your results are low (I assume you're showing here throughput per-thread,
but still they're low), and it makes no sense that increasing the cacheline
footprint will make a *read-only* workload (0% updates) run faster.

More below.

> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  include/qemu/seqlock.h | 2 +-
>  util/qht.c             | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h
> index 8dee11d..954abe8 100644
> --- a/include/qemu/seqlock.h
> +++ b/include/qemu/seqlock.h
> @@ -21,7 +21,7 @@ typedef struct QemuSeqLock QemuSeqLock;
>  
>  struct QemuSeqLock {
>      unsigned sequence;
> -};
> +} QEMU_ALIGNED(64);
>  
>  static inline void seqlock_init(QemuSeqLock *sl)
>  {
> diff --git a/util/qht.c b/util/qht.c
> index ff4d2e6..4d82609 100644
> --- a/util/qht.c
> +++ b/util/qht.c
> @@ -101,14 +101,14 @@
>   * be grabbed first.
>   */
>  struct qht_bucket {
> -    QemuSpin lock;
>      QemuSeqLock sequence;
> +    QemuSpin lock;
>      uint32_t hashes[QHT_BUCKET_ENTRIES];
>      void *pointers[QHT_BUCKET_ENTRIES];
>      struct qht_bucket *next;
>  } QEMU_ALIGNED(QHT_BUCKET_ALIGN);

I understand this is a hack but this would have been more localized:


So really I don't know what you're measuring.
The idea of decoupling the seqlock from the spinlock's cache line doesn't
make sense to me, because:
- Bucket lock holders are very likely to update the seqlock, so it makes sense
  to have them in the same cache line (exceptions to this are resizes or
  traversals, but those are very rare and we're not measuring those in qht-bench)
- Thanks to resizing + good hashing, bucket chains are very short, so
  a single seqlock per bucket is all we need.
- We can have *many* buckets (200K is not crazy for the TB htable), so
  anything that increases their size needs very good justification (see
  200K results above).

		E.

diff --git a/util/qht.c b/util/qht.c
index ff4d2e6..55db907 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -101,14 +101,16 @@ 
  * be grabbed first.
  */
 struct qht_bucket {
+    struct {
+        QemuSeqLock sequence;
+    } QEMU_ALIGNED(QHT_BUCKET_ALIGN);
     QemuSpin lock;
-    QemuSeqLock sequence;
     uint32_t hashes[QHT_BUCKET_ENTRIES];
     void *pointers[QHT_BUCKET_ENTRIES];
     struct qht_bucket *next;
 } QEMU_ALIGNED(QHT_BUCKET_ALIGN);
 
So I tested my change above vs. master on a 16-core (32-way) Intel machine
(Xeon E5-2690 @ 2.90GHz with turbo-boost disabled), making sure threads are
scheduled on separate cores, favouring same-socket ones.
Results: http://imgur.com/a/c4dTB

[RFC] qht: Align sequence lock to cache line

Commit Message

Patch