diff mbox

[RFC] qht: Align sequence lock to cache line

Message ID 20161025205558.GB22860@flamenco (mailing list archive)
State New, archived
Headers show

Commit Message

Emilio Cota Oct. 25, 2016, 8:55 p.m. UTC
On Tue, Oct 25, 2016 at 11:35:06 -0400, Pranith Kumar wrote:
> Using perf, I see that sequence lock is being a bottleneck since it is
> being read by everyone. Giving it its own cache-line seems to help
> things quite a bit.
> 
> Using qht-bench, I measured the following for:
> 
> $ ./tests/qht-bench -d 10 -n 24 -u <x>
> 
> throughput base   patch  %change
> update
> 0          8.07   13.33  +65%
> 10         7.10   8.90   +25%
> 20         6.34   7.02	 +10%
> 30         5.48   6.11   +9.6%
> 40         4.90   5.46   +11.42%
> 
> I am not able to see any significant increases for lower thread counts though.

Honestly I don't know what you're measuring here.

Your results are low (I assume you're showing here throughput per-thread,
but still they're low), and it makes no sense that increasing the cacheline
footprint will make a *read-only* workload (0% updates) run faster.

More below.

> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  include/qemu/seqlock.h | 2 +-
>  util/qht.c             | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/qemu/seqlock.h b/include/qemu/seqlock.h
> index 8dee11d..954abe8 100644
> --- a/include/qemu/seqlock.h
> +++ b/include/qemu/seqlock.h
> @@ -21,7 +21,7 @@ typedef struct QemuSeqLock QemuSeqLock;
>  
>  struct QemuSeqLock {
>      unsigned sequence;
> -};
> +} QEMU_ALIGNED(64);
>  
>  static inline void seqlock_init(QemuSeqLock *sl)
>  {
> diff --git a/util/qht.c b/util/qht.c
> index ff4d2e6..4d82609 100644
> --- a/util/qht.c
> +++ b/util/qht.c
> @@ -101,14 +101,14 @@
>   * be grabbed first.
>   */
>  struct qht_bucket {
> -    QemuSpin lock;
>      QemuSeqLock sequence;
> +    QemuSpin lock;
>      uint32_t hashes[QHT_BUCKET_ENTRIES];
>      void *pointers[QHT_BUCKET_ENTRIES];
>      struct qht_bucket *next;
>  } QEMU_ALIGNED(QHT_BUCKET_ALIGN);

I understand this is a hack but this would have been more localized:


So really I don't know what you're measuring.
The idea of decoupling the seqlock from the spinlock's cache line doesn't
make sense to me, because:
- Bucket lock holders are very likely to update the seqlock, so it makes sense
  to have them in the same cache line (exceptions to this are resizes or
  traversals, but those are very rare and we're not measuring those in qht-bench)
- Thanks to resizing + good hashing, bucket chains are very short, so
  a single seqlock per bucket is all we need.
- We can have *many* buckets (200K is not crazy for the TB htable), so
  anything that increases their size needs very good justification (see
  200K results above).

		E.
diff mbox

Patch

diff --git a/util/qht.c b/util/qht.c
index ff4d2e6..55db907 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -101,14 +101,16 @@ 
  * be grabbed first.
  */
 struct qht_bucket {
+    struct {
+        QemuSeqLock sequence;
+    } QEMU_ALIGNED(QHT_BUCKET_ALIGN);
     QemuSpin lock;
-    QemuSeqLock sequence;
     uint32_t hashes[QHT_BUCKET_ENTRIES];
     void *pointers[QHT_BUCKET_ENTRIES];
     struct qht_bucket *next;
 } QEMU_ALIGNED(QHT_BUCKET_ALIGN);
 
So I tested my change above vs. master on a 16-core (32-way) Intel machine
(Xeon E5-2690 @ 2.90GHz with turbo-boost disabled), making sure threads are
scheduled on separate cores, favouring same-socket ones.
Results: http://imgur.com/a/c4dTB