[00/10] cputlb: track dirty tlbs and general cleanup

Message ID	20181023070253.6407-1-richard.henderson@linaro.org (mailing list archive)
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Date: Tue, 23 Oct 2018 08:02:42 +0100 Message-Id: <20181023070253.6407-1-richard.henderson@linaro.org> Subject: [Qemu-devel] [PATCH 00/10] cputlb: track dirty tlbs and general cleanup Precedence: list Cc: cota@braap.org Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>
Series	cputlb: track dirty tlbs and general cleanup \| expand [00/10] cputlb: track dirty tlbs and general cleanup [01/10] cputlb: Move tlb_lock to CPUTLBCommon [02/10] cputlb: Remove tcg_enabled hack from tlb_flush_nocheck [03/10] cputlb: Move cpu->pending_tlb_flush to env->tlb_c.pending_flush [04/10] cputlb: Split large page tracking per mmu_idx [05/10] cputlb: Move env->vtlb_index to env->tlb_d.vindex [06/10] cputlb: Merge tlb_flush_nocheck into tlb_flush_by_mmuidx_async_work [07/10] cputlb: Merge tlb_flush_page into tlb_flush_page_by_mmuidx [08/10] cputlb: Count "partial" and "elided" tlb flushes [09/10] cputlb: Filter flushes on already clean tlbs [10/10] cputlb: Remove tlb_c.pending_flushes

Message ID

20181023070253.6407-1-richard.henderson@linaro.org (mailing list archive)

Headers

From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Tue, 23 Oct 2018 08:02:42 +0100
Message-Id: <20181023070253.6407-1-richard.henderson@linaro.org>
Subject: [Qemu-devel] [PATCH 00/10] cputlb: track dirty tlbs and general
 cleanup
Precedence: list
Cc: cota@braap.org
Errors-To: 
 qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org
Sender: "Qemu-devel"
	<qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Series

cputlb: track dirty tlbs and general cleanup | expand

Message

Richard Henderson Oct. 23, 2018, 7:02 a.m. UTC

The motivation here is reducing the total overhead.

Before a few patches went into target-arm.next, I measured total
tlb flush overhead for aarch64 at 25%.  This appears to reduce the
total overhead to about 5% (I do need to re-run the control tests,
not just watch perf top as I'm doing now).

The final patch is somewhat of an RFC.  I'd like to know what
benchmark was used when putting in pending_tlb_flushes, and I
have not done any archaeology to find out.  I suspect that it
does make any measurable difference beyond tlb_c.dirty, and I
think the code is a bit cleaner without it.


r~


Richard Henderson (10):
  cputlb: Move tlb_lock to CPUTLBCommon
  cputlb: Remove tcg_enabled hack from tlb_flush_nocheck
  cputlb: Move cpu->pending_tlb_flush to env->tlb_c.pending_flush
  cputlb: Split large page tracking per mmu_idx
  cputlb: Move env->vtlb_index to env->tlb_d.vindex
  cputlb: Merge tlb_flush_nocheck into tlb_flush_by_mmuidx_async_work
  cputlb: Merge tlb_flush_page into tlb_flush_page_by_mmuidx
  cputlb: Count "partial" and "elided" tlb flushes
  cputlb: Filter flushes on already clean tlbs
  cputlb: Remove tlb_c.pending_flushes

 include/exec/cpu-defs.h   |  51 +++++-
 include/exec/cputlb.h     |   2 +-
 include/qom/cpu.h         |   6 -
 accel/tcg/cputlb.c        | 354 +++++++++++++++-----------------------
 accel/tcg/translate-all.c |   8 +-
 5 files changed, 184 insertions(+), 237 deletions(-)

Comments

Emilio Cota Oct. 23, 2018, 5:11 p.m. UTC | #1

On Tue, Oct 23, 2018 at 08:02:42 +0100, Richard Henderson wrote:
> The motivation here is reducing the total overhead.
> 
> Before a few patches went into target-arm.next, I measured total
> tlb flush overhead for aarch64 at 25%.  This appears to reduce the
> total overhead to about 5% (I do need to re-run the control tests,
> not just watch perf top as I'm doing now).

I'd like to see those absolute perf numbers; I ran a few Ubuntu aarch64
boots and the noise is just too high to draw any conclusions (I'm
using your tlb-dirty branch on github).

When booting the much smaller debian image, these patches are
performance-neutral though. So,
  Reviewed-by: Emilio G. Cota <cota@braap.org>
for the series.

(On a pedantic note: consider s/miniscule/minuscule/ in patches 6-7)

> The final patch is somewhat of an RFC.  I'd like to know what
> benchmark was used when putting in pending_tlb_flushes, and I
> have not done any archaeology to find out.  I suspect that it
> does make any measurable difference beyond tlb_c.dirty, and I
> think the code is a bit cleaner without it.

I suspect that pending_tlb_flushes was premature optimization.
Avoiding an async job sounds like a good idea, since it is very
expensive for the remote vCPU.
However, in most cases we'll be taking a lock (or a full barrier
in the original code) but we won't avoid the async job (because
a race when flushing other vCPUs is unlikely), therefore wasting
cycles in the lock (formerly barrier).

Thanks,

		Emilio