[v2,0/2] KVM: arm64: Optimize the wait for the completion of the VPT analysis

Message ID	20201128141857.983-1-lushenming@huawei.com (mailing list archive)
Headers	show Return-Path: <SRS0=CKL9=FC=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EC55524684 From: Shenming Lu <lushenming@huawei.com> To: Marc Zyngier <maz@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, "Jason Cooper" <jason@lakedaemon.net>, <linux-kernel@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>, <kvmarm@lists.cs.columbia.edu>, <kvm@vger.kernel.org>, James Morse <james.morse@arm.com>, Julien Thierry <julien.thierry.kdev@gmail.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Eric Auger <eric.auger@redhat.com>, Christoffer Dall <christoffer.dall@arm.com> Subject: [PATCH v2 0/2] KVM: arm64: Optimize the wait for the completion of the VPT analysis Date: Sat, 28 Nov 2020 22:18:55 +0800 Message-ID: <20201128141857.983-1-lushenming@huawei.com> MIME-Version: 1.0 Precedence: list Cc: yuzenghui@huawei.com, wanghaibin.wang@huawei.com, lushenming@huawei.com Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
Series	KVM: arm64: Optimize the wait for the completion of the VPT analysis \| expand [v2,0/2] KVM: arm64: Optimize the wait for the completion of the VPT analysis [v2,1/2] irqchip/gic-v4.1: Reduce the delay time of the poll on the GICR_VPENDBASER.Dirty bit [v2,2/2] KVM: arm64: Delay the execution of the polling on the GICR_VPENDBASER.Dirty bit

Message ID

20201128141857.983-1-lushenming@huawei.com (mailing list archive)

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EC55524684
From: Shenming Lu <lushenming@huawei.com>
To: Marc Zyngier <maz@kernel.org>, Thomas Gleixner <tglx@linutronix.de>,
 "Jason Cooper" <jason@lakedaemon.net>, <linux-kernel@vger.kernel.org>,
 <linux-arm-kernel@lists.infradead.org>, <kvmarm@lists.cs.columbia.edu>,
 <kvm@vger.kernel.org>, James Morse <james.morse@arm.com>, Julien Thierry
 <julien.thierry.kdev@gmail.com>, Suzuki K Poulose <suzuki.poulose@arm.com>,
 Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>,
 Eric Auger <eric.auger@redhat.com>, Christoffer Dall
 <christoffer.dall@arm.com>
Subject: [PATCH v2 0/2] KVM: arm64: Optimize the wait for the completion of
 the VPT analysis
Date: Sat, 28 Nov 2020 22:18:55 +0800
Message-ID: <20201128141857.983-1-lushenming@huawei.com>
MIME-Version: 1.0
Precedence: list
Cc: yuzenghui@huawei.com, wanghaibin.wang@huawei.com, lushenming@huawei.com
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: 
 linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

Series

KVM: arm64: Optimize the wait for the completion of the VPT analysis | expand

Message

Shenming Lu Nov. 28, 2020, 2:18 p.m. UTC

Right after a vPE is made resident, the code starts polling the
GICR_VPENDBASER.Dirty bit until it becomes 0, where the delay_us
is set to 10. But in our measurement, it takes only hundreds of
nanoseconds, or 1~2 microseconds, to finish parsing the VPT in most
cases. What's more, we found that the MMIO delay on GICv4.1 system
(HiSilicon) is about 10 times higher than that on GICv4.0 system in
kvm-unit-tests (the specific data is as follows).

                        |   GICv4.1 emulator   |  GICv4.0 emulator
mmio_read_user (ns)     |        12811         |        1598

After analysis, this is mainly caused by the 10 delay_us, so it might
really hurt performance.

To avoid this, we can set the delay_us to 1, which is more appropriate
in this situation and universal. Besides, we can delay the execution
of the polling, giving the GIC a chance to work in parallel with the CPU
on the entry path.

Shenming Lu (2):
  irqchip/gic-v4.1: Reduce the delay time of the poll on the
    GICR_VPENDBASER.Dirty bit
  KVM: arm64: Delay the execution of the polling on the
    GICR_VPENDBASER.Dirty bit

 arch/arm64/kvm/vgic/vgic-v4.c      | 16 ++++++++++++++++
 arch/arm64/kvm/vgic/vgic.c         |  3 +++
 drivers/irqchip/irq-gic-v3-its.c   | 18 +++++++++++++-----
 drivers/irqchip/irq-gic-v4.c       | 11 +++++++++++
 include/kvm/arm_vgic.h             |  3 +++
 include/linux/irqchip/arm-gic-v4.h |  4 ++++
 6 files changed, 50 insertions(+), 5 deletions(-)

Comments

Marc Zyngier Dec. 11, 2020, 3:01 p.m. UTC | #1

On Sat, 28 Nov 2020 22:18:55 +0800, Shenming Lu wrote:
> Right after a vPE is made resident, the code starts polling the
> GICR_VPENDBASER.Dirty bit until it becomes 0, where the delay_us
> is set to 10. But in our measurement, it takes only hundreds of
> nanoseconds, or 1~2 microseconds, to finish parsing the VPT in most
> cases. What's more, we found that the MMIO delay on GICv4.1 system
> (HiSilicon) is about 10 times higher than that on GICv4.0 system in
> kvm-unit-tests (the specific data is as follows).
> 
> [...]

Applied to irq/irqchip-next, thanks!

[1/2] irqchip/gic-v4.1: Reduce the delay time of the poll on the GICR_VPENDBASER.Dirty bit
      commit: 0b39498230ae53e6af981141be99f4c7d5144de6

Patch 2 will be routed via the KVM/arm64 tree.

Cheers,

	M.