[0/8] KVM: arm/arm64: vgic: ITS translation cache

Message ID	20190606165455.162478-1-marc.zyngier@arm.com (mailing list archive)
Headers	show Return-Path: <kvm-owner@kernel.org> From: Marc Zyngier <marc.zyngier@arm.com> To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org Cc: Julien Thierry <julien.thierry@arm.com>, James Morse <james.morse@arm.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Christoffer Dall <christoffer.dall@arm.com>, Eric Auger <eric.auger@redhat.com>, Zenghui Yu <yuzenghui@huawei.com>, "Raslan, KarimAllah" <karahmed@amazon.de> Subject: [PATCH 0/8] KVM: arm/arm64: vgic: ITS translation cache Date: Thu, 6 Jun 2019 17:54:47 +0100 Message-Id: <20190606165455.162478-1-marc.zyngier@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	KVM: arm/arm64: vgic: ITS translation cache \| expand [0/8] KVM: arm/arm64: vgic: ITS translation cache [1/8] KVM: arm/arm64: vgic: Add LPI translation cache definition [2/8] KVM: arm/arm64: vgic: Add __vgic_put_lpi_locked primitive [3/8] KVM: arm/arm64: vgic-its: Cache successful MSI->LPI translation [4/8] KVM: arm/arm64: vgic-its: Add kvm parameter to vgic_its_free_collection [5/8] KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on specific commands [6/8] KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on disabling LPIs [7/8] KVM: arm/arm64: vgic-its: Check the LPI translation cache on MSI injection [8/8] KVM: arm/arm64: vgic-irqfd: Implement kvm_arch_set_irq_inatomic

Message ID

20190606165455.162478-1-marc.zyngier@arm.com (mailing list archive)

Headers

From: Marc Zyngier <marc.zyngier@arm.com>
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu,
        kvm@vger.kernel.org
Cc: Julien Thierry <julien.thierry@arm.com>,
        James Morse <james.morse@arm.com>,
        Suzuki K Poulose <suzuki.poulose@arm.com>,
        Christoffer Dall <christoffer.dall@arm.com>,
        Eric Auger <eric.auger@redhat.com>,
        Zenghui Yu <yuzenghui@huawei.com>,
        "Raslan, KarimAllah" <karahmed@amazon.de>
Subject: [PATCH 0/8] KVM: arm/arm64: vgic: ITS translation cache
Date: Thu,  6 Jun 2019 17:54:47 +0100
Message-Id: <20190606165455.162478-1-marc.zyngier@arm.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: kvm-owner@vger.kernel.org
Precedence: bulk

Series

KVM: arm/arm64: vgic: ITS translation cache | expand

Message

Marc Zyngier June 6, 2019, 4:54 p.m. UTC

It recently became apparent[1] that our LPI injection path is not as
efficient as it could be when injecting interrupts coming from a VFIO
assigned device.

Although the proposed patch wasn't 100% correct, it outlined at least
two issues:

(1) Injecting an LPI from VFIO always results in a context switch to a
    worker thread: no good

(2) We have no way of amortising the cost of translating a DID+EID pair
    to an LPI number

The reason for (1) is that we may sleep when translating an LPI, so we
do need a context process. A way to fix that is to implement a small
LPI translation cache that could be looked up from an atomic
context. It would also solve (2).

This is what this small series proposes. it implements a very basic
LRU cache of pre-translated LPIs, which gets used to implement
kvm_arch_set_irq_inatomic. The size of the cache is currently
hard-coded at 4 times the number of vcpus, a number I have randomly
chosen with the utmost care.

Does it work? well, it doesn't crash, and is thus perfect. More
seriously, I don't really have a way to benchmark it directly, so my
observations are indirect:

On a TX2 system, I run a 4 vcpu VM with an Ethernet interface passed
to it directly. From the host, I inject interrupts using debugfs. In
parallel, I look at the number of context switch, and the number of
interrupts on the host. Without this series, I get the same number for
both IRQ and CS (about half a million of each per second is pretty
easy to reach). With this series, the number of context switches drops
to something pretty small (in the low 2k), while the number of
interrupts stays the same.

Yes, this is a pretty rubbish benchmark, what did you expect? ;-)

So I'm putting this out for people with real workloads to try out and
report what they see.

[1] https://lore.kernel.org/lkml/1552833373-19828-1-git-send-email-yuzenghui@huawei.com/

Marc Zyngier (8):
  KVM: arm/arm64: vgic: Add LPI translation cache definition
  KVM: arm/arm64: vgic: Add __vgic_put_lpi_locked primitive
  KVM: arm/arm64: vgic-its: Cache successful MSI->LPI translation
  KVM: arm/arm64: vgic-its: Add kvm parameter to
    vgic_its_free_collection
  KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on
    specific commands
  KVM: arm/arm64: vgic-its: Invalidate MSI-LPI translation cache on
    disabling LPIs
  KVM: arm/arm64: vgic-its: Check the LPI translation cache on MSI
    injection
  KVM: arm/arm64: vgic-irqfd: Implement kvm_arch_set_irq_inatomic

 include/kvm/arm_vgic.h           |  10 +++
 virt/kvm/arm/vgic/vgic-init.c    |  34 ++++++++
 virt/kvm/arm/vgic/vgic-irqfd.c   |  36 ++++++--
 virt/kvm/arm/vgic/vgic-its.c     | 143 +++++++++++++++++++++++++++++--
 virt/kvm/arm/vgic/vgic-mmio-v3.c |   4 +-
 virt/kvm/arm/vgic/vgic.c         |  26 ++++--
 virt/kvm/arm/vgic/vgic.h         |   6 ++
 7 files changed, 238 insertions(+), 21 deletions(-)