Message ID | 20180926163258.20218-1-suzuki.poulose@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | kvm: arm64: Dynamic IPA and 52bit IPA | expand |
Hi Suzuki, On 9/26/18 6:32 PM, Suzuki K Poulose wrote: > > The physical address space size for a VM (IPA size) on arm/arm64 is > limited to a static limit of 40bits. This series adds support for > using an IPA size specific to a VM, allowing to use a size supported > by the host (based on the host kernel configuration and CPU support). > The default size is fixed to 40bits. On arm64, we can allow the limit > to be lowered (limiting the number of levels in stage2 to 2, to prevent > splitting the host PMD huge pages at stage2). We also add support for > handling 52bit IPA addresses (where supported) added by Arm v8.2 > extensions. > > We need to set the IPA limit as early as the VM creation to keep the > code simpler to avoid sprinkling checks everywhere to ensure that the > IPA is configured. We encode the IPA size in the machine_type > argument to KVM_CREATE_VM ioctl. Bits [7-0] of the type are reserved > for the IPA size. The availability of this feature is advertised by a > new cap KVM_CAP_ARM_VM_IPA_SIZE. When supported, this capability > returns the maximum IPA shift supported by the host. The supported IPA > size on a host could be different from the system's PARange indicated > by the CPUs (e.g, kernel limit on the PA size). > > Supporting different IPA size requires modification to the stage2 page > table code. The arm64 page table level helpers are defined based on the > page table levels used by the host VA. So, the accessors may not work > if the guest uses more number of levels in stage2 than the stage1 > of the host. The previous versions (v1 & v2) of this series refactored > the stage1 page table accessors to reuse the low-level accessors for an > independent stage2 table. However, due to the level folding in the > generic code, the types are redefined as well. i.e, if the PUD is > folded, the pud_t could be defined as : > > typedef struct { pgd_t pgd; } pud_t; > > similarly for pmd_t. So, without stage1 independent page table entry > types for stage2, we could be dealing with a different type for level > 0-2 entries. This is practically fine on arm/arm64 as the entries > have similar format and size and we always use the appropriate > accessors to get the raw value (i.e, pud_val/pmd_val etc). But not > ideal for a solution upstream. So, this version caps the stage2 page > table levels to that of the stage1. This has the following impact on > the IPA support for various pagesize/host-va combinations : > > > x-----------------------------------------------------x > | host\ipa | 40bit | 42bit | 44bit | 48bit | 52bit | > ------------------------------------------------------- > | 39bit-4K | y | y | n | n | n/a | > ------------------------------------------------------- > | 48bit-4K | y | y | y | y | n/a | > ------------------------------------------------------- > | 36bit-16K | y | n | n | n | n/a | > ------------------------------------------------------- > | 47bit-16K | y | y | y | y | n/a | > ------------------------------------------------------- > | 48bit-4K | y | y | y | y | n/a | > ------------------------------------------------------- > | 42bit-64K | y | y | y | n | n | > ------------------------------------------------------- > | 48bit-64K | y | y | y | y | y | > x-----------------------------------------------------x > > Or the following list shows what cannot be supported : > > 39bit-4K host | [44 - 48] > 36bit-16K host | [41 - 48] > 42bit-64K host | [47 - 52] > > which is not really bad. We can pursue the independent stage2 > page table support and lift the restriction once we get there. > Given there is a proposal for new generic page table walker [0], > it would make sense to make our efforts in sync with it to avoid > diverting from a common API. > > 52bit support is added for VGIC (including ITS emulation) and handling > of PAR, HPFAR registers. > > The series applies on 4.19-rc4. A tree is available here: > > git://linux-arm.org/linux-skp.git ipa52/v6 > > Tested with > - Modified kvmtool, which can only be used for (patches included in > the series for reference / testing): > * with virtio-pci upto 44bit PA (Due to 4K page size for virtio-pci > legacy implemented by kvmtool) > * Upto 48bit PA with virtio-mmio, due to 32bit PFN limitation. > - Hacked Qemu (boot loader support for highmem, IPA size support) > * with virtio-pci GIC-v3 ITS & MSI upto 52bit on Foundation model. > Also see [1] for Qemu support. > > [0] https://lkml.org/lkml/2018/4/24/777 > [1] https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg05759.html > > Change since v5: > - Don't raise the IPA Limit to 40bits on systems with lower PA size. > Doesn't break backward compatibility, we still allow KVM_CREATE_VM > to succeed with "0" as the IPA size (40bits). But prevent specifying > 40bit explicitly, when the limit is lower. > - Rename CAP, KVM_CAP_ARM_VM_PHYS_SHIFT => KVM_CAP_ARM_VM_IPA_SIZE > and helper, KVM_VM_TYPE_ARM_VM_PHY_SHIFT => KVM_VM_TYPE_ARM_VM_IPA_SIZE > - Update Documentation of the API > - Update comments and commit description as reported by Eric > - Set the missing TCR_T0SZ in patch "kvm: arm64: Configure VTCR_EL2 per VM" > - Fix bits for CBASER_ADDRESS mask, GITS_CBASER_ADDRESS() > > Changes since V4: > - Rebased on v4.19-rc3 > - Dropped virtio patches queued already by mst. > - Collect Acks from Christoffer > - Restrict IPA configuration support to arm64 only > - Use KVM_CAP_ARM_VM_PHYS_SHIFT for detecting the support for > IPA size configuration along with the limit on the IPA for the host. > - Update comments on __load_guest_stage2 > - Add comment about the default value for unknown PARange values. > - Update Documentation of the API > > Changes since V3: > - Use per-VM VTCR instead per-VM private VTCR bits > - Allow IPA less than 40bits > - Split the patch adding support for stage2 dynamic page tables > - Rearrange the series to keep the userspace API at the end, which > needs further discussion. > - Collect Reviews/Acks from Eric & Marc > > Changes since V2: > - Drop "refactoring of host page table helpers" and restrict the IPA size > to make sure stage2 doesn't use more page table levels than that of the host. > - Load VTCR for TLB operations on behalf of the VM (Pointed-by: James Morse) > - Split a couple of patches to make them easier to review. > - Fall back to normal (non-concatenated) entry level page table support if > possible. > - Bump the IOCTL number > > Changes since V1: > - Change the userspace API for configuring VM to encode the IPA > size in the VM type. (suggested by Christoffer) > - Expose the IPA limit on the host via ioctl on /dev/kvm > - Handle 52bit addresses in PAR & HPFAR > - Drop patch changing the life time of stage2 PGD > - Rename macros for 48-to-52 bit conversion for GIC ITS BASER. > (suggested by Christoffer) > - Split virtio PFN check patches and address comments. > > > Kristina Martsenko (1): > vgic: Add support for 52bit guest physical address > > Suzuki K Poulose (17): > kvm: arm/arm64: Fix stage2_flush_memslot for 4 level page table > kvm: arm/arm64: Remove spurious WARN_ON > kvm: arm64: Add helper for loading the stage2 setting for a VM > arm64: Add a helper for PARange to physical shift conversion > kvm: arm64: Clean up VTCR_EL2 initialisation > kvm: arm/arm64: Allow arch specific configurations for VM > kvm: arm64: Configure VTCR_EL2 per VM > kvm: arm/arm64: Prepare for VM specific stage2 translations > kvm: arm64: Prepare for dynamic stage2 page table layout > kvm: arm64: Make stage2 page table layout dynamic > kvm: arm64: Dynamic configuration of VTTBR mask > kvm: arm64: Configure VTCR_EL2.SL0 per VM > kvm: arm64: Switch to per VM IPA limit > kvm: arm64: Add 52bit support for PAR to HPFAR conversoin > kvm: arm64: Set a limit on the IPA size > kvm: arm64: Limit the minimum number of page table levels > kvm: arm64: Allow tuning the physical address size for VM > > Documentation/virtual/kvm/api.txt | 31 +++ > arch/arm/include/asm/kvm_arm.h | 3 +- > arch/arm/include/asm/kvm_host.h | 7 + > arch/arm/include/asm/kvm_mmu.h | 15 +- > arch/arm/include/asm/stage2_pgtable.h | 50 ++-- > arch/arm64/include/asm/cpufeature.h | 20 ++ > arch/arm64/include/asm/kvm_arm.h | 157 +++++++++--- > arch/arm64/include/asm/kvm_asm.h | 2 - > arch/arm64/include/asm/kvm_host.h | 16 +- > arch/arm64/include/asm/kvm_hyp.h | 10 + > arch/arm64/include/asm/kvm_mmu.h | 42 +++- > arch/arm64/include/asm/stage2_pgtable-nopmd.h | 42 ---- > arch/arm64/include/asm/stage2_pgtable-nopud.h | 39 --- > arch/arm64/include/asm/stage2_pgtable.h | 236 +++++++++++++----- > arch/arm64/kvm/hyp/Makefile | 1 - > arch/arm64/kvm/hyp/s2-setup.c | 90 ------- > arch/arm64/kvm/hyp/switch.c | 4 +- > arch/arm64/kvm/hyp/tlb.c | 4 +- > arch/arm64/kvm/reset.c | 103 ++++++++ > include/linux/irqchip/arm-gic-v3.h | 5 + > include/uapi/linux/kvm.h | 10 + > virt/kvm/arm/arm.c | 9 +- > virt/kvm/arm/mmu.c | 120 ++++----- > virt/kvm/arm/vgic/vgic-its.c | 36 +-- > virt/kvm/arm/vgic/vgic-kvm-device.c | 2 +- > virt/kvm/arm/vgic/vgic-mmio-v3.c | 2 - > 26 files changed, 648 insertions(+), 408 deletions(-) > delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopmd.h > delete mode 100644 arch/arm64/include/asm/stage2_pgtable-nopud.h > delete mode 100644 arch/arm64/kvm/hyp/s2-setup.c > > kvmtool changes: > > Suzuki K Poulose (4): > kvmtool: Allow backends to run checks on the KVM device fd > kvmtool: arm64: Add support for guest physical address size > kvmtool: arm64: Switch memory layout > kvmtool: arm: Add support for creating VM with PA size > > arm/aarch32/include/kvm/kvm-arch.h | 6 ++-- > arm/aarch64/include/kvm/kvm-arch.h | 15 ++++++++-- > arm/aarch64/include/kvm/kvm-config-arch.h | 5 +++- > arm/include/arm-common/kvm-arch.h | 17 ++++++++---- > arm/include/arm-common/kvm-config-arch.h | 1 + > arm/kvm.c | 34 ++++++++++++++++++++++- > include/kvm/kvm.h | 4 +++ > kvm.c | 2 ++ > 8 files changed, 71 insertions(+), 13 deletions(-) > Feel free to add Tested-by: Eric Auger <eric.auger@redhat.com> I tested this series with QEMU, using cold plugged 4GB PC-DIMM at 2TB on a Gigabyte machine. The VM is created with 43 IPA bits. I ran memtester on guest at 2TB using "memtester -p 20000000000 1G 1" and it succeeds. Thanks Eric