Message ID | 20180706135529.88966-12-frankja@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
I'd still vote for turning this into a "KVM: s390x: " patch or splitting the gmap parts off. > 8. Other capabilities. > ---------------------- > > diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h > index f5ff9dbad8ac..652cd658e242 100644 > --- a/arch/s390/include/asm/mmu.h > +++ b/arch/s390/include/asm/mmu.h > @@ -24,6 +24,8 @@ typedef struct { > unsigned int uses_skeys:1; > /* The mmu context uses CMM. */ > unsigned int uses_cmm:1; > + /* The gmap associated with this context uses huge pages. */ > + unsigned int uses_gmap_hpage:1; Didn't we discuss that something like "allow_gmap_1mb_hpage" or similar would be more appropriate? The "uses" part can easily be wrong. > #include <linux/kernel.h> > @@ -589,8 +591,8 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr) > return -EFAULT; > pmd = pmd_offset(pud, vmaddr); > VM_BUG_ON(pmd_none(*pmd)); > - /* large pmds cannot yet be handled */ > - if (pmd_large(*pmd)) > + /* Are we allowed to use huge pages? */ > + if (pmd_large(*pmd) && !gmap->mm->context.uses_gmap_hpage) > return -EFAULT; > /* Link gmap segment table entry location to page table. */ > rc = radix_tree_preload(GFP_KERNEL); > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index b6270a3b38e9..b955b986b341 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt { > #define KVM_CAP_GET_MSR_FEATURES 153 > #define KVM_CAP_HYPERV_EVENTFD 154 > #define KVM_CAP_HYPERV_TLBFLUSH 155 > +#define KVM_CAP_S390_HPAGE_1M 156 > > #ifdef KVM_CAP_IRQ_ROUTING > > I was wondering if we should add some safety net for gmap shadows when hitting a huge pmd. Or when trying to create a shadow for a gmap with huge pmds enabled (add a check in gmap_shadow()). So the GMAP kernel interface remains consistent. Apart from that, looks good to me!
On Thu, 12 Jul 2018 09:23:01 +0200 David Hildenbrand <david@redhat.com> wrote: > > > #include <linux/kernel.h> > > @@ -589,8 +591,8 @@ int __gmap_link(struct gmap *gmap, unsigned > > long gaddr, unsigned long vmaddr) return -EFAULT; > > pmd = pmd_offset(pud, vmaddr); > > VM_BUG_ON(pmd_none(*pmd)); > > - /* large pmds cannot yet be handled */ > > - if (pmd_large(*pmd)) > > + /* Are we allowed to use huge pages? */ > > + if (pmd_large(*pmd) && !gmap->mm->context.uses_gmap_hpage) > > return -EFAULT; > > /* Link gmap segment table entry location to page table. */ > > rc = radix_tree_preload(GFP_KERNEL); > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > > index b6270a3b38e9..b955b986b341 100644 > > --- a/include/uapi/linux/kvm.h > > +++ b/include/uapi/linux/kvm.h > > @@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt { > > #define KVM_CAP_GET_MSR_FEATURES 153 > > #define KVM_CAP_HYPERV_EVENTFD 154 > > #define KVM_CAP_HYPERV_TLBFLUSH 155 > > +#define KVM_CAP_S390_HPAGE_1M 156 > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > I was wondering if we should add some safety net for gmap shadows when > hitting a huge pmd. Or when trying to create a shadow for a gmap with > huge pmds enabled (add a check in gmap_shadow()). So the GMAP kernel > interface remains consistent. For now we don't have the sief2 so we don't end up in gmap_shadow() anyway. Huge l3 in small l2 is currently supported via the fake tables. So what do you want to keep consistent? > > > Apart from that, looks good to me! >
On 12.07.2018 10:09, frankja wrote: > On Thu, 12 Jul 2018 09:23:01 +0200 > David Hildenbrand <david@redhat.com> wrote: >> >>> #include <linux/kernel.h> >>> @@ -589,8 +591,8 @@ int __gmap_link(struct gmap *gmap, unsigned >>> long gaddr, unsigned long vmaddr) return -EFAULT; >>> pmd = pmd_offset(pud, vmaddr); >>> VM_BUG_ON(pmd_none(*pmd)); >>> - /* large pmds cannot yet be handled */ >>> - if (pmd_large(*pmd)) >>> + /* Are we allowed to use huge pages? */ >>> + if (pmd_large(*pmd) && !gmap->mm->context.uses_gmap_hpage) >>> return -EFAULT; >>> /* Link gmap segment table entry location to page table. */ >>> rc = radix_tree_preload(GFP_KERNEL); >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >>> index b6270a3b38e9..b955b986b341 100644 >>> --- a/include/uapi/linux/kvm.h >>> +++ b/include/uapi/linux/kvm.h >>> @@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt { >>> #define KVM_CAP_GET_MSR_FEATURES 153 >>> #define KVM_CAP_HYPERV_EVENTFD 154 >>> #define KVM_CAP_HYPERV_TLBFLUSH 155 >>> +#define KVM_CAP_S390_HPAGE_1M 156 >>> >>> #ifdef KVM_CAP_IRQ_ROUTING >>> >>> >> >> I was wondering if we should add some safety net for gmap shadows when >> hitting a huge pmd. Or when trying to create a shadow for a gmap with >> huge pmds enabled (add a check in gmap_shadow()). So the GMAP kernel >> interface remains consistent. > > For now we don't have the sief2 so we don't end up in gmap_shadow() > anyway. Huge l3 in small l2 is currently supported via the fake tables. > > So what do you want to keep consistent? gmap is just an internal kernel interface and KVM is the only user right now. The logic that gmap shadow does not work with hpage gmaps is right now buried in KVM, not in the GMAP. That's why I am asking if it makes sense to also have safety nets in the gmap shadow code. If you added the other safety net I suggested (PMD protection with SHADOW notifier bit set -> -EINVAL), we are maybe already on the safe side that if gmap_shadow() would be used. We would maybe fail in a nice fashion. But I didn't check all call paths. Prohibiting gmap_shadow() when huge pages are allowed in the GMAP right from the start would be the big hammer.
On Thu, 12 Jul 2018 10:35:00 +0200 David Hildenbrand <david@redhat.com> wrote: > On 12.07.2018 10:09, frankja wrote: > > On Thu, 12 Jul 2018 09:23:01 +0200 > > David Hildenbrand <david@redhat.com> wrote: > >> > >>> #include <linux/kernel.h> > >>> @@ -589,8 +591,8 @@ int __gmap_link(struct gmap *gmap, unsigned > >>> long gaddr, unsigned long vmaddr) return -EFAULT; > >>> pmd = pmd_offset(pud, vmaddr); > >>> VM_BUG_ON(pmd_none(*pmd)); > >>> - /* large pmds cannot yet be handled */ > >>> - if (pmd_large(*pmd)) > >>> + /* Are we allowed to use huge pages? */ > >>> + if (pmd_large(*pmd) > >>> && !gmap->mm->context.uses_gmap_hpage) return -EFAULT; > >>> /* Link gmap segment table entry location to page table. > >>> */ rc = radix_tree_preload(GFP_KERNEL); > >>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > >>> index b6270a3b38e9..b955b986b341 100644 > >>> --- a/include/uapi/linux/kvm.h > >>> +++ b/include/uapi/linux/kvm.h > >>> @@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt { > >>> #define KVM_CAP_GET_MSR_FEATURES 153 > >>> #define KVM_CAP_HYPERV_EVENTFD 154 > >>> #define KVM_CAP_HYPERV_TLBFLUSH 155 > >>> +#define KVM_CAP_S390_HPAGE_1M 156 > >>> > >>> #ifdef KVM_CAP_IRQ_ROUTING > >>> > >>> > >> > >> I was wondering if we should add some safety net for gmap shadows > >> when hitting a huge pmd. Or when trying to create a shadow for a > >> gmap with huge pmds enabled (add a check in gmap_shadow()). So the > >> GMAP kernel interface remains consistent. > > > > For now we don't have the sief2 so we don't end up in gmap_shadow() > > anyway. Huge l3 in small l2 is currently supported via the fake > > tables. > > > > So what do you want to keep consistent? > > gmap is just an internal kernel interface and KVM is the only user > right now. The logic that gmap shadow does not work with hpage gmaps > is right now buried in KVM, not in the GMAP. That's why I am asking > if it makes sense to also have safety nets in the gmap shadow code. > > If you added the other safety net I suggested (PMD protection with > SHADOW notifier bit set -> -EINVAL), we are maybe already on the safe > side that if gmap_shadow() would be used. We would maybe fail in a > nice fashion. But I didn't check all call paths. Prohibiting > gmap_shadow() when huge pages are allowed in the GMAP right from the > start would be the big hammer. > I'll add a BUG_ON(parent->mm->context.allow_gmap_hpage_1m); it's removed later anyway... For now I'll also add the notifier warning for the gmap invalidation, it should only slow us down with thp, as hpages are otherwise never touched after fault-in. And it will be useful for thp testing in a few months.
On 12.07.2018 11:04, frankja wrote: > On Thu, 12 Jul 2018 10:35:00 +0200 > David Hildenbrand <david@redhat.com> wrote: > >> On 12.07.2018 10:09, frankja wrote: >>> On Thu, 12 Jul 2018 09:23:01 +0200 >>> David Hildenbrand <david@redhat.com> wrote: >>>> >>>>> #include <linux/kernel.h> >>>>> @@ -589,8 +591,8 @@ int __gmap_link(struct gmap *gmap, unsigned >>>>> long gaddr, unsigned long vmaddr) return -EFAULT; >>>>> pmd = pmd_offset(pud, vmaddr); >>>>> VM_BUG_ON(pmd_none(*pmd)); >>>>> - /* large pmds cannot yet be handled */ >>>>> - if (pmd_large(*pmd)) >>>>> + /* Are we allowed to use huge pages? */ >>>>> + if (pmd_large(*pmd) >>>>> && !gmap->mm->context.uses_gmap_hpage) return -EFAULT; >>>>> /* Link gmap segment table entry location to page table. >>>>> */ rc = radix_tree_preload(GFP_KERNEL); >>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h >>>>> index b6270a3b38e9..b955b986b341 100644 >>>>> --- a/include/uapi/linux/kvm.h >>>>> +++ b/include/uapi/linux/kvm.h >>>>> @@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt { >>>>> #define KVM_CAP_GET_MSR_FEATURES 153 >>>>> #define KVM_CAP_HYPERV_EVENTFD 154 >>>>> #define KVM_CAP_HYPERV_TLBFLUSH 155 >>>>> +#define KVM_CAP_S390_HPAGE_1M 156 >>>>> >>>>> #ifdef KVM_CAP_IRQ_ROUTING >>>>> >>>>> >>>> >>>> I was wondering if we should add some safety net for gmap shadows >>>> when hitting a huge pmd. Or when trying to create a shadow for a >>>> gmap with huge pmds enabled (add a check in gmap_shadow()). So the >>>> GMAP kernel interface remains consistent. >>> >>> For now we don't have the sief2 so we don't end up in gmap_shadow() >>> anyway. Huge l3 in small l2 is currently supported via the fake >>> tables. >>> >>> So what do you want to keep consistent? >> >> gmap is just an internal kernel interface and KVM is the only user >> right now. The logic that gmap shadow does not work with hpage gmaps >> is right now buried in KVM, not in the GMAP. That's why I am asking >> if it makes sense to also have safety nets in the gmap shadow code. >> >> If you added the other safety net I suggested (PMD protection with >> SHADOW notifier bit set -> -EINVAL), we are maybe already on the safe >> side that if gmap_shadow() would be used. We would maybe fail in a >> nice fashion. But I didn't check all call paths. Prohibiting >> gmap_shadow() when huge pages are allowed in the GMAP right from the >> start would be the big hammer. >> > > I'll add a BUG_ON(parent->mm->context.allow_gmap_hpage_1m); it's > removed later anyway... > > For now I'll also add the notifier warning for the gmap invalidation, it > should only slow us down with thp, as hpages are otherwise never > touched after fault-in. And it will be useful for thp testing in a few > months. > Sounds good to me! Looking forward to the next version of this series :)
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index d10944e619d3..cb8db4f9d097 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -4391,6 +4391,22 @@ all such vmexits. Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits. +7.14 KVM_CAP_S390_HPAGE_1M + +Architectures: s390 +Parameters: none +Returns: 0 on success, -EINVAL if hpage module parameter was not set + or cmma is enabled + +With this capability the KVM support for memory backing with 1m pages +through hugetlbfs can be enabled for a VM. After the capability is +enabled, cmma can't be enabled anymore and pfmfi and the storage key +interpretation are disabled. If cmma has already been enabled or the +hpage module parameter is not set to 1, -EINVAL is returned. + +While it is generally possible to create a huge page backed VM without +this capability, the VM will not be able to run. + 8. Other capabilities. ---------------------- diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h index f5ff9dbad8ac..652cd658e242 100644 --- a/arch/s390/include/asm/mmu.h +++ b/arch/s390/include/asm/mmu.h @@ -24,6 +24,8 @@ typedef struct { unsigned int uses_skeys:1; /* The mmu context uses CMM. */ unsigned int uses_cmm:1; + /* The gmap associated with this context uses huge pages. */ + unsigned int uses_gmap_hpage:1; } mm_context_t; #define INIT_MM_CONTEXT(name) \ diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h index d16bc79c30bb..407165d805b9 100644 --- a/arch/s390/include/asm/mmu_context.h +++ b/arch/s390/include/asm/mmu_context.h @@ -32,6 +32,7 @@ static inline int init_new_context(struct task_struct *tsk, mm->context.has_pgste = 0; mm->context.uses_skeys = 0; mm->context.uses_cmm = 0; + mm->context.uses_gmap_hpage = 0; #endif switch (mm->context.asce_limit) { case _REGION2_SIZE: diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 6acc46cc7f7f..60014ea9d3f5 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -171,7 +171,10 @@ struct kvm_s390_tod_clock_ext { static int nested; module_param(nested, int, S_IRUGO); MODULE_PARM_DESC(nested, "Nested virtualization support"); - +/* allow 1m huge page guest backing, if !nested */ +static int hpage; +module_param(hpage, int, 0444); +MODULE_PARM_DESC(hpage, "1m huge page backing support"); /* * For now we handle at most 16 double words as this is what the s390 base @@ -475,6 +478,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_S390_AIS_MIGRATION: r = 1; break; + case KVM_CAP_S390_HPAGE_1M: + r = 0; + if (hpage) + r = 1; + break; case KVM_CAP_S390_MEM_OP: r = MEM_OP_MAX_SIZE; break; @@ -672,6 +680,27 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap) VM_EVENT(kvm, 3, "ENABLE: CAP_S390_GS %s", r ? "(not available)" : "(success)"); break; + case KVM_CAP_S390_HPAGE_1M: + mutex_lock(&kvm->lock); + if (kvm->created_vcpus) + r = -EBUSY; + else if (!hpage || kvm->arch.use_cmma) + r = -EINVAL; + else { + r = 0; + kvm->mm->context.uses_gmap_hpage = 1; + /* + * We might have to create fake 4k page + * tables. To avoid that the hardware works on + * stale PGSTEs, we emulate these instructions. + */ + kvm->arch.use_skf = 0; + kvm->arch.use_pfmfi = 0; + } + mutex_unlock(&kvm->lock); + VM_EVENT(kvm, 3, "ENABLE: CAP_S390_HPAGE %s", + r ? "(not available)" : "(success)"); + break; case KVM_CAP_S390_USER_STSI: VM_EVENT(kvm, 3, "%s", "ENABLE: CAP_S390_USER_STSI"); kvm->arch.user_stsi = 1; @@ -722,7 +751,11 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att ret = -EBUSY; VM_EVENT(kvm, 3, "%s", "ENABLE: CMMA support"); mutex_lock(&kvm->lock); - if (!kvm->created_vcpus) { + if (kvm->created_vcpus) + ret = -EBUSY; + else if (kvm->mm->context.uses_gmap_hpage) + ret = -EINVAL; + else { kvm->arch.use_cmma = 1; /* Not compatible with cmma. */ kvm->arch.use_pfmfi = 0; @@ -4087,6 +4120,11 @@ static int __init kvm_s390_init(void) return -ENODEV; } + if (nested && hpage) { + pr_info("nested (vSIE) and hpage (huge page backing) can currently not be activated concurrently"); + return -EINVAL; + } + for (i = 0; i < 16; i++) kvm_s390_fac_base[i] |= S390_lowcore.stfle_fac_list[i] & nonhyp_mask(i); diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 8992f809e3a6..03ae469fc0b6 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2,8 +2,10 @@ /* * KVM guest address space mapping code * - * Copyright IBM Corp. 2007, 2016 + * Copyright IBM Corp. 2007, 2016, 2017 * Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com> + * David Hildenbrand <david@redhat.com> + * Janosch Frank <frankja@linux.vnet.ibm.com> */ #include <linux/kernel.h> @@ -589,8 +591,8 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr) return -EFAULT; pmd = pmd_offset(pud, vmaddr); VM_BUG_ON(pmd_none(*pmd)); - /* large pmds cannot yet be handled */ - if (pmd_large(*pmd)) + /* Are we allowed to use huge pages? */ + if (pmd_large(*pmd) && !gmap->mm->context.uses_gmap_hpage) return -EFAULT; /* Link gmap segment table entry location to page table. */ rc = radix_tree_preload(GFP_KERNEL); diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index b6270a3b38e9..b955b986b341 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt { #define KVM_CAP_GET_MSR_FEATURES 153 #define KVM_CAP_HYPERV_EVENTFD 154 #define KVM_CAP_HYPERV_TLBFLUSH 155 +#define KVM_CAP_S390_HPAGE_1M 156 #ifdef KVM_CAP_IRQ_ROUTING