[v4,9/9] s390/mm: Enable gmap huge pmd support

Message ID	20180627135510.117945-10-frankja@linux.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Gateway: Authorized Use Only! Violators will be prosecuted for <kvm@vger.kernel.org> from <frankja@linux.ibm.com>; Wed, 27 Jun 2018 14:56:16 +0100 Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 27 Jun 2018 14:56:14 +0100 From: Janosch Frank <frankja@linux.ibm.com> To: kvm@vger.kernel.org Cc: david@redhat.com, borntraeger@de.ibm.com, schwidefsky@de.ibm.com, dominik.dingel@gmail.com Subject: [PATCH v4 9/9] s390/mm: Enable gmap huge pmd support Date: Wed, 27 Jun 2018 14:55:10 +0100 In-Reply-To: <20180627135510.117945-1-frankja@linux.ibm.com> References: <20180627135510.117945-1-frankja@linux.ibm.com> Message-Id: <20180627135510.117945-10-frankja@linux.ibm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk

Message ID

20180627135510.117945-10-frankja@linux.ibm.com (mailing list archive)

State

New, archived

Headers

From: Janosch Frank <frankja@linux.ibm.com>
To: kvm@vger.kernel.org
Cc: david@redhat.com, borntraeger@de.ibm.com, schwidefsky@de.ibm.com,
	dominik.dingel@gmail.com
Subject: [PATCH v4 9/9] s390/mm: Enable gmap huge pmd support
Date: Wed, 27 Jun 2018 14:55:10 +0100
In-Reply-To: <20180627135510.117945-1-frankja@linux.ibm.com>
References: <20180627135510.117945-1-frankja@linux.ibm.com>
Message-Id: <20180627135510.117945-10-frankja@linux.ibm.com>
Sender: kvm-owner@vger.kernel.org
Precedence: bulk

Commit Message

Janosch Frank June 27, 2018, 1:55 p.m. UTC

From: Janosch Frank <frankja@linux.vnet.ibm.com>

Now that we have everything in place, let's allow huge (1m) pmds for
gmap linking, effectively allowing hugetlbfs backed guests. Transparent
huge pages and 2g huge pages are *not* supported through this change.

General KVM huge page support on s390 has to be enabled via the
kvm.hpage module parameter. Either nested or hpage can be enabled, as
we currently do not support vSIE for huge backed guests.

For a guest the feature has to be enabled through the new
KVM_CAP_S390_HPAGE capability and the hpage module parameter. Enabling
it will disable cmm, pfmf and storage key interpretation.

Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
---
 Documentation/virtual/kvm/api.txt   | 16 +++++++++++++++
 arch/s390/include/asm/mmu.h         |  2 ++
 arch/s390/include/asm/mmu_context.h |  1 +
 arch/s390/kvm/kvm-s390.c            | 39 +++++++++++++++++++++++++++++++++++--
 arch/s390/mm/gmap.c                 |  8 +++++---
 include/uapi/linux/kvm.h            |  1 +
 6 files changed, 62 insertions(+), 5 deletions(-)

Comments

David Hildenbrand June 28, 2018, 7:15 a.m. UTC | #1

On 27.06.2018 15:55, Janosch Frank wrote:

Wonder if this patch should rather be "KVM: s390: ..."

> From: Janosch Frank <frankja@linux.vnet.ibm.com>
> 
> Now that we have everything in place, let's allow huge (1m) pmds for
> gmap linking, effectively allowing hugetlbfs backed guests. Transparent
> huge pages and 2g huge pages are *not* supported through this change.
> 
> General KVM huge page support on s390 has to be enabled via the
> kvm.hpage module parameter. Either nested or hpage can be enabled, as
> we currently do not support vSIE for huge backed guests.

You might want to extend a little bit on how this might be handled in
the future like "Once vSIE supports huge (1mb) pages, we can either drop
the module parameter or enable it as default".

> 
> For a guest the feature has to be enabled through the new
> KVM_CAP_S390_HPAGE capability and the hpage module parameter. Enabling
> it will disable cmm, pfmf and storage key interpretation.

You might want to add how this would complicates things and why we
disable them (no pgste on huge pages, but we might have to create fake
4k page tables for huge pages, and we have to tell the HW that these
pgste are not to be used -> disable all assists that could make use of
the PGSTE and force it into the manual handler instead). CMM in general
not compatible.

> 
> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
> ---
>  Documentation/virtual/kvm/api.txt   | 16 +++++++++++++++
>  arch/s390/include/asm/mmu.h         |  2 ++
>  arch/s390/include/asm/mmu_context.h |  1 +
>  arch/s390/kvm/kvm-s390.c            | 39 +++++++++++++++++++++++++++++++++++--
>  arch/s390/mm/gmap.c                 |  8 +++++---
>  include/uapi/linux/kvm.h            |  1 +
>  6 files changed, 62 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
> index d10944e619d3..284acacea8c4 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4391,6 +4391,22 @@ all such vmexits.
>  
>  Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
>  
> +7.14 KVM_CAP_S390_HPAGE
> +
> +Architectures: s390
> +Parameters: none
> +Returns: 0 on success, -EINVAL if hpage module parameter was not set
> +	 or cmma is enabled
> +
> +With this capability the KVM support for memory backing with 1m pages
> +through hugetlbfs can be enabled for a VM. This will disable cmma,

It will never disable CMMA, as CMMA was not enabled before.

> +pfmfi and the storage key interpretation. If cmma has already been
> +enabled or the hpage module parameter is not set to 1, -EINVAL is
> +returned.
> +
> +While it is generally possible to create and start a huge page backed
> +VM without this capability, the VM will not be functional.

will crash when trying to use a huge page?

> +
>  8. Other capabilities.
>  ----------------------
>  
> diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h
> index f5ff9dbad8ac..652cd658e242 100644
> --- a/arch/s390/include/asm/mmu.h
> +++ b/arch/s390/include/asm/mmu.h
> @@ -24,6 +24,8 @@ typedef struct {
>  	unsigned int uses_skeys:1;
>  	/* The mmu context uses CMM. */
>  	unsigned int uses_cmm:1;
> +	/* The gmap associated with this context uses huge pages. */
> +	unsigned int uses_gmap_hpage:1;

uses implies "that we've seen a huge page", right? Shouldn't this be
rather allow_gmap_hpage? (or to be more precise allow_gmap_1mb_hpage)

>  } mm_context_t;
>  
>  #define INIT_MM_CONTEXT(name)						   \
> diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
> index d16bc79c30bb..407165d805b9 100644
> --- a/arch/s390/include/asm/mmu_context.h
> +++ b/arch/s390/include/asm/mmu_context.h
> @@ -32,6 +32,7 @@ static inline int init_new_context(struct task_struct *tsk,
>  	mm->context.has_pgste = 0;
>  	mm->context.uses_skeys = 0;
>  	mm->context.uses_cmm = 0;
> +	mm->context.uses_gmap_hpage = 0;
>  #endif
>  	switch (mm->context.asce_limit) {
>  	case _REGION2_SIZE:
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index de8dec849b60..41d4bfa31be1 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -171,7 +171,10 @@ struct kvm_s390_tod_clock_ext {
>  static int nested;
>  module_param(nested, int, S_IRUGO);
>  MODULE_PARM_DESC(nested, "Nested virtualization support");
> -
> +/* allow 1m huge page guest backing, if !nested */
> +static int hpage;
> +module_param(hpage, int, 0444);

What's the latest trend, 0444 vs S_IRUGO?

> +MODULE_PARM_DESC(hpage, "1m huge page backing support");
>  
>  /*
>   * For now we handle at most 16 double words as this is what the s390 base
> @@ -475,6 +478,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_S390_AIS_MIGRATION:
>  		r = 1;
>  		break;
> +	case KVM_CAP_S390_HPAGE:
> +		r = 0;
> +		if (hpage)
> +			r = 1;
> +		break;
>  	case KVM_CAP_S390_MEM_OP:
>  		r = MEM_OP_MAX_SIZE;
>  		break;
> @@ -671,6 +679,24 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>  		VM_EVENT(kvm, 3, "ENABLE: CAP_S390_GS %s",
>  			 r ? "(not available)" : "(success)");
>  		break;
> +	case KVM_CAP_S390_HPAGE:
> +		mutex_lock(&kvm->lock);
> +		if (kvm->created_vcpus)
> +			r = -EBUSY;
> +		else if (!hpage || kvm->arch.use_cmma)

/* CMM relies on PGSTE to work, however huge pages have no PGSTE */

You should also check for "kvm->mm->context.uses_gmap_hpage" when
enabling CMMA using the capability.

> +			r = -EINVAL;
> +		else {
> +			r = 0;
> +			kvm->mm->context.uses_gmap_hpage = 1;
> +			/* They would complicate matters too much. */

Dito, please describe why this is complicated (or even impossible)

/* we might have to create fake 4k page tables and have to hinder the HW
   from accessing these ...

> +			kvm->arch.use_skf = 0;
> +			kvm->arch.use_cmma = 0;

Not necessary, you have an kvm->arch.use_cmma check above

> +			kvm->arch.use_pfmfi = 0;
> +		}
> +		mutex_unlock(&kvm->lock);
> +		VM_EVENT(kvm, 3, "ENABLE: CAP_S390_HPAGE %s",
> +			 r ? "(not available)" : "(success)");
> +		break;
>  	case KVM_CAP_S390_USER_STSI:
>  		VM_EVENT(kvm, 3, "%s", "ENABLE: CAP_S390_USER_STSI");
>  		kvm->arch.user_stsi = 1;
> @@ -721,7 +747,11 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
>  		ret = -EBUSY;
>  		VM_EVENT(kvm, 3, "%s", "ENABLE: CMMA support");
>  		mutex_lock(&kvm->lock);
> -		if (!kvm->created_vcpus) {
> +		if (kvm->created_vcpus)
> +			ret = -EBUSY;
> +		else if (kvm->mm->context.uses_gmap_hpage)
> +			ret = -EINVAL;
> +		else {
>  			kvm->arch.use_cmma = 1;
>  			/* Not compatible with cmma. */
>  			kvm->arch.use_pfmfi = 0;
> @@ -4086,6 +4116,11 @@ static int __init kvm_s390_init(void)
>  		return -ENODEV;
>  	}
>  
> +	if (nested && hpage) {
> +		pr_info("nested (vSIE) and hpage (huge page backing) can currently not be activated concurrently");
> +		return -EINVAL;
> +	}
> +
>  	for (i = 0; i < 16; i++)
>  		kvm_s390_fac_base[i] |=
>  			S390_lowcore.stfle_fac_list[i] & nonhyp_mask(i);
> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
> index c691b9d9d223..84c6bad06949 100644
> --- a/arch/s390/mm/gmap.c
> +++ b/arch/s390/mm/gmap.c
> @@ -2,8 +2,10 @@
>  /*
>   *  KVM guest address space mapping code
>   *
> - *    Copyright IBM Corp. 2007, 2016
> + *    Copyright IBM Corp. 2007, 2016, 2017
>   *    Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
> + *		 David Hildenbrand <david@redhat.com>
> + *		 Janosch Frank <frankja@linux.vnet.ibm.com>
>   */
>  
>  #include <linux/kernel.h>
> @@ -595,8 +597,8 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
>  		return -EFAULT;
>  	pmd = pmd_offset(pud, vmaddr);
>  	VM_BUG_ON(pmd_none(*pmd));
> -	/* large pmds cannot yet be handled */
> -	if (pmd_large(*pmd))
> +	/* Are we allowed to use huge pages? */
> +	if (pmd_large(*pmd) && !gmap->mm->context.uses_gmap_hpage)
>  		return -EFAULT;
>  	/* Link gmap segment table entry location to page table. */
>  	rc = radix_tree_preload(GFP_KERNEL);
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index b6270a3b38e9..ea007e900acd 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_GET_MSR_FEATURES 153
>  #define KVM_CAP_HYPERV_EVENTFD 154
>  #define KVM_CAP_HYPERV_TLBFLUSH 155
> +#define KVM_CAP_S390_HPAGE 156

Should we even change that to KVM_CAP_S390_1MB_HPAGE ?

>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
>

Janosch Frank June 28, 2018, 3:24 p.m. UTC | #2

On 28.06.2018 09:15, David Hildenbrand wrote:
> On 27.06.2018 15:55, Janosch Frank wrote:
> 
> Wonder if this patch should rather be "KVM: s390: ..."
> 
>> From: Janosch Frank <frankja@linux.vnet.ibm.com>
>>
>> Now that we have everything in place, let's allow huge (1m) pmds for
>> gmap linking, effectively allowing hugetlbfs backed guests. Transparent
>> huge pages and 2g huge pages are *not* supported through this change.
>>
>> General KVM huge page support on s390 has to be enabled via the
>> kvm.hpage module parameter. Either nested or hpage can be enabled, as
>> we currently do not support vSIE for huge backed guests.
> 
> You might want to extend a little bit on how this might be handled in
> the future like "Once vSIE supports huge (1mb) pages, we can either drop
> the module parameter or enable it as default".
> 
>>
>> For a guest the feature has to be enabled through the new
>> KVM_CAP_S390_HPAGE capability and the hpage module parameter. Enabling
>> it will disable cmm, pfmf and storage key interpretation.
> 
> You might want to add how this would complicates things and why we
> disable them (no pgste on huge pages, but we might have to create fake
> 4k page tables for huge pages, and we have to tell the HW that these
> pgste are not to be used -> disable all assists that could make use of
> the PGSTE and force it into the manual handler instead). CMM in general
> not compatible.
> 
>>
>> Signed-off-by: Janosch Frank <frankja@linux.vnet.ibm.com>
>> ---
>>  Documentation/virtual/kvm/api.txt   | 16 +++++++++++++++
>>  arch/s390/include/asm/mmu.h         |  2 ++
>>  arch/s390/include/asm/mmu_context.h |  1 +
>>  arch/s390/kvm/kvm-s390.c            | 39 +++++++++++++++++++++++++++++++++++--
>>  arch/s390/mm/gmap.c                 |  8 +++++---
>>  include/uapi/linux/kvm.h            |  1 +
>>  6 files changed, 62 insertions(+), 5 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index d10944e619d3..284acacea8c4 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -4391,6 +4391,22 @@ all such vmexits.
>>  
>>  Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
>>  
>> +7.14 KVM_CAP_S390_HPAGE
>> +
>> +Architectures: s390
>> +Parameters: none
>> +Returns: 0 on success, -EINVAL if hpage module parameter was not set
>> +	 or cmma is enabled
>> +
>> +With this capability the KVM support for memory backing with 1m pages
>> +through hugetlbfs can be enabled for a VM. This will disable cmma,
> 
> It will never disable CMMA, as CMMA was not enabled before.

"After the capability is enabled, cmma can't be enabled anymore and
pfmfi and the storage key interpretation are disabled."

> 
>> +pfmfi and the storage key interpretation. If cmma has already been
>> +enabled or the hpage module parameter is not set to 1, -EINVAL is
>> +returned.
>> +
>> +While it is generally possible to create and start a huge page backed
>> +VM without this capability, the VM will not be functional.
> 
> will crash when trying to use a huge page?

Crash implies it ran at a certain point and that it'll burn down
horribly, but QEMU will only pause it after the EFAULT if I remember it
correctly.

"While it is generally possible to create a huge page backed
VM without this capability, the VM will not be able to run."

> 
>> +
>>  8. Other capabilities.
>>  ----------------------
>>  
>> diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h
>> index f5ff9dbad8ac..652cd658e242 100644
>> --- a/arch/s390/include/asm/mmu.h
>> +++ b/arch/s390/include/asm/mmu.h
>> @@ -24,6 +24,8 @@ typedef struct {
>>  	unsigned int uses_skeys:1;
>>  	/* The mmu context uses CMM. */
>>  	unsigned int uses_cmm:1;
>> +	/* The gmap associated with this context uses huge pages. */
>> +	unsigned int uses_gmap_hpage:1;
> 
> uses implies "that we've seen a huge page", right? Shouldn't this be
> rather allow_gmap_hpage? (or to be more precise allow_gmap_1mb_hpage)

allow_gmap_hpage_1m


We might want to prefix all of them with kvm_ in the future, so it's
clear they belong to kvm.

> 
>>  } mm_context_t;
>>  
>>  #define INIT_MM_CONTEXT(name)						   \
>> diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
>> index d16bc79c30bb..407165d805b9 100644
>> --- a/arch/s390/include/asm/mmu_context.h
>> +++ b/arch/s390/include/asm/mmu_context.h
>> @@ -32,6 +32,7 @@ static inline int init_new_context(struct task_struct *tsk,
>>  	mm->context.has_pgste = 0;
>>  	mm->context.uses_skeys = 0;
>>  	mm->context.uses_cmm = 0;
>> +	mm->context.uses_gmap_hpage = 0;
>>  #endif
>>  	switch (mm->context.asce_limit) {
>>  	case _REGION2_SIZE:
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index de8dec849b60..41d4bfa31be1 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -171,7 +171,10 @@ struct kvm_s390_tod_clock_ext {
>>  static int nested;
>>  module_param(nested, int, S_IRUGO);
>>  MODULE_PARM_DESC(nested, "Nested virtualization support");
>> -
>> +/* allow 1m huge page guest backing, if !nested */
>> +static int hpage;
>> +module_param(hpage, int, 0444);
> 
> What's the latest trend, 0444 vs S_IRUGO?

checkpatch wants octals

> 
>> +MODULE_PARM_DESC(hpage, "1m huge page backing support");
>>  
>>  /*
>>   * For now we handle at most 16 double words as this is what the s390 base
>> @@ -475,6 +478,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>  	case KVM_CAP_S390_AIS_MIGRATION:
>>  		r = 1;
>>  		break;
>> +	case KVM_CAP_S390_HPAGE:
>> +		r = 0;
>> +		if (hpage)
>> +			r = 1;
>> +		break;
>>  	case KVM_CAP_S390_MEM_OP:
>>  		r = MEM_OP_MAX_SIZE;
>>  		break;
>> @@ -671,6 +679,24 @@ static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
>>  		VM_EVENT(kvm, 3, "ENABLE: CAP_S390_GS %s",
>>  			 r ? "(not available)" : "(success)");
>>  		break;
>> +	case KVM_CAP_S390_HPAGE:
>> +		mutex_lock(&kvm->lock);
>> +		if (kvm->created_vcpus)
>> +			r = -EBUSY;
>> +		else if (!hpage || kvm->arch.use_cmma)
> 
> /* CMM relies on PGSTE to work, however huge pages have no PGSTE */
> 
> You should also check for "kvm->mm->context.uses_gmap_hpage" when
> enabling CMMA using the capability.

You mean the mem ctrl? I already do that, or do I miss something here?

-		if (!kvm->created_vcpus) {
+		if (kvm->created_vcpus)
+			ret = -EBUSY;
+		else if (kvm->mm->context.uses_gmap_hpage)
+			ret = -EINVAL;
+		else {
 			kvm->arch.use_cmma = 1;

> 
>> +			r = -EINVAL;
>> +		else {
>> +			r = 0;
>> +			kvm->mm->context.uses_gmap_hpage = 1;
>> +			/* They would complicate matters too much. */
> 
> Dito, please describe why this is complicated (or even impossible)
> 
> /* we might have to create fake 4k page tables and have to hinder the HW
>    from accessing these ...

                        /*



                         * We might have to create fake 4k page



                         * tables. To avoid that the hardware works on



                         * them we emulate these instructions.



                         */

> 
>> +			kvm->arch.use_skf = 0;
>> +			kvm->arch.use_cmma = 0;
> 
> Not necessary, you have an kvm->arch.use_cmma check above
> 
>> +			kvm->arch.use_pfmfi = 0;
>> +		}
>> +		mutex_unlock(&kvm->lock);
>> +		VM_EVENT(kvm, 3, "ENABLE: CAP_S390_HPAGE %s",
>> +			 r ? "(not available)" : "(success)");
>> +		break;
>>  	case KVM_CAP_S390_USER_STSI:
>>  		VM_EVENT(kvm, 3, "%s", "ENABLE: CAP_S390_USER_STSI");
>>  		kvm->arch.user_stsi = 1;
>> @@ -721,7 +747,11 @@ static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
>>  		ret = -EBUSY;
>>  		VM_EVENT(kvm, 3, "%s", "ENABLE: CMMA support");
>>  		mutex_lock(&kvm->lock);
>> -		if (!kvm->created_vcpus) {
>> +		if (kvm->created_vcpus)
>> +			ret = -EBUSY;
>> +		else if (kvm->mm->context.uses_gmap_hpage)
>> +			ret = -EINVAL;
>> +		else {
>>  			kvm->arch.use_cmma = 1;
>>  			/* Not compatible with cmma. */
>>  			kvm->arch.use_pfmfi = 0;
>> @@ -4086,6 +4116,11 @@ static int __init kvm_s390_init(void)
>>  		return -ENODEV;
>>  	}
>>  
>> +	if (nested && hpage) {
>> +		pr_info("nested (vSIE) and hpage (huge page backing) can currently not be activated concurrently");
>> +		return -EINVAL;
>> +	}
>> +
>>  	for (i = 0; i < 16; i++)
>>  		kvm_s390_fac_base[i] |=
>>  			S390_lowcore.stfle_fac_list[i] & nonhyp_mask(i);
>> diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
>> index c691b9d9d223..84c6bad06949 100644
>> --- a/arch/s390/mm/gmap.c
>> +++ b/arch/s390/mm/gmap.c
>> @@ -2,8 +2,10 @@
>>  /*
>>   *  KVM guest address space mapping code
>>   *
>> - *    Copyright IBM Corp. 2007, 2016
>> + *    Copyright IBM Corp. 2007, 2016, 2017
>>   *    Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
>> + *		 David Hildenbrand <david@redhat.com>
>> + *		 Janosch Frank <frankja@linux.vnet.ibm.com>
>>   */
>>  
>>  #include <linux/kernel.h>
>> @@ -595,8 +597,8 @@ int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
>>  		return -EFAULT;
>>  	pmd = pmd_offset(pud, vmaddr);
>>  	VM_BUG_ON(pmd_none(*pmd));
>> -	/* large pmds cannot yet be handled */
>> -	if (pmd_large(*pmd))
>> +	/* Are we allowed to use huge pages? */
>> +	if (pmd_large(*pmd) && !gmap->mm->context.uses_gmap_hpage)
>>  		return -EFAULT;
>>  	/* Link gmap segment table entry location to page table. */
>>  	rc = radix_tree_preload(GFP_KERNEL);
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index b6270a3b38e9..ea007e900acd 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -949,6 +949,7 @@ struct kvm_ppc_resize_hpt {
>>  #define KVM_CAP_GET_MSR_FEATURES 153
>>  #define KVM_CAP_HYPERV_EVENTFD 154
>>  #define KVM_CAP_HYPERV_TLBFLUSH 155
>> +#define KVM_CAP_S390_HPAGE 156
> 
> Should we even change that to KVM_CAP_S390_1MB_HPAGE ?

I could live with KVM_CAP_S390_HPAGE_1M

> 
>>  
>>  #ifdef KVM_CAP_IRQ_ROUTING
>>  
>>
> 
>

David Hildenbrand June 28, 2018, 3:33 p.m. UTC | #3

>> /* CMM relies on PGSTE to work, however huge pages have no PGSTE */
>>
>> You should also check for "kvm->mm->context.uses_gmap_hpage" when
>> enabling CMMA using the capability.
> 
> You mean the mem ctrl? I already do that, or do I miss something here?

I missed it, sorry for that.

> 
> -		if (!kvm->created_vcpus) {
> +		if (kvm->created_vcpus)
> +			ret = -EBUSY;
> +		else if (kvm->mm->context.uses_gmap_hpage)
> +			ret = -EINVAL;
> +		else {
>  			kvm->arch.use_cmma = 1;
> 
>>
>>> +			r = -EINVAL;
>>> +		else {
>>> +			r = 0;
>>> +			kvm->mm->context.uses_gmap_hpage = 1;
>>> +			/* They would complicate matters too much. */
>>
>> Dito, please describe why this is complicated (or even impossible)
>>
>> /* we might have to create fake 4k page tables and have to hinder the HW
>>    from accessing these ...
> 
>                         /*
> 
> 
> 
>                          * We might have to create fake 4k page
> 
> 
> 
>                          * tables. To avoid that the hardware works on
> 
> 
> 
>                          * them we emulate these instructions.
> 
> 
> 
>                          */
> 

... hardware works on the stale PGSTE ...

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index d10944e619d3..284acacea8c4 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -4391,6 +4391,22 @@  all such vmexits.
 
 Do not enable KVM_FEATURE_PV_UNHALT if you disable HLT exits.
 
+7.14 KVM_CAP_S390_HPAGE
+
+Architectures: s390
+Parameters: none
+Returns: 0 on success, -EINVAL if hpage module parameter was not set
+	 or cmma is enabled
+
+With this capability the KVM support for memory backing with 1m pages
+through hugetlbfs can be enabled for a VM. This will disable cmma,
+pfmfi and the storage key interpretation. If cmma has already been
+enabled or the hpage module parameter is not set to 1, -EINVAL is
+returned.
+
+While it is generally possible to create and start a huge page backed
+VM without this capability, the VM will not be functional.
+
 8. Other capabilities.
 ----------------------
 
diff --git a/arch/s390/include/asm/mmu.h b/arch/s390/include/asm/mmu.h
index f5ff9dbad8ac..652cd658e242 100644
--- a/arch/s390/include/asm/mmu.h
+++ b/arch/s390/include/asm/mmu.h
@@ -24,6 +24,8 @@  typedef struct {
 	unsigned int uses_skeys:1;
 	/* The mmu context uses CMM. */
 	unsigned int uses_cmm:1;
+	/* The gmap associated with this context uses huge pages. */
+	unsigned int uses_gmap_hpage:1;
 } mm_context_t;
 
 #define INIT_MM_CONTEXT(name)						   \
diff --git a/arch/s390/include/asm/mmu_context.h b/arch/s390/include/asm/mmu_context.h
index d16bc79c30bb..407165d805b9 100644
--- a/arch/s390/include/asm/mmu_context.h
+++ b/arch/s390/include/asm/mmu_context.h
@@ -32,6 +32,7 @@  static inline int init_new_context(struct task_struct *tsk,
 	mm->context.has_pgste = 0;
 	mm->context.uses_skeys = 0;
 	mm->context.uses_cmm = 0;
+	mm->context.uses_gmap_hpage = 0;
 #endif
 	switch (mm->context.asce_limit) {
 	case _REGION2_SIZE:
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index de8dec849b60..41d4bfa31be1 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -171,7 +171,10 @@  struct kvm_s390_tod_clock_ext {
 static int nested;
 module_param(nested, int, S_IRUGO);
 MODULE_PARM_DESC(nested, "Nested virtualization support");
-
+/* allow 1m huge page guest backing, if !nested */
+static int hpage;
+module_param(hpage, int, 0444);
+MODULE_PARM_DESC(hpage, "1m huge page backing support");
 
 /*
  * For now we handle at most 16 double words as this is what the s390 base
@@ -475,6 +478,11 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_S390_AIS_MIGRATION:
 		r = 1;
 		break;
+	case KVM_CAP_S390_HPAGE:
+		r = 0;
+		if (hpage)
+			r = 1;
+		break;
 	case KVM_CAP_S390_MEM_OP:
 		r = MEM_OP_MAX_SIZE;
 		break;
@@ -671,6 +679,24 @@  static int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap)
 		VM_EVENT(kvm, 3, "ENABLE: CAP_S390_GS %s",
 			 r ? "(not available)" : "(success)");
 		break;
+	case KVM_CAP_S390_HPAGE:
+		mutex_lock(&kvm->lock);
+		if (kvm->created_vcpus)
+			r = -EBUSY;
+		else if (!hpage || kvm->arch.use_cmma)
+			r = -EINVAL;
+		else {
+			r = 0;
+			kvm->mm->context.uses_gmap_hpage = 1;
+			/* They would complicate matters too much. */
+			kvm->arch.use_skf = 0;
+			kvm->arch.use_cmma = 0;
+			kvm->arch.use_pfmfi = 0;
+		}
+		mutex_unlock(&kvm->lock);
+		VM_EVENT(kvm, 3, "ENABLE: CAP_S390_HPAGE %s",
+			 r ? "(not available)" : "(success)");
+		break;
 	case KVM_CAP_S390_USER_STSI:
 		VM_EVENT(kvm, 3, "%s", "ENABLE: CAP_S390_USER_STSI");
 		kvm->arch.user_stsi = 1;
@@ -721,7 +747,11 @@  static int kvm_s390_set_mem_control(struct kvm *kvm, struct kvm_device_attr *att
 		ret = -EBUSY;
 		VM_EVENT(kvm, 3, "%s", "ENABLE: CMMA support");
 		mutex_lock(&kvm->lock);
-		if (!kvm->created_vcpus) {
+		if (kvm->created_vcpus)
+			ret = -EBUSY;
+		else if (kvm->mm->context.uses_gmap_hpage)
+			ret = -EINVAL;
+		else {
 			kvm->arch.use_cmma = 1;
 			/* Not compatible with cmma. */
 			kvm->arch.use_pfmfi = 0;
@@ -4086,6 +4116,11 @@  static int __init kvm_s390_init(void)
 		return -ENODEV;
 	}
 
+	if (nested && hpage) {
+		pr_info("nested (vSIE) and hpage (huge page backing) can currently not be activated concurrently");
+		return -EINVAL;
+	}
+
 	for (i = 0; i < 16; i++)
 		kvm_s390_fac_base[i] |=
 			S390_lowcore.stfle_fac_list[i] & nonhyp_mask(i);
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c
index c691b9d9d223..84c6bad06949 100644
--- a/arch/s390/mm/gmap.c
+++ b/arch/s390/mm/gmap.c
@@ -2,8 +2,10 @@ 
 /*
  *  KVM guest address space mapping code
  *
- *    Copyright IBM Corp. 2007, 2016
+ *    Copyright IBM Corp. 2007, 2016, 2017
  *    Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
+ *		 David Hildenbrand <david@redhat.com>
+ *		 Janosch Frank <frankja@linux.vnet.ibm.com>
  */
 
 #include <linux/kernel.h>
@@ -595,8 +597,8 @@  int __gmap_link(struct gmap *gmap, unsigned long gaddr, unsigned long vmaddr)
 		return -EFAULT;
 	pmd = pmd_offset(pud, vmaddr);
 	VM_BUG_ON(pmd_none(*pmd));
-	/* large pmds cannot yet be handled */
-	if (pmd_large(*pmd))
+	/* Are we allowed to use huge pages? */
+	if (pmd_large(*pmd) && !gmap->mm->context.uses_gmap_hpage)
 		return -EFAULT;
 	/* Link gmap segment table entry location to page table. */
 	rc = radix_tree_preload(GFP_KERNEL);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index b6270a3b38e9..ea007e900acd 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -949,6 +949,7 @@  struct kvm_ppc_resize_hpt {
 #define KVM_CAP_GET_MSR_FEATURES 153
 #define KVM_CAP_HYPERV_EVENTFD 154
 #define KVM_CAP_HYPERV_TLBFLUSH 155
+#define KVM_CAP_S390_HPAGE 156
 
 #ifdef KVM_CAP_IRQ_ROUTING

[v4,9/9] s390/mm: Enable gmap huge pmd support

Commit Message

Comments

Patch