[RFC,v2,07/13] KVM: Handle page fault for fd based memslot

Message ID	20211119134739.20218-8-chao.p.peng@linux.intel.com (mailing list archive)
State	New
Headers	show Return-Path: <SRS0=ywda=QG=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5959660FE3 From: Chao Peng <chao.p.peng@linux.intel.com> To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, qemu-devel@nongnu.org Cc: Paolo Bonzini <pbonzini@redhat.com>, Jonathan Corbet <corbet@lwn.net>, Sean Christopherson <seanjc@google.com>, Vitaly Kuznetsov <vkuznets@redhat.com>, Wanpeng Li <wanpengli@tencent.com>, Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>, Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>, "J . Bruce Fields" <bfields@fieldses.org>, Andrew Morton <akpm@linux-foundation.org>, Yu Zhang <yu.c.zhang@linux.intel.com>, Chao Peng <chao.p.peng@linux.intel.com>, "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>, luto@kernel.org, john.ji@intel.com, susie.li@intel.com, jun.nakajima@intel.com, dave.hansen@intel.com, ak@linux.intel.com, david@redhat.com Subject: [RFC v2 PATCH 07/13] KVM: Handle page fault for fd based memslot Date: Fri, 19 Nov 2021 21:47:33 +0800 Message-Id: <20211119134739.20218-8-chao.p.peng@linux.intel.com> In-Reply-To: <20211119134739.20218-1-chao.p.peng@linux.intel.com> References: <20211119134739.20218-1-chao.p.peng@linux.intel.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	KVM: mm: fd-based approach for supporting KVM guest private memory \| expand [RFC,v2,00/13] KVM: mm: fd-based approach for supporting KVM guest private memory [RFC,v2,01/13] mm/shmem: Introduce F_SEAL_GUEST [RFC,v2,02/13] KVM: Add KVM_EXIT_MEMORY_ERROR exit [RFC,v2,03/13] KVM: Extend kvm_userspace_memory_region to support fd based memslot [RFC,v2,04/13] KVM: Add fd-based memslot data structure and utils [RFC,v2,05/13] KVM: Implement fd-based memory using new memfd interfaces [RFC,v2,06/13] KVM: Register/unregister memfd backed memslot [RFC,v2,07/13] KVM: Handle page fault for fd based memslot [RFC,v2,08/13] KVM: Rename hva memory invalidation code to cover fd-based offset [RFC,v2,09/13] KVM: Introduce kvm_memfd_invalidate_range [RFC,v2,10/13] KVM: Match inode for invalidation of fd-based slot [RFC,v2,11/13] KVM: Add kvm_map_gfn_range [RFC,v2,12/13] KVM: Introduce kvm_memfd_fallocate_range [RFC,v2,13/13] KVM: Enable memfd based page invalidation/fallocate

Message ID

20211119134739.20218-8-chao.p.peng@linux.intel.com (mailing list archive)

State

New

Headers

DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 5959660FE3
From: Chao Peng <chao.p.peng@linux.intel.com>
To: kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org,
	qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Sean Christopherson <seanjc@google.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	x86@kernel.org,
	"H . Peter Anvin" <hpa@zytor.com>,
	Hugh Dickins <hughd@google.com>,
	Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	Chao Peng <chao.p.peng@linux.intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	luto@kernel.org,
	john.ji@intel.com,
	susie.li@intel.com,
	jun.nakajima@intel.com,
	dave.hansen@intel.com,
	ak@linux.intel.com,
	david@redhat.com
Subject: [RFC v2 PATCH 07/13] KVM: Handle page fault for fd based memslot
Date: Fri, 19 Nov 2021 21:47:33 +0800
Message-Id: <20211119134739.20218-8-chao.p.peng@linux.intel.com>
In-Reply-To: <20211119134739.20218-1-chao.p.peng@linux.intel.com>
References: <20211119134739.20218-1-chao.p.peng@linux.intel.com>
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

KVM: mm: fd-based approach for supporting KVM guest private memory | expand

Commit Message

Chao Peng Nov. 19, 2021, 1:47 p.m. UTC

Current code assume the private memory is persistent and KVM can check
with backing store to see if private memory exists at the same address
by calling get_pfn(alloc=false).

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
---
 arch/x86/kvm/mmu/mmu.c | 75 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 73 insertions(+), 2 deletions(-)

Comments

Yao Yuan Nov. 20, 2021, 1:55 a.m. UTC | #1

On Fri, Nov 19, 2021 at 09:47:33PM +0800, Chao Peng wrote:
> Current code assume the private memory is persistent and KVM can check
> with backing store to see if private memory exists at the same address
> by calling get_pfn(alloc=false).
>
> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 75 ++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 73 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 40377901598b..cd5d1f923694 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -3277,6 +3277,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
>  	if (max_level == PG_LEVEL_4K)
>  		return PG_LEVEL_4K;
>
> +	if (memslot_is_memfd(slot))
> +		return max_level;
> +
>  	host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot);
>  	return min(host_level, max_level);
>  }
> @@ -4555,6 +4558,65 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>  				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
>  }
>
> +static bool kvm_faultin_pfn_memfd(struct kvm_vcpu *vcpu,
> +				  struct kvm_page_fault *fault, int *r)
> +{	int order;
> +	kvm_pfn_t pfn;
> +	struct kvm_memory_slot *slot = fault->slot;
> +	bool priv_gfn = kvm_vcpu_is_private_gfn(vcpu, fault->addr >> PAGE_SHIFT);
> +	bool priv_slot_exists = memslot_has_private(slot);
> +	bool priv_gfn_exists = false;
> +	int mem_convert_type;
> +
> +	if (priv_gfn && !priv_slot_exists) {
> +		*r = RET_PF_INVALID;
> +		return true;
> +	}
> +
> +	if (priv_slot_exists) {
> +		pfn = slot->memfd_ops->get_pfn(slot, slot->priv_file,
> +					       fault->gfn, false, &order);
> +		if (pfn >= 0)
> +			priv_gfn_exists = true;

Need "fault->pfn = pfn" here if actual pfn is returned in
get_pfn(alloc=false) case for private page case.

> +	}
> +
> +	if (priv_gfn && !priv_gfn_exists) {
> +		mem_convert_type = KVM_EXIT_MEM_MAP_PRIVATE;
> +		goto out_convert;
> +	}
> +
> +	if (!priv_gfn && priv_gfn_exists) {
> +		slot->memfd_ops->put_pfn(pfn);
> +		mem_convert_type = KVM_EXIT_MEM_MAP_SHARED;
> +		goto out_convert;
> +	}
> +
> +	if (!priv_gfn) {
> +		pfn = slot->memfd_ops->get_pfn(slot, slot->file,
> +					       fault->gfn, true, &order);

Need "fault->pfn = pfn" here, because he pfn for
share page is getted here only.

> +		if (fault->pfn < 0) {
> +			*r = RET_PF_INVALID;
> +			return true;
> +		}
> +	}
> +
> +	if (slot->flags & KVM_MEM_READONLY)
> +		fault->map_writable = false;
> +	if (order == 0)
> +		fault->max_level = PG_LEVEL_4K;
> +
> +	return false;
> +
> +out_convert:
> +	vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR;
> +	vcpu->run->mem.type = mem_convert_type;
> +	vcpu->run->mem.u.map.gpa = fault->gfn << PAGE_SHIFT;
> +	vcpu->run->mem.u.map.size = PAGE_SIZE;
> +	fault->pfn = -1;
> +	*r = -1;
> +	return true;
> +}
> +
>  static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r)
>  {
>  	struct kvm_memory_slot *slot = fault->slot;
> @@ -4596,6 +4658,9 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
>  		}
>  	}
>
> +	if (memslot_is_memfd(slot))
> +		return kvm_faultin_pfn_memfd(vcpu, fault, r);
> +
>  	async = false;
>  	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
>  					  fault->write, &fault->map_writable,
> @@ -4660,7 +4725,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  	else
>  		write_lock(&vcpu->kvm->mmu_lock);
>
> -	if (fault->slot && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
> +	if (fault->slot && !memslot_is_memfd(fault->slot) &&
> +			mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
>  		goto out_unlock;
>  	r = make_mmu_pages_available(vcpu);
>  	if (r)
> @@ -4676,7 +4742,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
>  		read_unlock(&vcpu->kvm->mmu_lock);
>  	else
>  		write_unlock(&vcpu->kvm->mmu_lock);
> -	kvm_release_pfn_clean(fault->pfn);
> +
> +	if (memslot_is_memfd(fault->slot))
> +		fault->slot->memfd_ops->put_pfn(fault->pfn);
> +	else
> +		kvm_release_pfn_clean(fault->pfn);
> +
>  	return r;
>  }
>
> --
> 2.17.1
>

Chao Peng Nov. 22, 2021, 9:18 a.m. UTC | #2

On Sat, Nov 20, 2021 at 09:55:29AM +0800, Yao Yuan wrote:
> On Fri, Nov 19, 2021 at 09:47:33PM +0800, Chao Peng wrote:
> > Current code assume the private memory is persistent and KVM can check
> > with backing store to see if private memory exists at the same address
> > by calling get_pfn(alloc=false).
> >
> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
> > Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
> > ---
> >  arch/x86/kvm/mmu/mmu.c | 75 ++++++++++++++++++++++++++++++++++++++++--
> >  1 file changed, 73 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > index 40377901598b..cd5d1f923694 100644
> > --- a/arch/x86/kvm/mmu/mmu.c
> > +++ b/arch/x86/kvm/mmu/mmu.c
> > @@ -3277,6 +3277,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
> >  	if (max_level == PG_LEVEL_4K)
> >  		return PG_LEVEL_4K;
> >
> > +	if (memslot_is_memfd(slot))
> > +		return max_level;
> > +
> >  	host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot);
> >  	return min(host_level, max_level);
> >  }
> > @@ -4555,6 +4558,65 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> >  				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
> >  }
> >
> > +static bool kvm_faultin_pfn_memfd(struct kvm_vcpu *vcpu,
> > +				  struct kvm_page_fault *fault, int *r)
> > +{	int order;
> > +	kvm_pfn_t pfn;
> > +	struct kvm_memory_slot *slot = fault->slot;
> > +	bool priv_gfn = kvm_vcpu_is_private_gfn(vcpu, fault->addr >> PAGE_SHIFT);
> > +	bool priv_slot_exists = memslot_has_private(slot);
> > +	bool priv_gfn_exists = false;
> > +	int mem_convert_type;
> > +
> > +	if (priv_gfn && !priv_slot_exists) {
> > +		*r = RET_PF_INVALID;
> > +		return true;
> > +	}
> > +
> > +	if (priv_slot_exists) {
> > +		pfn = slot->memfd_ops->get_pfn(slot, slot->priv_file,
> > +					       fault->gfn, false, &order);
> > +		if (pfn >= 0)
> > +			priv_gfn_exists = true;
> 
> Need "fault->pfn = pfn" here if actual pfn is returned in
> get_pfn(alloc=false) case for private page case.
> 
> > +	}
> > +
> > +	if (priv_gfn && !priv_gfn_exists) {
> > +		mem_convert_type = KVM_EXIT_MEM_MAP_PRIVATE;
> > +		goto out_convert;
> > +	}
> > +
> > +	if (!priv_gfn && priv_gfn_exists) {
> > +		slot->memfd_ops->put_pfn(pfn);
> > +		mem_convert_type = KVM_EXIT_MEM_MAP_SHARED;
> > +		goto out_convert;
> > +	}
> > +
> > +	if (!priv_gfn) {
> > +		pfn = slot->memfd_ops->get_pfn(slot, slot->file,
> > +					       fault->gfn, true, &order);
> 
> Need "fault->pfn = pfn" here, because he pfn for
> share page is getted here only.
> 
> > +		if (fault->pfn < 0) {
> > +			*r = RET_PF_INVALID;
> > +			return true;
> > +		}
> > +	}

Right, I actually have "fault->pfn = pfn" here but accidentally deleted
in a code factoring.

Chao
> > +
> > +	if (slot->flags & KVM_MEM_READONLY)
> > +		fault->map_writable = false;
> > +	if (order == 0)
> > +		fault->max_level = PG_LEVEL_4K;
> > +
> > +	return false;
> > +
> > +out_convert:
> > +	vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR;
> > +	vcpu->run->mem.type = mem_convert_type;
> > +	vcpu->run->mem.u.map.gpa = fault->gfn << PAGE_SHIFT;
> > +	vcpu->run->mem.u.map.size = PAGE_SIZE;
> > +	fault->pfn = -1;
> > +	*r = -1;
> > +	return true;
> > +}
> > +
> >  static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r)
> >  {
> >  	struct kvm_memory_slot *slot = fault->slot;
> > @@ -4596,6 +4658,9 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
> >  		}
> >  	}
> >
> > +	if (memslot_is_memfd(slot))
> > +		return kvm_faultin_pfn_memfd(vcpu, fault, r);
> > +
> >  	async = false;
> >  	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
> >  					  fault->write, &fault->map_writable,
> > @@ -4660,7 +4725,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
> >  	else
> >  		write_lock(&vcpu->kvm->mmu_lock);
> >
> > -	if (fault->slot && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
> > +	if (fault->slot && !memslot_is_memfd(fault->slot) &&
> > +			mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
> >  		goto out_unlock;
> >  	r = make_mmu_pages_available(vcpu);
> >  	if (r)
> > @@ -4676,7 +4742,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
> >  		read_unlock(&vcpu->kvm->mmu_lock);
> >  	else
> >  		write_unlock(&vcpu->kvm->mmu_lock);
> > -	kvm_release_pfn_clean(fault->pfn);
> > +
> > +	if (memslot_is_memfd(fault->slot))
> > +		fault->slot->memfd_ops->put_pfn(fault->pfn);
> > +	else
> > +		kvm_release_pfn_clean(fault->pfn);
> > +
> >  	return r;
> >  }
> >
> > --
> > 2.17.1
> >

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 40377901598b..cd5d1f923694 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3277,6 +3277,9 @@  int kvm_mmu_max_mapping_level(struct kvm *kvm,
 	if (max_level == PG_LEVEL_4K)
 		return PG_LEVEL_4K;
 
+	if (memslot_is_memfd(slot))
+		return max_level;
+
 	host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot);
 	return min(host_level, max_level);
 }
@@ -4555,6 +4558,65 @@  static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 				  kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch);
 }
 
+static bool kvm_faultin_pfn_memfd(struct kvm_vcpu *vcpu,
+				  struct kvm_page_fault *fault, int *r)
+{	int order;
+	kvm_pfn_t pfn;
+	struct kvm_memory_slot *slot = fault->slot;
+	bool priv_gfn = kvm_vcpu_is_private_gfn(vcpu, fault->addr >> PAGE_SHIFT);
+	bool priv_slot_exists = memslot_has_private(slot);
+	bool priv_gfn_exists = false;
+	int mem_convert_type;
+
+	if (priv_gfn && !priv_slot_exists) {
+		*r = RET_PF_INVALID;
+		return true;
+	}
+
+	if (priv_slot_exists) {
+		pfn = slot->memfd_ops->get_pfn(slot, slot->priv_file,
+					       fault->gfn, false, &order);
+		if (pfn >= 0)
+			priv_gfn_exists = true;
+	}
+
+	if (priv_gfn && !priv_gfn_exists) {
+		mem_convert_type = KVM_EXIT_MEM_MAP_PRIVATE;
+		goto out_convert;
+	}
+
+	if (!priv_gfn && priv_gfn_exists) {
+		slot->memfd_ops->put_pfn(pfn);
+		mem_convert_type = KVM_EXIT_MEM_MAP_SHARED;
+		goto out_convert;
+	}
+
+	if (!priv_gfn) {
+		pfn = slot->memfd_ops->get_pfn(slot, slot->file,
+					       fault->gfn, true, &order);
+		if (fault->pfn < 0) {
+			*r = RET_PF_INVALID;
+			return true;
+		}
+	}
+
+	if (slot->flags & KVM_MEM_READONLY)
+		fault->map_writable = false;
+	if (order == 0)
+		fault->max_level = PG_LEVEL_4K;
+
+	return false;
+
+out_convert:
+	vcpu->run->exit_reason = KVM_EXIT_MEMORY_ERROR;
+	vcpu->run->mem.type = mem_convert_type;
+	vcpu->run->mem.u.map.gpa = fault->gfn << PAGE_SHIFT;
+	vcpu->run->mem.u.map.size = PAGE_SIZE;
+	fault->pfn = -1;
+	*r = -1;
+	return true;
+}
+
 static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r)
 {
 	struct kvm_memory_slot *slot = fault->slot;
@@ -4596,6 +4658,9 @@  static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault,
 		}
 	}
 
+	if (memslot_is_memfd(slot))
+		return kvm_faultin_pfn_memfd(vcpu, fault, r);
+
 	async = false;
 	fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async,
 					  fault->write, &fault->map_writable,
@@ -4660,7 +4725,8 @@  static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 	else
 		write_lock(&vcpu->kvm->mmu_lock);
 
-	if (fault->slot && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
+	if (fault->slot && !memslot_is_memfd(fault->slot) &&
+			mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva))
 		goto out_unlock;
 	r = make_mmu_pages_available(vcpu);
 	if (r)
@@ -4676,7 +4742,12 @@  static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault
 		read_unlock(&vcpu->kvm->mmu_lock);
 	else
 		write_unlock(&vcpu->kvm->mmu_lock);
-	kvm_release_pfn_clean(fault->pfn);
+
+	if (memslot_is_memfd(fault->slot))
+		fault->slot->memfd_ops->put_pfn(fault->pfn);
+	else
+		kvm_release_pfn_clean(fault->pfn);
+
 	return r;
 }

[RFC,v2,07/13] KVM: Handle page fault for fd based memslot

Commit Message

Comments

Patch