diff mbox series

[v2,05/27] drm/i915/gvt: Verify VFIO-pinned page is THP when shadowing 2M gtt entry

Message ID 20230311002258.852397-6-seanjc@google.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/gvt: KVM: KVMGT fixes and page-track cleanups | expand

Commit Message

Sean Christopherson March 11, 2023, 12:22 a.m. UTC
When shadowing a GTT entry with a 2M page, explicitly verify that the
first page pinned by VFIO is a transparent hugepage instead of assuming
that page observed by is_2MB_gtt_possible() is the same page pinned by
vfio_pin_pages().  E.g. if userspace is doing something funky with the
guest's memslots, or if the page is demoted between is_2MB_gtt_possible()
and vfio_pin_pages().

This is more of a performance optimization than a bug fix as the check
for contiguous struct pages should guard against incorrect mapping (even
though assuming struct pages are virtually contiguous is wrong).

The real motivation for explicitly checking for a transparent hugepage
after pinning is that it will reduce the risk of introducing a bug in a
future fix for a page refcount leak (KVMGT doesn't put the reference
acquired by gfn_to_pfn()), and eventually will allow KVMGT to stop using
KVM's gfn_to_pfn() altogether.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 drivers/gpu/drm/i915/gvt/kvmgt.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Yan Zhao March 17, 2023, 5:33 a.m. UTC | #1
On Fri, Mar 10, 2023 at 04:22:36PM -0800, Sean Christopherson wrote:
> When shadowing a GTT entry with a 2M page, explicitly verify that the
> first page pinned by VFIO is a transparent hugepage instead of assuming
> that page observed by is_2MB_gtt_possible() is the same page pinned by
> vfio_pin_pages().  E.g. if userspace is doing something funky with the
> guest's memslots, or if the page is demoted between is_2MB_gtt_possible()
> and vfio_pin_pages().
> 
> This is more of a performance optimization than a bug fix as the check
> for contiguous struct pages should guard against incorrect mapping (even
> though assuming struct pages are virtually contiguous is wrong).
> 
> The real motivation for explicitly checking for a transparent hugepage
> after pinning is that it will reduce the risk of introducing a bug in a
> future fix for a page refcount leak (KVMGT doesn't put the reference
> acquired by gfn_to_pfn()), and eventually will allow KVMGT to stop using
> KVM's gfn_to_pfn() altogether.
> 
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  drivers/gpu/drm/i915/gvt/kvmgt.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> index 8ae7039b3683..90997cc385b4 100644
> --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> @@ -159,11 +159,25 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
>  			goto err;
>  		}
>  
> -		if (npage == 0)
> -			base_page = cur_page;
> +		if (npage == 0) {
> +			/*
> +			 * Bail immediately to avoid unnecessary pinning when
> +			 * trying to shadow a 2M page and the host page isn't
> +			 * a transparent hugepage.
> +			 *
> +			 * TODO: support other type hugepages, e.g. HugeTLB.
> +			 */
> +			if (size == I915_GTT_PAGE_SIZE_2M &&
> +			    !PageTransHuge(cur_page))
Maybe the checking of PageTransHuge(cur_page) and bailing out is not necessary.
If a page is not transparent huge, but there are 512 contigous 4K
pages, I think it's still good to map them in IOMMU in 2M.
See vfio_pin_map_dma() who does similar things.

> +				ret = -EIO;
> +			else
> +				base_page = cur_page;
> +		}
>  		else if (base_page + npage != cur_page) {
>  			gvt_vgpu_err("The pages are not continuous\n");
>  			ret = -EINVAL;
> +		}
> +		if (ret < 0) {
>  			npage++;
>  			goto err;
>  		}
> -- 
> 2.40.0.rc1.284.g88254d51c5-goog
>
Sean Christopherson May 4, 2023, 8:41 p.m. UTC | #2
On Fri, Mar 17, 2023, Yan Zhao wrote:
> On Fri, Mar 10, 2023 at 04:22:36PM -0800, Sean Christopherson wrote:
> > When shadowing a GTT entry with a 2M page, explicitly verify that the
> > first page pinned by VFIO is a transparent hugepage instead of assuming
> > that page observed by is_2MB_gtt_possible() is the same page pinned by
> > vfio_pin_pages().  E.g. if userspace is doing something funky with the
> > guest's memslots, or if the page is demoted between is_2MB_gtt_possible()
> > and vfio_pin_pages().
> > 
> > This is more of a performance optimization than a bug fix as the check
> > for contiguous struct pages should guard against incorrect mapping (even
> > though assuming struct pages are virtually contiguous is wrong).
> > 
> > The real motivation for explicitly checking for a transparent hugepage
> > after pinning is that it will reduce the risk of introducing a bug in a
> > future fix for a page refcount leak (KVMGT doesn't put the reference
> > acquired by gfn_to_pfn()), and eventually will allow KVMGT to stop using
> > KVM's gfn_to_pfn() altogether.
> > 
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >  drivers/gpu/drm/i915/gvt/kvmgt.c | 18 ++++++++++++++++--
> >  1 file changed, 16 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > index 8ae7039b3683..90997cc385b4 100644
> > --- a/drivers/gpu/drm/i915/gvt/kvmgt.c
> > +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
> > @@ -159,11 +159,25 @@ static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
> >  			goto err;
> >  		}
> >  
> > -		if (npage == 0)
> > -			base_page = cur_page;
> > +		if (npage == 0) {
> > +			/*
> > +			 * Bail immediately to avoid unnecessary pinning when
> > +			 * trying to shadow a 2M page and the host page isn't
> > +			 * a transparent hugepage.
> > +			 *
> > +			 * TODO: support other type hugepages, e.g. HugeTLB.
> > +			 */
> > +			if (size == I915_GTT_PAGE_SIZE_2M &&
> > +			    !PageTransHuge(cur_page))
> Maybe the checking of PageTransHuge(cur_page) and bailing out is not necessary.
> If a page is not transparent huge, but there are 512 contigous 4K
> pages, I think it's still good to map them in IOMMU in 2M.
> See vfio_pin_map_dma() who does similar things.

I agree that bailing isn't strictly necessary, and processing "blindly" should
Just Work for HugeTLB and other hugepage types.  I was going to argue that it
would be safer to add this and then drop it at the end, but I think that's a
specious argument.  If not checking the page type is unsafe, then the existing
code is buggy, and this changelog literally states that the check for contiguous
pages guards against any such problems.

I do think there's a (very, very theoretical) issue though.  For "CONFIG_SPARSEMEM=y
&& CONFIG_SPARSEMEM_VMEMMAP=n", struct pages aren't virtually contiguous with respect
to their pfns, i.e. it's possible (again, very theoretically) that two struct pages
could be virtually contiguous but physically discontiguous.  I suspect I'm being
ridiculously paranoid, but for the efficient cases where pages are guaranteed to
be contiguous, the extra page_to_pfn() checks should be optimized away by the
compiler, i.e. there's no meaningful downside to the paranoia.

TL;DR: My plan is to drop this patch and instead harden the continuity check.
Yan Zhao May 6, 2023, 6:35 a.m. UTC | #3
> > Maybe the checking of PageTransHuge(cur_page) and bailing out is not necessary.
> > If a page is not transparent huge, but there are 512 contigous 4K
> > pages, I think it's still good to map them in IOMMU in 2M.
> > See vfio_pin_map_dma() who does similar things.
> 
> I agree that bailing isn't strictly necessary, and processing "blindly" should
> Just Work for HugeTLB and other hugepage types.  I was going to argue that it
> would be safer to add this and then drop it at the end, but I think that's a
> specious argument.  If not checking the page type is unsafe, then the existing
> code is buggy, and this changelog literally states that the check for contiguous
> pages guards against any such problems.
> 
> I do think there's a (very, very theoretical) issue though.  For "CONFIG_SPARSEMEM=y
> && CONFIG_SPARSEMEM_VMEMMAP=n", struct pages aren't virtually contiguous with respect
> to their pfns, i.e. it's possible (again, very theoretically) that two struct pages
> could be virtually contiguous but physically discontiguous.  I suspect I'm being
> ridiculously paranoid, but for the efficient cases where pages are guaranteed to
> be contiguous, the extra page_to_pfn() checks should be optimized away by the
> compiler, i.e. there's no meaningful downside to the paranoia.
To make sure I understand it correctly:
There are 3 conditions:
(1) Two struct pages aren't virtually contiguous, but there PFNs are contiguous.
(2) Two struct pages are virtually contiguous but their PFNs aren't contiguous.
    (Looks this will not happen?)
(3) Two struct pages are virtually contiguous, and their PFNs are contiguous, too.
    But they have different backends, e.g.
    PFN 1 and PFN 2 are contiguous, while PFN 1 belongs to RAM, and PFN 2
    belongs to DEVMEM.

I think you mean condition (3) is problematic, am I right?
> 
> TL;DR: My plan is to drop this patch and instead harden the continuity check.

So you want to check page zone?
Yan Zhao May 6, 2023, 10:57 a.m. UTC | #4
On Sat, May 06, 2023 at 02:35:41PM +0800, Yan Zhao wrote:
> > > Maybe the checking of PageTransHuge(cur_page) and bailing out is not necessary.
> > > If a page is not transparent huge, but there are 512 contigous 4K
> > > pages, I think it's still good to map them in IOMMU in 2M.
> > > See vfio_pin_map_dma() who does similar things.
> > 
> > I agree that bailing isn't strictly necessary, and processing "blindly" should
> > Just Work for HugeTLB and other hugepage types.  I was going to argue that it
> > would be safer to add this and then drop it at the end, but I think that's a
> > specious argument.  If not checking the page type is unsafe, then the existing
> > code is buggy, and this changelog literally states that the check for contiguous
> > pages guards against any such problems.
> > 
> > I do think there's a (very, very theoretical) issue though.  For "CONFIG_SPARSEMEM=y
> > && CONFIG_SPARSEMEM_VMEMMAP=n", struct pages aren't virtually contiguous with respect
> > to their pfns, i.e. it's possible (again, very theoretically) that two struct pages
> > could be virtually contiguous but physically discontiguous.  I suspect I'm being
> > ridiculously paranoid, but for the efficient cases where pages are guaranteed to
> > be contiguous, the extra page_to_pfn() checks should be optimized away by the
> > compiler, i.e. there's no meaningful downside to the paranoia.
> To make sure I understand it correctly:
> There are 3 conditions:
> (1) Two struct pages aren't virtually contiguous, but there PFNs are contiguous.
> (2) Two struct pages are virtually contiguous but their PFNs aren't contiguous.
>     (Looks this will not happen?)
> (3) Two struct pages are virtually contiguous, and their PFNs are contiguous, too.
>     But they have different backends, e.g.
>     PFN 1 and PFN 2 are contiguous, while PFN 1 belongs to RAM, and PFN 2
>     belongs to DEVMEM.
> 
> I think you mean condition (3) is problematic, am I right?
Oh, I got it now.
You are saying about condition (2), with "CONFIG_SPARSEMEM=y &&
CONFIG_SPARSEMEM_VMEMMAP=n".
Two struct pages are contiguous if one is at one section's tail and another at
another section's head, but the two sections aren't for contiguous PFNs.

> > 
> > TL;DR: My plan is to drop this patch and instead harden the continuity check.
> 
> So you want to check page zone?
Sean Christopherson May 8, 2023, 2:05 p.m. UTC | #5
On Sat, May 06, 2023, Yan Zhao wrote:
> On Sat, May 06, 2023 at 02:35:41PM +0800, Yan Zhao wrote:
> > > > Maybe the checking of PageTransHuge(cur_page) and bailing out is not necessary.
> > > > If a page is not transparent huge, but there are 512 contigous 4K
> > > > pages, I think it's still good to map them in IOMMU in 2M.
> > > > See vfio_pin_map_dma() who does similar things.
> > > 
> > > I agree that bailing isn't strictly necessary, and processing "blindly" should
> > > Just Work for HugeTLB and other hugepage types.  I was going to argue that it
> > > would be safer to add this and then drop it at the end, but I think that's a
> > > specious argument.  If not checking the page type is unsafe, then the existing
> > > code is buggy, and this changelog literally states that the check for contiguous
> > > pages guards against any such problems.
> > > 
> > > I do think there's a (very, very theoretical) issue though.  For "CONFIG_SPARSEMEM=y
> > > && CONFIG_SPARSEMEM_VMEMMAP=n", struct pages aren't virtually contiguous with respect
> > > to their pfns, i.e. it's possible (again, very theoretically) that two struct pages
> > > could be virtually contiguous but physically discontiguous.  I suspect I'm being
> > > ridiculously paranoid, but for the efficient cases where pages are guaranteed to
> > > be contiguous, the extra page_to_pfn() checks should be optimized away by the
> > > compiler, i.e. there's no meaningful downside to the paranoia.
> > To make sure I understand it correctly:
> > There are 3 conditions:
> > (1) Two struct pages aren't virtually contiguous, but there PFNs are contiguous.
> > (2) Two struct pages are virtually contiguous but their PFNs aren't contiguous.
> >     (Looks this will not happen?)
> > (3) Two struct pages are virtually contiguous, and their PFNs are contiguous, too.
> >     But they have different backends, e.g.
> >     PFN 1 and PFN 2 are contiguous, while PFN 1 belongs to RAM, and PFN 2
> >     belongs to DEVMEM.
> > 
> > I think you mean condition (3) is problematic, am I right?
> Oh, I got it now.
> You are saying about condition (2), with "CONFIG_SPARSEMEM=y &&
> CONFIG_SPARSEMEM_VMEMMAP=n".
> Two struct pages are contiguous if one is at one section's tail and another at
> another section's head, but the two sections aren't for contiguous PFNs.

Yep, exactly.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 8ae7039b3683..90997cc385b4 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -159,11 +159,25 @@  static int gvt_pin_guest_page(struct intel_vgpu *vgpu, unsigned long gfn,
 			goto err;
 		}
 
-		if (npage == 0)
-			base_page = cur_page;
+		if (npage == 0) {
+			/*
+			 * Bail immediately to avoid unnecessary pinning when
+			 * trying to shadow a 2M page and the host page isn't
+			 * a transparent hugepage.
+			 *
+			 * TODO: support other type hugepages, e.g. HugeTLB.
+			 */
+			if (size == I915_GTT_PAGE_SIZE_2M &&
+			    !PageTransHuge(cur_page))
+				ret = -EIO;
+			else
+				base_page = cur_page;
+		}
 		else if (base_page + npage != cur_page) {
 			gvt_vgpu_err("The pages are not continuous\n");
 			ret = -EINVAL;
+		}
+		if (ret < 0) {
 			npage++;
 			goto err;
 		}