diff mbox series

x86/ept: fix shattering of special pages

Message ID 20220627100119.55363-1-roger.pau@citrix.com (mailing list archive)
State Superseded
Headers show
Series x86/ept: fix shattering of special pages | expand

Commit Message

Roger Pau Monne June 27, 2022, 10:01 a.m. UTC
The current logic in epte_get_entry_emt() will split any page marked
as special with order greater than zero, without checking whether the
super page is all special.

Fix this by only splitting the page only if it's not all marked as
special, in order to prevent unneeded super page shuttering.

Fixes: ca24b2ffdb ('x86/hvm: set 'ipat' in EPT for special pages')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Paul Durrant <paul@xen.org>
---
It would seem weird to me to have a super page entry in EPT with
ranges marked as special and not the full page.  I guess it's better
to be safe, but I don't see an scenario where we could end up in that
situation.

I've been trying to find a clarification in the original patch
submission about how it's possible to have such super page EPT entry,
but haven't been able to find any justification.

Might be nice to expand the commit message as to why it's possible to
have such mixed super page entries that would need splitting.
---
 xen/arch/x86/mm/p2m-ept.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

Comments

Tian, Kevin June 29, 2022, 8:41 a.m. UTC | #1
> From: Roger Pau Monne <roger.pau@citrix.com>
> Sent: Monday, June 27, 2022 6:01 PM
> 
> The current logic in epte_get_entry_emt() will split any page marked
> as special with order greater than zero, without checking whether the
> super page is all special.
> 
> Fix this by only splitting the page only if it's not all marked as
> special, in order to prevent unneeded super page shuttering.
> 
> Fixes: ca24b2ffdb ('x86/hvm: set 'ipat' in EPT for special pages')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Paul Durrant <paul@xen.org>
> ---
> It would seem weird to me to have a super page entry in EPT with
> ranges marked as special and not the full page.  I guess it's better
> to be safe, but I don't see an scenario where we could end up in that
> situation.
> 
> I've been trying to find a clarification in the original patch
> submission about how it's possible to have such super page EPT entry,
> but haven't been able to find any justification.
> 
> Might be nice to expand the commit message as to why it's possible to
> have such mixed super page entries that would need splitting.

Here is what I dig out.

When reviewing v1 of adding special page check, Jan suggested
that APIC access page was also covered hence the old logic for APIC
access page can be removed. [1]

Then when reviewing v2 he found that the order check in removed
logic was not carried to the new check on special page. [2] 

The original order check in old APIC access logic came from:

commit 126018f2acd5416434747423e61a4690108b9dc9
Author: Jan Beulich <jbeulich@suse.com>
Date:   Fri May 2 10:48:48 2014 +0200

    x86/EPT: consider page order when checking for APIC MFN

    This was overlooked in 3d90d6e6 ("x86/EPT: split super pages upon
    mismatching memory types").

    Signed-off-by: Jan Beulich <jbeulich@suse.com>
    Acked-by: Kevin Tian <kevin.tian@intel.com>
    Reviewed-by: Tim Deegan <tim@xen.org>

I suppose Jan may actually find such mixed super page entry case
to bring this fix in.

Not sure whether Jan still remember the history.

[1] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg01648.html
[2] https://lore.kernel.org/all/a4856c33-8bb0-4afa-cc71-3af4c229bc27@suse.com/

> ---
>  xen/arch/x86/mm/p2m-ept.c | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> index b04ca6dbe8..b4919bad51 100644
> --- a/xen/arch/x86/mm/p2m-ept.c
> +++ b/xen/arch/x86/mm/p2m-ept.c
> @@ -491,7 +491,7 @@ int epte_get_entry_emt(struct domain *d, gfn_t gfn,
> mfn_t mfn,
>  {
>      int gmtrr_mtype, hmtrr_mtype;
>      struct vcpu *v = current;
> -    unsigned long i;
> +    unsigned long i, special_pgs;
> 
>      *ipat = false;
> 
> @@ -525,15 +525,17 @@ int epte_get_entry_emt(struct domain *d, gfn_t
> gfn, mfn_t mfn,
>          return MTRR_TYPE_WRBACK;
>      }
> 
> -    for ( i = 0; i < (1ul << order); i++ )
> -    {
> +    for ( special_pgs = i = 0; i < (1ul << order); i++ )
>          if ( is_special_page(mfn_to_page(mfn_add(mfn, i))) )
> -        {
> -            if ( order )
> -                return -1;
> -            *ipat = true;
> -            return MTRR_TYPE_WRBACK;
> -        }
> +            special_pgs++;
> +
> +    if ( special_pgs )
> +    {
> +        if ( special_pgs != (1ul << order) )
> +            return -1;
> +
> +        *ipat = true;
> +        return MTRR_TYPE_WRBACK;

Did you actually observe an impact w/o this fix? 

>      }
> 
>      switch ( type )
> --
> 2.36.1
Roger Pau Monne June 29, 2022, 9:10 a.m. UTC | #2
On Wed, Jun 29, 2022 at 08:41:43AM +0000, Tian, Kevin wrote:
> > From: Roger Pau Monne <roger.pau@citrix.com>
> > Sent: Monday, June 27, 2022 6:01 PM
> > 
> > The current logic in epte_get_entry_emt() will split any page marked
> > as special with order greater than zero, without checking whether the
> > super page is all special.
> > 
> > Fix this by only splitting the page only if it's not all marked as
> > special, in order to prevent unneeded super page shuttering.
> > 
> > Fixes: ca24b2ffdb ('x86/hvm: set 'ipat' in EPT for special pages')
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > Cc: Paul Durrant <paul@xen.org>
> > ---
> > It would seem weird to me to have a super page entry in EPT with
> > ranges marked as special and not the full page.  I guess it's better
> > to be safe, but I don't see an scenario where we could end up in that
> > situation.
> > 
> > I've been trying to find a clarification in the original patch
> > submission about how it's possible to have such super page EPT entry,
> > but haven't been able to find any justification.
> > 
> > Might be nice to expand the commit message as to why it's possible to
> > have such mixed super page entries that would need splitting.
> 
> Here is what I dig out.
> 
> When reviewing v1 of adding special page check, Jan suggested
> that APIC access page was also covered hence the old logic for APIC
> access page can be removed. [1]

But the APIC access page is always added using set_mmio_p2m_entry(),
which will unconditionally create an entry for it in the EPT, hence
there's no explicit need to check for it's presence inside of higher
order pages.

> Then when reviewing v2 he found that the order check in removed
> logic was not carried to the new check on special page. [2] 
> 
> The original order check in old APIC access logic came from:
> 
> commit 126018f2acd5416434747423e61a4690108b9dc9
> Author: Jan Beulich <jbeulich@suse.com>
> Date:   Fri May 2 10:48:48 2014 +0200
> 
>     x86/EPT: consider page order when checking for APIC MFN
> 
>     This was overlooked in 3d90d6e6 ("x86/EPT: split super pages upon
>     mismatching memory types").
> 
>     Signed-off-by: Jan Beulich <jbeulich@suse.com>
>     Acked-by: Kevin Tian <kevin.tian@intel.com>
>     Reviewed-by: Tim Deegan <tim@xen.org>
> 
> I suppose Jan may actually find such mixed super page entry case
> to bring this fix in.

Hm, I guess theoretically it could be possible for contiguous entries
to be coalesced (if we ever implement that for p2m page tables), and
hence result in super pages being created from smaller entries.

It that case it would be less clear to assert that special pages
cannot be coalesced with other contiguous entries.

> Not sure whether Jan still remember the history.
> 
> [1] https://lists.xenproject.org/archives/html/xen-devel/2020-07/msg01648.html
> [2] https://lore.kernel.org/all/a4856c33-8bb0-4afa-cc71-3af4c229bc27@suse.com/
> 
> > ---
> >  xen/arch/x86/mm/p2m-ept.c | 20 +++++++++++---------
> >  1 file changed, 11 insertions(+), 9 deletions(-)
> > 
> > diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
> > index b04ca6dbe8..b4919bad51 100644
> > --- a/xen/arch/x86/mm/p2m-ept.c
> > +++ b/xen/arch/x86/mm/p2m-ept.c
> > @@ -491,7 +491,7 @@ int epte_get_entry_emt(struct domain *d, gfn_t gfn,
> > mfn_t mfn,
> >  {
> >      int gmtrr_mtype, hmtrr_mtype;
> >      struct vcpu *v = current;
> > -    unsigned long i;
> > +    unsigned long i, special_pgs;
> > 
> >      *ipat = false;
> > 
> > @@ -525,15 +525,17 @@ int epte_get_entry_emt(struct domain *d, gfn_t
> > gfn, mfn_t mfn,
> >          return MTRR_TYPE_WRBACK;
> >      }
> > 
> > -    for ( i = 0; i < (1ul << order); i++ )
> > -    {
> > +    for ( special_pgs = i = 0; i < (1ul << order); i++ )
> >          if ( is_special_page(mfn_to_page(mfn_add(mfn, i))) )
> > -        {
> > -            if ( order )
> > -                return -1;
> > -            *ipat = true;
> > -            return MTRR_TYPE_WRBACK;
> > -        }
> > +            special_pgs++;
> > +
> > +    if ( special_pgs )
> > +    {
> > +        if ( special_pgs != (1ul << order) )
> > +            return -1;
> > +
> > +        *ipat = true;
> > +        return MTRR_TYPE_WRBACK;
> 
> Did you actually observe an impact w/o this fix? 

Yes, the original change has caused a performance regression in some
GPU pass through workloads for XenServer.  I think it's reasonable to
avoid super page shattering if the resulting pages would all end up
using ipat and WRBACK cache attribute, as there's no reason for the
split in the first case.

Thanks, Roger.
Jan Beulich June 29, 2022, 4:21 p.m. UTC | #3
On 27.06.2022 12:01, Roger Pau Monne wrote:
> The current logic in epte_get_entry_emt() will split any page marked
> as special with order greater than zero, without checking whether the
> super page is all special.
> 
> Fix this by only splitting the page only if it's not all marked as
> special, in order to prevent unneeded super page shuttering.
> 
> Fixes: ca24b2ffdb ('x86/hvm: set 'ipat' in EPT for special pages')
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> Cc: Paul Durrant <paul@xen.org>
> ---
> It would seem weird to me to have a super page entry in EPT with
> ranges marked as special and not the full page.  I guess it's better
> to be safe, but I don't see an scenario where we could end up in that
> situation.
> 
> I've been trying to find a clarification in the original patch
> submission about how it's possible to have such super page EPT entry,
> but haven't been able to find any justification.

I think the loop there was added "just in case", whereas in reality
it was only single special pages that would ever be mapped. So ...

> Might be nice to expand the commit message as to why it's possible to
> have such mixed super page entries that would need splitting.

... there may not be anything to add. I don't mind this being re-done,
hence
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Yet I'm not sure I understand what case this actually fixes (and
hence why you did add a Fixes: tag) - are there cases where non-
order-0 special pages are mapped in reality?

As to the Fixes: tag - I think we mean to follow Linux there and
hence want 12-digit hashes to be specified. See also
docs/process/sending-patches.pandoc. (But no need to resend just
for this.)

Jan
Tian, Kevin June 30, 2022, 2:04 a.m. UTC | #4
> From: Roger Pau Monné <roger.pau@citrix.com>
> Sent: Wednesday, June 29, 2022 5:11 PM
> 
> On Wed, Jun 29, 2022 at 08:41:43AM +0000, Tian, Kevin wrote:
> > > From: Roger Pau Monne <roger.pau@citrix.com>
> > > Sent: Monday, June 27, 2022 6:01 PM
> > >
> > > The current logic in epte_get_entry_emt() will split any page marked
> > > as special with order greater than zero, without checking whether the
> > > super page is all special.
> > >
> > > Fix this by only splitting the page only if it's not all marked as
> > > special, in order to prevent unneeded super page shuttering.
> > >
> > > Fixes: ca24b2ffdb ('x86/hvm: set 'ipat' in EPT for special pages')
> > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > > ---
> > > Cc: Paul Durrant <paul@xen.org>
> > > ---
> > > It would seem weird to me to have a super page entry in EPT with
> > > ranges marked as special and not the full page.  I guess it's better
> > > to be safe, but I don't see an scenario where we could end up in that
> > > situation.
> > >
> > > I've been trying to find a clarification in the original patch
> > > submission about how it's possible to have such super page EPT entry,
> > > but haven't been able to find any justification.
> > >
> > > Might be nice to expand the commit message as to why it's possible to
> > > have such mixed super page entries that would need splitting.
> >
> > Here is what I dig out.
> >
> > When reviewing v1 of adding special page check, Jan suggested
> > that APIC access page was also covered hence the old logic for APIC
> > access page can be removed. [1]
> 
> But the APIC access page is always added using set_mmio_p2m_entry(),
> which will unconditionally create an entry for it in the EPT, hence
> there's no explicit need to check for it's presence inside of higher
> order pages.
> 
> > Then when reviewing v2 he found that the order check in removed
> > logic was not carried to the new check on special page. [2]
> >
> > The original order check in old APIC access logic came from:
> >
> > commit 126018f2acd5416434747423e61a4690108b9dc9
> > Author: Jan Beulich <jbeulich@suse.com>
> > Date:   Fri May 2 10:48:48 2014 +0200
> >
> >     x86/EPT: consider page order when checking for APIC MFN
> >
> >     This was overlooked in 3d90d6e6 ("x86/EPT: split super pages upon
> >     mismatching memory types").
> >
> >     Signed-off-by: Jan Beulich <jbeulich@suse.com>
> >     Acked-by: Kevin Tian <kevin.tian@intel.com>
> >     Reviewed-by: Tim Deegan <tim@xen.org>
> >
> > I suppose Jan may actually find such mixed super page entry case
> > to bring this fix in.
> 
> Hm, I guess theoretically it could be possible for contiguous entries
> to be coalesced (if we ever implement that for p2m page tables), and
> hence result in super pages being created from smaller entries.
> 
> It that case it would be less clear to assert that special pages
> cannot be coalesced with other contiguous entries.

With Jan's confirmation I'm fine with this change too. Just below...

> >
> > Did you actually observe an impact w/o this fix?
> 
> Yes, the original change has caused a performance regression in some
> GPU pass through workloads for XenServer.  I think it's reasonable to
> avoid super page shattering if the resulting pages would all end up
> using ipat and WRBACK cache attribute, as there's no reason for the
> split in the first case.
> 

... I'd appreciate mentioning the regression case in the commit msg.

With that,

Reviewed-by: Kevin Tian <kevin.tian@intel.com>
diff mbox series

Patch

diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index b04ca6dbe8..b4919bad51 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -491,7 +491,7 @@  int epte_get_entry_emt(struct domain *d, gfn_t gfn, mfn_t mfn,
 {
     int gmtrr_mtype, hmtrr_mtype;
     struct vcpu *v = current;
-    unsigned long i;
+    unsigned long i, special_pgs;
 
     *ipat = false;
 
@@ -525,15 +525,17 @@  int epte_get_entry_emt(struct domain *d, gfn_t gfn, mfn_t mfn,
         return MTRR_TYPE_WRBACK;
     }
 
-    for ( i = 0; i < (1ul << order); i++ )
-    {
+    for ( special_pgs = i = 0; i < (1ul << order); i++ )
         if ( is_special_page(mfn_to_page(mfn_add(mfn, i))) )
-        {
-            if ( order )
-                return -1;
-            *ipat = true;
-            return MTRR_TYPE_WRBACK;
-        }
+            special_pgs++;
+
+    if ( special_pgs )
+    {
+        if ( special_pgs != (1ul << order) )
+            return -1;
+
+        *ipat = true;
+        return MTRR_TYPE_WRBACK;
     }
 
     switch ( type )