diff mbox series

[v2,2/2] x86/dom0: only disable SMAP for the PV dom0 build

Message ID 20240730152855.48745-3-roger.pau@citrix.com (mailing list archive)
State New
Headers show
Series x86/dom0: miscellaneous fixes for PV dom0 builder | expand

Commit Message

Roger Pau Monné July 30, 2024, 3:28 p.m. UTC
The PVH dom0 builder doesn't switch page tables and has no need to run with
SMAP disabled.

Use stac() and clac(), as that's safe to use even if events would hit in the
middle of the region with SMAP disabled.  Nesting stac() and clac() calls is
not safe, so the stac() call is done strictly after elf_load_binary() because
that uses raw_{copy_to,clear}_guest() accessors which toggle SMAP.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/pv/dom0_build.c | 13 +++++++++++++
 xen/arch/x86/setup.c         | 17 -----------------
 2 files changed, 13 insertions(+), 17 deletions(-)

Comments

Jan Beulich July 31, 2024, 6:44 a.m. UTC | #1
On 30.07.2024 17:28, Roger Pau Monne wrote:
> The PVH dom0 builder doesn't switch page tables and has no need to run with
> SMAP disabled.
> 
> Use stac() and clac(), as that's safe to use even if events would hit in the
> middle of the region with SMAP disabled.  Nesting stac() and clac() calls is
> not safe, so the stac() call is done strictly after elf_load_binary() because
> that uses raw_{copy_to,clear}_guest() accessors which toggle SMAP.

And that was the main concern causing the CR4.SMAP override to be used instead,
iirc. While I'm sure you've properly audited all code paths, how can we be sure
there's not going to be another stac() or clac() added somewhere? Or setting of
EFLAGS as a whole, clearing EFLAGS.AC without that being explicit? I think we'd
be better off sticking to the fiddling with CR4.

> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Considering the bug Andrew pointed out on the code you remove from setup.c,
don't we want a Fixes: tag as well?

> --- a/xen/arch/x86/pv/dom0_build.c
> +++ b/xen/arch/x86/pv/dom0_build.c
> @@ -830,6 +830,15 @@ int __init dom0_construct_pv(struct domain *d,
>          printk("Failed to load the kernel binary\n");
>          goto out;
>      }
> +
> +    /*
> +     * Disable SMAP to allow user-accesses when running on dom0 page-tables.
> +     * Note this must be done after elf_load_binary(), as such helper uses
> +     * raw_{copy_to,clear}_guest() helpers which internally call stac()/clac()
> +     * and those calls would otherwise nest with the ones here.

Just in case you and Andrew would outvote me on which approach to take:
I'm okay with "helpers" here, but the earlier "such helper" reads a little
odd to me. Imo using "that" or "it" instead would be better. Not the least
because personally a function like elf_load_binary() goes beyond what I'd
call a mere "helper" (in that case dom0_construct_pv() toos could be deemed
a helper, etc).

Jan
Roger Pau Monné July 31, 2024, 1:25 p.m. UTC | #2
On Wed, Jul 31, 2024 at 08:44:46AM +0200, Jan Beulich wrote:
> On 30.07.2024 17:28, Roger Pau Monne wrote:
> > The PVH dom0 builder doesn't switch page tables and has no need to run with
> > SMAP disabled.
> > 
> > Use stac() and clac(), as that's safe to use even if events would hit in the
> > middle of the region with SMAP disabled.  Nesting stac() and clac() calls is
> > not safe, so the stac() call is done strictly after elf_load_binary() because
> > that uses raw_{copy_to,clear}_guest() accessors which toggle SMAP.
> 
> And that was the main concern causing the CR4.SMAP override to be used instead,
> iirc. While I'm sure you've properly audited all code paths, how can we be sure
> there's not going to be another stac() or clac() added somewhere? Or setting of
> EFLAGS as a whole, clearing EFLAGS.AC without that being explicit? I think we'd
> be better off sticking to the fiddling with CR4.

On approach I didn't test would be to add ASSERTs in stac/clac
functions to ensure that the state is as intended.  IOW: for stac we
would assert that the AC flag is not set, while for clac we would do
the opposite and assert that it's set before clearing it.

That should detect nesting.

> 
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Considering the bug Andrew pointed out on the code you remove from setup.c,
> don't we want a Fixes: tag as well?

No, I think the current code is correct, it was my change that was
incorrect.

> > --- a/xen/arch/x86/pv/dom0_build.c
> > +++ b/xen/arch/x86/pv/dom0_build.c
> > @@ -830,6 +830,15 @@ int __init dom0_construct_pv(struct domain *d,
> >          printk("Failed to load the kernel binary\n");
> >          goto out;
> >      }
> > +
> > +    /*
> > +     * Disable SMAP to allow user-accesses when running on dom0 page-tables.
> > +     * Note this must be done after elf_load_binary(), as such helper uses
> > +     * raw_{copy_to,clear}_guest() helpers which internally call stac()/clac()
> > +     * and those calls would otherwise nest with the ones here.
> 
> Just in case you and Andrew would outvote me on which approach to take:
> I'm okay with "helpers" here, but the earlier "such helper" reads a little
> odd to me. Imo using "that" or "it" instead would be better. Not the least
> because personally a function like elf_load_binary() goes beyond what I'd
> call a mere "helper" (in that case dom0_construct_pv() toos could be deemed
> a helper, etc).

Hm, yes, would be better to reword that.

Thanks, Roger.
Jan Beulich July 31, 2024, 1:39 p.m. UTC | #3
On 31.07.2024 15:25, Roger Pau Monné wrote:
> On Wed, Jul 31, 2024 at 08:44:46AM +0200, Jan Beulich wrote:
>> On 30.07.2024 17:28, Roger Pau Monne wrote:
>>> The PVH dom0 builder doesn't switch page tables and has no need to run with
>>> SMAP disabled.
>>>
>>> Use stac() and clac(), as that's safe to use even if events would hit in the
>>> middle of the region with SMAP disabled.  Nesting stac() and clac() calls is
>>> not safe, so the stac() call is done strictly after elf_load_binary() because
>>> that uses raw_{copy_to,clear}_guest() accessors which toggle SMAP.
>>
>> And that was the main concern causing the CR4.SMAP override to be used instead,
>> iirc. While I'm sure you've properly audited all code paths, how can we be sure
>> there's not going to be another stac() or clac() added somewhere? Or setting of
>> EFLAGS as a whole, clearing EFLAGS.AC without that being explicit? I think we'd
>> be better off sticking to the fiddling with CR4.
> 
> On approach I didn't test would be to add ASSERTs in stac/clac
> functions to ensure that the state is as intended.  IOW: for stac we
> would assert that the AC flag is not set, while for clac we would do
> the opposite and assert that it's set before clearing it.
> 
> That should detect nesting.

Yet it would also refuse non-paired uses which are in principle okay.
Plus is requires respective code paths to be taken for such assertions
to trigger.

>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>
>> Considering the bug Andrew pointed out on the code you remove from setup.c,
>> don't we want a Fixes: tag as well?
> 
> No, I think the current code is correct, it was my change that was
> incorrect.

Hmm, no I think there was an issue also before, from the cpu_has_smap
use in the restore-CR4 conditional: We'd enable SMAP there even if on
the command line there was "smap=hvm". While we clear FEATURE_SMAP
when "smap=off", we keep the feature available when "smap=hvm". Thus
we'd pointlessly write CR4 in the first if() and then enable SMAP in
the second one, even though it wasn't enabled earlier on.

Jan
Roger Pau Monné July 31, 2024, 2:10 p.m. UTC | #4
On Wed, Jul 31, 2024 at 03:39:35PM +0200, Jan Beulich wrote:
> On 31.07.2024 15:25, Roger Pau Monné wrote:
> > On Wed, Jul 31, 2024 at 08:44:46AM +0200, Jan Beulich wrote:
> >> On 30.07.2024 17:28, Roger Pau Monne wrote:
> >>> The PVH dom0 builder doesn't switch page tables and has no need to run with
> >>> SMAP disabled.
> >>>
> >>> Use stac() and clac(), as that's safe to use even if events would hit in the
> >>> middle of the region with SMAP disabled.  Nesting stac() and clac() calls is
> >>> not safe, so the stac() call is done strictly after elf_load_binary() because
> >>> that uses raw_{copy_to,clear}_guest() accessors which toggle SMAP.
> >>
> >> And that was the main concern causing the CR4.SMAP override to be used instead,
> >> iirc. While I'm sure you've properly audited all code paths, how can we be sure
> >> there's not going to be another stac() or clac() added somewhere? Or setting of
> >> EFLAGS as a whole, clearing EFLAGS.AC without that being explicit? I think we'd
> >> be better off sticking to the fiddling with CR4.
> > 
> > On approach I didn't test would be to add ASSERTs in stac/clac
> > functions to ensure that the state is as intended.  IOW: for stac we
> > would assert that the AC flag is not set, while for clac we would do
> > the opposite and assert that it's set before clearing it.
> > 
> > That should detect nesting.
> 
> Yet it would also refuse non-paired uses which are in principle okay.

While such non-paired uses could be fine, it would seem to point to
other issues, as I would expect stac/clac to always be paired unless
it's a non-return path (a panic or similar).

> Plus is requires respective code paths to be taken for such assertions
> to trigger.

It does.  It seems more reliable to me to use stac/clac, rather than
fiddling with %cr4, however there's the nesting issue.  I think we
need to reach consensus as to which approach is to be used.

> >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> >>
> >> Considering the bug Andrew pointed out on the code you remove from setup.c,
> >> don't we want a Fixes: tag as well?
> > 
> > No, I think the current code is correct, it was my change that was
> > incorrect.
> 
> Hmm, no I think there was an issue also before, from the cpu_has_smap
> use in the restore-CR4 conditional: We'd enable SMAP there even if on
> the command line there was "smap=hvm". While we clear FEATURE_SMAP
> when "smap=off", we keep the feature available when "smap=hvm". Thus
> we'd pointlessly write CR4 in the first if() and then enable SMAP in
> the second one, even though it wasn't enabled earlier on.

Oh yes, that one.  I was thinking about the one related to IST and
cr4_pv32_mask.  I will add the fixes tag.

Thanks, Roger.
Andrew Cooper July 31, 2024, 4:47 p.m. UTC | #5
On 30/07/2024 4:28 pm, Roger Pau Monne wrote:
> The PVH dom0 builder doesn't switch page tables and has no need to run with
> SMAP disabled.
>
> Use stac() and clac(), as that's safe to use even if events would hit in the
> middle of the region with SMAP disabled.  Nesting stac() and clac() calls is
> not safe, so the stac() call is done strictly after elf_load_binary() because
> that uses raw_{copy_to,clear}_guest() accessors which toggle SMAP.
>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Summarising a discussion on Matrix.

There are 3 options.

1) Simply reposition the write_cr4()/cr4_pv32_mask adjustments.
2) This form (use stac/clac instead of playing with cr4).
3) Delay the original set_in_cr4(SMAP).

As proved by the confusion thus far, cr4_pv32_mask adjustments are
fragile and can go unnoticed in the general case (need a lucky watchdog
NMI hit to trigger).  Hence I'd prefer to remove the adjustments.

Using stac()/clac() is much easier.  It is fragile because of nesting
(no AC save/restore infrastructure), but any mistake here will be picked
up reliably in Gitlab CI because both adl-* and zen3p-* support SMAP.

Personally I think option 2 is better than 1, hence why I suggested it. 
It's got a fragile corner case but will be spotted reliably.

However, it occurs to me that option 3 exists as well... just delay
setting SMAP until after dom0 is made.  We have form now with only
activating CET-SS on the way out of __start_xen().

Furthermore, option 3 would allow us to move the cr4_pv32_mask
adjustment into set_in_cr4() and never need to opencode it.

(Although this is a bit tricky given the current code layout. 
cr4_pv32_mask also wants to be __ro_after_init and non-existent in a
!PV32 build.)

~Andrew
Jan Beulich Aug. 1, 2024, 7:12 a.m. UTC | #6
On 31.07.2024 18:47, Andrew Cooper wrote:
> On 30/07/2024 4:28 pm, Roger Pau Monne wrote:
>> The PVH dom0 builder doesn't switch page tables and has no need to run with
>> SMAP disabled.
>>
>> Use stac() and clac(), as that's safe to use even if events would hit in the
>> middle of the region with SMAP disabled.  Nesting stac() and clac() calls is
>> not safe, so the stac() call is done strictly after elf_load_binary() because
>> that uses raw_{copy_to,clear}_guest() accessors which toggle SMAP.
>>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Summarising a discussion on Matrix.
> 
> There are 3 options.
> 
> 1) Simply reposition the write_cr4()/cr4_pv32_mask adjustments.
> 2) This form (use stac/clac instead of playing with cr4).
> 3) Delay the original set_in_cr4(SMAP).
> 
> As proved by the confusion thus far, cr4_pv32_mask adjustments are
> fragile and can go unnoticed in the general case (need a lucky watchdog
> NMI hit to trigger).  Hence I'd prefer to remove the adjustments.
> 
> Using stac()/clac() is much easier.  It is fragile because of nesting
> (no AC save/restore infrastructure), but any mistake here will be picked
> up reliably in Gitlab CI because both adl-* and zen3p-* support SMAP.
> 
> Personally I think option 2 is better than 1, hence why I suggested it. 
> It's got a fragile corner case but will be spotted reliably.

... when code paths in question are always taken. Any such operation on
a rarely taken code path quite likely won't be spotted by mere testing.

Jan

> However, it occurs to me that option 3 exists as well... just delay
> setting SMAP until after dom0 is made.  We have form now with only
> activating CET-SS on the way out of __start_xen().
> 
> Furthermore, option 3 would allow us to move the cr4_pv32_mask
> adjustment into set_in_cr4() and never need to opencode it.
> 
> (Although this is a bit tricky given the current code layout. 
> cr4_pv32_mask also wants to be __ro_after_init and non-existent in a
> !PV32 build.)
> 
> ~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 57e58a02e707..ad804579cb6f 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -830,6 +830,15 @@  int __init dom0_construct_pv(struct domain *d,
         printk("Failed to load the kernel binary\n");
         goto out;
     }
+
+    /*
+     * Disable SMAP to allow user-accesses when running on dom0 page-tables.
+     * Note this must be done after elf_load_binary(), as such helper uses
+     * raw_{copy_to,clear}_guest() helpers which internally call stac()/clac()
+     * and those calls would otherwise nest with the ones here.
+     */
+    stac();
+
     bootstrap_map(NULL);
 
     if ( UNSET_ADDR != parms.virt_hypercall )
@@ -837,6 +846,7 @@  int __init dom0_construct_pv(struct domain *d,
         if ( (parms.virt_hypercall < v_start) ||
              (parms.virt_hypercall >= v_end) )
         {
+            clac();
             mapcache_override_current(NULL);
             switch_cr3_cr4(current->arch.cr3, read_cr4());
             printk("Invalid HYPERCALL_PAGE field in ELF notes.\n");
@@ -978,6 +988,9 @@  int __init dom0_construct_pv(struct domain *d,
                                     : XLAT_start_info_console_dom0);
 #endif
 
+    /* Possibly re-enable SMAP. */
+    clac();
+
     /* Return to idle domain's page tables. */
     mapcache_override_current(NULL);
     switch_cr3_cr4(current->arch.cr3, read_cr4());
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index eee20bb1753c..bc387d96b519 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -955,26 +955,9 @@  static struct domain *__init create_dom0(const module_t *image,
         }
     }
 
-    /*
-     * Temporarily clear SMAP in CR4 to allow user-accesses in construct_dom0().
-     * This saves a large number of corner cases interactions with
-     * copy_from_user().
-     */
-    if ( cpu_has_smap )
-    {
-        cr4_pv32_mask &= ~X86_CR4_SMAP;
-        write_cr4(read_cr4() & ~X86_CR4_SMAP);
-    }
-
     if ( construct_dom0(d, image, headroom, initrd, cmdline) != 0 )
         panic("Could not construct domain 0\n");
 
-    if ( cpu_has_smap )
-    {
-        write_cr4(read_cr4() | X86_CR4_SMAP);
-        cr4_pv32_mask |= X86_CR4_SMAP;
-    }
-
     return d;
 }