diff mbox series

[3/8] x86/paging: move update_paging_modes() hook

Message ID bee2d51f-534e-f6e2-6385-70f8597eac1e@suse.com (mailing list archive)
State Superseded
Headers show
Series x86/mm: address aspects noticed during XSA-410 work | expand

Commit Message

Jan Beulich Dec. 21, 2022, 1:25 p.m. UTC
The hook isn't mode dependent, hence it's misplaced in struct
paging_mode. (Or alternatively I see no reason why the alloc_page() and
free_page() hooks don't also live there.) Move it to struct
paging_domain.

While there rename the hook and HAP's as well as shadow's hook functions
to use singular; I never understood why plural was used. (Renaming in
particular the wrapper would be touching quite a lot of other code.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>

Comments

Andrew Cooper Dec. 21, 2022, 5:43 p.m. UTC | #1
On 21/12/2022 1:25 pm, Jan Beulich wrote:
> The hook isn't mode dependent, hence it's misplaced in struct
> paging_mode. (Or alternatively I see no reason why the alloc_page() and
> free_page() hooks don't also live there.) Move it to struct
> paging_domain.
>
> While there rename the hook and HAP's as well as shadow's hook functions
> to use singular; I never understood why plural was used. (Renaming in
> particular the wrapper would be touching quite a lot of other code.)

There are always two modes; Xen's, and the guest's.

This was probably more visible back in the 32-bit days, but remnants of
it are still around with the fact that struct vcpu needs to be below the
4G boundary for the PDPTRs for when the guest isn't in Long Mode.

HAP also hides it fairly well given the uniformity of EPT/NPT (and
always 4 levels in a 64-bit Xen), but I suspect it will become more
visible again when we start supporting LA57.


All of that said, I have debated the utility of this hook.  It mixes up
various concepts including the singleton construction of monitor
tables,  and TLB flushing (which should not happen for a mov cr3 with
NOFLUSH set), and with those concepts abstracted properly, the only
remaining action is caching the right ops pointer.

I'm not convinced it will survive a concerned attempt to fix the HVM p2m
vs guest tlb flushing confusion.

> --- a/xen/arch/x86/mm/shadow/none.c
> +++ b/xen/arch/x86/mm/shadow/none.c
> @@ -27,6 +32,9 @@ int shadow_domain_init(struct domain *d)
>      };
>  
>      paging_log_dirty_init(d, &sh_none_ops);
> +
> +    d->arch.paging.update_paging_mode = _update_paging_mode;
> +
>      return is_hvm_domain(d) ? -EOPNOTSUPP : 0;

I know you haven't changed the logic here, but this hook looks broken. 
Why do we fail it right at the end for HVM domains?

~Andrew
Jan Beulich Dec. 22, 2022, 8 a.m. UTC | #2
On 21.12.2022 18:43, Andrew Cooper wrote:
> On 21/12/2022 1:25 pm, Jan Beulich wrote:
>> The hook isn't mode dependent, hence it's misplaced in struct
>> paging_mode. (Or alternatively I see no reason why the alloc_page() and
>> free_page() hooks don't also live there.) Move it to struct
>> paging_domain.
>>
>> While there rename the hook and HAP's as well as shadow's hook functions
>> to use singular; I never understood why plural was used. (Renaming in
>> particular the wrapper would be touching quite a lot of other code.)
> 
> There are always two modes; Xen's, and the guest's.
> 
> This was probably more visible back in the 32-bit days, but remnants of
> it are still around with the fact that struct vcpu needs to be below the
> 4G boundary for the PDPTRs for when the guest isn't in Long Mode.
> 
> HAP also hides it fairly well given the uniformity of EPT/NPT (and
> always 4 levels in a 64-bit Xen), but I suspect it will become more
> visible again when we start supporting LA57.

So does this boil down to a request to undo the rename? Or undo it just
for shadow code (as the HAP function really does only one thing)? As to
LA57, I'm not convinced it'll become more visible again then, but of
course without actually doing that work it's all hand-waving anyway.

>> --- a/xen/arch/x86/mm/shadow/none.c
>> +++ b/xen/arch/x86/mm/shadow/none.c
>> @@ -27,6 +32,9 @@ int shadow_domain_init(struct domain *d)
>>      };
>>  
>>      paging_log_dirty_init(d, &sh_none_ops);
>> +
>> +    d->arch.paging.update_paging_mode = _update_paging_mode;
>> +
>>      return is_hvm_domain(d) ? -EOPNOTSUPP : 0;
> 
> I know you haven't changed the logic here, but this hook looks broken. 
> Why do we fail it right at the end for HVM domains?

It's been a long time, but I guess my thinking back then was that it's
better to put in place pointers which other code may rely on being non-
NULL.

Jan
Andrew Cooper Jan. 5, 2023, 7:53 p.m. UTC | #3
On 22/12/2022 8:00 am, Jan Beulich wrote:
> On 21.12.2022 18:43, Andrew Cooper wrote:
>> On 21/12/2022 1:25 pm, Jan Beulich wrote:
>>> The hook isn't mode dependent, hence it's misplaced in struct
>>> paging_mode. (Or alternatively I see no reason why the alloc_page() and
>>> free_page() hooks don't also live there.) Move it to struct
>>> paging_domain.
>>>
>>> While there rename the hook and HAP's as well as shadow's hook functions
>>> to use singular; I never understood why plural was used. (Renaming in
>>> particular the wrapper would be touching quite a lot of other code.)
>> There are always two modes; Xen's, and the guest's.
>>
>> This was probably more visible back in the 32-bit days, but remnants of
>> it are still around with the fact that struct vcpu needs to be below the
>> 4G boundary for the PDPTRs for when the guest isn't in Long Mode.
>>
>> HAP also hides it fairly well given the uniformity of EPT/NPT (and
>> always 4 levels in a 64-bit Xen), but I suspect it will become more
>> visible again when we start supporting LA57.
> So does this boil down to a request to undo the rename? Or undo it just
> for shadow code (as the HAP function really does only one thing)? As to
> LA57, I'm not convinced it'll become more visible again then, but of
> course without actually doing that work it's all hand-waving anyway.

With LA57, HAP really does become 2 things.  Using a 5-level walk at the
HAP level is a prerequisite for being able to VMEntry with CR4.LA57 set,
on both Intel and AMD hardware IIRC.

Then, for VMs which don't elect to enable LA57, we will be in a 4-on-5
(to borrow the shadow terminology) situation.

The comment by paging_update_paging_modes() is fairly clear about this
operation being strictly related to guest state, which further
demonstrates that hap version playing with the monitor table is wrong.

>
>>> --- a/xen/arch/x86/mm/shadow/none.c
>>> +++ b/xen/arch/x86/mm/shadow/none.c
>>> @@ -27,6 +32,9 @@ int shadow_domain_init(struct domain *d)
>>>      };
>>>  
>>>      paging_log_dirty_init(d, &sh_none_ops);
>>> +
>>> +    d->arch.paging.update_paging_mode = _update_paging_mode;
>>> +
>>>      return is_hvm_domain(d) ? -EOPNOTSUPP : 0;
>> I know you haven't changed the logic here, but this hook looks broken. 
>> Why do we fail it right at the end for HVM domains?
> It's been a long time, but I guess my thinking back then was that it's
> better to put in place pointers which other code may rely on being non-
> NULL.

Any chance we could gain a /* set up pointers for safety, then fail */
then ?

~Andrew
diff mbox series

Patch

--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -235,6 +235,8 @@  struct paging_domain {
      * (used by p2m and log-dirty code for their tries) */
     struct page_info * (*alloc_page)(struct domain *d);
     void (*free_page)(struct domain *d, struct page_info *pg);
+
+    void (*update_paging_mode)(struct vcpu *v);
 };
 
 struct paging_vcpu {
--- a/xen/arch/x86/include/asm/paging.h
+++ b/xen/arch/x86/include/asm/paging.h
@@ -140,7 +140,6 @@  struct paging_mode {
 #endif
     void          (*update_cr3            )(struct vcpu *v, int do_locking,
                                             bool noflush);
-    void          (*update_paging_modes   )(struct vcpu *v);
     bool          (*flush_tlb             )(const unsigned long *vcpu_bitmap);
 
     unsigned int guest_levels;
@@ -316,7 +315,7 @@  static inline void paging_update_cr3(str
  * has changed, and when bringing up a VCPU for the first time. */
 static inline void paging_update_paging_modes(struct vcpu *v)
 {
-    paging_get_hostmode(v)->update_paging_modes(v);
+    v->domain->arch.paging.update_paging_mode(v);
 }
 
 #ifdef CONFIG_PV
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -443,6 +443,9 @@  static void hap_destroy_monitor_table(st
 /************************************************/
 /*          HAP DOMAIN LEVEL FUNCTIONS          */
 /************************************************/
+
+static void cf_check hap_update_paging_mode(struct vcpu *v);
+
 void hap_domain_init(struct domain *d)
 {
     static const struct log_dirty_ops hap_ops = {
@@ -453,6 +456,8 @@  void hap_domain_init(struct domain *d)
 
     /* Use HAP logdirty mechanism. */
     paging_log_dirty_init(d, &hap_ops);
+
+    d->arch.paging.update_paging_mode = hap_update_paging_mode;
 }
 
 /* return 0 for success, -errno for failure */
@@ -772,7 +777,7 @@  hap_paging_get_mode(struct vcpu *v)
                                       &hap_paging_protected_mode);
 }
 
-static void cf_check hap_update_paging_modes(struct vcpu *v)
+static void cf_check hap_update_paging_mode(struct vcpu *v)
 {
     struct domain *d = v->domain;
     unsigned long cr3_gfn = v->arch.hvm.guest_cr[3] >> PAGE_SHIFT;
@@ -842,7 +847,6 @@  static const struct paging_mode hap_pagi
     .gva_to_gfn             = hap_gva_to_gfn_real_mode,
     .p2m_ga_to_gfn          = hap_p2m_ga_to_gfn_real_mode,
     .update_cr3             = hap_update_cr3,
-    .update_paging_modes    = hap_update_paging_modes,
     .flush_tlb              = flush_tlb,
     .guest_levels           = 1
 };
@@ -853,7 +857,6 @@  static const struct paging_mode hap_pagi
     .gva_to_gfn             = hap_gva_to_gfn_2_levels,
     .p2m_ga_to_gfn          = hap_p2m_ga_to_gfn_2_levels,
     .update_cr3             = hap_update_cr3,
-    .update_paging_modes    = hap_update_paging_modes,
     .flush_tlb              = flush_tlb,
     .guest_levels           = 2
 };
@@ -864,7 +867,6 @@  static const struct paging_mode hap_pagi
     .gva_to_gfn             = hap_gva_to_gfn_3_levels,
     .p2m_ga_to_gfn          = hap_p2m_ga_to_gfn_3_levels,
     .update_cr3             = hap_update_cr3,
-    .update_paging_modes    = hap_update_paging_modes,
     .flush_tlb              = flush_tlb,
     .guest_levels           = 3
 };
@@ -875,7 +877,6 @@  static const struct paging_mode hap_pagi
     .gva_to_gfn             = hap_gva_to_gfn_4_levels,
     .p2m_ga_to_gfn          = hap_p2m_ga_to_gfn_4_levels,
     .update_cr3             = hap_update_cr3,
-    .update_paging_modes    = hap_update_paging_modes,
     .flush_tlb              = flush_tlb,
     .guest_levels           = 4
 };
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -45,6 +45,8 @@  static int cf_check sh_enable_log_dirty(
 static int cf_check sh_disable_log_dirty(struct domain *);
 static void cf_check sh_clean_dirty_bitmap(struct domain *);
 
+static void cf_check shadow_update_paging_mode(struct vcpu *);
+
 /* Set up the shadow-specific parts of a domain struct at start of day.
  * Called for every domain from arch_domain_create() */
 int shadow_domain_init(struct domain *d)
@@ -60,6 +62,8 @@  int shadow_domain_init(struct domain *d)
     /* Use shadow pagetables for log-dirty support */
     paging_log_dirty_init(d, &sh_ops);
 
+    d->arch.paging.update_paging_mode = shadow_update_paging_mode;
+
 #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
     d->arch.paging.shadow.oos_active = 0;
 #endif
@@ -2558,7 +2562,12 @@  static void sh_update_paging_modes(struc
     v->arch.paging.mode->update_cr3(v, 0, false);
 }
 
-void cf_check shadow_update_paging_modes(struct vcpu *v)
+/*
+ * Update all the things that are derived from the guest's CR0/CR3/CR4.
+ * Called to initialize paging structures if the paging mode has changed,
+ * and when bringing up a VCPU for the first time.
+ */
+static void cf_check shadow_update_paging_mode(struct vcpu *v)
 {
     paging_lock(v->domain);
     sh_update_paging_modes(v);
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -4198,7 +4198,6 @@  const struct paging_mode sh_paging_mode
     .gva_to_gfn                    = sh_gva_to_gfn,
 #endif
     .update_cr3                    = sh_update_cr3,
-    .update_paging_modes           = shadow_update_paging_modes,
     .flush_tlb                     = shadow_flush_tlb,
     .guest_levels                  = GUEST_PAGING_LEVELS,
     .shadow.detach_old_tables      = sh_detach_old_tables,
--- a/xen/arch/x86/mm/shadow/none.c
+++ b/xen/arch/x86/mm/shadow/none.c
@@ -18,6 +18,11 @@  static void cf_check _clean_dirty_bitmap
     ASSERT(is_pv_domain(d));
 }
 
+static void cf_check _update_paging_mode(struct vcpu *v)
+{
+    ASSERT_UNREACHABLE();
+}
+
 int shadow_domain_init(struct domain *d)
 {
     static const struct log_dirty_ops sh_none_ops = {
@@ -27,6 +32,9 @@  int shadow_domain_init(struct domain *d)
     };
 
     paging_log_dirty_init(d, &sh_none_ops);
+
+    d->arch.paging.update_paging_mode = _update_paging_mode;
+
     return is_hvm_domain(d) ? -EOPNOTSUPP : 0;
 }
 
@@ -57,11 +65,6 @@  static void cf_check _update_cr3(struct
     ASSERT_UNREACHABLE();
 }
 
-static void cf_check _update_paging_modes(struct vcpu *v)
-{
-    ASSERT_UNREACHABLE();
-}
-
 static const struct paging_mode sh_paging_none = {
     .page_fault                    = _page_fault,
     .invlpg                        = _invlpg,
@@ -69,7 +72,6 @@  static const struct paging_mode sh_pagin
     .gva_to_gfn                    = _gva_to_gfn,
 #endif
     .update_cr3                    = _update_cr3,
-    .update_paging_modes           = _update_paging_modes,
 };
 
 void shadow_vcpu_init(struct vcpu *v)
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -426,11 +426,6 @@  void cf_check sh_write_guest_entry(
 intpte_t cf_check sh_cmpxchg_guest_entry(
     struct vcpu *v, intpte_t *p, intpte_t old, intpte_t new, mfn_t gmfn);
 
-/* Update all the things that are derived from the guest's CR0/CR3/CR4.
- * Called to initialize paging structures if the paging mode
- * has changed, and when bringing up a VCPU for the first time. */
-void cf_check shadow_update_paging_modes(struct vcpu *v);
-
 /* Unhook the non-Xen mappings in this top-level shadow mfn.
  * With user_only == 1, unhooks only the user-mode mappings. */
 void shadow_unhook_mappings(struct domain *d, mfn_t smfn, int user_only);