Message ID | ad8ea1da-dda9-4567-8b15-55b65777e80f@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/MTRR: hook mtrr_bp_restore() back up | expand |
On Thu, Mar 27, 2025 at 12:20:47PM +0100, Jan Beulich wrote: > Unlike stated in the offending commit's description, > load_system_tables() wasn't the only thing left to retain from the > earlier restore_rest_processor_state(). > > While there also do Misra-related tidying for the function itself: The > function being used from assembly only means it doesn't need to have a > declaration, but wants to be asmlinkage. I wonder, maybe the intention was for the MTRR restoring on the BSP to also be done by the mtrr_aps_sync_end() call in enter_state()? AFAICT that will set the MTRRs uniformly on all CPUs, by calling mtrr_set_all() just like mtrr_bp_restore(), but later in the restore process. Thanks, Roger.
On 27.03.2025 12:38, Roger Pau Monné wrote: > On Thu, Mar 27, 2025 at 12:20:47PM +0100, Jan Beulich wrote: >> Unlike stated in the offending commit's description, >> load_system_tables() wasn't the only thing left to retain from the >> earlier restore_rest_processor_state(). >> >> While there also do Misra-related tidying for the function itself: The >> function being used from assembly only means it doesn't need to have a >> declaration, but wants to be asmlinkage. > > I wonder, maybe the intention was for the MTRR restoring on the BSP to > also be done by the mtrr_aps_sync_end() call in enter_state()? > > AFAICT that will set the MTRRs uniformly on all CPUs, by calling > mtrr_set_all() just like mtrr_bp_restore(), but later in the restore > process. Hmm, yes, that's possible. The comment in set_mtrr() is somewhat misleading then, though, as for the BP the writing then isn't just "okay" but necessary. Question is whether doing this so much later is actually good enough. Jan
On Thu, Mar 27, 2025 at 01:30:44PM +0100, Jan Beulich wrote: > On 27.03.2025 12:38, Roger Pau Monné wrote: > > On Thu, Mar 27, 2025 at 12:20:47PM +0100, Jan Beulich wrote: > >> Unlike stated in the offending commit's description, > >> load_system_tables() wasn't the only thing left to retain from the > >> earlier restore_rest_processor_state(). > >> > >> While there also do Misra-related tidying for the function itself: The > >> function being used from assembly only means it doesn't need to have a > >> declaration, but wants to be asmlinkage. > > > > I wonder, maybe the intention was for the MTRR restoring on the BSP to > > also be done by the mtrr_aps_sync_end() call in enter_state()? > > > > AFAICT that will set the MTRRs uniformly on all CPUs, by calling > > mtrr_set_all() just like mtrr_bp_restore(), but later in the restore > > process. > > Hmm, yes, that's possible. The comment in set_mtrr() is somewhat misleading > then, though, as for the BP the writing then isn't just "okay" but necessary. > Question is whether doing this so much later is actually good enough. Hm, no idea really. We do the device restore ahead of the MTRR restore, so I wonder whether we could have issues by using unexpected effective cache attributes for device memory accesses as a result of MTRRs not being initialized? Thanks, Roger.
On 27.03.2025 13:48, Roger Pau Monné wrote: > On Thu, Mar 27, 2025 at 01:30:44PM +0100, Jan Beulich wrote: >> On 27.03.2025 12:38, Roger Pau Monné wrote: >>> On Thu, Mar 27, 2025 at 12:20:47PM +0100, Jan Beulich wrote: >>>> Unlike stated in the offending commit's description, >>>> load_system_tables() wasn't the only thing left to retain from the >>>> earlier restore_rest_processor_state(). >>>> >>>> While there also do Misra-related tidying for the function itself: The >>>> function being used from assembly only means it doesn't need to have a >>>> declaration, but wants to be asmlinkage. >>> >>> I wonder, maybe the intention was for the MTRR restoring on the BSP to >>> also be done by the mtrr_aps_sync_end() call in enter_state()? >>> >>> AFAICT that will set the MTRRs uniformly on all CPUs, by calling >>> mtrr_set_all() just like mtrr_bp_restore(), but later in the restore >>> process. >> >> Hmm, yes, that's possible. The comment in set_mtrr() is somewhat misleading >> then, though, as for the BP the writing then isn't just "okay" but necessary. >> Question is whether doing this so much later is actually good enough. > > Hm, no idea really. We do the device restore ahead of the MTRR > restore, so I wonder whether we could have issues by using unexpected > effective cache attributes for device memory accesses as a result of > MTRRs not being initialized? That's just one of the possible problems. The father the MTRRs we run with diverged from what firmware puts in place, the bigger the possible trouble. I think the restoring better is done as being switched to here again. The absence of any discussion of MTRRs in that earlier change leaves me pretty certain that the behavioral change there wasn't intended. Andrew is usually pretty good at spelling out all intended effects. Jan
On Thu, Mar 27, 2025 at 02:28:42PM +0100, Jan Beulich wrote: > On 27.03.2025 13:48, Roger Pau Monné wrote: > > On Thu, Mar 27, 2025 at 01:30:44PM +0100, Jan Beulich wrote: > >> On 27.03.2025 12:38, Roger Pau Monné wrote: > >>> On Thu, Mar 27, 2025 at 12:20:47PM +0100, Jan Beulich wrote: > >>>> Unlike stated in the offending commit's description, > >>>> load_system_tables() wasn't the only thing left to retain from the > >>>> earlier restore_rest_processor_state(). > >>>> > >>>> While there also do Misra-related tidying for the function itself: The > >>>> function being used from assembly only means it doesn't need to have a > >>>> declaration, but wants to be asmlinkage. > >>> > >>> I wonder, maybe the intention was for the MTRR restoring on the BSP to > >>> also be done by the mtrr_aps_sync_end() call in enter_state()? > >>> > >>> AFAICT that will set the MTRRs uniformly on all CPUs, by calling > >>> mtrr_set_all() just like mtrr_bp_restore(), but later in the restore > >>> process. > >> > >> Hmm, yes, that's possible. The comment in set_mtrr() is somewhat misleading > >> then, though, as for the BP the writing then isn't just "okay" but necessary. > >> Question is whether doing this so much later is actually good enough. > > > > Hm, no idea really. We do the device restore ahead of the MTRR > > restore, so I wonder whether we could have issues by using unexpected > > effective cache attributes for device memory accesses as a result of > > MTRRs not being initialized? > > That's just one of the possible problems. The father the MTRRs we run with > diverged from what firmware puts in place, the bigger the possible trouble. > I think the restoring better is done as being switched to here again. The > absence of any discussion of MTRRs in that earlier change leaves me pretty > certain that the behavioral change there wasn't intended. Andrew is usually > pretty good at spelling out all intended effects. No objection, however for the BSP we now end up restoring the MTRRs twice, as we will also do it in mtrr_aps_sync_end(). Might be worth to mention in the commit message that the MTRR state was restored in mtrr_aps_sync_end() for the BSP also, but that it might be too late. Possibly with that somehow noted in the commit message: Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger.
On 27.03.2025 15:10, Roger Pau Monné wrote: > On Thu, Mar 27, 2025 at 02:28:42PM +0100, Jan Beulich wrote: >> On 27.03.2025 13:48, Roger Pau Monné wrote: >>> On Thu, Mar 27, 2025 at 01:30:44PM +0100, Jan Beulich wrote: >>>> On 27.03.2025 12:38, Roger Pau Monné wrote: >>>>> On Thu, Mar 27, 2025 at 12:20:47PM +0100, Jan Beulich wrote: >>>>>> Unlike stated in the offending commit's description, >>>>>> load_system_tables() wasn't the only thing left to retain from the >>>>>> earlier restore_rest_processor_state(). >>>>>> >>>>>> While there also do Misra-related tidying for the function itself: The >>>>>> function being used from assembly only means it doesn't need to have a >>>>>> declaration, but wants to be asmlinkage. >>>>> >>>>> I wonder, maybe the intention was for the MTRR restoring on the BSP to >>>>> also be done by the mtrr_aps_sync_end() call in enter_state()? >>>>> >>>>> AFAICT that will set the MTRRs uniformly on all CPUs, by calling >>>>> mtrr_set_all() just like mtrr_bp_restore(), but later in the restore >>>>> process. >>>> >>>> Hmm, yes, that's possible. The comment in set_mtrr() is somewhat misleading >>>> then, though, as for the BP the writing then isn't just "okay" but necessary. >>>> Question is whether doing this so much later is actually good enough. >>> >>> Hm, no idea really. We do the device restore ahead of the MTRR >>> restore, so I wonder whether we could have issues by using unexpected >>> effective cache attributes for device memory accesses as a result of >>> MTRRs not being initialized? >> >> That's just one of the possible problems. The father the MTRRs we run with >> diverged from what firmware puts in place, the bigger the possible trouble. >> I think the restoring better is done as being switched to here again. The >> absence of any discussion of MTRRs in that earlier change leaves me pretty >> certain that the behavioral change there wasn't intended. Andrew is usually >> pretty good at spelling out all intended effects. > > No objection, however for the BSP we now end up restoring the MTRRs > twice, as we will also do it in mtrr_aps_sync_end(). > > Might be worth to mention in the commit message that the MTRR state > was restored in mtrr_aps_sync_end() for the BSP also, but that it > might be too late. I've added "Note that MTRR state was still reloaded via mtrr_aps_sync_end(), but that happens quite a bit later in the resume process." > Possibly with that somehow noted in the commit message: > > Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks. Any chance of getting another one for the 3rd patch in this (split up) group? Maybe the duplicate one for the "don't hard-code" one was actually meant to go there? Jan
On 27/03/2025 2:20 pm, Jan Beulich wrote: > On 27.03.2025 15:10, Roger Pau Monné wrote: >> On Thu, Mar 27, 2025 at 02:28:42PM +0100, Jan Beulich wrote: >>> On 27.03.2025 13:48, Roger Pau Monné wrote: >>>> On Thu, Mar 27, 2025 at 01:30:44PM +0100, Jan Beulich wrote: >>>>> On 27.03.2025 12:38, Roger Pau Monné wrote: >>>>>> On Thu, Mar 27, 2025 at 12:20:47PM +0100, Jan Beulich wrote: >>>>>>> Unlike stated in the offending commit's description, >>>>>>> load_system_tables() wasn't the only thing left to retain from the >>>>>>> earlier restore_rest_processor_state(). >>>>>>> >>>>>>> While there also do Misra-related tidying for the function itself: The >>>>>>> function being used from assembly only means it doesn't need to have a >>>>>>> declaration, but wants to be asmlinkage. >>>>>> I wonder, maybe the intention was for the MTRR restoring on the BSP to >>>>>> also be done by the mtrr_aps_sync_end() call in enter_state()? >>>>>> >>>>>> AFAICT that will set the MTRRs uniformly on all CPUs, by calling >>>>>> mtrr_set_all() just like mtrr_bp_restore(), but later in the restore >>>>>> process. >>>>> Hmm, yes, that's possible. The comment in set_mtrr() is somewhat misleading >>>>> then, though, as for the BP the writing then isn't just "okay" but necessary. >>>>> Question is whether doing this so much later is actually good enough. >>>> Hm, no idea really. We do the device restore ahead of the MTRR >>>> restore, so I wonder whether we could have issues by using unexpected >>>> effective cache attributes for device memory accesses as a result of >>>> MTRRs not being initialized? >>> That's just one of the possible problems. The father the MTRRs we run with >>> diverged from what firmware puts in place, the bigger the possible trouble. >>> I think the restoring better is done as being switched to here again. The >>> absence of any discussion of MTRRs in that earlier change leaves me pretty >>> certain that the behavioral change there wasn't intended. Andrew is usually >>> pretty good at spelling out all intended effects. >> No objection, however for the BSP we now end up restoring the MTRRs >> twice, as we will also do it in mtrr_aps_sync_end(). >> >> Might be worth to mention in the commit message that the MTRR state >> was restored in mtrr_aps_sync_end() for the BSP also, but that it >> might be too late. > I've added "Note that MTRR state was still reloaded via mtrr_aps_sync_end(), > but that happens quite a bit later in the resume process." Ah yes, you got here too. Yes, I think I simply missed this part of the discussion from the commit message. The MTRR logic is a giant tangle, and lost of it (I'm pretty sure) is only relevant for early 32bit days. Also since then, I expect firmware has gotten better, considering that S3 is ubiquitous on laptops nowadays. I expect that we don't need to change MTRRs in most cases. However, if change to the MTRRs actually need to happen, then they probably want doing as part of the AP boot, rather than in a rendezvous later. That said, it would be a difference between the normal boot and S3 resume paths. ~Andrew
On 27.03.2025 16:23, Andrew Cooper wrote: > On 27/03/2025 2:20 pm, Jan Beulich wrote: >> On 27.03.2025 15:10, Roger Pau Monné wrote: >>> On Thu, Mar 27, 2025 at 02:28:42PM +0100, Jan Beulich wrote: >>>> On 27.03.2025 13:48, Roger Pau Monné wrote: >>>>> On Thu, Mar 27, 2025 at 01:30:44PM +0100, Jan Beulich wrote: >>>>>> On 27.03.2025 12:38, Roger Pau Monné wrote: >>>>>>> On Thu, Mar 27, 2025 at 12:20:47PM +0100, Jan Beulich wrote: >>>>>>>> Unlike stated in the offending commit's description, >>>>>>>> load_system_tables() wasn't the only thing left to retain from the >>>>>>>> earlier restore_rest_processor_state(). >>>>>>>> >>>>>>>> While there also do Misra-related tidying for the function itself: The >>>>>>>> function being used from assembly only means it doesn't need to have a >>>>>>>> declaration, but wants to be asmlinkage. >>>>>>> I wonder, maybe the intention was for the MTRR restoring on the BSP to >>>>>>> also be done by the mtrr_aps_sync_end() call in enter_state()? >>>>>>> >>>>>>> AFAICT that will set the MTRRs uniformly on all CPUs, by calling >>>>>>> mtrr_set_all() just like mtrr_bp_restore(), but later in the restore >>>>>>> process. >>>>>> Hmm, yes, that's possible. The comment in set_mtrr() is somewhat misleading >>>>>> then, though, as for the BP the writing then isn't just "okay" but necessary. >>>>>> Question is whether doing this so much later is actually good enough. >>>>> Hm, no idea really. We do the device restore ahead of the MTRR >>>>> restore, so I wonder whether we could have issues by using unexpected >>>>> effective cache attributes for device memory accesses as a result of >>>>> MTRRs not being initialized? >>>> That's just one of the possible problems. The father the MTRRs we run with >>>> diverged from what firmware puts in place, the bigger the possible trouble. >>>> I think the restoring better is done as being switched to here again. The >>>> absence of any discussion of MTRRs in that earlier change leaves me pretty >>>> certain that the behavioral change there wasn't intended. Andrew is usually >>>> pretty good at spelling out all intended effects. >>> No objection, however for the BSP we now end up restoring the MTRRs >>> twice, as we will also do it in mtrr_aps_sync_end(). >>> >>> Might be worth to mention in the commit message that the MTRR state >>> was restored in mtrr_aps_sync_end() for the BSP also, but that it >>> might be too late. >> I've added "Note that MTRR state was still reloaded via mtrr_aps_sync_end(), >> but that happens quite a bit later in the resume process." > > Ah yes, you got here too. > > Yes, I think I simply missed this part of the discussion from the commit > message. > > The MTRR logic is a giant tangle, and lost of it (I'm pretty sure) is > only relevant for early 32bit days. Also since then, I expect firmware > has gotten better, considering that S3 is ubiquitous on laptops nowadays. > > I expect that we don't need to change MTRRs in most cases. However, if > change to the MTRRs actually need to happen, then they probably want > doing as part of the AP boot, rather than in a rendezvous later. That > said, it would be a difference between the normal boot and S3 resume paths. So in summary - you think we don't need/want the patch here? It feels risky to me to run the BSP with not-yet-restored MTRRs for an extended period of time. Jan
--- a/xen/arch/x86/acpi/wakeup_prot.S +++ b/xen/arch/x86/acpi/wakeup_prot.S @@ -124,6 +124,8 @@ LABEL(s3_resume) mov STACK_CPUINFO_FIELD(cr4)(%rbx), %rax mov %rax, %cr4 + call mtrr_bp_restore + .Lsuspend_err: pop %r15 pop %r14 --- a/xen/arch/x86/cpu/mtrr/main.c +++ b/xen/arch/x86/cpu/mtrr/main.c @@ -610,7 +610,7 @@ void mtrr_aps_sync_end(void) hold_mtrr_updates_on_aps = 0; } -void mtrr_bp_restore(void) +void asmlinkage mtrr_bp_restore(void) { if (mtrr_if) mtrr_set_all(); --- a/xen/arch/x86/include/asm/mtrr.h +++ b/xen/arch/x86/include/asm/mtrr.h @@ -66,7 +66,6 @@ extern uint8_t pat_type_2_pte_flags(uint extern int hold_mtrr_updates_on_aps; extern void mtrr_aps_sync_begin(void); extern void mtrr_aps_sync_end(void); -extern void mtrr_bp_restore(void); extern bool mtrr_var_range_msr_set(struct domain *d, struct mtrr_state *m, uint32_t msr, uint64_t msr_content);
Unlike stated in the offending commit's description, load_system_tables() wasn't the only thing left to retain from the earlier restore_rest_processor_state(). While there also do Misra-related tidying for the function itself: The function being used from assembly only means it doesn't need to have a declaration, but wants to be asmlinkage. Fixes: 4304ff420e51 ("x86/S3: Drop {save,restore}_rest_processor_state() completely") Reported-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- Contextually assumed to go on top of "x86/MTRR: constrain AP sync and BSP restore". Functionally there's no dependency.