diff mbox series

[v3,08/10] x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init

Message ID 20220908084914.21703-9-jgross@suse.com (mailing list archive)
State New, archived
Headers show
Series x86: make pat and mtrr independent from each other | expand

Commit Message

Jürgen Groß Sept. 8, 2022, 8:49 a.m. UTC
In order to prepare decoupling MTRR and PAT replace the MTRR specific
mtrr_aps_delayed_init flag with a more generic cache_aps_delayed_init
one.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- new patch
---
 arch/x86/include/asm/cacheinfo.h |  2 ++
 arch/x86/include/asm/mtrr.h      |  2 --
 arch/x86/kernel/cpu/cacheinfo.c  |  2 ++
 arch/x86/kernel/cpu/mtrr/mtrr.c  | 17 ++++-------------
 arch/x86/kernel/smpboot.c        |  5 +++--
 5 files changed, 11 insertions(+), 17 deletions(-)

Comments

Borislav Petkov Sept. 26, 2022, 9:11 p.m. UTC | #1
On Thu, Sep 08, 2022 at 10:49:12AM +0200, Juergen Gross wrote:
> -void set_mtrr_aps_delayed_init(void)
> -{
> -	if (!cache_generic)
> -		return;
> -
> -	mtrr_aps_delayed_init = true;
> -}
> -

Except that you've removed the accessors and made that bool global.
Which is less pretty than it was before...
Jürgen Groß Sept. 27, 2022, 8:57 a.m. UTC | #2
On 26.09.22 23:11, Borislav Petkov wrote:
> On Thu, Sep 08, 2022 at 10:49:12AM +0200, Juergen Gross wrote:
>> -void set_mtrr_aps_delayed_init(void)
>> -{
>> -	if (!cache_generic)
>> -		return;
>> -
>> -	mtrr_aps_delayed_init = true;
>> -}
>> -
> 
> Except that you've removed the accessors and made that bool global.
> Which is less pretty than it was before...
> 

The accessor would now only need to set the bool, while it had at least
some logic before.

TBH I don't see the point of having an accessor which is just setting a
variable to "true". But if you like it better, I can keep it.

Another possibility would be to move the arch_thaw_secondary_cpus_begin()
and arch_thaw_secondary_cpus_end() functions to cacheinfo.c, resulting
in only a single place outside of cacheinfo.c setting the variable (in
theory the arch_thaw_secondary_cpus_*() functions could just be redefined
to the accessor and cache_aps_init(), but this would be rather hacky IMO).


Juergen
Borislav Petkov Sept. 27, 2022, 10:10 a.m. UTC | #3
On Tue, Sep 27, 2022 at 10:57:37AM +0200, Juergen Gross wrote:
> TBH I don't see the point of having an accessor which is just setting a
> variable to "true". But if you like it better, I can keep it.

Accessors are always better, no matter how silly. :)

But, in trying to grok your next patch - you really should split those
more complex ones because they're a pain to review - I'm starting to
wonder whether we could even remove mtrr_aps_delayed_init and make the
delayed init the default.

Because, AFAICT, set_mtrr_aps_delayed_init() is called by default
by native_smp_prepare_cpus(). Which is called by hyperv and
arch/x86/xen/smp_hvm.c.

The only one that's not calling it is arch/x86/xen/smp_pv.c but that
thing doesn't support MTRRs in the first place, right?

Which means, it doesn't need delayed MTRR init anyway.

Which would then mean that this would simplify this ugly logic even more.

Or am I missing an angle?

It is possible in this nuts code.

Thx.
Jürgen Groß Sept. 27, 2022, 10:14 a.m. UTC | #4
On 27.09.22 12:10, Borislav Petkov wrote:
> On Tue, Sep 27, 2022 at 10:57:37AM +0200, Juergen Gross wrote:
>> TBH I don't see the point of having an accessor which is just setting a
>> variable to "true". But if you like it better, I can keep it.
> 
> Accessors are always better, no matter how silly. :)

Okay, then I'll keep it.

> But, in trying to grok your next patch - you really should split those
> more complex ones because they're a pain to review - I'm starting to
> wonder whether we could even remove mtrr_aps_delayed_init and make the
> delayed init the default.
> 
> Because, AFAICT, set_mtrr_aps_delayed_init() is called by default
> by native_smp_prepare_cpus(). Which is called by hyperv and
> arch/x86/xen/smp_hvm.c.
> 
> The only one that's not calling it is arch/x86/xen/smp_pv.c but that
> thing doesn't support MTRRs in the first place, right?

Correct.

> Which means, it doesn't need delayed MTRR init anyway.
> 
> Which would then mean that this would simplify this ugly logic even more.
> 
> Or am I missing an angle?

Yes: cpu hotplug.


Juergen
Borislav Petkov Sept. 27, 2022, 11:19 a.m. UTC | #5
On Tue, Sep 27, 2022 at 12:14:42PM +0200, Juergen Gross wrote:
> Yes: cpu hotplug.

You might need to elaborate here.

Because I see mtrr_ap_init() on the AP hotplug path:

native_cpu_up->
do_boot_cpu->
start_secondary->
smp_callin->
smp_store_cpu_info->
identify_secondary_cpu->
mtrr_ap_init

Which then means that we could check in mtrr_ap_init() if we're on the
hotplug path and still get rid of that stupid bool...

Close?
Jürgen Groß Sept. 27, 2022, 11:25 a.m. UTC | #6
On 27.09.22 13:19, Borislav Petkov wrote:
> On Tue, Sep 27, 2022 at 12:14:42PM +0200, Juergen Gross wrote:
>> Yes: cpu hotplug.
> 
> You might need to elaborate here.
> 
> Because I see mtrr_ap_init() on the AP hotplug path:
> 
> native_cpu_up->
> do_boot_cpu->
> start_secondary->
> smp_callin->
> smp_store_cpu_info->
> identify_secondary_cpu->
> mtrr_ap_init
> 
> Which then means that we could check in mtrr_ap_init() if we're on the
> hotplug path and still get rid of that stupid bool...
> 
> Close?
> 

You mean by replacing it with "(system_state != SYSTEM_RUNNING)" ?


Juergen
Borislav Petkov Sept. 27, 2022, 12:13 p.m. UTC | #7
On Tue, Sep 27, 2022 at 01:25:47PM +0200, Juergen Gross wrote:
> You mean by replacing it with "(system_state != SYSTEM_RUNNING)" ?

Right, or maybe even something more elegant. I've been meaning to ask
tglx about it as I needed it for the microcode loader too.
Jürgen Groß Sept. 27, 2022, 12:21 p.m. UTC | #8
On 27.09.22 14:13, Borislav Petkov wrote:
> On Tue, Sep 27, 2022 at 01:25:47PM +0200, Juergen Gross wrote:
>> You mean by replacing it with "(system_state != SYSTEM_RUNNING)" ?
> 
> Right, or maybe even something more elegant. I've been meaning to ask
> tglx about it as I needed it for the microcode loader too.

So replacing the bool with "(system_state != SYSTEM_RUNNING)" is fine
with you right now? We can later switch that to the "more elegant"
solution when it shows up.


Juergen
Borislav Petkov Sept. 27, 2022, 11:16 p.m. UTC | #9
On Tue, Sep 27, 2022 at 02:21:17PM +0200, Juergen Gross wrote:
> So replacing the bool with "(system_state != SYSTEM_RUNNING)" is fine
> with you right now? We can later switch that to the "more elegant"
> solution when it shows up.

Ok, I think I have something. And it was staring me straight in the
face but I didn't see it: the MTRR code needs a hotplug notifier. In
that notifier it can do the immediate, i.e., non-delayed init while the
delayed init becomes the default, see below.

And ignore the pr_info debugging gunk pls.

mtrr_ap_init() becomes the notifier callback. It doesn't need to be
called in identify_secondary_cpu() anymore as in the init case that
function doesn't do anything - delayed=true - and in the hotplug case
the notifier runs.

mtrr_aps_init() - "aps" in plural - does the delayed init after all CPUs
have been brought online after the box has booted. That might need some
renaming.

And yes, there's a lot more to cleanup after this. This code has grown
wart after wart over the years...

Fun.

---
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 76d726074c16..1a3dad244bba 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -42,8 +42,6 @@ extern int mtrr_add_page(unsigned long base, unsigned long size,
 extern int mtrr_del(int reg, unsigned long base, unsigned long size);
 extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
 extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
-extern void mtrr_ap_init(void);
-extern void set_mtrr_aps_delayed_init(void);
 extern void mtrr_aps_init(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
@@ -83,8 +81,6 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
 static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
-#define mtrr_ap_init() do {} while (0)
-#define set_mtrr_aps_delayed_init() do {} while (0)
 #define mtrr_aps_init() do {} while (0)
 #define mtrr_bp_restore() do {} while (0)
 #  endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..deef1b5b27cc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1948,7 +1948,6 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
 #ifdef CONFIG_X86_32
 	enable_sep_cpu();
 #endif
-	mtrr_ap_init();
 	validate_apic_and_package_id(c);
 	x86_spec_ctrl_setup_ap();
 	update_srbds_msr();
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 2746cac9d8a9..abbf7cb8a430 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -69,7 +69,6 @@ unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 static DEFINE_MUTEX(mtrr_mutex);
 
 u64 size_or_mask, size_and_mask;
-static bool mtrr_aps_delayed_init;
 
 static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
 
@@ -176,7 +175,7 @@ static int mtrr_rendezvous_handler(void *info)
 	if (data->smp_reg != ~0U) {
 		mtrr_if->set(data->smp_reg, data->smp_base,
 			     data->smp_size, data->smp_type);
-	} else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
+	} else if (!cpu_online(smp_processor_id())) {
 		mtrr_if->set_all();
 	}
 	return 0;
@@ -784,13 +783,16 @@ void __init mtrr_bp_init(void)
 	}
 }
 
-void mtrr_ap_init(void)
+static int mtrr_ap_init(unsigned int cpu)
 {
+	pr_info("%s: single AP entry, use_intel: %d, mtrr_enabled: %d, mtrr_aps_delayed_init\n",
+		__func__, use_intel(), mtrr_enabled());
+
 	if (!mtrr_enabled())
-		return;
+		return 1;
 
-	if (!use_intel() || mtrr_aps_delayed_init)
-		return;
+	if (!use_intel())
+		return 1;
 
 	/*
 	 * Ideally we should hold mtrr_mutex here to avoid mtrr entries
@@ -806,6 +808,8 @@ void mtrr_ap_init(void)
 	 *      lock to prevent mtrr entry changes
 	 */
 	set_mtrr_from_inactive_cpu(~0U, 0, 0, 0);
+
+	return 0;
 }
 
 /**
@@ -820,37 +824,24 @@ void mtrr_save_state(void)
 		return;
 
 	first_cpu = cpumask_first(cpu_online_mask);
-	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
-}
 
-void set_mtrr_aps_delayed_init(void)
-{
-	if (!mtrr_enabled())
-		return;
-	if (!use_intel())
-		return;
+	pr_info("%s: first_cpu: %d\n", __func__, first_cpu);
 
-	mtrr_aps_delayed_init = true;
+	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
 }
 
 /*
- * Delayed MTRR initialization for all AP's
+ * Delayed MTRR initialization for all APs
  */
 void mtrr_aps_init(void)
 {
-	if (!use_intel() || !mtrr_enabled())
-		return;
+	pr_info("%s: entry, use_intel: %d, mtrr_enabled: %d, mtrr_aps_delayed_init\n",
+		__func__, use_intel(), mtrr_enabled());
 
-	/*
-	 * Check if someone has requested the delay of AP MTRR initialization,
-	 * by doing set_mtrr_aps_delayed_init(), prior to this point. If not,
-	 * then we are done.
-	 */
-	if (!mtrr_aps_delayed_init)
+	if (!use_intel() || !mtrr_enabled())
 		return;
 
 	set_mtrr(~0U, 0, 0, 0);
-	mtrr_aps_delayed_init = false;
 }
 
 void mtrr_bp_restore(void)
@@ -869,6 +860,10 @@ static int __init mtrr_init_finialize(void)
 	if (use_intel()) {
 		if (!changed_by_mtrr_cleanup)
 			mtrr_state_warn();
+
+		cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "x86/mtrr:online",
+				  mtrr_ap_init, NULL);
+
 		return 0;
 	}
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f24227bc3220..171acef35201 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1428,7 +1428,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 
 	uv_system_init();
 
-	set_mtrr_aps_delayed_init();
+	pr_info("%s: set_mtrr_aps_delayed_init\n", __func__);
 
 	smp_quirk_init_udelay();
 
@@ -1439,7 +1439,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 
 void arch_thaw_secondary_cpus_begin(void)
 {
-	set_mtrr_aps_delayed_init();
+	pr_info("%s: set_mtrr_aps_delayed_init\n", __func__);
 }
 
 void arch_thaw_secondary_cpus_end(void)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index bbad5e375d3b..fc14601b908c 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -591,6 +591,8 @@ static int bringup_cpu(unsigned int cpu)
 	struct task_struct *idle = idle_thread_get(cpu);
 	int ret;
 
+	pr_info("%s: CPU%d\n", __func__, cpu);
+
 	/*
 	 * Reset stale stack state from the last time this CPU was online.
 	 */
Jürgen Groß Sept. 28, 2022, 5:30 a.m. UTC | #10
On 28.09.22 01:16, Borislav Petkov wrote:
> On Tue, Sep 27, 2022 at 02:21:17PM +0200, Juergen Gross wrote:
>> So replacing the bool with "(system_state != SYSTEM_RUNNING)" is fine
>> with you right now? We can later switch that to the "more elegant"
>> solution when it shows up.
> 
> Ok, I think I have something. And it was staring me straight in the
> face but I didn't see it: the MTRR code needs a hotplug notifier. In
> that notifier it can do the immediate, i.e., non-delayed init while the
> delayed init becomes the default, see below.
> 
> And ignore the pr_info debugging gunk pls.
> 
> mtrr_ap_init() becomes the notifier callback. It doesn't need to be
> called in identify_secondary_cpu() anymore as in the init case that
> function doesn't do anything - delayed=true - and in the hotplug case
> the notifier runs.

Are sure the hotplug notifier doesn't get called in the boot and in the
resume cases? I don't see how those calls are being not done or resulting in
not doing anything.


Juergen
Jürgen Groß Sept. 28, 2022, 6:16 a.m. UTC | #11
On 28.09.22 07:30, Juergen Gross wrote:
> On 28.09.22 01:16, Borislav Petkov wrote:
>> On Tue, Sep 27, 2022 at 02:21:17PM +0200, Juergen Gross wrote:
>>> So replacing the bool with "(system_state != SYSTEM_RUNNING)" is fine
>>> with you right now? We can later switch that to the "more elegant"
>>> solution when it shows up.
>>
>> Ok, I think I have something. And it was staring me straight in the
>> face but I didn't see it: the MTRR code needs a hotplug notifier. In
>> that notifier it can do the immediate, i.e., non-delayed init while the
>> delayed init becomes the default, see below.
>>
>> And ignore the pr_info debugging gunk pls.
>>
>> mtrr_ap_init() becomes the notifier callback. It doesn't need to be
>> called in identify_secondary_cpu() anymore as in the init case that
>> function doesn't do anything - delayed=true - and in the hotplug case
>> the notifier runs.
> 
> Are sure the hotplug notifier doesn't get called in the boot and in the
> resume cases? I don't see how those calls are being not done or resulting in
> not doing anything.

In case my suspicion is correct: this can still be solved by adding the
hotplug notifier only in mtrr_aps_init(), and removing it again in
arch_thaw_secondary_cpus_begin().


Juergen
Borislav Petkov Sept. 28, 2022, 10:48 a.m. UTC | #12
On Wed, Sep 28, 2022 at 08:16:53AM +0200, Juergen Gross wrote:
> > Are sure the hotplug notifier doesn't get called in the boot and in the

It doesn't because it gets registered after smp_init()...

> > resume cases?

... but it gets called during resume because by that time the notifier
has been registered already. See my notes at the end of this mail of
what the code does currently.

> In case my suspicion is correct: this can still be solved by adding the
> hotplug notifier only in mtrr_aps_init(), and removing it again in
> arch_thaw_secondary_cpus_begin().

Pretty much. Yeah, we still need a bool. ;-(

But that bool has a much smaller scope and it is perfectly clear what it
does. And I've added a comment. Could've used comments for the delayed
init thing.

Anyway, it gets set in a thaw callback (I mean, might as well, since
we call into the MTRR code anyway). I probably can make this even
cleaner and not do any bool if I could query in the notifier whether I'm
resuming...

Thx.

---
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 76d726074c16..86b8009d2429 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -42,9 +42,8 @@ extern int mtrr_add_page(unsigned long base, unsigned long size,
 extern int mtrr_del(int reg, unsigned long base, unsigned long size);
 extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
 extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
-extern void mtrr_ap_init(void);
-extern void set_mtrr_aps_delayed_init(void);
 extern void mtrr_aps_init(void);
+extern void mtrr_aps_thaw(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
 extern int amd_special_default_mtrr(void);
@@ -83,9 +82,8 @@ static inline int mtrr_trim_uncached_memory(unsigned long end_pfn)
 static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
-#define mtrr_ap_init() do {} while (0)
-#define set_mtrr_aps_delayed_init() do {} while (0)
 #define mtrr_aps_init() do {} while (0)
+#define mtrr_aps_thaw() do {} while (0)
 #define mtrr_bp_restore() do {} while (0)
 #  endif
 
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 3e508f239098..deef1b5b27cc 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1948,7 +1948,6 @@ void identify_secondary_cpu(struct cpuinfo_x86 *c)
 #ifdef CONFIG_X86_32
 	enable_sep_cpu();
 #endif
-	mtrr_ap_init();
 	validate_apic_and_package_id(c);
 	x86_spec_ctrl_setup_ap();
 	update_srbds_msr();
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 2746cac9d8a9..c4089fd2b477 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -69,7 +69,7 @@ unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 static DEFINE_MUTEX(mtrr_mutex);
 
 u64 size_or_mask, size_and_mask;
-static bool mtrr_aps_delayed_init;
+static bool ap_notifier_disabled;
 
 static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
 
@@ -176,7 +176,7 @@ static int mtrr_rendezvous_handler(void *info)
 	if (data->smp_reg != ~0U) {
 		mtrr_if->set(data->smp_reg, data->smp_base,
 			     data->smp_size, data->smp_type);
-	} else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
+	} else if (!cpu_online(smp_processor_id())) {
 		mtrr_if->set_all();
 	}
 	return 0;
@@ -784,13 +784,16 @@ void __init mtrr_bp_init(void)
 	}
 }
 
-void mtrr_ap_init(void)
+static int mtrr_ap_init(unsigned int cpu)
 {
 	if (!mtrr_enabled())
-		return;
+		return 1;
 
-	if (!use_intel() || mtrr_aps_delayed_init)
-		return;
+	if (!use_intel())
+		return 1;
+
+	if (ap_notifier_disabled)
+		return 0;
 
 	/*
 	 * Ideally we should hold mtrr_mutex here to avoid mtrr entries
@@ -806,6 +809,8 @@ void mtrr_ap_init(void)
 	 *      lock to prevent mtrr entry changes
 	 */
 	set_mtrr_from_inactive_cpu(~0U, 0, 0, 0);
+
+	return 0;
 }
 
 /**
@@ -823,34 +828,26 @@ void mtrr_save_state(void)
 	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
 }
 
-void set_mtrr_aps_delayed_init(void)
-{
-	if (!mtrr_enabled())
-		return;
-	if (!use_intel())
-		return;
-
-	mtrr_aps_delayed_init = true;
-}
-
 /*
- * Delayed MTRR initialization for all AP's
+ * Delayed MTRR initialization for all APs
  */
 void mtrr_aps_init(void)
 {
 	if (!use_intel() || !mtrr_enabled())
 		return;
 
-	/*
-	 * Check if someone has requested the delay of AP MTRR initialization,
-	 * by doing set_mtrr_aps_delayed_init(), prior to this point. If not,
-	 * then we are done.
-	 */
-	if (!mtrr_aps_delayed_init)
-		return;
-
 	set_mtrr(~0U, 0, 0, 0);
-	mtrr_aps_delayed_init = false;
+	ap_notifier_disabled = false;
+}
+
+/*
+ * Disable the AP notifier temporarily during resume. It is supposed to be active only
+ * during CPU hotplug as during resume mtrr_aps_init() takes care of the MTRR
+ * programming on all CPUs.
+ */
+void mtrr_aps_thaw(void)
+{
+	ap_notifier_disabled = true;
 }
 
 void mtrr_bp_restore(void)
@@ -869,6 +866,10 @@ static int __init mtrr_init_finialize(void)
 	if (use_intel()) {
 		if (!changed_by_mtrr_cleanup)
 			mtrr_state_warn();
+
+		cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "x86/mtrr:online",
+				  mtrr_ap_init, NULL);
+
 		return 0;
 	}
 
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f24227bc3220..b90780dab88a 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1428,8 +1428,6 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 
 	uv_system_init();
 
-	set_mtrr_aps_delayed_init();
-
 	smp_quirk_init_udelay();
 
 	speculative_store_bypass_ht_init();
@@ -1439,7 +1437,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)
 
 void arch_thaw_secondary_cpus_begin(void)
 {
-	set_mtrr_aps_delayed_init();
+	mtrr_aps_thaw();
 }
 
 void arch_thaw_secondary_cpus_end(void)


Notes:
------

Boot sequence:

BSP:

[    0.272801] smpboot: native_smp_prepare_cpus: set_mtrr_aps_delayed_init

APs:

[    0.287190] mtrr_save_state: first_cpu: 0
[    0.287724] x86: Booting SMP configuration:
[    0.290292] .... node  #0, CPUs:        #1
[    0.061135] mtrr_ap_init: single AP entry, use_intel: 1, mtrr_enabled: 1, mtrr_aps_delayed_init: 1

 -> set_mtrr_from_inactive_cpu() gets skipped.

After all APs booted:

[    1.544506] mtrr_aps_init: entry, use_intel: 1, mtrr_enabled: 1, mtrr_aps_delayed_init: 1

 -> set_mtrr()

hotplug:

[  206.112651] smpboot: CPU 11 is now offline
[  208.286030] bringup_cpu: CPU11
[  208.286611] mtrr_save_state: first_cpu: 0
[  208.287416] smpboot: Booting Node 0 Processor 11 APIC 0xb
[  206.116567] mtrr_ap_init: single AP entry, use_intel: 1, mtrr_enabled: 1, mtrr_aps_delayed_init: 0

 -> set_mtrr_from_inactive_cpu()

suspend/resume:

BSP:

[  270.586643] smpboot: arch_thaw_secondary_cpus_begin: set_mtrr_aps_delayed_init

APs:

[  270.587947] bringup_cpu: CPU1
[  270.588470] mtrr_save_state: first_cpu: 0
[  270.589207] x86: Booting SMP configuration:
[  270.597971] smpboot: Booting Node 0 Processor 1 APIC 0x1
[  270.530418] mtrr_ap_init: single AP entry, use_intel: 1, mtrr_enabled: 1, mtrr_aps_delayed_init: 1

After all APs booted:

[  270.694168] mtrr_aps_init: entry, use_intel: 1, mtrr_enabled: 1, mtrr_aps_delayed_init: 1
[  270.696923] ACPI: PM: Waking up from system sleep state S3
Jürgen Groß Sept. 28, 2022, 11:14 a.m. UTC | #13
On 28.09.22 12:48, Borislav Petkov wrote:
> On Wed, Sep 28, 2022 at 08:16:53AM +0200, Juergen Gross wrote:
>>> Are sure the hotplug notifier doesn't get called in the boot and in the
> 
> It doesn't because it gets registered after smp_init()...
> 
>>> resume cases?
> 
> ... but it gets called during resume because by that time the notifier
> has been registered already. See my notes at the end of this mail of
> what the code does currently.
> 
>> In case my suspicion is correct: this can still be solved by adding the
>> hotplug notifier only in mtrr_aps_init(), and removing it again in
>> arch_thaw_secondary_cpus_begin().
> 
> Pretty much. Yeah, we still need a bool. ;-(

No, we don't.

Using basically your patch, but with

+	mtrr_online = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+						"x86/mtrr:online",
+						mtrr_ap_init, NULL);

moved to the end of mtrr_aps_init(), and:

+void mtrr_aps_thaw(void)
+{
+	cpuhp_remove_state_nocalls(mtrr_online);
+}


Juergen
Borislav Petkov Sept. 28, 2022, 11:22 a.m. UTC | #14
On Wed, Sep 28, 2022 at 01:14:11PM +0200, Juergen Gross wrote:
> No, we don't.
> 
> Using basically your patch, but with
> 
> +	mtrr_online = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
> +						"x86/mtrr:online",
> +						mtrr_ap_init, NULL);
> 
> moved to the end of mtrr_aps_init(), and:
> 
> +void mtrr_aps_thaw(void)
> +{
> +	cpuhp_remove_state_nocalls(mtrr_online);
> +}

Yes, so you said. I'm not sure I like this toggling of notifier
registration like that.

Optimally, I'd like to be able to query the suspend code whether it is
in the process of resuming.

This here:


static int resume_target_kernel(bool platform_mode)
{

...

 Enable_irqs:
        system_state = SYSTEM_RUNNING;
        local_irq_enable();
 
 Enable_cpus:
        pm_sleep_enable_secondary_cpus();


but being able to do:

        pm_sleep_enable_secondary_cpus();
	system_state = SYSTEM_RUNNING | SYSTEM_RUNNING_APS_UP;

which can't work, obviously. But something like that.
Jürgen Groß Sept. 28, 2022, 1:43 p.m. UTC | #15
On 28.09.22 13:22, Borislav Petkov wrote:
> On Wed, Sep 28, 2022 at 01:14:11PM +0200, Juergen Gross wrote:
>> No, we don't.
>>
>> Using basically your patch, but with
>>
>> +	mtrr_online = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
>> +						"x86/mtrr:online",
>> +						mtrr_ap_init, NULL);
>>
>> moved to the end of mtrr_aps_init(), and:
>>
>> +void mtrr_aps_thaw(void)
>> +{
>> +	cpuhp_remove_state_nocalls(mtrr_online);
>> +}
> 
> Yes, so you said. I'm not sure I like this toggling of notifier
> registration like that.

Yeah, especially with having to remember the slot.

Would you feel better with adding a new enum member CPUHP_AP_CACHECTRL_ONLINE?

This would avoid a possible source of failure during resume in case no slot
for CPUHP_AP_ONLINE_DYN is found (quite improbable, but in theory possible).

> Optimally, I'd like to be able to query the suspend code whether it is
> in the process of resuming.
> 
> This here:
> 
> 
> static int resume_target_kernel(bool platform_mode)
> {
> 
> ...
> 
>   Enable_irqs:
>          system_state = SYSTEM_RUNNING;
>          local_irq_enable();
>   
>   Enable_cpus:
>          pm_sleep_enable_secondary_cpus();
> 
> 
> but being able to do:
> 
>          pm_sleep_enable_secondary_cpus();
> 	system_state = SYSTEM_RUNNING | SYSTEM_RUNNING_APS_UP;
> 
> which can't work, obviously. But something like that.
> 

You wouldn't want to do that there, as there are multiple places where
pm_sleep_enable_secondary_cpus() is being called. Additionally not all
cases are coming in via pm_sleep_enable_secondary_cpus(), as there is
e.g. a call of suspend_enable_secondary_cpus() from kernel_kexec(),
which wants to have the same handling.

arch_thaw_secondary_cpus_begin() and arch_thaw_secondary_cpus_end() are
the functions to mark start and end of the special region where the
delayed MTRR setup should happen.


Juergen
Borislav Petkov Sept. 28, 2022, 4:12 p.m. UTC | #16
On Wed, Sep 28, 2022 at 03:43:56PM +0200, Juergen Gross wrote:
> Would you feel better with adding a new enum member CPUHP_AP_CACHECTRL_ONLINE?
> 
> This would avoid a possible source of failure during resume in case no slot
> for CPUHP_AP_ONLINE_DYN is found (quite improbable, but in theory possible).

Let's keep that in the bag for the time when we get to cross that bridge.

> You wouldn't want to do that there, as there are multiple places where
> pm_sleep_enable_secondary_cpus() is being called.

We want all of them, I'd say. They're all some sort of suspend AFAICT.
But yes, if we get to do it, that would need a proper audit.

> Additionally not all cases are coming in via
> pm_sleep_enable_secondary_cpus(), as there is e.g. a call of
> suspend_enable_secondary_cpus() from kernel_kexec(), which wants to
> have the same handling.

Which means, more hairy.

> arch_thaw_secondary_cpus_begin() and arch_thaw_secondary_cpus_end() are
> the functions to mark start and end of the special region where the
> delayed MTRR setup should happen.

Yap, it seems like the best solution at the moment. Want me to do a
proper patch and test it on real hw?

:-)

Thx.
Jürgen Groß Sept. 28, 2022, 4:32 p.m. UTC | #17
On 28.09.22 18:12, Borislav Petkov wrote:
> On Wed, Sep 28, 2022 at 03:43:56PM +0200, Juergen Gross wrote:
>> Would you feel better with adding a new enum member CPUHP_AP_CACHECTRL_ONLINE?
>>
>> This would avoid a possible source of failure during resume in case no slot
>> for CPUHP_AP_ONLINE_DYN is found (quite improbable, but in theory possible).
> 
> Let's keep that in the bag for the time when we get to cross that bridge.
> 
>> You wouldn't want to do that there, as there are multiple places where
>> pm_sleep_enable_secondary_cpus() is being called.
> 
> We want all of them, I'd say. They're all some sort of suspend AFAICT.
> But yes, if we get to do it, that would need a proper audit.
> 
>> Additionally not all cases are coming in via
>> pm_sleep_enable_secondary_cpus(), as there is e.g. a call of
>> suspend_enable_secondary_cpus() from kernel_kexec(), which wants to
>> have the same handling.
> 
> Which means, more hairy.
> 
>> arch_thaw_secondary_cpus_begin() and arch_thaw_secondary_cpus_end() are
>> the functions to mark start and end of the special region where the
>> delayed MTRR setup should happen.
> 
> Yap, it seems like the best solution at the moment. Want me to do a
> proper patch and test it on real hw?

I can do that.


Juergen
Borislav Petkov Sept. 28, 2022, 4:39 p.m. UTC | #18
On Wed, Sep 28, 2022 at 06:32:20PM +0200, Juergen Gross wrote:
> I can do that.

Thx!
Jürgen Groß Sept. 29, 2022, 8:26 a.m. UTC | #19
On 28.09.22 18:32, Juergen Gross wrote:
> On 28.09.22 18:12, Borislav Petkov wrote:
>> On Wed, Sep 28, 2022 at 03:43:56PM +0200, Juergen Gross wrote:
>>> Would you feel better with adding a new enum member CPUHP_AP_CACHECTRL_ONLINE?
>>>
>>> This would avoid a possible source of failure during resume in case no slot
>>> for CPUHP_AP_ONLINE_DYN is found (quite improbable, but in theory possible).
>>
>> Let's keep that in the bag for the time when we get to cross that bridge.
>>
>>> You wouldn't want to do that there, as there are multiple places where
>>> pm_sleep_enable_secondary_cpus() is being called.
>>
>> We want all of them, I'd say. They're all some sort of suspend AFAICT.
>> But yes, if we get to do it, that would need a proper audit.
>>
>>> Additionally not all cases are coming in via
>>> pm_sleep_enable_secondary_cpus(), as there is e.g. a call of
>>> suspend_enable_secondary_cpus() from kernel_kexec(), which wants to
>>> have the same handling.
>>
>> Which means, more hairy.
>>
>>> arch_thaw_secondary_cpus_begin() and arch_thaw_secondary_cpus_end() are
>>> the functions to mark start and end of the special region where the
>>> delayed MTRR setup should happen.
>>
>> Yap, it seems like the best solution at the moment. Want me to do a
>> proper patch and test it on real hw?
> 
> I can do that.

Okay, lets define what is meant by "that" just to be on the same page.

The idea to use a hotplug callback seems to be rather risky IMHO. At least
CPUHP_AP_ONLINE_DYN seems to be way too late, as there are several device
drivers hooking in with the same or lower priority already. And device
drivers might rely on PAT settings in PTEs of MTRR being setup correctly.

Another problematic case is CPUHP_AP_MICROCODE_LOADER, which is explicitly
doing cache writeback and invalidation, which seems to be risky without
having a sane PAT/MTRR state of the processor. It should be noted that the
microcode loader is registered via late_initcall(), so boot isn't affected
by the delayed MTRR/PAT init when booting.

So the only secure way to use a hotplug callback would be to have a rather
early preregistered slot in enum cpuhp_state.

Regarding resume and kexec I'm no longer sure doing the delayed MTRR/PAT
init is such a great idea. It might save some milliseconds, but the risks
mentioned above with e.g. microcode loading should apply.

So right now I'm inclined to be better on the safe side by not adding any
cpu hotplug hook, but to use just the same "delayed AP init" flag as today,
just renaming it. This would leave the delayed MTRR/PAT init in place for
resume and kexec cases, but deferring the MTRR/PAT cleanup due to this
potential issue seems not appropriate, as the cleanup isn't changing the
behavior here.

We should, however, have a discussion in parallel or later, whether the
whole thaw_secondary_cpus() handling is really okay or whether it should
be changed in some way.


Juergen
Borislav Petkov Sept. 30, 2022, 11:55 a.m. UTC | #20
On Thu, Sep 29, 2022 at 10:26:59AM +0200, Juergen Gross wrote:
> So right now I'm inclined to be better on the safe side by not adding any
> cpu hotplug hook, but to use just the same "delayed AP init" flag as today,
> just renaming it. This would leave the delayed MTRR/PAT init in place for
> resume and kexec cases, but deferring the MTRR/PAT cleanup due to this
> potential issue seems not appropriate, as the cleanup isn't changing the
> behavior here.

Ok, what's wrong with adding a special hotplug level just for that thing
and running it very early? Practically pretty much where it was in time,
in identify_secondary_cpu()?

Having a special one is warranted, as you explain, I'd say.

Thx.
Jürgen Groß Sept. 30, 2022, 1:11 p.m. UTC | #21
On 30.09.22 13:55, Borislav Petkov wrote:
> On Thu, Sep 29, 2022 at 10:26:59AM +0200, Juergen Gross wrote:
>> So right now I'm inclined to be better on the safe side by not adding any
>> cpu hotplug hook, but to use just the same "delayed AP init" flag as today,
>> just renaming it. This would leave the delayed MTRR/PAT init in place for
>> resume and kexec cases, but deferring the MTRR/PAT cleanup due to this
>> potential issue seems not appropriate, as the cleanup isn't changing the
>> behavior here.
> 
> Ok, what's wrong with adding a special hotplug level just for that thing
> and running it very early? Practically pretty much where it was in time,
> in identify_secondary_cpu()?

Yes, this can be done. It would practically have to be the first one just
after CPUHP_BRINGUP_CPU.

The question is whether we really want to call the MTRR/PAT initialization
on hotplugged cpus only after enabling interrupts. Note that the callbacks
are activated only at the end of start_secondary(), while today MTRR/PAT
initialization is called some time earlier by:

   start_secondary()
     smp_callin()
       smp_store_cpu_info()
         identify_secondary_cpu()
           mtrr_ap_init()

I don't think this is a real problem, but I wanted to mention it.

The next question would be, why MTRR/PAT init should be special (meaning:
why are all the other functions called that early not realized via
callbacks)? Is it just because of the special handling during boot/resume?

It might be worth a discussion whether there shouldn't be a special group
of callbacks activated BEFORE interrupts are being enabled.

> Having a special one is warranted, as you explain, I'd say.

Thanks. I'll write a patch for that.


Juergen
Borislav Petkov Sept. 30, 2022, 1:25 p.m. UTC | #22
On Fri, Sep 30, 2022 at 03:11:07PM +0200, Juergen Gross wrote:
> Yes, this can be done. It would practically have to be the first one just
> after CPUHP_BRINGUP_CPU.

Right.

> The question is whether we really want to call the MTRR/PAT initialization
> on hotplugged cpus only after enabling interrupts. Note that the callbacks
> are activated only at the end of start_secondary(), while today MTRR/PAT
> initialization is called some time earlier by:
> 
>   start_secondary()
>     smp_callin()
>       smp_store_cpu_info()
>         identify_secondary_cpu()
>           mtrr_ap_init()
> 
> I don't think this is a real problem, but I wanted to mention it.

Yep, I saw that too but I don't think there will be a problem either.
I mean, it should be early enough as you point out not to need proper
MTRR/PAT settings yet.

But we'll make sure we test this real good too.

> The next question would be, why MTRR/PAT init should be special
> (meaning: why are all the other functions called that early not
> realized via callbacks)?

Well, our init code is crazy. Frankly, I don't see why not more of the
"init stuff on the freshly hotplugged CPU" work is done there...

> Is it just because of the special handling during boot/resume?

... unless this is the case, ofc. Right.

> It might be worth a discussion whether there shouldn't be a special group
> of callbacks activated BEFORE interrupts are being enabled.

That's a good question. /me writes it down to ask tglx when he gets back.

I mean, that early I don't think it matters whether IRQs are enabled
or not. But this'll need to be audited on a case by case basis. As I
said, our boot code is nuts with stuff bolted on everywhere for whatever
reasons.

> Thanks. I'll write a patch for that.

Thanks too.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/cacheinfo.h b/arch/x86/include/asm/cacheinfo.h
index 563d9cb5fcf5..e80ed3c523c8 100644
--- a/arch/x86/include/asm/cacheinfo.h
+++ b/arch/x86/include/asm/cacheinfo.h
@@ -7,6 +7,8 @@  extern unsigned int cache_generic;
 #define CACHE_GENERIC_MTRR 0x01
 #define CACHE_GENERIC_PAT  0x02
 
+extern bool cache_aps_delayed_init;
+
 void cacheinfo_amd_init_llc_id(struct cpuinfo_x86 *c, int cpu);
 void cacheinfo_hygon_init_llc_id(struct cpuinfo_x86 *c, int cpu);
 
diff --git a/arch/x86/include/asm/mtrr.h b/arch/x86/include/asm/mtrr.h
index 986249a2b9b6..5d31219c8529 100644
--- a/arch/x86/include/asm/mtrr.h
+++ b/arch/x86/include/asm/mtrr.h
@@ -43,7 +43,6 @@  extern int mtrr_del(int reg, unsigned long base, unsigned long size);
 extern int mtrr_del_page(int reg, unsigned long base, unsigned long size);
 extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
 extern void mtrr_ap_init(void);
-extern void set_mtrr_aps_delayed_init(void);
 extern void mtrr_aps_init(void);
 extern void mtrr_bp_restore(void);
 extern int mtrr_trim_uncached_memory(unsigned long end_pfn);
@@ -87,7 +86,6 @@  static inline void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi)
 {
 }
 #define mtrr_ap_init() do {} while (0)
-#define set_mtrr_aps_delayed_init() do {} while (0)
 #define mtrr_aps_init() do {} while (0)
 #define mtrr_bp_restore() do {} while (0)
 #define mtrr_disable() do {} while (0)
diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 36378604ec61..c6e7c93e45e8 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1139,3 +1139,5 @@  void cache_cpu_init(void)
 	cache_enable();
 	local_irq_restore(flags);
 }
+
+bool cache_aps_delayed_init;
diff --git a/arch/x86/kernel/cpu/mtrr/mtrr.c b/arch/x86/kernel/cpu/mtrr/mtrr.c
index 956838bb4481..a47d46035240 100644
--- a/arch/x86/kernel/cpu/mtrr/mtrr.c
+++ b/arch/x86/kernel/cpu/mtrr/mtrr.c
@@ -65,7 +65,6 @@  unsigned int mtrr_usage_table[MTRR_MAX_VAR_RANGES];
 static DEFINE_MUTEX(mtrr_mutex);
 
 u64 size_or_mask, size_and_mask;
-static bool mtrr_aps_delayed_init;
 
 static const struct mtrr_ops *mtrr_ops[X86_VENDOR_NUM] __ro_after_init;
 
@@ -172,7 +171,7 @@  static int mtrr_rendezvous_handler(void *info)
 	if (data->smp_reg != ~0U) {
 		mtrr_if->set(data->smp_reg, data->smp_base,
 			     data->smp_size, data->smp_type);
-	} else if (mtrr_aps_delayed_init || !cpu_online(smp_processor_id())) {
+	} else if (cache_aps_delayed_init || !cpu_online(smp_processor_id())) {
 		cache_cpu_init();
 	}
 	return 0;
@@ -784,7 +783,7 @@  void __init mtrr_bp_init(void)
 
 void mtrr_ap_init(void)
 {
-	if (!cache_generic || mtrr_aps_delayed_init)
+	if (!cache_generic || cache_aps_delayed_init)
 		return;
 
 	/*
@@ -818,14 +817,6 @@  void mtrr_save_state(void)
 	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1);
 }
 
-void set_mtrr_aps_delayed_init(void)
-{
-	if (!cache_generic)
-		return;
-
-	mtrr_aps_delayed_init = true;
-}
-
 /*
  * Delayed MTRR initialization for all AP's
  */
@@ -839,11 +830,11 @@  void mtrr_aps_init(void)
 	 * by doing set_mtrr_aps_delayed_init(), prior to this point. If not,
 	 * then we are done.
 	 */
-	if (!mtrr_aps_delayed_init)
+	if (!cache_aps_delayed_init)
 		return;
 
 	set_mtrr(~0U, 0, 0, 0);
-	mtrr_aps_delayed_init = false;
+	cache_aps_delayed_init = false;
 }
 
 void mtrr_bp_restore(void)
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index f24227bc3220..ef7bce21cbe8 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -58,6 +58,7 @@ 
 #include <linux/overflow.h>
 
 #include <asm/acpi.h>
+#include <asm/cacheinfo.h>
 #include <asm/desc.h>
 #include <asm/nmi.h>
 #include <asm/irq.h>
@@ -1428,7 +1429,7 @@  void __init native_smp_prepare_cpus(unsigned int max_cpus)
 
 	uv_system_init();
 
-	set_mtrr_aps_delayed_init();
+	cache_aps_delayed_init = true;
 
 	smp_quirk_init_udelay();
 
@@ -1439,7 +1440,7 @@  void __init native_smp_prepare_cpus(unsigned int max_cpus)
 
 void arch_thaw_secondary_cpus_begin(void)
 {
-	set_mtrr_aps_delayed_init();
+	cache_aps_delayed_init = true;
 }
 
 void arch_thaw_secondary_cpus_end(void)