Message ID | 20200222162432.497201-1-srinivas.pandruvada@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/mce/therm_throt: Handle case where throttle_active_work() is called on behalf of an offline CPU | expand |
Quoting Srinivas Pandruvada (2020-02-22 16:24:32) > During cpu-hotplug test with CONFIG_PREEMPTION and CONFIG_DEBUG_PREEMPT > enabled, Chris reported error: > > BUG: using smp_processor_id() in preemptible [00000000] code: kworker/1:0/17 > caller is throttle_active_work+0x12/0x280 > > Here throttle_active_work() is a work queue callback scheduled with > schedule_delayed_work_on(). This will not cause this error for the use > of smp_processor_id() under normal conditions as there is a check for > "current->nr_cpus_allowed == 1". > But when the target CPU is offline the workqueue becomes unbound. > Then the work queue callback can be scheduled on another CPU and the > error is printed for the use of smp_processor_id() in preemptible context. > > When the workqueue is not getting called on the target CPU, simply return. > This is done by adding a cpu field in the _thermal_state struct and match > the current CPU id. > > Once workqueue is scheduled, prevent CPU offline. In this way, the log > bits are checked and cleared on the correct CPU. Also use get_cpu() to > get current CPU id and prevent preemption before we finish processing. > > Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle") > Reported-by: Chris Wilson <chris@chris-wilson.co.uk> > Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> > Reviewed-by: Tony Luck <tony.luck@intel.com> I've pushed the patch to our CI, but it's not a frequent occurrence, so it may be some time before I can state a t-b with any confidence. -Chris
On Sat, Feb 22, 2020 at 08:24:32AM -0800, Srinivas Pandruvada wrote: > During cpu-hotplug test with CONFIG_PREEMPTION and CONFIG_DEBUG_PREEMPT > enabled, Chris reported error: > > BUG: using smp_processor_id() in preemptible [00000000] code: kworker/1:0/17 > caller is throttle_active_work+0x12/0x280 > > Here throttle_active_work() is a work queue callback scheduled with > schedule_delayed_work_on(). This will not cause this error for the use > of smp_processor_id() under normal conditions as there is a check for > "current->nr_cpus_allowed == 1". > But when the target CPU is offline the workqueue becomes unbound. > Then the work queue callback can be scheduled on another CPU and the > error is printed for the use of smp_processor_id() in preemptible context. So what's wrong with simply doing: if (cpu_is_offline(this_cpu)) return; ? You don't need to run the callback on an offlined CPU anyway...
On Sat, 2020-02-22 at 18:51 +0100, Borislav Petkov wrote: > On Sat, Feb 22, 2020 at 08:24:32AM -0800, Srinivas Pandruvada wrote: > > During cpu-hotplug test with CONFIG_PREEMPTION and > > CONFIG_DEBUG_PREEMPT > > enabled, Chris reported error: > > > > BUG: using smp_processor_id() in preemptible [00000000] code: > > kworker/1:0/17 > > caller is throttle_active_work+0x12/0x280 > > > > Here throttle_active_work() is a work queue callback scheduled with > > schedule_delayed_work_on(). This will not cause this error for the > > use > > of smp_processor_id() under normal conditions as there is a check > > for > > "current->nr_cpus_allowed == 1". > > But when the target CPU is offline the workqueue becomes unbound. > > Then the work queue callback can be scheduled on another CPU and > > the > > error is printed for the use of smp_processor_id() in preemptible > > context. > > So what's wrong with simply doing: > > if (cpu_is_offline(this_cpu)) > return; > > ? > If the condition is false, will it prevent offline CPU before executing next statement and reschedule on another CPU? Although It will not cause any error or crash but in rare circumstance may print premature warning/normal message based on the current CPU's state. So I can submit something like this: diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c index b38010b541d6..7aa7c9d1df2a 100644 --- a/arch/x86/kernel/cpu/mce/therm_throt.c +++ b/arch/x86/kernel/cpu/mce/therm_throt.c @@ -239,11 +239,14 @@ static void throttle_active_work(struct work_struct *work) { struct _thermal_state *state = container_of(to_delayed_work(work), struct _thermal_state, therm_work); - unsigned int i, avg, this_cpu = smp_processor_id(); + unsigned int i, avg, this_cpu = state->cpu; u64 now = get_jiffies_64(); bool hot; u8 temp; + if (cpu_is_offline(this_cpu)) + return; + get_therm_status(state->level, &hot, &temp); /* temperature value is offset from the max so lesser means hotter */ if (!hot && temp > state->baseline_temp) { Thanks, Srinivas > You don't need to run the callback on an offlined CPU anyway... >
On Sat, Feb 22, 2020 at 04:25:59PM -0800, Srinivas Pandruvada wrote: > If the condition is false, will it prevent offline CPU before executing > next statement and reschedule on another CPU? Although It will not > cause any error or crash but in rare circumstance may print premature > warning/normal message based on the current CPU's state. Why, offline CPU is offline CPU? Btw, I'm asking whether you can do the simpler thing *instead* of your patch. You basically don't run the workqueue callback on offlined CPUs: get_online_cpus(); if (cpu_is_offline(smp_processor_id())) goto out; ... out: put_online_cpus(); Hmm?
Borislav Petkov <bp@alien8.de> writes: > On Sat, Feb 22, 2020 at 04:25:59PM -0800, Srinivas Pandruvada wrote: >> If the condition is false, will it prevent offline CPU before executing >> next statement and reschedule on another CPU? Although It will not >> cause any error or crash but in rare circumstance may print premature >> warning/normal message based on the current CPU's state. > > Why, offline CPU is offline CPU? > > Btw, I'm asking whether you can do the simpler thing *instead* of your > patch. You basically don't run the workqueue callback on offlined CPUs: > > get_online_cpus(); > > if (cpu_is_offline(smp_processor_id())) > goto out; > > ... > > > out: > put_online_cpus(); Which is wrong as well. Trying to "fix" it in the work queue callback is papering over the root cause. Why is any work scheduled on an outgoing CPU after this CPU executed thermal_throttle_offline()? When thermal_throttle_offline() is invoked the cpu bound work queues are still functional and thermal_throttle_offline() cancels outstanding work. So no, please fix the root cause not the symptom. Thanks, tglx
Thomas Gleixner <tglx@linutronix.de> writes: > Which is wrong as well. Trying to "fix" it in the work queue callback is > papering over the root cause. > > Why is any work scheduled on an outgoing CPU after this CPU executed > thermal_throttle_offline()? > > When thermal_throttle_offline() is invoked the cpu bound work queues are > still functional and thermal_throttle_offline() cancels outstanding > work. > > So no, please fix the root cause not the symptom. And if you look at thermal_throttle_online() then you'll notice that it is asymetric vs. thermal_throttle_offline(). Also you want to do cancel_delayed_work_sync() and not just cancel_delayed_work() because only the latter guarantees that the work is not enqueued anymore while the former does not take running or self requeueing work into account. Something like the untested patch below. Thanks, tglx --- --- a/arch/x86/kernel/cpu/mce/therm_throt.c +++ b/arch/x86/kernel/cpu/mce/therm_throt.c @@ -487,8 +487,12 @@ static int thermal_throttle_offline(unsi struct thermal_state *state = &per_cpu(thermal_state, cpu); struct device *dev = get_cpu_device(cpu); - cancel_delayed_work(&state->package_throttle.therm_work); - cancel_delayed_work(&state->core_throttle.therm_work); + /* Mask the thermal vector before draining evtl. pending work */ + l = apic_read(APIC_LVTTHMR); + apic_write(APIC_LVTTHMR, l | APIC_LVT_MASKED); + + cancel_delayed_work_sync(&state->package_throttle.therm_work); + cancel_delayed_work_sync(&state->core_throttle.therm_work); state->package_throttle.rate_control_active = false; state->core_throttle.rate_control_active = false;
On Mon, 2020-02-24 at 20:25 +0100, Thomas Gleixner wrote: > Thomas Gleixner <tglx@linutronix.de> writes: > > Which is wrong as well. Trying to "fix" it in the work queue > > callback is > > papering over the root cause. > > > > Why is any work scheduled on an outgoing CPU after this CPU > > executed > > thermal_throttle_offline()? > > > > When thermal_throttle_offline() is invoked the cpu bound work > > queues are > > still functional and thermal_throttle_offline() cancels outstanding > > work. > > > > So no, please fix the root cause not the symptom. > > And if you look at thermal_throttle_online() then you'll notice that > it > is asymetric vs. thermal_throttle_offline(). > > Also you want to do cancel_delayed_work_sync() and not just > cancel_delayed_work() because only the latter guarantees that the > work > is not enqueued anymore while the former does not take running or > self > requeueing work into account. > > Something like the untested patch below. Thanks Thomas. I am testing this patch. -Srinivas > > Thanks, > > tglx > --- > --- a/arch/x86/kernel/cpu/mce/therm_throt.c > +++ b/arch/x86/kernel/cpu/mce/therm_throt.c > @@ -487,8 +487,12 @@ static int thermal_throttle_offline(unsi > struct thermal_state *state = &per_cpu(thermal_state, cpu); > struct device *dev = get_cpu_device(cpu); > > - cancel_delayed_work(&state->package_throttle.therm_work); > - cancel_delayed_work(&state->core_throttle.therm_work); > + /* Mask the thermal vector before draining evtl. pending work > */ > + l = apic_read(APIC_LVTTHMR); > + apic_write(APIC_LVTTHMR, l | APIC_LVT_MASKED); > + > + cancel_delayed_work_sync(&state->package_throttle.therm_work); > + cancel_delayed_work_sync(&state->core_throttle.therm_work); > > state->package_throttle.rate_control_active = false; > state->core_throttle.rate_control_active = false;
On Mon, 2020-02-24 at 20:25 +0100, Thomas Gleixner wrote: > Thomas Gleixner <tglx@linutronix.de> writes: > > Which is wrong as well. Trying to "fix" it in the work queue > > callback is > > papering over the root cause. > > > > Why is any work scheduled on an outgoing CPU after this CPU > > executed > > thermal_throttle_offline()? > > > > When thermal_throttle_offline() is invoked the cpu bound work > > queues are > > still functional and thermal_throttle_offline() cancels outstanding > > work. > > > > So no, please fix the root cause not the symptom. > > And if you look at thermal_throttle_online() then you'll notice that > it > is asymetric vs. thermal_throttle_offline(). > > Also you want to do cancel_delayed_work_sync() and not just > cancel_delayed_work() because only the latter guarantees that the > work > is not enqueued anymore while the former does not take running or > self > requeueing work into account. > > Something like the untested patch below. I tested this patch. After simulating 20 million thermal interrupts and online/offline test for 12+ hours, don't see the issue. So this change fixed the issue. I can send change on your behalf or you can add Tested-by: Pandruvada, Srinivas <srinivas.pandruvada@linux.intel.com> Thanks, Srinivas > > Thanks, > > tglx > --- > --- a/arch/x86/kernel/cpu/mce/therm_throt.c > +++ b/arch/x86/kernel/cpu/mce/therm_throt.c > @@ -487,8 +487,12 @@ static int thermal_throttle_offline(unsi > struct thermal_state *state = &per_cpu(thermal_state, cpu); > struct device *dev = get_cpu_device(cpu); > > - cancel_delayed_work(&state->package_throttle.therm_work); > - cancel_delayed_work(&state->core_throttle.therm_work); > + /* Mask the thermal vector before draining evtl. pending work > */ > + l = apic_read(APIC_LVTTHMR); > + apic_write(APIC_LVTTHMR, l | APIC_LVT_MASKED); > + > + cancel_delayed_work_sync(&state->package_throttle.therm_work); > + cancel_delayed_work_sync(&state->core_throttle.therm_work); > > state->package_throttle.rate_control_active = false; > state->core_throttle.rate_control_active = false;
diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c index 58b4ee3cda77..4dab8a4558f9 100644 --- a/arch/x86/kernel/cpu/mce/therm_throt.c +++ b/arch/x86/kernel/cpu/mce/therm_throt.c @@ -61,6 +61,7 @@ * @new_event: Stores the last high/low status of the * THERM_STATUS_PROCHOT or * THERM_STATUS_POWER_LIMIT. + * @cpu: CPU id for this instance. * @level: Stores whether this _thermal_state instance is * for a CORE level or for PACKAGE level. * @sample_index: Index for storing the next sample in the buffer @@ -86,6 +87,7 @@ struct _thermal_state { unsigned long total_time_ms; bool rate_control_active; bool new_event; + int cpu; u8 level; u8 sample_index; u8 sample_count; @@ -239,11 +241,19 @@ static void __maybe_unused throttle_active_work(struct work_struct *work) { struct _thermal_state *state = container_of(to_delayed_work(work), struct _thermal_state, therm_work); - unsigned int i, avg, this_cpu = smp_processor_id(); + unsigned int i, avg, this_cpu; u64 now = get_jiffies_64(); bool hot; u8 temp; + get_online_cpus(); + this_cpu = get_cpu(); + + if (state->cpu != this_cpu) { + state->rate_control_active = false; + goto end; + } + get_therm_status(state->level, &hot, &temp); /* temperature value is offset from the max so lesser means hotter */ if (!hot && temp > state->baseline_temp) { @@ -254,7 +264,7 @@ static void __maybe_unused throttle_active_work(struct work_struct *work) state->count); state->rate_control_active = false; - return; + goto end; } if (time_before64(now, state->next_check) && @@ -296,6 +306,10 @@ static void __maybe_unused throttle_active_work(struct work_struct *work) re_arm: clear_therm_status_log(state->level); schedule_delayed_work_on(this_cpu, &state->therm_work, THERM_THROT_POLL_INTERVAL); + +end: + put_cpu(); + put_online_cpus(); } /*** @@ -359,6 +373,7 @@ static void therm_throt_process(bool new_event, int event, int level) state->baseline_temp = temp; state->last_interrupt_time = now; + state->cpu = this_cpu; schedule_delayed_work_on(this_cpu, &state->therm_work, THERM_THROT_POLL_INTERVAL); } else if (old_event && state->last_interrupt_time) { unsigned long throttle_time;