Message ID | 20191111214312.81365-1-srinivas.pandruvada@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86, mce, therm_throt: Optimize notifications of thermal throttle | expand |
Quoting Srinivas Pandruvada (2019-11-11 21:43:12) > +static void throttle_active_work(struct work_struct *work) > +{ > + struct _thermal_state *state = container_of(to_delayed_work(work), > + struct _thermal_state, therm_work); > + unsigned int i, avg, this_cpu = smp_processor_id(); > + u64 now = get_jiffies_64(); > + bool hot; > + u8 temp; <6> [198.901895] [IGT] perf_pmu: starting subtest cpu-hotplug <4> [199.088851] IRQ 24: no longer affine to CPU0 <4> [199.088871] IRQ 25: no longer affine to CPU0 <6> [199.091679] smpboot: CPU 0 is now offline <6> [200.122204] smpboot: Booting Node 0 Processor 0 APIC 0x0 <6> [200.297267] smpboot: CPU 1 is now offline <3> [201.218812] BUG: using smp_processor_id() in preemptible [00000000] code: kworker/1:0/17 <4> [201.218974] caller is throttle_active_work+0x12/0x280 <4> [201.218985] CPU: 0 PID: 17 Comm: kworker/1:0 Tainted: G U 5.5.0-CI-CI_DRM_7867+ #1 <4> [201.218991] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016 <4> [201.219001] Workqueue: events throttle_active_work <4> [201.219009] Call Trace: <4> [201.219021] dump_stack+0x71/0x9b <4> [201.219035] debug_smp_processor_id+0xad/0xb0 <4> [201.219047] throttle_active_work+0x12/0x280 <4> [201.219063] process_one_work+0x26a/0x620 <4> [201.219087] worker_thread+0x37/0x380 <4> [201.219103] ? process_one_work+0x620/0x620 <4> [201.219110] kthread+0x119/0x130 <4> [201.219119] ? kthread_park+0x80/0x80 <4> [201.219134] ret_from_fork+0x3a/0x50 <6> [201.315866] x86: Booting SMP configuration: <6> [201.315880] smpboot: Booting Node 0 Processor 1 APIC 0x2 <4> [201.319814] ------------[ cut here ]------------ <3> [201.319832] ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x10 <4> [201.319971] WARNING: CPU: 1 PID: 14 at lib/debugobjects.c:484 debug_print_object+0x67/0x90 <4> [201.319977] Modules linked in: vgem snd_hda_codec_hdmi i915 mei_hdcp x86_pkg_temp_thermal coretemp snd_hda_codec_realtek crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel snd_intel_dspcfg snd_hda_codec ghash_clmulni_intel snd_hwdep snd_hda_core snd_pcm mei_me r8169 mei realtek lpc_ich prime_numbers <4> [201.320023] CPU: 1 PID: 14 Comm: cpuhp/1 Tainted: G U 5.5.0-CI-CI_DRM_7867+ #1 <4> [201.320029] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS V1.12 02/15/2016 <4> [201.320038] RIP: 0010:debug_print_object+0x67/0x90 <4> [201.320046] Code: 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 17 f7 8b 02 8b 53 10 4c 89 e6 48 c7 c7 b0 ce 31 82 48 8b 14 d5 00 37 07 82 e8 89 7b b8 ff <0f> 0b 5b 83 05 33 fb 21 01 01 5d 41 5c c3 83 05 28 fb 21 01 01 c3 <4> [201.320053] RSP: 0000:ffffc900000dbd40 EFLAGS: 00010286 <4> [201.320060] RAX: 0000000000000000 RBX: ffff888408665d68 RCX: 0000000000000001 <4> [201.320066] RDX: 0000000080000001 RSI: ffff88840d6e30f8 RDI: 00000000ffffffff <4> [201.320072] RBP: ffffffff826489e0 R08: ffff88840d6e30f8 R09: 0000000000000000 <4> [201.320078] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff822d7bd1 <4> [201.320084] R13: ffffffff826489e0 R14: ffff88840f898300 R15: 0000000000000202 <4> [201.320091] FS: 0000000000000000(0000) GS:ffff88840f880000(0000) knlGS:0000000000000000 <4> [201.320098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [201.320104] CR2: 0000000000000000 CR3: 0000000005610001 CR4: 00000000001606e0 <4> [201.320109] Call Trace: <4> [201.320125] __debug_object_init+0x359/0x510 <4> [201.320140] ? _raw_spin_unlock_irqrestore+0x34/0x60 <4> [201.320156] ? queue_work_node+0x70/0x70 <4> [201.320165] init_timer_key+0x25/0x140 <4> [201.320180] ? intel_thermal_supported+0x30/0x30 <4> [201.320191] thermal_throttle_online+0xb4/0x260 <4> [201.320204] ? unexpected_thermal_interrupt+0x20/0x20 <4> [201.320213] cpuhp_invoke_callback+0x9b/0x9d0 <4> [201.320235] cpuhp_thread_fun+0x1c8/0x220 <4> [201.320249] ? smpboot_thread_fn+0x23/0x280 <4> [201.320259] ? smpboot_thread_fn+0x6b/0x280 <4> [201.320271] smpboot_thread_fn+0x1d3/0x280 <4> [201.320288] ? sort_range+0x20/0x20 <4> [201.320295] kthread+0x119/0x130 <4> [201.320303] ? kthread_park+0x80/0x80 <4> [201.320317] ret_from_fork+0x3a/0x50 <4> [201.320348] irq event stamp: 4846 <4> [201.320358] hardirqs last enabled at (4845): [<ffffffff8112dcca>] console_unlock+0x4ba/0x5a0 <4> [201.320368] hardirqs last disabled at (4846): [<ffffffff81001ca0>] trace_hardirqs_off_thunk+0x1a/0x1c <4> [201.320379] softirqs last enabled at (4746): [<ffffffff81e00385>] __do_softirq+0x385/0x47f <4> [201.320388] softirqs last disabled at (4739): [<ffffffff810ba15a>] irq_exit+0xba/0xc0 <4> [201.320394] ---[ end trace 06576bf31ad2ac2b ]--- Are we otherwise relying on current->nr_cpus_allowed == 1 here? (As this section is not within a preempt_disable or local_irq_disable region.) -Chris
On Sat, 2020-02-08 at 22:24 +0000, Chris Wilson wrote: > Quoting Srinivas Pandruvada (2019-11-11 21:43:12) > > +static void throttle_active_work(struct work_struct *work) > > +{ > > + struct _thermal_state *state = > > container_of(to_delayed_work(work), > > + struct > > _thermal_state, therm_work); > > + unsigned int i, avg, this_cpu = smp_processor_id(); > > + u64 now = get_jiffies_64(); > > + bool hot; > > + u8 temp; > > <6> [198.901895] [IGT] perf_pmu: starting subtest cpu-hotplug > <4> [199.088851] IRQ 24: no longer affine to CPU0 > <4> [199.088871] IRQ 25: no longer affine to CPU0 > <6> [199.091679] smpboot: CPU 0 is now offline > <6> [200.122204] smpboot: Booting Node 0 Processor 0 APIC 0x0 > <6> [200.297267] smpboot: CPU 1 is now offline > <3> [201.218812] BUG: using smp_processor_id() in preemptible > [00000000] code: kworker/1:0/17 > <4> [201.218974] caller is throttle_active_work+0x12/0x280 > <4> [201.218985] CPU: 0 PID: 17 Comm: kworker/1:0 Tainted: > G U 5.5.0-CI-CI_DRM_7867+ #1 > <4> [201.218991] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS > V1.12 02/15/2016 > <4> [201.219001] Workqueue: events throttle_active_work > <4> [201.219009] Call Trace: > <4> [201.219021] dump_stack+0x71/0x9b > <4> [201.219035] debug_smp_processor_id+0xad/0xb0 > <4> [201.219047] throttle_active_work+0x12/0x280 > <4> [201.219063] process_one_work+0x26a/0x620 > <4> [201.219087] worker_thread+0x37/0x380 > <4> [201.219103] ? process_one_work+0x620/0x620 > <4> [201.219110] kthread+0x119/0x130 > <4> [201.219119] ? kthread_park+0x80/0x80 > <4> [201.219134] ret_from_fork+0x3a/0x50 > <6> [201.315866] x86: Booting SMP configuration: > <6> [201.315880] smpboot: Booting Node 0 Processor 1 APIC 0x2 > <4> [201.319814] ------------[ cut here ]------------ > <3> [201.319832] ODEBUG: init active (active state 0) object type: > timer_list hint: delayed_work_timer_fn+0x0/0x10 > <4> [201.319971] WARNING: CPU: 1 PID: 14 at lib/debugobjects.c:484 > debug_print_object+0x67/0x90 > <4> [201.319977] Modules linked in: vgem snd_hda_codec_hdmi i915 > mei_hdcp x86_pkg_temp_thermal coretemp snd_hda_codec_realtek > crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel > snd_intel_dspcfg snd_hda_codec ghash_clmulni_intel snd_hwdep > snd_hda_core snd_pcm mei_me r8169 mei realtek lpc_ich prime_numbers > <4> [201.320023] CPU: 1 PID: 14 Comm: cpuhp/1 Tainted: > G U 5.5.0-CI-CI_DRM_7867+ #1 > <4> [201.320029] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS > V1.12 02/15/2016 > <4> [201.320038] RIP: 0010:debug_print_object+0x67/0x90 > <4> [201.320046] Code: 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 17 f7 8b > 02 8b 53 10 4c 89 e6 48 c7 c7 b0 ce 31 82 48 8b 14 d5 00 37 07 82 e8 > 89 7b b8 ff <0f> 0b 5b 83 05 33 fb 21 01 01 5d 41 5c c3 83 05 28 fb > 21 01 01 c3 > <4> [201.320053] RSP: 0000:ffffc900000dbd40 EFLAGS: 00010286 > <4> [201.320060] RAX: 0000000000000000 RBX: ffff888408665d68 RCX: > 0000000000000001 > <4> [201.320066] RDX: 0000000080000001 RSI: ffff88840d6e30f8 RDI: > 00000000ffffffff > <4> [201.320072] RBP: ffffffff826489e0 R08: ffff88840d6e30f8 R09: > 0000000000000000 > <4> [201.320078] R10: 0000000000000000 R11: 0000000000000000 R12: > ffffffff822d7bd1 > <4> [201.320084] R13: ffffffff826489e0 R14: ffff88840f898300 R15: > 0000000000000202 > <4> [201.320091] FS: 0000000000000000(0000) > GS:ffff88840f880000(0000) knlGS:0000000000000000 > <4> [201.320098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > <4> [201.320104] CR2: 0000000000000000 CR3: 0000000005610001 CR4: > 00000000001606e0 > <4> [201.320109] Call Trace: > <4> [201.320125] __debug_object_init+0x359/0x510 > <4> [201.320140] ? _raw_spin_unlock_irqrestore+0x34/0x60 > <4> [201.320156] ? queue_work_node+0x70/0x70 > <4> [201.320165] init_timer_key+0x25/0x140 > <4> [201.320180] ? intel_thermal_supported+0x30/0x30 > <4> [201.320191] thermal_throttle_online+0xb4/0x260 > <4> [201.320204] ? unexpected_thermal_interrupt+0x20/0x20 > <4> [201.320213] cpuhp_invoke_callback+0x9b/0x9d0 > <4> [201.320235] cpuhp_thread_fun+0x1c8/0x220 > <4> [201.320249] ? smpboot_thread_fn+0x23/0x280 > <4> [201.320259] ? smpboot_thread_fn+0x6b/0x280 > <4> [201.320271] smpboot_thread_fn+0x1d3/0x280 > <4> [201.320288] ? sort_range+0x20/0x20 > <4> [201.320295] kthread+0x119/0x130 > <4> [201.320303] ? kthread_park+0x80/0x80 > <4> [201.320317] ret_from_fork+0x3a/0x50 > <4> [201.320348] irq event stamp: 4846 > <4> [201.320358] hardirqs last enabled at (4845): > [<ffffffff8112dcca>] console_unlock+0x4ba/0x5a0 > <4> [201.320368] hardirqs last disabled at (4846): > [<ffffffff81001ca0>] trace_hardirqs_off_thunk+0x1a/0x1c > <4> [201.320379] softirqs last enabled at (4746): > [<ffffffff81e00385>] __do_softirq+0x385/0x47f > <4> [201.320388] softirqs last disabled at (4739): > [<ffffffff810ba15a>] irq_exit+0xba/0xc0 > <4> [201.320394] ---[ end trace 06576bf31ad2ac2b ]--- > > Are we otherwise relying on current->nr_cpus_allowed == 1 here? No. I am checking internally, if I can use raw_smp_processor_id() instead. Thanks, Srinivas > (As this section is not within a preempt_disable or local_irq_disable > region.) > -Chris
On Sat, 2020-02-08 at 22:09 -0800, Srinivas Pandruvada wrote: > On Sat, 2020-02-08 at 22:24 +0000, Chris Wilson wrote: > > Quoting Srinivas Pandruvada (2019-11-11 21:43:12) > > > +static void throttle_active_work(struct work_struct *work) > > > +{ > > > + struct _thermal_state *state = > > > container_of(to_delayed_work(work), > > > + struct > > > _thermal_state, therm_work); > > > + unsigned int i, avg, this_cpu = smp_processor_id(); > > > + u64 now = get_jiffies_64(); > > > + bool hot; > > > + u8 temp; > > > > <6> [198.901895] [IGT] perf_pmu: starting subtest cpu-hotplug > > <4> [199.088851] IRQ 24: no longer affine to CPU0 > > <4> [199.088871] IRQ 25: no longer affine to CPU0 > > <6> [199.091679] smpboot: CPU 0 is now offline > > <6> [200.122204] smpboot: Booting Node 0 Processor 0 APIC 0x0 > > <6> [200.297267] smpboot: CPU 1 is now offline > > <3> [201.218812] BUG: using smp_processor_id() in preemptible > > [00000000] code: kworker/1:0/17 > > <4> [201.218974] caller is throttle_active_work+0x12/0x280 > > <4> [201.218985] CPU: 0 PID: 17 Comm: kworker/1:0 Tainted: > > G U 5.5.0-CI-CI_DRM_7867+ #1 > > <4> [201.218991] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS > > V1.12 02/15/2016 > > <4> [201.219001] Workqueue: events throttle_active_work > > <4> [201.219009] Call Trace: > > <4> [201.219021] dump_stack+0x71/0x9b > > <4> [201.219035] debug_smp_processor_id+0xad/0xb0 > > <4> [201.219047] throttle_active_work+0x12/0x280 > > <4> [201.219063] process_one_work+0x26a/0x620 > > <4> [201.219087] worker_thread+0x37/0x380 > > <4> [201.219103] ? process_one_work+0x620/0x620 > > <4> [201.219110] kthread+0x119/0x130 > > <4> [201.219119] ? kthread_park+0x80/0x80 > > <4> [201.219134] ret_from_fork+0x3a/0x50 > > <6> [201.315866] x86: Booting SMP configuration: > > <6> [201.315880] smpboot: Booting Node 0 Processor 1 APIC 0x2 > > <4> [201.319814] ------------[ cut here ]------------ > > <3> [201.319832] ODEBUG: init active (active state 0) object type: > > timer_list hint: delayed_work_timer_fn+0x0/0x10 > > <4> [201.319971] WARNING: CPU: 1 PID: 14 at lib/debugobjects.c:484 > > debug_print_object+0x67/0x90 > > <4> [201.319977] Modules linked in: vgem snd_hda_codec_hdmi i915 > > mei_hdcp x86_pkg_temp_thermal coretemp snd_hda_codec_realtek > > crct10dif_pclmul snd_hda_codec_generic crc32_pclmul snd_hda_intel > > snd_intel_dspcfg snd_hda_codec ghash_clmulni_intel snd_hwdep > > snd_hda_core snd_pcm mei_me r8169 mei realtek lpc_ich prime_numbers > > <4> [201.320023] CPU: 1 PID: 14 Comm: cpuhp/1 Tainted: > > G U 5.5.0-CI-CI_DRM_7867+ #1 > > <4> [201.320029] Hardware name: MSI MS-7924/Z97M-G43(MS-7924), BIOS > > V1.12 02/15/2016 > > <4> [201.320038] RIP: 0010:debug_print_object+0x67/0x90 > > <4> [201.320046] Code: 83 c2 01 8b 4b 14 4c 8b 45 00 89 15 17 f7 8b > > 02 8b 53 10 4c 89 e6 48 c7 c7 b0 ce 31 82 48 8b 14 d5 00 37 07 82 > > e8 > > 89 7b b8 ff <0f> 0b 5b 83 05 33 fb 21 01 01 5d 41 5c c3 83 05 28 fb > > 21 01 01 c3 > > <4> [201.320053] RSP: 0000:ffffc900000dbd40 EFLAGS: 00010286 > > <4> [201.320060] RAX: 0000000000000000 RBX: ffff888408665d68 RCX: > > 0000000000000001 > > <4> [201.320066] RDX: 0000000080000001 RSI: ffff88840d6e30f8 RDI: > > 00000000ffffffff > > <4> [201.320072] RBP: ffffffff826489e0 R08: ffff88840d6e30f8 R09: > > 0000000000000000 > > <4> [201.320078] R10: 0000000000000000 R11: 0000000000000000 R12: > > ffffffff822d7bd1 > > <4> [201.320084] R13: ffffffff826489e0 R14: ffff88840f898300 R15: > > 0000000000000202 > > <4> [201.320091] FS: 0000000000000000(0000) > > GS:ffff88840f880000(0000) knlGS:0000000000000000 > > <4> [201.320098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > <4> [201.320104] CR2: 0000000000000000 CR3: 0000000005610001 CR4: > > 00000000001606e0 > > <4> [201.320109] Call Trace: > > <4> [201.320125] __debug_object_init+0x359/0x510 > > <4> [201.320140] ? _raw_spin_unlock_irqrestore+0x34/0x60 > > <4> [201.320156] ? queue_work_node+0x70/0x70 > > <4> [201.320165] init_timer_key+0x25/0x140 > > <4> [201.320180] ? intel_thermal_supported+0x30/0x30 > > <4> [201.320191] thermal_throttle_online+0xb4/0x260 > > <4> [201.320204] ? unexpected_thermal_interrupt+0x20/0x20 > > <4> [201.320213] cpuhp_invoke_callback+0x9b/0x9d0 > > <4> [201.320235] cpuhp_thread_fun+0x1c8/0x220 > > <4> [201.320249] ? smpboot_thread_fn+0x23/0x280 > > <4> [201.320259] ? smpboot_thread_fn+0x6b/0x280 > > <4> [201.320271] smpboot_thread_fn+0x1d3/0x280 > > <4> [201.320288] ? sort_range+0x20/0x20 > > <4> [201.320295] kthread+0x119/0x130 > > <4> [201.320303] ? kthread_park+0x80/0x80 > > <4> [201.320317] ret_from_fork+0x3a/0x50 > > <4> [201.320348] irq event stamp: 4846 > > <4> [201.320358] hardirqs last enabled at (4845): > > [<ffffffff8112dcca>] console_unlock+0x4ba/0x5a0 > > <4> [201.320368] hardirqs last disabled at (4846): > > [<ffffffff81001ca0>] trace_hardirqs_off_thunk+0x1a/0x1c > > <4> [201.320379] softirqs last enabled at (4746): > > [<ffffffff81e00385>] __do_softirq+0x385/0x47f > > <4> [201.320388] softirqs last disabled at (4739): > > [<ffffffff810ba15a>] irq_exit+0xba/0xc0 > > <4> [201.320394] ---[ end trace 06576bf31ad2ac2b ]--- > > > > Are we otherwise relying on current->nr_cpus_allowed == 1 here? > No. > I am checking internally, if I can use raw_smp_processor_id() > instead. Let me correct my answer. Here the call is from a workqueue callback which is scheduled to execute on a specific CPU using schedule_delayed_work_on(). Meanwhile if the CPU is offline or dead, not sure if the thread can execute on another CPU. Thanks, Srinivas > > Thanks, > Srinivas > > > (As this section is not within a preempt_disable or > > local_irq_disable > > region.) > > -Chris
diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c index bc441d68d060..c102fcff23f3 100644 --- a/arch/x86/kernel/cpu/mce/therm_throt.c +++ b/arch/x86/kernel/cpu/mce/therm_throt.c @@ -40,15 +40,58 @@ #define THERMAL_THROTTLING_EVENT 0 #define POWER_LIMIT_EVENT 1 -/* - * Current thermal event state: +/** + * struct _thermal_state - Represent the current thermal event state + * @next_check: Stores the next timestamp, when it is allowed + * to log the next warning message. + * @last_interrupt_time: Stores the timestamp for the last threshold + * high event. + * @therm_work: Delayed workqueue structure + * @count: Stores the current running count for thermal + * or power threshold interrupts. + * @last_count: Stores the previous running count for thermal + * or power threshold interrupts. + * @max_time_ms: This shows the maximum amount of time CPU was + * in throttled state for a single thermal + * threshold high to low state. + * @total_time_ms: This is a cumulative time during which CPU was + * in the throttled state. + * @rate_control_active: Flag to set, when throttling message is logged. + * This is used for the purpose of rate-control. + * @new_event: Stores the last high/low status of the + * THERM_STATUS_PROCHOT or + * THERM_STATUS_POWER_LIMIT. + * @level: Stores whether this _thermal_state instance is + * for a CORE level or for PACKAGE level. + * @sample_index: Index for storage for the next sample in the + * buffer temp_samples[]. + * @sample_count: Total number of samples collected in the buffer + * temp_samples[]. + * @average: The last moving average of temperature samples + * @baseline_temp: Temperature at which thermal threshold high + * interrupt was generated. + * @temp_samples: Storage for temperature samples to calculate + * moving average. + * + * This structure is used to represent data related to thermal state for a CPU. + * There is a separate storage for core and package level for each CPU. */ struct _thermal_state { - bool new_event; - int event; u64 next_check; + u64 last_interrupt_time; + struct delayed_work therm_work; unsigned long count; unsigned long last_count; + unsigned long max_time_ms; + unsigned long total_time_ms; + bool rate_control_active; + bool new_event; + u8 level; + u8 sample_index; + u8 sample_count; + u8 average; + u8 baseline_temp; + u8 temp_samples[3]; }; struct thermal_state { @@ -121,8 +164,22 @@ define_therm_throt_device_one_ro(package_throttle_count); define_therm_throt_device_show_func(package_power_limit, count); define_therm_throt_device_one_ro(package_power_limit_count); +define_therm_throt_device_show_func(core_throttle, max_time_ms); +define_therm_throt_device_one_ro(core_throttle_max_time_ms); + +define_therm_throt_device_show_func(package_throttle, max_time_ms); +define_therm_throt_device_one_ro(package_throttle_max_time_ms); + +define_therm_throt_device_show_func(core_throttle, total_time_ms); +define_therm_throt_device_one_ro(core_throttle_total_time_ms); + +define_therm_throt_device_show_func(package_throttle, total_time_ms); +define_therm_throt_device_one_ro(package_throttle_total_time_ms); + static struct attribute *thermal_throttle_attrs[] = { &dev_attr_core_throttle_count.attr, + &dev_attr_core_throttle_max_time_ms.attr, + &dev_attr_core_throttle_total_time_ms.attr, NULL }; @@ -135,6 +192,105 @@ static const struct attribute_group thermal_attr_group = { #define CORE_LEVEL 0 #define PACKAGE_LEVEL 1 +#define THERM_THROT_POLL_INTERVAL HZ +#define THERM_STATUS_PROCHOT_LOG BIT(1) + +static void clear_therm_status_log(int level) +{ + int msr; + u64 msr_val; + + if (level == CORE_LEVEL) + msr = MSR_IA32_THERM_STATUS; + else + msr = MSR_IA32_PACKAGE_THERM_STATUS; + + rdmsrl(msr, msr_val); + wrmsrl(msr, msr_val & ~THERM_STATUS_PROCHOT_LOG); +} + +static void get_therm_status(int level, bool *proc_hot, u8 *temp) +{ + int msr; + u64 msr_val; + + if (level == CORE_LEVEL) + msr = MSR_IA32_THERM_STATUS; + else + msr = MSR_IA32_PACKAGE_THERM_STATUS; + + rdmsrl(msr, msr_val); + if (msr_val & THERM_STATUS_PROCHOT_LOG) + *proc_hot = true; + else + *proc_hot = false; + + *temp = (msr_val >> 16) & 0x7F; +} + +static void throttle_active_work(struct work_struct *work) +{ + struct _thermal_state *state = container_of(to_delayed_work(work), + struct _thermal_state, therm_work); + unsigned int i, avg, this_cpu = smp_processor_id(); + u64 now = get_jiffies_64(); + bool hot; + u8 temp; + + get_therm_status(state->level, &hot, &temp); + /* temperature value is offset from the max so lesser means hotter */ + if (!hot && temp > state->baseline_temp) { + if (state->rate_control_active) + pr_info("CPU%d: %s temperature/speed normal (total events = %lu)\n", + this_cpu, + state->level == CORE_LEVEL ? "Core" : "Package", + state->count); + + state->rate_control_active = false; + return; + } + + if (time_before64(now, state->next_check) && + state->rate_control_active) + goto re_arm; + + state->next_check = now + CHECK_INTERVAL; + + if (state->count != state->last_count) { + /* There was one new thermal interrupt */ + state->last_count = state->count; + state->average = 0; + state->sample_count = 0; + state->sample_index = 0; + } + + state->temp_samples[state->sample_index] = temp; + state->sample_count++; + state->sample_index = (state->sample_index + 1) % ARRAY_SIZE(state->temp_samples); + if (state->sample_count < ARRAY_SIZE(state->temp_samples)) + goto re_arm; + + avg = 0; + for (i = 0; i < ARRAY_SIZE(state->temp_samples); ++i) + avg += state->temp_samples[i]; + + avg /= ARRAY_SIZE(state->temp_samples); + + if (state->average > avg) { + pr_warn("CPU%d: %s temperature is above threshold, cpu clock is throttled (total events = %lu)\n", + this_cpu, + state->level == CORE_LEVEL ? "Core" : "Package", + state->count); + state->rate_control_active = true; + } + + state->average = avg; + +re_arm: + clear_therm_status_log(state->level); + schedule_delayed_work_on(this_cpu, &state->therm_work, THERM_THROT_POLL_INTERVAL); +} + /*** * therm_throt_process - Process thermal throttling event from interrupt * @curr: Whether the condition is current or not (boolean), since the @@ -178,27 +334,33 @@ static void therm_throt_process(bool new_event, int event, int level) if (new_event) state->count++; - if (time_before64(now, state->next_check) && - state->count != state->last_count) + if (event != THERMAL_THROTTLING_EVENT) return; - state->next_check = now + CHECK_INTERVAL; - state->last_count = state->count; + if (new_event && !state->last_interrupt_time) { + bool hot; + u8 temp; + + get_therm_status(state->level, &hot, &temp); + /* + * Ignore short temperature spike as the system is not close + * to PROCHOT. 10C offset is large enough to ignore. It is + * already dropped from the high threshold temperature. + */ + if (temp > 10) + return; - /* if we just entered the thermal event */ - if (new_event) { - if (event == THERMAL_THROTTLING_EVENT) - pr_warn("CPU%d: %s temperature above threshold, cpu clock throttled (total events = %lu)\n", - this_cpu, - level == CORE_LEVEL ? "Core" : "Package", - state->count); - return; - } - if (old_event) { - if (event == THERMAL_THROTTLING_EVENT) - pr_info("CPU%d: %s temperature/speed normal\n", this_cpu, - level == CORE_LEVEL ? "Core" : "Package"); - return; + state->baseline_temp = temp; + state->last_interrupt_time = now; + schedule_delayed_work_on(this_cpu, &state->therm_work, THERM_THROT_POLL_INTERVAL); + } else if (old_event && state->last_interrupt_time) { + unsigned long throttle_time; + + throttle_time = jiffies_delta_to_msecs(now - state->last_interrupt_time); + if (throttle_time > state->max_time_ms) + state->max_time_ms = throttle_time; + state->total_time_ms += throttle_time; + state->last_interrupt_time = 0; } } @@ -244,20 +406,47 @@ static int thermal_throttle_add_dev(struct device *dev, unsigned int cpu) if (err) return err; - if (cpu_has(c, X86_FEATURE_PLN) && int_pln_enable) + if (cpu_has(c, X86_FEATURE_PLN) && int_pln_enable) { err = sysfs_add_file_to_group(&dev->kobj, &dev_attr_core_power_limit_count.attr, thermal_attr_group.name); + if (err) + goto del_group; + } + if (cpu_has(c, X86_FEATURE_PTS)) { err = sysfs_add_file_to_group(&dev->kobj, &dev_attr_package_throttle_count.attr, thermal_attr_group.name); - if (cpu_has(c, X86_FEATURE_PLN) && int_pln_enable) + if (err) + goto del_group; + + err = sysfs_add_file_to_group(&dev->kobj, + &dev_attr_package_throttle_max_time_ms.attr, + thermal_attr_group.name); + if (err) + goto del_group; + + err = sysfs_add_file_to_group(&dev->kobj, + &dev_attr_package_throttle_total_time_ms.attr, + thermal_attr_group.name); + if (err) + goto del_group; + + if (cpu_has(c, X86_FEATURE_PLN) && int_pln_enable) { err = sysfs_add_file_to_group(&dev->kobj, &dev_attr_package_power_limit_count.attr, thermal_attr_group.name); + if (err) + goto del_group; + } } + return 0; + +del_group: + sysfs_remove_group(&dev->kobj, &thermal_attr_group); + return err; } @@ -269,15 +458,29 @@ static void thermal_throttle_remove_dev(struct device *dev) /* Get notified when a cpu comes on/off. Be hotplug friendly. */ static int thermal_throttle_online(unsigned int cpu) { + struct thermal_state *state = &per_cpu(thermal_state, cpu); struct device *dev = get_cpu_device(cpu); + state->package_throttle.level = PACKAGE_LEVEL; + state->core_throttle.level = CORE_LEVEL; + + INIT_DELAYED_WORK(&state->package_throttle.therm_work, throttle_active_work); + INIT_DELAYED_WORK(&state->core_throttle.therm_work, throttle_active_work); + return thermal_throttle_add_dev(dev, cpu); } static int thermal_throttle_offline(unsigned int cpu) { + struct thermal_state *state = &per_cpu(thermal_state, cpu); struct device *dev = get_cpu_device(cpu); + cancel_delayed_work(&state->package_throttle.therm_work); + cancel_delayed_work(&state->core_throttle.therm_work); + + state->package_throttle.rate_control_active = false; + state->core_throttle.rate_control_active = false; + thermal_throttle_remove_dev(dev); return 0; }
Some modern systems have very tight thermal tolerances. Because of this they may cross thermal thresholds when running normal workloads (even during boot). The CPU hardware will react by limiting power/frequency and using duty cycles to bring the temperature back into normal range. Thus users may see a "critical" message about the "temperature above threshold" which is soon followed by "temperature/speed normal". These messages are rate-limited, but still may repeat every few minutes. This issue became worse starting with the Ivy Bridge generation of CPUs because they include a TCC activation offset in the MSR IA32_TEMPERATURE_TARGET. OEMs use this to provide alerts long before critical temperatures are reached. A test run on a laptop with Intel 8th Gen i5 core for two hours with a workload resulted in 20K+ thermal interrupts per CPU for core level and another 20K+ interrupts at package level. The kernel logs were full of throttling messages. The real value of these threshold interrupts, is to debug problems with the external cooling solutions and performance issues due to excessive throttling. So the solution here is the following: - In the current thermal_throttle folder, show the maximum time for one throttling event and total amount of time, the system was in throttling state. - Do not log short excursions. - Log only when, in spite of thermal throttling the temperature is rising. On the high threshold interrupt trigger a delayed workqueue, that monitors the threshold violation log bit (THERM_STATUS_PROCHOT_LOG). When the log bit is set, this workqueue callback calculates three point moving average and logs warning message when the temperature trend is rising. When this log bit is clear and temperature is below threshold temperature, then the workqueue callback logs "Normal" message". Once a high threshold event is logged, the logging is rate limited. With this patch, on the same test laptop, no warnings are printed in logs as the max time the processor could bring the temperature under control is only 280 ms. This implementation is done with the inputs from Alan Cox and Tony Luck. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> --- Version History: v1: Changes compared to RFC PATCH: Addressed comments from Boris Rebased to tip/master Added kenrel doc for struct Edits to the commit description Optimize storage for _thermal_state Ignore invalid sample for threshold high arch/x86/kernel/cpu/mce/therm_throt.c | 251 +++++++++++++++++++++++--- 1 file changed, 227 insertions(+), 24 deletions(-)