Message ID | 20240313214031.1658045-1-fenghua.yu@intel.com (mailing list archive) |
---|---|
State | Accepted |
Commit | f221033f5c24659dc6ad7e5cf18fb1b075f4a8be |
Headers | show |
Series | dmaengine: idxd: Fix oops during rmmod on single-CPU platforms | expand |
On 3/13/24 2:40 PM, Fenghua Yu wrote: > During the removal of the idxd driver, registered offline callback is > invoked as part of the clean up process. However, on systems with only > one CPU online, no valid target is available to migrate the > perf context, resulting in a kernel oops: > > BUG: unable to handle page fault for address: 000000000002a2b8 > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > PGD 1470e1067 P4D 0 > Oops: 0002 [#1] PREEMPT SMP NOPTI > CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57 > Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023 > RIP: 0010:mutex_lock+0x2e/0x50 > ... > Call Trace: > <TASK> > __die+0x24/0x70 > page_fault_oops+0x82/0x160 > do_user_addr_fault+0x65/0x6b0 > __pfx___rdmsr_safe_on_cpu+0x10/0x10 > exc_page_fault+0x7d/0x170 > asm_exc_page_fault+0x26/0x30 > mutex_lock+0x2e/0x50 > mutex_lock+0x1e/0x50 > perf_pmu_migrate_context+0x87/0x1f0 > perf_event_cpu_offline+0x76/0x90 [idxd] > cpuhp_invoke_callback+0xa2/0x4f0 > __pfx_perf_event_cpu_offline+0x10/0x10 [idxd] > cpuhp_thread_fun+0x98/0x150 > smpboot_thread_fn+0x27/0x260 > smpboot_thread_fn+0x1af/0x260 > __pfx_smpboot_thread_fn+0x10/0x10 > kthread+0x103/0x140 > __pfx_kthread+0x10/0x10 > ret_from_fork+0x31/0x50 > __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > <TASK> > > Fix the issue by preventing the migration of the perf context to an > invalid target. > > Fixes: 81dd4d4d6178 ("dmaengine: idxd: Add IDXD performance monitor support") > Reported-by: Terrence Xu <terrence.xu@intel.com> > Tested-by: Terrence Xu <terrence.xu@intel.com> > Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: Tom Zanussi > --- > drivers/dma/idxd/perfmon.c | 9 +++------ > 1 file changed, 3 insertions(+), 6 deletions(-) > > diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c > index fdda6d604262..5e94247e1ea7 100644 > --- a/drivers/dma/idxd/perfmon.c > +++ b/drivers/dma/idxd/perfmon.c > @@ -528,14 +528,11 @@ static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node) > return 0; > > target = cpumask_any_but(cpu_online_mask, cpu); > - > /* migrate events if there is a valid target */ > - if (target < nr_cpu_ids) > + if (target < nr_cpu_ids) { > cpumask_set_cpu(target, &perfmon_dsa_cpu_mask); > - else > - target = -1; > - > - perf_pmu_migrate_context(&idxd_pmu->pmu, cpu, target); > + perf_pmu_migrate_context(&idxd_pmu->pmu, cpu, target); > + } > > return 0; > }
On Wed, 13 Mar 2024 14:40:31 -0700, Fenghua Yu wrote: > During the removal of the idxd driver, registered offline callback is > invoked as part of the clean up process. However, on systems with only > one CPU online, no valid target is available to migrate the > perf context, resulting in a kernel oops: > > BUG: unable to handle page fault for address: 000000000002a2b8 > #PF: supervisor write access in kernel mode > #PF: error_code(0x0002) - not-present page > PGD 1470e1067 P4D 0 > Oops: 0002 [#1] PREEMPT SMP NOPTI > CPU: 0 PID: 20 Comm: cpuhp/0 Not tainted 6.8.0-rc6-dsa+ #57 > Hardware name: Intel Corporation AvenueCity/AvenueCity, BIOS BHSDCRB1.86B.2492.D03.2307181620 07/18/2023 > RIP: 0010:mutex_lock+0x2e/0x50 > ... > Call Trace: > <TASK> > __die+0x24/0x70 > page_fault_oops+0x82/0x160 > do_user_addr_fault+0x65/0x6b0 > __pfx___rdmsr_safe_on_cpu+0x10/0x10 > exc_page_fault+0x7d/0x170 > asm_exc_page_fault+0x26/0x30 > mutex_lock+0x2e/0x50 > mutex_lock+0x1e/0x50 > perf_pmu_migrate_context+0x87/0x1f0 > perf_event_cpu_offline+0x76/0x90 [idxd] > cpuhp_invoke_callback+0xa2/0x4f0 > __pfx_perf_event_cpu_offline+0x10/0x10 [idxd] > cpuhp_thread_fun+0x98/0x150 > smpboot_thread_fn+0x27/0x260 > smpboot_thread_fn+0x1af/0x260 > __pfx_smpboot_thread_fn+0x10/0x10 > kthread+0x103/0x140 > __pfx_kthread+0x10/0x10 > ret_from_fork+0x31/0x50 > __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > <TASK> > > [...] Applied, thanks! [1/1] dmaengine: idxd: Fix oops during rmmod on single-CPU platforms commit: f221033f5c24659dc6ad7e5cf18fb1b075f4a8be Best regards,
diff --git a/drivers/dma/idxd/perfmon.c b/drivers/dma/idxd/perfmon.c index fdda6d604262..5e94247e1ea7 100644 --- a/drivers/dma/idxd/perfmon.c +++ b/drivers/dma/idxd/perfmon.c @@ -528,14 +528,11 @@ static int perf_event_cpu_offline(unsigned int cpu, struct hlist_node *node) return 0; target = cpumask_any_but(cpu_online_mask, cpu); - /* migrate events if there is a valid target */ - if (target < nr_cpu_ids) + if (target < nr_cpu_ids) { cpumask_set_cpu(target, &perfmon_dsa_cpu_mask); - else - target = -1; - - perf_pmu_migrate_context(&idxd_pmu->pmu, cpu, target); + perf_pmu_migrate_context(&idxd_pmu->pmu, cpu, target); + } return 0; }