Message ID | 20230921141516.520471-2-bigeasy@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/amd/display: Remove migrate-disable and move memory allocation. | expand |
On 2023-09-21 10:15, Sebastian Andrzej Siewior wrote: > This is a revert of the commit mentioned below while it is not wrong, as > in the kernel will explode, having migrate_disable() here it is > complete waste of resources. > > Additionally commit message is plain wrong the review tag does not make Not sure I follow what's unhelpful about the review tag with 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable") I do wish the original patch showed the splat it's attempting to fix. It apparently made a difference for something, whether inadvertently or not. I wish I knew what that "something" was. Harry > it any better. The migrate_disable() interface has a fat comment > describing it and it includes the word "undesired" in the headline which > should tickle people to read it before using it. > Initially I assumed it is worded too harsh but now I beg to differ. > > The reviewer of the original commit, even not understanding what > migrate_disable() does should ask the following: > > - migrate_disable() is added only to the CONFIG_X86 block and it claims > to protect fpu_recursion_depth. Why are the other the architectures > excluded? > > - migrate_disable() is added after fpu_recursion_depth was modified. > Shouldn't it be added before the modification or referencing takes > place? > > Moving on. > Disabling preemption DOES prevent CPU migration. A task, that can not be > pushed away from the CPU by the scheduler (due to disabled preemption) > can not be pushed or migrated to another CPU. > > Disabling migration DOES NOT ensure consistency of per-CPU variables. It > only ensures that the task acts always on the same per-CPU variable. The > task remains preemptible meaning multiple tasks can access the same > per-CPU variable. This in turn leads to inconsistency for the statement > > *pcpu -= 1; > > with two tasks on one CPU and a preemption point during the RMW > operation: > > Task A Task B > read pcpu to reg # 0 > inc reg # 0 -> 1 > read pcpu to reg # 0 > inc reg # 0 -> 1 > write reg to pcpu # 1 > write reg to pcpu # 1 > > At the end pcpu reads 1 but should read 2 instead. Boom. > > get_cpu_ptr() already contains a preempt_disable() statement. That means > that the per-CPU variable can only be referenced by a single task which > is currently running. The only inconsistency that can occur if the > variable is additionally accessed from an interrupt. > > Remove migrate_disable/enable() from dc_fpu_begin/end(). > > Cc: Tianci Yin <tianci.yin@amd.com> > Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> > Fixes: 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable") > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c > index 172aa10a8800f..86f4c0e046548 100644 > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c > @@ -91,7 +91,6 @@ void dc_fpu_begin(const char *function_name, const int line) > > if (*pcpu == 1) { > #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) > - migrate_disable(); > kernel_fpu_begin(); > #elif defined(CONFIG_PPC64) > if (cpu_has_feature(CPU_FTR_VSX_COMP)) { > @@ -132,7 +131,6 @@ void dc_fpu_end(const char *function_name, const int line) > if (*pcpu <= 0) { > #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) > kernel_fpu_end(); > - migrate_enable(); > #elif defined(CONFIG_PPC64) > if (cpu_has_feature(CPU_FTR_VSX_COMP)) { > disable_kernel_vsx();
On 2023-10-03 15:53:41 [-0400], Harry Wentland wrote: > On 2023-09-21 10:15, Sebastian Andrzej Siewior wrote: > > This is a revert of the commit mentioned below while it is not wrong, as > > in the kernel will explode, having migrate_disable() here it is > > complete waste of resources. > > > > Additionally commit message is plain wrong the review tag does not make > > Not sure I follow what's unhelpful about the review tag with > 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable") I explained it below with two points what the reviewer should have noticed why reading the commit message even if he does not know what migrate_disable() itself does. > I do wish the original patch showed the splat it's attempting > to fix. It apparently made a difference for something, whether > inadvertently or not. I wish I knew what that "something" was. As far as I can tell the patch does make a difference. > Harry Sebastian
On 10/3/23 15:53, Harry Wentland wrote: > On 2023-09-21 10:15, Sebastian Andrzej Siewior wrote: >> This is a revert of the commit mentioned below while it is not wrong, as >> in the kernel will explode, having migrate_disable() here it is >> complete waste of resources. >> >> Additionally commit message is plain wrong the review tag does not make > > Not sure I follow what's unhelpful about the review tag with > 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable") > > I do wish the original patch showed the splat it's attempting > to fix. It apparently made a difference for something, whether > inadvertently or not. I wish I knew what that "something" was. I did some digging, and it seems like the intention of that patch was to fix the following splat: WARNING: CPU: 5 PID: 1062 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/dc_fpu.c:71 dc_assert_fp_enabled+0x1a/0x30 [amdgpu] [...] CPU: 5 PID: 1062 Comm: Xorg Tainted: G OE 5.15.0-56-generic #62-Ubuntu Hardware name: ASUS System Product Name/ROG STRIX Z590-F GAMING WIFI, BIOS 1202 10/27/2021 RIP: 0010:dc_assert_fp_enabled+0x1a/0x30 [amdgpu] Code: ff 45 31 f6 0f 0b e9 ca fe ff ff e8 90 1c 1f f7 48 c7 c0 00 30 03 00 65 48 03 05 b1 aa 86 3f 8b 00 85 c0 7e 05 c3 cc cc cc cc <0f> 0b c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 RSP: 0000:ffffb89b82a8f118 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8c271cd00000 RCX: 0000000000000000 RDX: ffff8c2708025000 RSI: ffff8c270e8c0000 RDI: ffff8c271cd00000 RBP: ffffb89b82a8f1d0 R08: 0000000000000000 R09: 7fffff6affffffff R10: ffffb89b82a8f240 R11: 0000000000000000 R12: 0000000000000002 R13: ffff8c271cd00000 R14: ffff8c270e8c0000 R15: ffff8c2708025000 FS: 00007f0570019a80(0000) GS:ffff8c2e3fb40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005594858a0058 CR3: 000000010e204001 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> ? dcn20_populate_dml_pipes_from_context+0x47/0x1730 [amdgpu] ? __kmalloc_node+0x2cb/0x3a0 dcn32_populate_dml_pipes_from_context+0x2b/0x450 [amdgpu] dcn32_internal_validate_bw+0x15f9/0x2670 [amdgpu] dcn32_find_dummy_latency_index_for_fw_based_mclk_switch+0xd0/0x310 [amdgpu] dcn32_calculate_wm_and_dlg_fpu+0xe6/0x1e50 [amdgpu] dcn32_calculate_wm_and_dlg+0x46/0x70 [amdgpu] dcn32_validate_bandwidth+0x1b7/0x3e0 [amdgpu] dc_validate_global_state+0x32c/0x560 [amdgpu] dc_validate_with_context+0x6e6/0xd80 [amdgpu] dc_commit_streams+0x21b/0x500 [amdgpu] dc_commit_state+0xf3/0x150 [amdgpu] amdgpu_dm_atomic_commit_tail+0x60d/0x3120 [amdgpu] ? dcn32_internal_validate_bw+0xcf8/0x2670 [amdgpu] ? fill_plane_buffer_attributes+0x1e5/0x560 [amdgpu] ? dcn32_validate_bandwidth+0x1e0/0x3e0 [amdgpu] ? kfree+0x1f7/0x250 ? dcn32_validate_bandwidth+0x1e0/0x3e0 [amdgpu] ? dc_validate_global_state+0x32c/0x560 [amdgpu] ? __cond_resched+0x1a/0x50 ? __wait_for_common+0x3e/0x150 ? fill_plane_buffer_attributes+0x1e5/0x560 [amdgpu] ? usleep_range_state+0x90/0x90 ? wait_for_completion_timeout+0x1d/0x30 commit_tail+0xc2/0x170 [drm_kms_helper] ? drm_atomic_helper_swap_state+0x20f/0x370 [drm_kms_helper] drm_atomic_helper_commit+0x12b/0x150 [drm_kms_helper] amdgpu_dm_atomic_commit+0x11/0x20 [amdgpu] drm_atomic_commit+0x47/0x60 [drm] drm_mode_obj_set_property_ioctl+0x16b/0x420 [drm] ? mutex_lock+0x13/0x50 ? drm_mode_createblob_ioctl+0xf6/0x130 [drm] ? drm_mode_obj_find_prop_id+0x90/0x90 [drm] drm_ioctl_kernel+0xb0/0x100 [drm] drm_ioctl+0x268/0x4b0 [drm] ? drm_mode_obj_find_prop_id+0x90/0x90 [drm] ? ktime_get_mono_fast_ns+0x4a/0xa0 amdgpu_drm_ioctl+0x4e/0x90 [amdgpu] __x64_sys_ioctl+0x92/0xd0 do_syscall_64+0x59/0xc0 ? do_user_addr_fault+0x1e7/0x670 ? do_syscall_64+0x69/0xc0 ? exit_to_user_mode_prepare+0x37/0xb0 ? irqentry_exit_to_user_mode+0x9/0x20 ? irqentry_exit+0x1d/0x30 ? exc_page_fault+0x89/0x170 entry_SYSCALL_64_after_hwframe+0x61/0xcb RIP: 0033:0x7f05704a2aff Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 RSP: 002b:00007ffc8c45a3f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffc8c45a480 RCX: 00007f05704a2aff RDX: 00007ffc8c45a480 RSI: 00000000c01864ba RDI: 000000000000000e RBP: 00000000c01864ba R08: 0000000000000077 R09: 00000000cccccccc R10: 00007f05705a22f0 R11: 0000000000000246 R12: 0000000000000004 R13: 000000000000000e R14: 000000000000000f R15: 00007ffc8c45a8a8 </TASK> ---[ end trace 4deab30bb69df00f ]--- > > Harry > >> it any better. The migrate_disable() interface has a fat comment >> describing it and it includes the word "undesired" in the headline which >> should tickle people to read it before using it. >> Initially I assumed it is worded too harsh but now I beg to differ. >> >> The reviewer of the original commit, even not understanding what >> migrate_disable() does should ask the following: >> >> - migrate_disable() is added only to the CONFIG_X86 block and it claims >> to protect fpu_recursion_depth. Why are the other the architectures >> excluded? >> >> - migrate_disable() is added after fpu_recursion_depth was modified. >> Shouldn't it be added before the modification or referencing takes >> place? >> >> Moving on. >> Disabling preemption DOES prevent CPU migration. A task, that can not be >> pushed away from the CPU by the scheduler (due to disabled preemption) >> can not be pushed or migrated to another CPU. >> >> Disabling migration DOES NOT ensure consistency of per-CPU variables. It >> only ensures that the task acts always on the same per-CPU variable. The >> task remains preemptible meaning multiple tasks can access the same >> per-CPU variable. This in turn leads to inconsistency for the statement >> >> *pcpu -= 1; >> >> with two tasks on one CPU and a preemption point during the RMW >> operation: >> >> Task A Task B >> read pcpu to reg # 0 >> inc reg # 0 -> 1 >> read pcpu to reg # 0 >> inc reg # 0 -> 1 >> write reg to pcpu # 1 >> write reg to pcpu # 1 >> >> At the end pcpu reads 1 but should read 2 instead. Boom. >> >> get_cpu_ptr() already contains a preempt_disable() statement. That means >> that the per-CPU variable can only be referenced by a single task which >> is currently running. The only inconsistency that can occur if the >> variable is additionally accessed from an interrupt. >> >> Remove migrate_disable/enable() from dc_fpu_begin/end(). >> >> Cc: Tianci Yin <tianci.yin@amd.com> >> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> >> Fixes: 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable") >> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> >> --- >> drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 2 -- >> 1 file changed, 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c >> index 172aa10a8800f..86f4c0e046548 100644 >> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c >> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c >> @@ -91,7 +91,6 @@ void dc_fpu_begin(const char *function_name, const int line) >> >> if (*pcpu == 1) { >> #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) >> - migrate_disable(); >> kernel_fpu_begin(); >> #elif defined(CONFIG_PPC64) >> if (cpu_has_feature(CPU_FTR_VSX_COMP)) { >> @@ -132,7 +131,6 @@ void dc_fpu_end(const char *function_name, const int line) >> if (*pcpu <= 0) { >> #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) >> kernel_fpu_end(); >> - migrate_enable(); >> #elif defined(CONFIG_PPC64) >> if (cpu_has_feature(CPU_FTR_VSX_COMP)) { >> disable_kernel_vsx(); >
On 2023-10-04 08:10:35 [-0400], Hamza Mahfooz wrote: > I did some digging, and it seems like the intention of that patch was to > fix the following splat: > > WARNING: CPU: 5 PID: 1062 at > drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/dc_fpu.c:71 > dc_assert_fp_enabled+0x1a/0x30 [amdgpu] > [...] > CPU: 5 PID: 1062 Comm: Xorg Tainted: G OE 5.15.0-56-generic So it has hard to look this up with a upstream v5.15 kernel since the dcn32_populate_dml_pipes_from_context() was introduced in v6.0-rc1. Judging from v6.0-rc1 I don't see how that warning could occur other than using dc_assert_fp_enabled() without invoking DC_FP_START first. > Hamza Sebastian
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c index 172aa10a8800f..86f4c0e046548 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c @@ -91,7 +91,6 @@ void dc_fpu_begin(const char *function_name, const int line) if (*pcpu == 1) { #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) - migrate_disable(); kernel_fpu_begin(); #elif defined(CONFIG_PPC64) if (cpu_has_feature(CPU_FTR_VSX_COMP)) { @@ -132,7 +131,6 @@ void dc_fpu_end(const char *function_name, const int line) if (*pcpu <= 0) { #if defined(CONFIG_X86) || defined(CONFIG_LOONGARCH) kernel_fpu_end(); - migrate_enable(); #elif defined(CONFIG_PPC64) if (cpu_has_feature(CPU_FTR_VSX_COMP)) { disable_kernel_vsx();
This is a revert of the commit mentioned below while it is not wrong, as in the kernel will explode, having migrate_disable() here it is complete waste of resources. Additionally commit message is plain wrong the review tag does not make it any better. The migrate_disable() interface has a fat comment describing it and it includes the word "undesired" in the headline which should tickle people to read it before using it. Initially I assumed it is worded too harsh but now I beg to differ. The reviewer of the original commit, even not understanding what migrate_disable() does should ask the following: - migrate_disable() is added only to the CONFIG_X86 block and it claims to protect fpu_recursion_depth. Why are the other the architectures excluded? - migrate_disable() is added after fpu_recursion_depth was modified. Shouldn't it be added before the modification or referencing takes place? Moving on. Disabling preemption DOES prevent CPU migration. A task, that can not be pushed away from the CPU by the scheduler (due to disabled preemption) can not be pushed or migrated to another CPU. Disabling migration DOES NOT ensure consistency of per-CPU variables. It only ensures that the task acts always on the same per-CPU variable. The task remains preemptible meaning multiple tasks can access the same per-CPU variable. This in turn leads to inconsistency for the statement *pcpu -= 1; with two tasks on one CPU and a preemption point during the RMW operation: Task A Task B read pcpu to reg # 0 inc reg # 0 -> 1 read pcpu to reg # 0 inc reg # 0 -> 1 write reg to pcpu # 1 write reg to pcpu # 1 At the end pcpu reads 1 but should read 2 instead. Boom. get_cpu_ptr() already contains a preempt_disable() statement. That means that the per-CPU variable can only be referenced by a single task which is currently running. The only inconsistency that can occur if the variable is additionally accessed from an interrupt. Remove migrate_disable/enable() from dc_fpu_begin/end(). Cc: Tianci Yin <tianci.yin@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Fixes: 0c316556d1249 ("drm/amd/display: Disable migration to ensure consistency of per-CPU variable") Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- drivers/gpu/drm/amd/display/amdgpu_dm/dc_fpu.c | 2 -- 1 file changed, 2 deletions(-)