Message ID | 20240703141737.75378-1-qiuxu.zhuo@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/1] drm/fb-helper: Don't schedule_work() to flush frame buffer during panic() | expand |
> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > To: maarten.lankhorst@linux.intel.com, mripard@kernel.org, > tzimmermann@suse.de, airlied@gmail.com, daniel@ffwll.ch > Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, > tony.luck@intel.com, qiuxu.zhuo@intel.com, yudong.wang@intel.com > Subject: [PATCH v2 1/1] drm/fb-helper: Don't schedule_work() to flush frame buffer during panic() > [...] > Acked-by: Thomas Zimmermann <tzimmermann@suse.de> > Reported-by: Yudong Wang <yudong.wang@intel.com> > Tested-by: Yudong Wang <yudong.wang@intel.com> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Hi Maarten and maintainers, A gentle ping :-). Could you please help push this v2 fix upstream? If you have any concerns, please let me know. Thanks! -Qiuxu
Hi Am 05.08.24 um 09:13 schrieb Qiuxu Zhuo: >> From: Qiuxu Zhuo <qiuxu.zhuo@intel.com> >> To: maarten.lankhorst@linux.intel.com, mripard@kernel.org, >> tzimmermann@suse.de, airlied@gmail.com, daniel@ffwll.ch >> Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, >> tony.luck@intel.com, qiuxu.zhuo@intel.com, yudong.wang@intel.com >> Subject: [PATCH v2 1/1] drm/fb-helper: Don't schedule_work() to flush frame buffer during panic() >> [...] >> Acked-by: Thomas Zimmermann <tzimmermann@suse.de> >> Reported-by: Yudong Wang <yudong.wang@intel.com> >> Tested-by: Yudong Wang <yudong.wang@intel.com> >> Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > Hi Maarten and maintainers, > > A gentle ping :-). > > Could you please help push this v2 fix upstream? > If you have any concerns, please let me know. I already acked this patch, but I still have a question: during a panic, will fbcon still print a panic message? I think that would likely require scheduling that worker. Best regards Thomas > > Thanks! > -Qiuxu
Hi Thomas, > From: Thomas Zimmermann <tzimmermann@suse.de> > Sent: Monday, August 5, 2024 3:31 PM > [...] > > Hi Maarten and maintainers, > > > > A gentle ping :-). > > > > Could you please help push this v2 fix upstream? > > If you have any concerns, please let me know. > > I already acked this patch, but I still have a question: during a panic, will fbcon Thanks for your kind review of this patch and ACK. > still print a panic message? I think that would likely require scheduling that > worker. During the error injection testing: 1) Without this v2 fix: 1.1) If panic() is not blocked on [1] (~99 times in 100 cycles), then the console/fbcon can print normal panic-related messages like [2], and the system can reboot successfully. 1.2) If panic() is blocked on [1] (~1 time in 100 cycles), then the console/fbcon is silent and the system gets hung without reboot. This is not the expected behavior. The system is expected to reboot. 2) With this v2 fix: 2.1) The console/fbcon can always print normal panic related messages like[2], and the system can reboot successfully. Same behavior to 1.1). [ we tested it ~1500 cycles. ] [1] panic() -> ... drm_fb_helper_damage() -> schedule_work(). For details, pls see the v2 commit message. [2] Panic messages: [ 133.900042] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 4: ba00000000000e0b [ 133.900046] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8229ebec> {intel_idle_xstate+0x6c/0xc0} [ 133.900055] mce: [Hardware Error]: TSC 9701dd289b MISC 29100000 PPIN 9000d7561bb0e340 [ 133.900057] mce: [Hardware Error]: PROCESSOR 0:a06d1 TIME 1715827713 SOCKET 0 APIC 0 microcode 810001d0 [ 133.900060] mce: [Hardware Error]: Run the above through 'mcelog --ascii' [ 134.053858] mce: [Hardware Error]: Machine check: Processor context corrupt [ 134.053866] Kernel panic - not syncing: Fatal machine check [ 134.075183] Kernel Offset: disabled [ 134.111372] pstore: backend (erst) writing error (-28) Thanks! -Qiuxu
Hi Am 05.08.24 um 10:10 schrieb Zhuo, Qiuxu: > Hi Thomas, > >> From: Thomas Zimmermann <tzimmermann@suse.de> >> Sent: Monday, August 5, 2024 3:31 PM >> [...] >>> Hi Maarten and maintainers, >>> >>> A gentle ping :-). >>> >>> Could you please help push this v2 fix upstream? >>> If you have any concerns, please let me know. >> I already acked this patch, but I still have a question: during a panic, will fbcon > Thanks for your kind review of this patch and ACK. > >> still print a panic message? I think that would likely require scheduling that >> worker. > During the error injection testing: > > 1) Without this v2 fix: > > 1.1) If panic() is not blocked on [1] (~99 times in 100 cycles), > then the console/fbcon can print normal panic-related messages like [2], > and the system can reboot successfully. > > 1.2) If panic() is blocked on [1] (~1 time in 100 cycles), > then the console/fbcon is silent and the system gets hung without reboot. > This is not the expected behavior. The system is expected to reboot. > > 2) With this v2 fix: > > 2.1) The console/fbcon can always print normal panic related messages like[2], > and the system can reboot successfully. Same behavior to 1.1). > [ we tested it ~1500 cycles. ] > > [1] panic() -> ... drm_fb_helper_damage() -> schedule_work(). > For details, pls see the v2 commit message. > > [2] Panic messages: > [ 133.900042] mce: [Hardware Error]: CPU 0: Machine Check Exception: 5 Bank 4: ba00000000000e0b > [ 133.900046] mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff8229ebec> {intel_idle_xstate+0x6c/0xc0} > [ 133.900055] mce: [Hardware Error]: TSC 9701dd289b MISC 29100000 PPIN 9000d7561bb0e340 > [ 133.900057] mce: [Hardware Error]: PROCESSOR 0:a06d1 TIME 1715827713 SOCKET 0 APIC 0 microcode 810001d0 > [ 133.900060] mce: [Hardware Error]: Run the above through 'mcelog --ascii' > [ 134.053858] mce: [Hardware Error]: Machine check: Processor context corrupt > [ 134.053866] Kernel panic - not syncing: Fatal machine check > [ 134.075183] Kernel Offset: disabled > [ 134.111372] pstore: backend (erst) writing error (-28) Thanks for the detailed reply. I've found that your patch has already been merged and should now be in v6.11-rc2. It'll probably be backported to older kernels as well. > dim cite 833cd3e9ad8360785b6c23c82dd3856df00732d9 833cd3e9ad83 ("drm/fb-helper: Don't schedule_work() to flush frame buffer during panic()") > git tag --contains 833cd3e9ad8360785b6c23c82dd3856df00732d9 drm-fixes-2024-08-02 drm-misc-fixes-2024-08-01 v6.11-rc2 Best regards Thomas > > Thanks! > -Qiuxu > > >
Hi Thomas, > From: Thomas Zimmermann <tzimmermann@suse.de> > [...] > > Thanks for the detailed reply. > > I've found that your patch has already been merged and should now be in > v6.11-rc2. It'll probably be backported to older kernels as well. > > > dim cite 833cd3e9ad8360785b6c23c82dd3856df00732d9 > 833cd3e9ad83 ("drm/fb-helper: Don't schedule_work() to flush frame buffer > during panic()") > git tag --contains > 833cd3e9ad8360785b6c23c82dd3856df00732d9 > drm-fixes-2024-08-02 > drm-misc-fixes-2024-08-01 > v6.11-rc2 Thank you so much for your review and support on this
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index d612133e2cf7..6ad75034c966 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -628,6 +628,17 @@ static void drm_fb_helper_add_damage_clip(struct drm_fb_helper *helper, u32 x, u static void drm_fb_helper_damage(struct drm_fb_helper *helper, u32 x, u32 y, u32 width, u32 height) { + /* + * This function may be invoked by panic() to flush the frame + * buffer, where all CPUs except the panic CPU are stopped. + * During the following schedule_work(), the panic CPU needs + * the worker_pool lock, which might be held by a stopped CPU, + * causing schedule_work() and panic() to block. Return early on + * oops_in_progress to prevent this blocking. + */ + if (oops_in_progress) + return; + drm_fb_helper_add_damage_clip(helper, x, y, width, height); schedule_work(&helper->damage_work);