Message ID | 20200424034016.42046-1-decui@microsoft.com (mailing list archive) |
---|---|
State | Mainlined, archived |
Headers | show |
Series | PM: hibernate: Freeze kernel threads in software_resume() | expand |
Hi [This is an automated email] This commit has been processed because it contains a -stable tag. The stable tag indicates that it's relevant for the following trees: all The bot has tested the following trees: v5.6.7, v5.4.35, v4.19.118, v4.14.177, v4.9.220, v4.4.220. v5.6.7: Build OK! v5.4.35: Build OK! v4.19.118: Build OK! v4.14.177: Build OK! v4.9.220: Build OK! v4.4.220: Failed to apply! Possible dependencies: ea00f4f4f00c ("PM / sleep: make PM notifiers called symmetrically") fe12c00d21bb ("PM / hibernate: Introduce test_resume mode for hibernation") NOTE: The patch will not be queued to stable trees until it is upstream. How should we proceed with this patch?
On Friday, April 24, 2020 5:40:16 AM CEST Dexuan Cui wrote: > Currently the kernel threads are not frozen in software_resume(), so > between dpm_suspend_start(PMSG_QUIESCE) and resume_target_kernel(), > system_freezable_power_efficient_wq can still try to submit SCSI > commands and this can cause a panic since the low level SCSI driver > (e.g. hv_storvsc) has quiesced the SCSI adapter and can not accept > any SCSI commands: https://lkml.org/lkml/2020/4/10/47 > > At first I posted a fix (https://lkml.org/lkml/2020/4/21/1318) trying > to resolve the issue from hv_storvsc, but with the help of > Bart Van Assche, I realized it's better to fix software_resume(), > since this looks like a generic issue, not only pertaining to SCSI. > > Cc: Bart Van Assche <bvanassche@acm.org> > Cc: stable@vger.kernel.org > Signed-off-by: Dexuan Cui <decui@microsoft.com> > --- > kernel/power/hibernate.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c > index 86aba8706b16..30bd28d1d418 100644 > --- a/kernel/power/hibernate.c > +++ b/kernel/power/hibernate.c > @@ -898,6 +898,13 @@ static int software_resume(void) > error = freeze_processes(); > if (error) > goto Close_Finish; > + > + error = freeze_kernel_threads(); > + if (error) { > + thaw_processes(); > + goto Close_Finish; > + } > + > error = load_image_and_restore(); > thaw_processes(); > Finish: > Applied as a fix for 5.7-rc4, thanks!
On 2020-04-26 09:24, Rafael J. Wysocki wrote: > On Friday, April 24, 2020 5:40:16 AM CEST Dexuan Cui wrote: >> Currently the kernel threads are not frozen in software_resume(), so >> between dpm_suspend_start(PMSG_QUIESCE) and resume_target_kernel(), >> system_freezable_power_efficient_wq can still try to submit SCSI >> commands and this can cause a panic since the low level SCSI driver >> (e.g. hv_storvsc) has quiesced the SCSI adapter and can not accept >> any SCSI commands: https://lkml.org/lkml/2020/4/10/47 >> >> At first I posted a fix (https://lkml.org/lkml/2020/4/21/1318) trying >> to resolve the issue from hv_storvsc, but with the help of >> Bart Van Assche, I realized it's better to fix software_resume(), >> since this looks like a generic issue, not only pertaining to SCSI. >> >> Cc: Bart Van Assche <bvanassche@acm.org> >> Cc: stable@vger.kernel.org >> Signed-off-by: Dexuan Cui <decui@microsoft.com> >> --- >> kernel/power/hibernate.c | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c >> index 86aba8706b16..30bd28d1d418 100644 >> --- a/kernel/power/hibernate.c >> +++ b/kernel/power/hibernate.c >> @@ -898,6 +898,13 @@ static int software_resume(void) >> error = freeze_processes(); >> if (error) >> goto Close_Finish; >> + >> + error = freeze_kernel_threads(); >> + if (error) { >> + thaw_processes(); >> + goto Close_Finish; >> + } >> + >> error = load_image_and_restore(); >> thaw_processes(); >> Finish: > > Applied as a fix for 5.7-rc4, thanks! Hi Rafael, What is not clear to me is how kernel threads are thawed after load_image_and_restore() has finished? Should a comment perhaps be added above the freeze_kernel_threads() call that explains how thaw_kernel_threads() is invoked after load_image_and_restore() has finished? Thanks, Bart.
> From: Bart Van Assche <bvanassche@acm.org> > Sent: Sunday, April 26, 2020 11:34 AM > To: Rafael J. Wysocki <rjw@rjwysocki.net>; Dexuan Cui <decui@microsoft.com> > >> --- a/kernel/power/hibernate.c > >> +++ b/kernel/power/hibernate.c > >> @@ -898,6 +898,13 @@ static int software_resume(void) > >> error = freeze_processes(); > >> if (error) > >> goto Close_Finish; > >> + > >> + error = freeze_kernel_threads(); > >> + if (error) { > >> + thaw_processes(); > >> + goto Close_Finish; > >> + } > >> + > >> error = load_image_and_restore(); > >> thaw_processes(); > >> Finish: > > > > Applied as a fix for 5.7-rc4, thanks! > > Hi Rafael, > > What is not clear to me is how kernel threads are thawed after > load_image_and_restore() has finished? Should a comment perhaps be added > above the freeze_kernel_threads() call that explains how > thaw_kernel_threads() is invoked after load_image_and_restore() has > finished? > > Bart. Hi Bart, Rafael, I would suggest the below comment: If load_image_and_restore() succeeds, it won't return, and the execution will be restored from the 'old' kernel's hibernate() -> hibernation_snapshot() -> create_image() -> swsusp_arch_suspend(), and later hibernate() -> thaw_processes() will thaw every frozen kernel process and userspace process of the 'old' kernel. Thanks, -- Dexuan
On Sun, Apr 26, 2020 at 8:34 PM Bart Van Assche <bvanassche@acm.org> wrote: > > On 2020-04-26 09:24, Rafael J. Wysocki wrote: > > On Friday, April 24, 2020 5:40:16 AM CEST Dexuan Cui wrote: > >> Currently the kernel threads are not frozen in software_resume(), so > >> between dpm_suspend_start(PMSG_QUIESCE) and resume_target_kernel(), > >> system_freezable_power_efficient_wq can still try to submit SCSI > >> commands and this can cause a panic since the low level SCSI driver > >> (e.g. hv_storvsc) has quiesced the SCSI adapter and can not accept > >> any SCSI commands: https://lkml.org/lkml/2020/4/10/47 > >> > >> At first I posted a fix (https://lkml.org/lkml/2020/4/21/1318) trying > >> to resolve the issue from hv_storvsc, but with the help of > >> Bart Van Assche, I realized it's better to fix software_resume(), > >> since this looks like a generic issue, not only pertaining to SCSI. > >> > >> Cc: Bart Van Assche <bvanassche@acm.org> > >> Cc: stable@vger.kernel.org > >> Signed-off-by: Dexuan Cui <decui@microsoft.com> > >> --- > >> kernel/power/hibernate.c | 7 +++++++ > >> 1 file changed, 7 insertions(+) > >> > >> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c > >> index 86aba8706b16..30bd28d1d418 100644 > >> --- a/kernel/power/hibernate.c > >> +++ b/kernel/power/hibernate.c > >> @@ -898,6 +898,13 @@ static int software_resume(void) > >> error = freeze_processes(); > >> if (error) > >> goto Close_Finish; > >> + > >> + error = freeze_kernel_threads(); > >> + if (error) { > >> + thaw_processes(); > >> + goto Close_Finish; > >> + } > >> + > >> error = load_image_and_restore(); > >> thaw_processes(); > >> Finish: > > > > Applied as a fix for 5.7-rc4, thanks! > > Hi Rafael, > > What is not clear to me is how kernel threads are thawed after > load_image_and_restore() has finished? Should a comment perhaps be added > above the freeze_kernel_threads() call that explains how > thaw_kernel_threads() is invoked after load_image_and_restore() has > finished? It isn't, because that is not necessary. thaw_processes() will thaw them along with the user space. Cheers!
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index 86aba8706b16..30bd28d1d418 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -898,6 +898,13 @@ static int software_resume(void) error = freeze_processes(); if (error) goto Close_Finish; + + error = freeze_kernel_threads(); + if (error) { + thaw_processes(); + goto Close_Finish; + } + error = load_image_and_restore(); thaw_processes(); Finish:
Currently the kernel threads are not frozen in software_resume(), so between dpm_suspend_start(PMSG_QUIESCE) and resume_target_kernel(), system_freezable_power_efficient_wq can still try to submit SCSI commands and this can cause a panic since the low level SCSI driver (e.g. hv_storvsc) has quiesced the SCSI adapter and can not accept any SCSI commands: https://lkml.org/lkml/2020/4/10/47 At first I posted a fix (https://lkml.org/lkml/2020/4/21/1318) trying to resolve the issue from hv_storvsc, but with the help of Bart Van Assche, I realized it's better to fix software_resume(), since this looks like a generic issue, not only pertaining to SCSI. Cc: Bart Van Assche <bvanassche@acm.org> Cc: stable@vger.kernel.org Signed-off-by: Dexuan Cui <decui@microsoft.com> --- kernel/power/hibernate.c | 7 +++++++ 1 file changed, 7 insertions(+)