diff mbox series

PM: hibernate: Freeze kernel threads in software_resume()

Message ID 20200424034016.42046-1-decui@microsoft.com (mailing list archive)
State Mainlined, archived
Headers show
Series PM: hibernate: Freeze kernel threads in software_resume() | expand

Commit Message

Dexuan Cui April 24, 2020, 3:40 a.m. UTC
Currently the kernel threads are not frozen in software_resume(), so
between dpm_suspend_start(PMSG_QUIESCE) and resume_target_kernel(),
system_freezable_power_efficient_wq can still try to submit SCSI
commands and this can cause a panic since the low level SCSI driver
(e.g. hv_storvsc) has quiesced the SCSI adapter and can not accept
any SCSI commands: https://lkml.org/lkml/2020/4/10/47

At first I posted a fix (https://lkml.org/lkml/2020/4/21/1318) trying
to resolve the issue from hv_storvsc, but with the help of
Bart Van Assche, I realized it's better to fix software_resume(),
since this looks like a generic issue, not only pertaining to SCSI.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: stable@vger.kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
---
 kernel/power/hibernate.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Sasha Levin April 26, 2020, 3:03 p.m. UTC | #1
Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.6.7, v5.4.35, v4.19.118, v4.14.177, v4.9.220, v4.4.220.

v5.6.7: Build OK!
v5.4.35: Build OK!
v4.19.118: Build OK!
v4.14.177: Build OK!
v4.9.220: Build OK!
v4.4.220: Failed to apply! Possible dependencies:
    ea00f4f4f00c ("PM / sleep: make PM notifiers called symmetrically")
    fe12c00d21bb ("PM / hibernate: Introduce test_resume mode for hibernation")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?
Rafael J. Wysocki April 26, 2020, 4:24 p.m. UTC | #2
On Friday, April 24, 2020 5:40:16 AM CEST Dexuan Cui wrote:
> Currently the kernel threads are not frozen in software_resume(), so
> between dpm_suspend_start(PMSG_QUIESCE) and resume_target_kernel(),
> system_freezable_power_efficient_wq can still try to submit SCSI
> commands and this can cause a panic since the low level SCSI driver
> (e.g. hv_storvsc) has quiesced the SCSI adapter and can not accept
> any SCSI commands: https://lkml.org/lkml/2020/4/10/47
> 
> At first I posted a fix (https://lkml.org/lkml/2020/4/21/1318) trying
> to resolve the issue from hv_storvsc, but with the help of
> Bart Van Assche, I realized it's better to fix software_resume(),
> since this looks like a generic issue, not only pertaining to SCSI.
> 
> Cc: Bart Van Assche <bvanassche@acm.org>
> Cc: stable@vger.kernel.org
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  kernel/power/hibernate.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> index 86aba8706b16..30bd28d1d418 100644
> --- a/kernel/power/hibernate.c
> +++ b/kernel/power/hibernate.c
> @@ -898,6 +898,13 @@ static int software_resume(void)
>  	error = freeze_processes();
>  	if (error)
>  		goto Close_Finish;
> +
> +	error = freeze_kernel_threads();
> +	if (error) {
> +		thaw_processes();
> +		goto Close_Finish;
> +	}
> +
>  	error = load_image_and_restore();
>  	thaw_processes();
>   Finish:
> 

Applied as a fix for 5.7-rc4, thanks!
Bart Van Assche April 26, 2020, 6:33 p.m. UTC | #3
On 2020-04-26 09:24, Rafael J. Wysocki wrote:
> On Friday, April 24, 2020 5:40:16 AM CEST Dexuan Cui wrote:
>> Currently the kernel threads are not frozen in software_resume(), so
>> between dpm_suspend_start(PMSG_QUIESCE) and resume_target_kernel(),
>> system_freezable_power_efficient_wq can still try to submit SCSI
>> commands and this can cause a panic since the low level SCSI driver
>> (e.g. hv_storvsc) has quiesced the SCSI adapter and can not accept
>> any SCSI commands: https://lkml.org/lkml/2020/4/10/47
>>
>> At first I posted a fix (https://lkml.org/lkml/2020/4/21/1318) trying
>> to resolve the issue from hv_storvsc, but with the help of
>> Bart Van Assche, I realized it's better to fix software_resume(),
>> since this looks like a generic issue, not only pertaining to SCSI.
>>
>> Cc: Bart Van Assche <bvanassche@acm.org>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Dexuan Cui <decui@microsoft.com>
>> ---
>>  kernel/power/hibernate.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
>> index 86aba8706b16..30bd28d1d418 100644
>> --- a/kernel/power/hibernate.c
>> +++ b/kernel/power/hibernate.c
>> @@ -898,6 +898,13 @@ static int software_resume(void)
>>  	error = freeze_processes();
>>  	if (error)
>>  		goto Close_Finish;
>> +
>> +	error = freeze_kernel_threads();
>> +	if (error) {
>> +		thaw_processes();
>> +		goto Close_Finish;
>> +	}
>> +
>>  	error = load_image_and_restore();
>>  	thaw_processes();
>>   Finish:
> 
> Applied as a fix for 5.7-rc4, thanks!

Hi Rafael,

What is not clear to me is how kernel threads are thawed after
load_image_and_restore() has finished? Should a comment perhaps be added
above the freeze_kernel_threads() call that explains how
thaw_kernel_threads() is invoked after load_image_and_restore() has
finished?

Thanks,

Bart.
Dexuan Cui April 27, 2020, 12:58 a.m. UTC | #4
> From: Bart Van Assche <bvanassche@acm.org>
> Sent: Sunday, April 26, 2020 11:34 AM
> To: Rafael J. Wysocki <rjw@rjwysocki.net>; Dexuan Cui <decui@microsoft.com>
> >> --- a/kernel/power/hibernate.c
> >> +++ b/kernel/power/hibernate.c
> >> @@ -898,6 +898,13 @@ static int software_resume(void)
> >>  	error = freeze_processes();
> >>  	if (error)
> >>  		goto Close_Finish;
> >> +
> >> +	error = freeze_kernel_threads();
> >> +	if (error) {
> >> +		thaw_processes();
> >> +		goto Close_Finish;
> >> +	}
> >> +
> >>  	error = load_image_and_restore();
> >>  	thaw_processes();
> >>   Finish:
> >
> > Applied as a fix for 5.7-rc4, thanks!
> 
> Hi Rafael,
> 
> What is not clear to me is how kernel threads are thawed after
> load_image_and_restore() has finished? Should a comment perhaps be added
> above the freeze_kernel_threads() call that explains how
> thaw_kernel_threads() is invoked after load_image_and_restore() has
> finished?
> 
> Bart.

Hi Bart, Rafael, I would suggest the below comment:

If load_image_and_restore() succeeds, it won't return, and the
execution will be restored from the 'old' kernel's hibernate() -> 
hibernation_snapshot() -> create_image() -> swsusp_arch_suspend(),
and later hibernate() -> thaw_processes() will thaw every frozen
kernel process and userspace process of the 'old' kernel.

Thanks,
-- Dexuan
Rafael J. Wysocki April 27, 2020, 8:43 a.m. UTC | #5
On Sun, Apr 26, 2020 at 8:34 PM Bart Van Assche <bvanassche@acm.org> wrote:
>
> On 2020-04-26 09:24, Rafael J. Wysocki wrote:
> > On Friday, April 24, 2020 5:40:16 AM CEST Dexuan Cui wrote:
> >> Currently the kernel threads are not frozen in software_resume(), so
> >> between dpm_suspend_start(PMSG_QUIESCE) and resume_target_kernel(),
> >> system_freezable_power_efficient_wq can still try to submit SCSI
> >> commands and this can cause a panic since the low level SCSI driver
> >> (e.g. hv_storvsc) has quiesced the SCSI adapter and can not accept
> >> any SCSI commands: https://lkml.org/lkml/2020/4/10/47
> >>
> >> At first I posted a fix (https://lkml.org/lkml/2020/4/21/1318) trying
> >> to resolve the issue from hv_storvsc, but with the help of
> >> Bart Van Assche, I realized it's better to fix software_resume(),
> >> since this looks like a generic issue, not only pertaining to SCSI.
> >>
> >> Cc: Bart Van Assche <bvanassche@acm.org>
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> >> ---
> >>  kernel/power/hibernate.c | 7 +++++++
> >>  1 file changed, 7 insertions(+)
> >>
> >> diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
> >> index 86aba8706b16..30bd28d1d418 100644
> >> --- a/kernel/power/hibernate.c
> >> +++ b/kernel/power/hibernate.c
> >> @@ -898,6 +898,13 @@ static int software_resume(void)
> >>      error = freeze_processes();
> >>      if (error)
> >>              goto Close_Finish;
> >> +
> >> +    error = freeze_kernel_threads();
> >> +    if (error) {
> >> +            thaw_processes();
> >> +            goto Close_Finish;
> >> +    }
> >> +
> >>      error = load_image_and_restore();
> >>      thaw_processes();
> >>   Finish:
> >
> > Applied as a fix for 5.7-rc4, thanks!
>
> Hi Rafael,
>
> What is not clear to me is how kernel threads are thawed after
> load_image_and_restore() has finished? Should a comment perhaps be added
> above the freeze_kernel_threads() call that explains how
> thaw_kernel_threads() is invoked after load_image_and_restore() has
> finished?

It isn't, because that is not necessary.

thaw_processes() will thaw them along with the user space.

Cheers!
diff mbox series

Patch

diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 86aba8706b16..30bd28d1d418 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -898,6 +898,13 @@  static int software_resume(void)
 	error = freeze_processes();
 	if (error)
 		goto Close_Finish;
+
+	error = freeze_kernel_threads();
+	if (error) {
+		thaw_processes();
+		goto Close_Finish;
+	}
+
 	error = load_image_and_restore();
 	thaw_processes();
  Finish: