diff mbox series

[v2,2/4] PM: hibernate: call wait_for_device_probe() without system_transition_mutex held

Message ID 48d01ce7-e028-c103-ea7f-5a4ea4c8930b@I-love.SAKURA.ne.jp (mailing list archive)
State Changes Requested, archived
Headers show
Series [v2,1/4] char: misc: allow calling open() callback without misc_mtx held | expand

Commit Message

Tetsuo Handa July 10, 2022, 2:24 a.m. UTC
syzbot is reporting hung task at misc_open() [1], for there is a race
window of AB-BA deadlock which involves probe_count variable.

Even with "char: misc: allow calling open() callback without misc_mtx
held", wait_for_device_probe() (w_f_d_p() afterward) from
snapshot_open() can sleep forever if probe_count cannot become 0.

w_f_d_p() in snapshot_open() was added by commit c751085943362143
("PM/Hibernate: Wait for SCSI devices scan to complete during resume"),

   "In addition, if the resume from hibernation is userland-driven, it's
    better to wait for all device probes in the kernel to complete before
    attempting to open the resume device."

but that commit did not take into account possibility of unresponsive
hardware, for the timeout is supposed to come from the SCSI layer in the
general case. syzbot is reporting that USB storage, which is a very tiny
wrapper around the whole SCSI protocol, is failing to apply timeout.

Fortunately, holding system_transition_mutex is not required when waiting
for device probe. Therefore, as one of steps for making it possible to
recover from such situation, this patch changes snapshot_open() to call
w_f_d_p() before calling lock_system_sleep().

Note that the problem that w_f_d_p() can sleep too long to wait remains.
But how to fix that part deserves different patches.

Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1]
Reported-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Oliver Neukum <oneukum@suse.com>
Cc: Wedson Almeida Filho <wedsonaf@google.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Arjan van de Ven <arjan@linux.intel.com>
---
 kernel/power/user.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

Comments

Rafael J. Wysocki July 15, 2022, 5:42 p.m. UTC | #1
On Sun, Jul 10, 2022 at 4:25 AM Tetsuo Handa
<penguin-kernel@i-love.sakura.ne.jp> wrote:
>
> syzbot is reporting hung task at misc_open() [1], for there is a race
> window of AB-BA deadlock which involves probe_count variable.
>
> Even with "char: misc: allow calling open() callback without misc_mtx
> held", wait_for_device_probe() (w_f_d_p() afterward) from
> snapshot_open() can sleep forever if probe_count cannot become 0.
>
> w_f_d_p() in snapshot_open() was added by commit c751085943362143
> ("PM/Hibernate: Wait for SCSI devices scan to complete during resume"),
>
>    "In addition, if the resume from hibernation is userland-driven, it's
>     better to wait for all device probes in the kernel to complete before
>     attempting to open the resume device."
>
> but that commit did not take into account possibility of unresponsive
> hardware, for the timeout is supposed to come from the SCSI layer in the
> general case. syzbot is reporting that USB storage, which is a very tiny
> wrapper around the whole SCSI protocol, is failing to apply timeout.
>
> Fortunately, holding system_transition_mutex is not required when waiting
> for device probe. Therefore, as one of steps for making it possible to
> recover from such situation, this patch changes snapshot_open() to call
> w_f_d_p() before calling lock_system_sleep().
>
> Note that the problem that w_f_d_p() can sleep too long to wait remains.
> But how to fix that part deserves different patches.
>
> Link: https://syzkaller.appspot.com/bug?extid=358c9ab4c93da7b7238c [1]
> Reported-by: syzbot <syzbot+358c9ab4c93da7b7238c@syzkaller.appspotmail.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Greg KH <gregkh@linuxfoundation.org>
> Cc: Oliver Neukum <oneukum@suse.com>
> Cc: Wedson Almeida Filho <wedsonaf@google.com>
> Cc: Rafael J. Wysocki <rjw@sisk.pl>
> Cc: Arjan van de Ven <arjan@linux.intel.com>
> ---
>  kernel/power/user.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/kernel/power/user.c b/kernel/power/user.c
> index 59912060109f..db98a028dfdd 100644
> --- a/kernel/power/user.c
> +++ b/kernel/power/user.c
> @@ -51,6 +51,18 @@ static int snapshot_open(struct inode *inode, struct file *filp)
>         if (!hibernation_available())
>                 return -EPERM;
>
> +       switch (filp->f_flags & O_ACCMODE) {
> +       case O_RDWR: /* Can't do both at the same time. */
> +               return -ENOSYS;

if ((filp->f_flags & O_ACCMODE) == O_RDWR)
              return -ENOSYS;

/* On resume, we may need to wait for the image device to appear. */
if ((filp->f_flags & O_ACCMODE) == O_WRONLY)
              wait_for_device_probe();

> +       case O_RDONLY: /* Hibernating */
> +               /* The image device should be already ready. */
> +               break;
> +       default: /* Resuming */
> +               /* We may need to wait for the image device to appear. */
> +               wait_for_device_probe();
> +               break;
> +       }
> +
>         lock_system_sleep();
>
>         if (!hibernate_acquire()) {
> @@ -58,28 +70,16 @@ static int snapshot_open(struct inode *inode, struct file *filp)
>                 goto Unlock;
>         }
>
> -       if ((filp->f_flags & O_ACCMODE) == O_RDWR) {
> -               hibernate_release();
> -               error = -ENOSYS;
> -               goto Unlock;
> -       }
>         nonseekable_open(inode, filp);
>         data = &snapshot_state;
>         filp->private_data = data;
>         memset(&data->handle, 0, sizeof(struct snapshot_handle));
>         if ((filp->f_flags & O_ACCMODE) == O_RDONLY) {
> -               /* Hibernating.  The image device should be accessible. */

No need to remove this comment.

>                 data->swap = swap_type_of(swsusp_resume_device, 0);
>                 data->mode = O_RDONLY;
>                 data->free_bitmaps = false;
>                 error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION);
>         } else {
> -               /*
> -                * Resuming.  We may need to wait for the image device to
> -                * appear.
> -                */
> -               wait_for_device_probe();
> -
>                 data->swap = -1;
>                 data->mode = O_WRONLY;
>                 error = pm_notifier_call_chain_robust(PM_RESTORE_PREPARE, PM_POST_RESTORE);
> --
> 2.18.4
>
Tetsuo Handa July 22, 2022, 4:24 a.m. UTC | #2
Thanks for comment on v2.

I sent v3 at https://lkml.kernel.org/r/04278747-41fe-3a0a-81bf-2935efaafac0@I-love.SAKURA.ne.jp .
This v2 will be discarded if v3 is acceptable.
diff mbox series

Patch

diff --git a/kernel/power/user.c b/kernel/power/user.c
index 59912060109f..db98a028dfdd 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -51,6 +51,18 @@  static int snapshot_open(struct inode *inode, struct file *filp)
 	if (!hibernation_available())
 		return -EPERM;
 
+	switch (filp->f_flags & O_ACCMODE) {
+	case O_RDWR: /* Can't do both at the same time. */
+		return -ENOSYS;
+	case O_RDONLY: /* Hibernating */
+		/* The image device should be already ready. */
+		break;
+	default: /* Resuming */
+		/* We may need to wait for the image device to appear. */
+		wait_for_device_probe();
+		break;
+	}
+
 	lock_system_sleep();
 
 	if (!hibernate_acquire()) {
@@ -58,28 +70,16 @@  static int snapshot_open(struct inode *inode, struct file *filp)
 		goto Unlock;
 	}
 
-	if ((filp->f_flags & O_ACCMODE) == O_RDWR) {
-		hibernate_release();
-		error = -ENOSYS;
-		goto Unlock;
-	}
 	nonseekable_open(inode, filp);
 	data = &snapshot_state;
 	filp->private_data = data;
 	memset(&data->handle, 0, sizeof(struct snapshot_handle));
 	if ((filp->f_flags & O_ACCMODE) == O_RDONLY) {
-		/* Hibernating.  The image device should be accessible. */
 		data->swap = swap_type_of(swsusp_resume_device, 0);
 		data->mode = O_RDONLY;
 		data->free_bitmaps = false;
 		error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION);
 	} else {
-		/*
-		 * Resuming.  We may need to wait for the image device to
-		 * appear.
-		 */
-		wait_for_device_probe();
-
 		data->swap = -1;
 		data->mode = O_WRONLY;
 		error = pm_notifier_call_chain_robust(PM_RESTORE_PREPARE, PM_POST_RESTORE);