diff mbox series

[v3,1/1] scsi: Fix racing between dev init and dev reset

Message ID 20220415040446.26451-2-alice.chao@mediatek.com (mailing list archive)
State New, archived
Headers show
Series [v3,1/1] scsi: Fix racing between dev init and dev reset | expand

Commit Message

Alice Chao April 15, 2022, 4:04 a.m. UTC
Device reset thread uses kobject_uevent_env() to get kobj.parent, and it
aces with device init thread which calls device_add() to add kobj.parent
before kobject_uevent_env().

Device init call:           Device reset call:
 scsi_probe_and_add_lun()    scsi_evt_thread()
  scsi_add_lun()             scsi_evt_emit()
   scsi_sysfs_add_sdev()      kobject_uevent_env() //get kobj.parent
    scsi_target_add()           kobject_get_path()
                                 len = get_kobj_path_length () // len=1 because parent hasn't created yet
    device_add() // add kobj.parent
      kobject_uevent_env()
       kobject_get_path()         path = kzalloc()
        fill_kobj_path()           fill_kobj_path() // --length; length -= cur is a negative value
                                    memcpy(path + length, kobject_name(parent), cur); // slab OOB!

Above backtrace describes the problem, device reset thread will get wrong
kobj.parent when device init thread didn’t add kobj.parent yet. When this
racing happened, it triggers the a KASAN dump on the final iteration:

BUG: KASAN: slab-out-of-bounds in kobject_get_path+0xf8/0x1b8
Write of size 11 at addr ffffff80d6bb94f5 by task kworker/3:1/58
<snip>
Call trace:
 __kasan_report+0x124/0x1c8
 kasan_report+0x54/0x84
 kasan_check_range+0x200/0x208
 memcpy+0xb8/0xf0
 kobject_get_path+0xf8/0x1b8
 kobject_uevent_env+0x228/0xa88
 scsi_evt_thread+0x2d0/0x5b0
 process_one_work+0x570/0xf94
 worker_thread+0x7cc/0xf80
 kthread+0x2c4/0x388

These two jobs are scheduled asynchronously, we can't guaranteed that
kobj.parent will be created in device init thread before device reset
thread calls kobject_get_path().

To resolve the racing issue between device init thread and device reset
thread, we use wait_event() in scsi_evt_emit() to wait for device_add()
to complete the creation of kobj.parent.

Device init call:                Device reset call:
ufshcd_async_scan()              scsi_evt_thread()
 scsi_scan_host()                 scsi_evt_emit() <- add wait_event()
  do_scsi_scan_host() <- add wake_up()
   scsi_scan_host_selected()
    scsi_scan_channel()
     scsi_probe_and_add_lun()
      scsi_target_add()
       device_add() // add kobj.parent
        kobject_uevent_env()
         kobject_get_path()
          fill_kobj_path()
  do_scan_async() <- wake_up()     kobject_uevent_env() // add kobj.parent
                                    kobject_get_path() // get valid kobj.parent
                                     fill_kobj_path()

After we add wake_up at do_scsi_scan_host() in device init thread, we can
ensure that device reset thread will get kobject after device init thread
finishes adding parent.

Signed-off-by: Alice Chao <alice.chao@mediatek.com>

---

Change since v2
-Change commit: Describes the preblem first and then the solution.
-Add commit: Add KASAN error log.

---
 drivers/scsi/scsi_lib.c  | 1 +
 drivers/scsi/scsi_scan.c | 1 +
 2 files changed, 2 insertions(+)

Comments

Miles Chen April 15, 2022, 5:52 a.m. UTC | #1
Hi Alice,

> Device reset thread uses kobject_uevent_env() to get kobj.parent, and it
> aces with device init thread which calls device_add() to add kobj.parent

"aces" may be "races"?

> before kobject_uevent_env().
> 
> Device init call:           Device reset call:
>  scsi_probe_and_add_lun()    scsi_evt_thread()
>   scsi_add_lun()             scsi_evt_emit()
>    scsi_sysfs_add_sdev()      kobject_uevent_env() //get kobj.parent
>     scsi_target_add()           kobject_get_path()
>                                  len = get_kobj_path_length () // len=1 because parent hasn't created yet
>     device_add() // add kobj.parent
>       kobject_uevent_env()
>        kobject_get_path()         path = kzalloc()
>         fill_kobj_path()           fill_kobj_path() // --length; length -= cur is a negative value
>                                     memcpy(path + length, kobject_name(parent), cur); // slab OOB!
> 
> Above backtrace describes the problem, device reset thread will get wrong
> kobj.parent when device init thread didn’t add kobj.parent yet. When this
> racing happened, it triggers the a KASAN dump on the final iteration:
> 
> BUG: KASAN: slab-out-of-bounds in kobject_get_path+0xf8/0x1b8
> Write of size 11 at addr ffffff80d6bb94f5 by task kworker/3:1/58
> <snip>
> Call trace:
>  __kasan_report+0x124/0x1c8
>  kasan_report+0x54/0x84
>  kasan_check_range+0x200/0x208
>  memcpy+0xb8/0xf0
>  kobject_get_path+0xf8/0x1b8
>  kobject_uevent_env+0x228/0xa88
>  scsi_evt_thread+0x2d0/0x5b0
>  process_one_work+0x570/0xf94
>  worker_thread+0x7cc/0xf80
>  kthread+0x2c4/0x388
> 
> These two jobs are scheduled asynchronously, we can't guaranteed that
> kobj.parent will be created in device init thread before device reset
> thread calls kobject_get_path().
> 
> To resolve the racing issue between device init thread and device reset
> thread, we use wait_event() in scsi_evt_emit() to wait for device_add()
> to complete the creation of kobj.parent.
> 
> Device init call:                Device reset call:
> ufshcd_async_scan()              scsi_evt_thread()
>  scsi_scan_host()                 scsi_evt_emit() <- add wait_event()
>   do_scsi_scan_host() <- add wake_up()
>    scsi_scan_host_selected()
>     scsi_scan_channel()
>      scsi_probe_and_add_lun()
>       scsi_target_add()
>        device_add() // add kobj.parent
>         kobject_uevent_env()
>          kobject_get_path()
>           fill_kobj_path()
>   do_scan_async() <- wake_up()     kobject_uevent_env() // add kobj.parent

There is no do_scan_async() changes in this patch. It this a typo?
From the patch, the flow looks like:

Device init call                        Device reset call:
do_scsi_scan_host()                     scsi_evt_thread()
 scsi_scan_host_selected()               scsi_evt_emit() <- add wait_event()
  scsi_scan_channel()
   scsi_probe_and_add_lun()
    scsi_target_add()
     device_add() // add kobj.parent
      kobject_uevent_env()
       kobject_get_path()
        fill_kobj_path()
 //call wake_up() after scsi_scan_host_selected is done
                                        kobject_uevent_env()
                                         kobject_get_path() // get valid kobj.parent
					 ...
                                           fill_kobj_path()

>                                     kobject_get_path() // get valid kobj.parent
>                                      fill_kobj_path()
> 
> After we add wake_up at do_scsi_scan_host() in device init thread, we can
> ensure that device reset thread will get kobject after device init thread
> finishes adding parent.
> 
> Signed-off-by: Alice Chao <alice.chao@mediatek.com>
> 
> ---
> 
> Change since v2
> -Change commit: Describes the preblem first and then the solution.
> -Add commit: Add KASAN error log.

Please keep all change history.

e.g.,

See https://lore.kernel.org/lkml/20220326022728.2969-1-jianjun.wang@mediatek.com/
as an example


Thanks,
Miles

> 
> ---
>  drivers/scsi/scsi_lib.c  | 1 +
>  drivers/scsi/scsi_scan.c | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0a70aa763a96..abf9a71ed77c 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2461,6 +2461,7 @@ static void scsi_evt_emit(struct scsi_device *sdev, struct scsi_event *evt)
>  		break;
>  	case SDEV_EVT_POWER_ON_RESET_OCCURRED:
>  		envp[idx++] = "SDEV_UA=POWER_ON_RESET_OCCURRED";
> +		wait_event(sdev->host->host_wait, sdev->sdev_gendev.kobj.parent != NULL);
>  		break;
>  	default:
>  		/* do nothing */
> diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
> index f4e6c68ac99e..431f229ac435 100644
> --- a/drivers/scsi/scsi_scan.c
> +++ b/drivers/scsi/scsi_scan.c
> @@ -1904,6 +1904,7 @@ static void do_scsi_scan_host(struct Scsi_Host *shost)
>  	} else {
>  		scsi_scan_host_selected(shost, SCAN_WILD_CARD, SCAN_WILD_CARD,
>  				SCAN_WILD_CARD, 0);
> +		wake_up(&shost->host_wait);
>  	}
>  }
>  
> -- 
> 2.18.0
> 
>
diff mbox series

Patch

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0a70aa763a96..abf9a71ed77c 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2461,6 +2461,7 @@  static void scsi_evt_emit(struct scsi_device *sdev, struct scsi_event *evt)
 		break;
 	case SDEV_EVT_POWER_ON_RESET_OCCURRED:
 		envp[idx++] = "SDEV_UA=POWER_ON_RESET_OCCURRED";
+		wait_event(sdev->host->host_wait, sdev->sdev_gendev.kobj.parent != NULL);
 		break;
 	default:
 		/* do nothing */
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index f4e6c68ac99e..431f229ac435 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1904,6 +1904,7 @@  static void do_scsi_scan_host(struct Scsi_Host *shost)
 	} else {
 		scsi_scan_host_selected(shost, SCAN_WILD_CARD, SCAN_WILD_CARD,
 				SCAN_WILD_CARD, 0);
+		wake_up(&shost->host_wait);
 	}
 }