diff mbox series

[core-for-CI] scsi: sd: Move sd_read_cpr() out of the q->limits_lock region

Message ID 20240801082257.506006-1-luciano.coelho@intel.com (mailing list archive)
State New, archived
Headers show
Series [core-for-CI] scsi: sd: Move sd_read_cpr() out of the q->limits_lock region | expand

Commit Message

Luca Coelho Aug. 1, 2024, 8:22 a.m. UTC
From: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

Commit 804e498e0496 ("sd: convert to the atomic queue limits API")
introduced pairs of function calls to queue_limits_start_update() and
queue_limits_commit_update(). These two functions lock and unlock
q->limits_lock. In sd_revalidate_disk(), sd_read_cpr() is called after
queue_limits_start_update() call and before
queue_limits_commit_update() call. sd_read_cpr() locks q->sysfs_dir_lock
and &q->sysfs_lock. Then new lock dependencies were created between
q->limits_lock, q->sysfs_dir_lock and q->sysfs_lock, as follows:

sd_revalidate_disk
  queue_limits_start_update
    mutex_lock(&q->limits_lock)
  sd_read_cpr
    disk_set_independent_access_ranges
      mutex_lock(&q->sysfs_dir_lock)
      mutex_lock(&q->sysfs_lock)
      mutex_unlock(&q->sysfs_lock)
      mutex_unlock(&q->sysfs_dir_lock)
  queue_limits_commit_update
    mutex_unlock(&q->limits_lock)

However, the three locks already had reversed dependencies in other
places. Then the new dependencies triggered the lockdep WARN "possible
circular locking dependency detected" [1]. This WARN was observed by
running the blktests test case srp/002.

To avoid the WARN, move the sd_read_cpr() call in sd_revalidate_disk()
after the queue_limits_commit_update() call. In other words, move the
sd_read_cpr() call out of the q->limits_lock region.

[1] https://lore.kernel.org/linux-scsi/vlmv53ni3ltwxplig5qnw4xsl2h6ccxijfbqzekx76vxoim5a5@dekv7q3es3tx/

Fixes: 804e498e0496 ("sd: convert to the atomic queue limits API")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
---
 drivers/scsi/sd.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Saarinen, Jani Aug. 1, 2024, 8:32 a.m. UTC | #1
These is also this made by Luca https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11813 

@Nikula, Jani , ok to merge. Already tested at trybot https://patchwork.freedesktop.org/series/136776/ 

> -----Original Message-----
> From: Coelho, Luciano <luciano.coelho@intel.com>
> Sent: Thursday, 1 August 2024 11.23
> To: intel-gfx@lists.freedesktop.org
> Cc: Saarinen, Jani <jani.saarinen@intel.com>
> Subject: [core-for-CI PATCH] scsi: sd: Move sd_read_cpr() out of the q-
> >limits_lock region
> 
> From: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> 
> Commit 804e498e0496 ("sd: convert to the atomic queue limits API")
> introduced pairs of function calls to queue_limits_start_update() and
> queue_limits_commit_update(). These two functions lock and unlock
> q->limits_lock. In sd_revalidate_disk(), sd_read_cpr() is called after
> queue_limits_start_update() call and before
> queue_limits_commit_update() call. sd_read_cpr() locks q->sysfs_dir_lock
> and &q->sysfs_lock. Then new lock dependencies were created between
> q->limits_lock, q->sysfs_dir_lock and q->sysfs_lock, as follows:
> 
> sd_revalidate_disk
>   queue_limits_start_update
>     mutex_lock(&q->limits_lock)
>   sd_read_cpr
>     disk_set_independent_access_ranges
>       mutex_lock(&q->sysfs_dir_lock)
>       mutex_lock(&q->sysfs_lock)
>       mutex_unlock(&q->sysfs_lock)
>       mutex_unlock(&q->sysfs_dir_lock)
>   queue_limits_commit_update
>     mutex_unlock(&q->limits_lock)
> 
> However, the three locks already had reversed dependencies in other places.
> Then the new dependencies triggered the lockdep WARN "possible circular
> locking dependency detected" [1]. This WARN was observed by running the
> blktests test case srp/002.
> 
> To avoid the WARN, move the sd_read_cpr() call in sd_revalidate_disk() after
> the queue_limits_commit_update() call. In other words, move the
> sd_read_cpr() call out of the q->limits_lock region.
> 
> [1] https://lore.kernel.org/linux-
> scsi/vlmv53ni3ltwxplig5qnw4xsl2h6ccxijfbqzekx76vxoim5a5@dekv7q3es3tx/
> 
> Fixes: 804e498e0496 ("sd: convert to the atomic queue limits API")
> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
> ---
>  drivers/scsi/sd.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index
> adeaa8ab9951..08cbe3815006 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3753,7 +3753,6 @@ static int sd_revalidate_disk(struct gendisk *disk)
>  			sd_read_block_limits_ext(sdkp);
>  			sd_read_block_characteristics(sdkp, &lim);
>  			sd_zbc_read_zones(sdkp, &lim, buffer);
> -			sd_read_cpr(sdkp);
>  		}
> 
>  		sd_print_capacity(sdkp, old_capacity); @@ -3808,6 +3807,14
> @@ static int sd_revalidate_disk(struct gendisk *disk)
>  	if (err)
>  		return err;
> 
> +	/*
> +	 * Query concurrent positioning ranges after
> +	 * queue_limits_commit_update() unlocked q->limits_lock to avoid
> +	 * deadlock with q->sysfs_dir_lock and q->sysfs_lock.
> +	 */
> +	if (sdkp->media_present && scsi_device_supports_vpd(sdp))
> +		sd_read_cpr(sdkp);
> +
>  	/*
>  	 * For a zoned drive, revalidating the zones can be done only once
>  	 * the gendisk capacity is set. So if this fails, set back the gendisk
> --
> 2.39.2
Jani Nikula Aug. 1, 2024, 8:48 a.m. UTC | #2
On Thu, 01 Aug 2024, "Saarinen, Jani" <jani.saarinen@intel.com> wrote:
> These is also this made by Luca https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/11813
>
> @Nikula, Jani , ok to merge. Already tested at trybot https://patchwork.freedesktop.org/series/136776/

Acked-by: Jani Nikula <jani.nikula@intel.com>

The full IGT results aren't in for the trybot submission though.

>
>> -----Original Message-----
>> From: Coelho, Luciano <luciano.coelho@intel.com>
>> Sent: Thursday, 1 August 2024 11.23
>> To: intel-gfx@lists.freedesktop.org
>> Cc: Saarinen, Jani <jani.saarinen@intel.com>
>> Subject: [core-for-CI PATCH] scsi: sd: Move sd_read_cpr() out of the q-
>> >limits_lock region
>>
>> From: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>
>> Commit 804e498e0496 ("sd: convert to the atomic queue limits API")
>> introduced pairs of function calls to queue_limits_start_update() and
>> queue_limits_commit_update(). These two functions lock and unlock
>> q->limits_lock. In sd_revalidate_disk(), sd_read_cpr() is called after
>> queue_limits_start_update() call and before
>> queue_limits_commit_update() call. sd_read_cpr() locks q->sysfs_dir_lock
>> and &q->sysfs_lock. Then new lock dependencies were created between
>> q->limits_lock, q->sysfs_dir_lock and q->sysfs_lock, as follows:
>>
>> sd_revalidate_disk
>>   queue_limits_start_update
>>     mutex_lock(&q->limits_lock)
>>   sd_read_cpr
>>     disk_set_independent_access_ranges
>>       mutex_lock(&q->sysfs_dir_lock)
>>       mutex_lock(&q->sysfs_lock)
>>       mutex_unlock(&q->sysfs_lock)
>>       mutex_unlock(&q->sysfs_dir_lock)
>>   queue_limits_commit_update
>>     mutex_unlock(&q->limits_lock)
>>
>> However, the three locks already had reversed dependencies in other places.
>> Then the new dependencies triggered the lockdep WARN "possible circular
>> locking dependency detected" [1]. This WARN was observed by running the
>> blktests test case srp/002.
>>
>> To avoid the WARN, move the sd_read_cpr() call in sd_revalidate_disk() after
>> the queue_limits_commit_update() call. In other words, move the
>> sd_read_cpr() call out of the q->limits_lock region.
>>
>> [1] https://lore.kernel.org/linux-
>> scsi/vlmv53ni3ltwxplig5qnw4xsl2h6ccxijfbqzekx76vxoim5a5@dekv7q3es3tx/
>>
>> Fixes: 804e498e0496 ("sd: convert to the atomic queue limits API")
>> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
>> ---
>>  drivers/scsi/sd.c | 9 ++++++++-
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index
>> adeaa8ab9951..08cbe3815006 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -3753,7 +3753,6 @@ static int sd_revalidate_disk(struct gendisk *disk)
>>                       sd_read_block_limits_ext(sdkp);
>>                       sd_read_block_characteristics(sdkp, &lim);
>>                       sd_zbc_read_zones(sdkp, &lim, buffer);
>> -                     sd_read_cpr(sdkp);
>>               }
>>
>>               sd_print_capacity(sdkp, old_capacity); @@ -3808,6 +3807,14
>> @@ static int sd_revalidate_disk(struct gendisk *disk)
>>       if (err)
>>               return err;
>>
>> +     /*
>> +      * Query concurrent positioning ranges after
>> +      * queue_limits_commit_update() unlocked q->limits_lock to avoid
>> +      * deadlock with q->sysfs_dir_lock and q->sysfs_lock.
>> +      */
>> +     if (sdkp->media_present && scsi_device_supports_vpd(sdp))
>> +             sd_read_cpr(sdkp);
>> +
>>       /*
>>        * For a zoned drive, revalidating the zones can be done only once
>>        * the gendisk capacity is set. So if this fails, set back the gendisk
>> --
>> 2.39.2
>
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index adeaa8ab9951..08cbe3815006 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3753,7 +3753,6 @@  static int sd_revalidate_disk(struct gendisk *disk)
 			sd_read_block_limits_ext(sdkp);
 			sd_read_block_characteristics(sdkp, &lim);
 			sd_zbc_read_zones(sdkp, &lim, buffer);
-			sd_read_cpr(sdkp);
 		}
 
 		sd_print_capacity(sdkp, old_capacity);
@@ -3808,6 +3807,14 @@  static int sd_revalidate_disk(struct gendisk *disk)
 	if (err)
 		return err;
 
+	/*
+	 * Query concurrent positioning ranges after
+	 * queue_limits_commit_update() unlocked q->limits_lock to avoid
+	 * deadlock with q->sysfs_dir_lock and q->sysfs_lock.
+	 */
+	if (sdkp->media_present && scsi_device_supports_vpd(sdp))
+		sd_read_cpr(sdkp);
+
 	/*
 	 * For a zoned drive, revalidating the zones can be done only once
 	 * the gendisk capacity is set. So if this fails, set back the gendisk