Message ID | 20240801054234.540532-1-shinichiro.kawasaki@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | scsi: sd: Move sd_read_cpr() out of the q->limits_lock region | expand |
On Thu, 2024-08-01 at 14:42 +0900, Shin'ichiro Kawasaki wrote: > Commit 804e498e0496 ("sd: convert to the atomic queue limits API") > introduced pairs of function calls to queue_limits_start_update() and > queue_limits_commit_update(). These two functions lock and unlock > q->limits_lock. In sd_revalidate_disk(), sd_read_cpr() is called after > queue_limits_start_update() call and before > queue_limits_commit_update() call. sd_read_cpr() locks q->sysfs_dir_lock > and &q->sysfs_lock. Then new lock dependencies were created between > q->limits_lock, q->sysfs_dir_lock and q->sysfs_lock, as follows: > > sd_revalidate_disk > queue_limits_start_update > mutex_lock(&q->limits_lock) > sd_read_cpr > disk_set_independent_access_ranges > mutex_lock(&q->sysfs_dir_lock) > mutex_lock(&q->sysfs_lock) > mutex_unlock(&q->sysfs_lock) > mutex_unlock(&q->sysfs_dir_lock) > queue_limits_commit_update > mutex_unlock(&q->limits_lock) > > However, the three locks already had reversed dependencies in other > places. Then the new dependencies triggered the lockdep WARN "possible > circular locking dependency detected" [1]. This WARN was observed by > running the blktests test case srp/002. > > To avoid the WARN, move the sd_read_cpr() call in sd_revalidate_disk() > after the queue_limits_commit_update() call. In other words, move the > sd_read_cpr() call out of the q->limits_lock region. > > [1] https://lore.kernel.org/linux-scsi/vlmv53ni3ltwxplig5qnw4xsl2h6ccxijfbqzekx76vxoim5a5@dekv7q3es3tx/ > > Fixes: 804e498e0496 ("sd: convert to the atomic queue limits API") > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> > --- > drivers/scsi/sd.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c > index adeaa8ab9951..08cbe3815006 100644 > --- a/drivers/scsi/sd.c > +++ b/drivers/scsi/sd.c > @@ -3753,7 +3753,6 @@ static int sd_revalidate_disk(struct gendisk *disk) > sd_read_block_limits_ext(sdkp); > sd_read_block_characteristics(sdkp, &lim); > sd_zbc_read_zones(sdkp, &lim, buffer); > - sd_read_cpr(sdkp); > } > > sd_print_capacity(sdkp, old_capacity); > @@ -3808,6 +3807,14 @@ static int sd_revalidate_disk(struct gendisk *disk) > if (err) > return err; > > + /* > + * Query concurrent positioning ranges after > + * queue_limits_commit_update() unlocked q->limits_lock to avoid > + * deadlock with q->sysfs_dir_lock and q->sysfs_lock. > + */ > + if (sdkp->media_present && scsi_device_supports_vpd(sdp)) > + sd_read_cpr(sdkp); > + > /* > * For a zoned drive, revalidating the zones can be done only once > * the gendisk capacity is set. So if this fails, set back the gendisk This seems to do the trick! At least on our setups we're not seeing the deadlock issue anymore. Thanks, Shinichiro! Tested-by: Luca Coelho <luciano.coelho@intel.com> -- Cheers, Luca.
On 8/1/24 2:42 PM, Shin'ichiro Kawasaki wrote: > Commit 804e498e0496 ("sd: convert to the atomic queue limits API") > introduced pairs of function calls to queue_limits_start_update() and > queue_limits_commit_update(). These two functions lock and unlock > q->limits_lock. In sd_revalidate_disk(), sd_read_cpr() is called after > queue_limits_start_update() call and before > queue_limits_commit_update() call. sd_read_cpr() locks q->sysfs_dir_lock > and &q->sysfs_lock. Then new lock dependencies were created between > q->limits_lock, q->sysfs_dir_lock and q->sysfs_lock, as follows: > > sd_revalidate_disk > queue_limits_start_update > mutex_lock(&q->limits_lock) > sd_read_cpr > disk_set_independent_access_ranges > mutex_lock(&q->sysfs_dir_lock) > mutex_lock(&q->sysfs_lock) > mutex_unlock(&q->sysfs_lock) > mutex_unlock(&q->sysfs_dir_lock) > queue_limits_commit_update > mutex_unlock(&q->limits_lock) > > However, the three locks already had reversed dependencies in other > places. Then the new dependencies triggered the lockdep WARN "possible > circular locking dependency detected" [1]. This WARN was observed by > running the blktests test case srp/002. > > To avoid the WARN, move the sd_read_cpr() call in sd_revalidate_disk() > after the queue_limits_commit_update() call. In other words, move the > sd_read_cpr() call out of the q->limits_lock region. > > [1] https://lore.kernel.org/linux-scsi/vlmv53ni3ltwxplig5qnw4xsl2h6ccxijfbqzekx76vxoim5a5@dekv7q3es3tx/ > > Fixes: 804e498e0496 ("sd: convert to the atomic queue limits API") > Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Given that sd_read_cpr() does not change any limit, looks good to me. Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Looks good, thanks!
Reviewed-by: Christoph Hellwig <hch@lst.de>
On 7/31/24 10:42 PM, Shin'ichiro Kawasaki wrote: > To avoid the WARN, move the sd_read_cpr() call in sd_revalidate_disk() > after the queue_limits_commit_update() call. In other words, move the > sd_read_cpr() call out of the q->limits_lock region. Reviewed-by: Bart Van Assche <bvanassche@acm.org>
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index adeaa8ab9951..08cbe3815006 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3753,7 +3753,6 @@ static int sd_revalidate_disk(struct gendisk *disk) sd_read_block_limits_ext(sdkp); sd_read_block_characteristics(sdkp, &lim); sd_zbc_read_zones(sdkp, &lim, buffer); - sd_read_cpr(sdkp); } sd_print_capacity(sdkp, old_capacity); @@ -3808,6 +3807,14 @@ static int sd_revalidate_disk(struct gendisk *disk) if (err) return err; + /* + * Query concurrent positioning ranges after + * queue_limits_commit_update() unlocked q->limits_lock to avoid + * deadlock with q->sysfs_dir_lock and q->sysfs_lock. + */ + if (sdkp->media_present && scsi_device_supports_vpd(sdp)) + sd_read_cpr(sdkp); + /* * For a zoned drive, revalidating the zones can be done only once * the gendisk capacity is set. So if this fails, set back the gendisk
Commit 804e498e0496 ("sd: convert to the atomic queue limits API") introduced pairs of function calls to queue_limits_start_update() and queue_limits_commit_update(). These two functions lock and unlock q->limits_lock. In sd_revalidate_disk(), sd_read_cpr() is called after queue_limits_start_update() call and before queue_limits_commit_update() call. sd_read_cpr() locks q->sysfs_dir_lock and &q->sysfs_lock. Then new lock dependencies were created between q->limits_lock, q->sysfs_dir_lock and q->sysfs_lock, as follows: sd_revalidate_disk queue_limits_start_update mutex_lock(&q->limits_lock) sd_read_cpr disk_set_independent_access_ranges mutex_lock(&q->sysfs_dir_lock) mutex_lock(&q->sysfs_lock) mutex_unlock(&q->sysfs_lock) mutex_unlock(&q->sysfs_dir_lock) queue_limits_commit_update mutex_unlock(&q->limits_lock) However, the three locks already had reversed dependencies in other places. Then the new dependencies triggered the lockdep WARN "possible circular locking dependency detected" [1]. This WARN was observed by running the blktests test case srp/002. To avoid the WARN, move the sd_read_cpr() call in sd_revalidate_disk() after the queue_limits_commit_update() call. In other words, move the sd_read_cpr() call out of the q->limits_lock region. [1] https://lore.kernel.org/linux-scsi/vlmv53ni3ltwxplig5qnw4xsl2h6ccxijfbqzekx76vxoim5a5@dekv7q3es3tx/ Fixes: 804e498e0496 ("sd: convert to the atomic queue limits API") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> --- drivers/scsi/sd.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)