Message ID | 1553219154-91328-1-git-send-email-zhengbin13@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v4] blk-mq: fix a hung issue when set device state to blocked and restore running | expand |
On Fri, Mar 22, 2019 at 09:45:54AM +0800, zhengbin wrote: > When I use dd test a SCSI device which use blk-mq in the following steps: > 1.echo "blocked" >/sys/block/sda/device/state > 2.dd if=/dev/sda of=/mnt/t.log bs=1M count=10 > 3.echo "running" >/sys/block/sda/device/state > dd should finish this work after step 3, unfortunately, still hung. > > After step2, the key code process is like this: > blk_mq_dispatch_rq_list-->scsi_queue_rq-->prep_to_mq > > prep_to_mq will return BLK_STS_RESOURCE, and scsi_queue_rq will transter > it to BLK_STS_DEV_RESOURCE, which means that driver can guarantee that > IO dispatch will be triggered in future when the resource is available. > Need to follow the rule if we set the device state to running. > > Signed-off-by: zhengbin <zhengbin13@huawei.com> > --- > drivers/scsi/scsi_sysfs.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > index 6a9040f..266e4c7 100644 > --- a/drivers/scsi/scsi_sysfs.c > +++ b/drivers/scsi/scsi_sysfs.c > @@ -772,6 +772,11 @@ store_state_field(struct device *dev, struct device_attribute *attr, > mutex_lock(&sdev->state_mutex); > ret = scsi_device_set_state(sdev, state); > mutex_unlock(&sdev->state_mutex); > + /* If device state changes to SDEV_RUNNING, we need to run hw queue > + * to avoid io hung. > + */ > + if (ret == 0 && state == SDEV_RUNNING) > + blk_mq_run_hw_queues(sdev->request_queue, true); > Thinking of further, this way isn't safe, given the caller should make sure that the queue is alive before calling blk_mq_run_hw_queues(). Now it is in sysfs write path, and the scsi device can become gone just before calling blk_mq_run_hw_queues(). Thanks, Ming
So blk_mq_run_hw_queues should still be in mutex_lock(&sdev->state_mutex)?? On 2019/3/22 10:11, Ming Lei wrote: > On Fri, Mar 22, 2019 at 09:45:54AM +0800, zhengbin wrote: >> When I use dd test a SCSI device which use blk-mq in the following steps: >> 1.echo "blocked" >/sys/block/sda/device/state >> 2.dd if=/dev/sda of=/mnt/t.log bs=1M count=10 >> 3.echo "running" >/sys/block/sda/device/state >> dd should finish this work after step 3, unfortunately, still hung. >> >> After step2, the key code process is like this: >> blk_mq_dispatch_rq_list-->scsi_queue_rq-->prep_to_mq >> >> prep_to_mq will return BLK_STS_RESOURCE, and scsi_queue_rq will transter >> it to BLK_STS_DEV_RESOURCE, which means that driver can guarantee that >> IO dispatch will be triggered in future when the resource is available. >> Need to follow the rule if we set the device state to running. >> >> Signed-off-by: zhengbin <zhengbin13@huawei.com> >> --- >> drivers/scsi/scsi_sysfs.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c >> index 6a9040f..266e4c7 100644 >> --- a/drivers/scsi/scsi_sysfs.c >> +++ b/drivers/scsi/scsi_sysfs.c >> @@ -772,6 +772,11 @@ store_state_field(struct device *dev, struct device_attribute *attr, >> mutex_lock(&sdev->state_mutex); >> ret = scsi_device_set_state(sdev, state); >> mutex_unlock(&sdev->state_mutex); >> + /* If device state changes to SDEV_RUNNING, we need to run hw queue >> + * to avoid io hung. >> + */ >> + if (ret == 0 && state == SDEV_RUNNING) >> + blk_mq_run_hw_queues(sdev->request_queue, true); >> > > Thinking of further, this way isn't safe, given the caller should make > sure that the queue is alive before calling blk_mq_run_hw_queues(). > > Now it is in sysfs write path, and the scsi device can become gone > just before calling blk_mq_run_hw_queues(). > > > Thanks, > Ming > > . >
On Fri, Mar 22, 2019 at 10:18:54AM +0800, zhengbin (A) wrote: > So blk_mq_run_hw_queues should still be in mutex_lock(&sdev->state_mutex)?? I think so, this way should be safe given the pattern is used in __scsi_remove_device() already. BTW, please do not reply before the quoted text, and I'd suggest you to read some linux kernel development FAQ: http://vger.kernel.org/lkml/#s3-9 Thanks, Ming > > On 2019/3/22 10:11, Ming Lei wrote: > > On Fri, Mar 22, 2019 at 09:45:54AM +0800, zhengbin wrote: > >> When I use dd test a SCSI device which use blk-mq in the following steps: > >> 1.echo "blocked" >/sys/block/sda/device/state > >> 2.dd if=/dev/sda of=/mnt/t.log bs=1M count=10 > >> 3.echo "running" >/sys/block/sda/device/state > >> dd should finish this work after step 3, unfortunately, still hung. > >> > >> After step2, the key code process is like this: > >> blk_mq_dispatch_rq_list-->scsi_queue_rq-->prep_to_mq > >> > >> prep_to_mq will return BLK_STS_RESOURCE, and scsi_queue_rq will transter > >> it to BLK_STS_DEV_RESOURCE, which means that driver can guarantee that > >> IO dispatch will be triggered in future when the resource is available. > >> Need to follow the rule if we set the device state to running. > >> > >> Signed-off-by: zhengbin <zhengbin13@huawei.com> > >> --- > >> drivers/scsi/scsi_sysfs.c | 5 +++++ > >> 1 file changed, 5 insertions(+) > >> > >> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c > >> index 6a9040f..266e4c7 100644 > >> --- a/drivers/scsi/scsi_sysfs.c > >> +++ b/drivers/scsi/scsi_sysfs.c > >> @@ -772,6 +772,11 @@ store_state_field(struct device *dev, struct device_attribute *attr, > >> mutex_lock(&sdev->state_mutex); > >> ret = scsi_device_set_state(sdev, state); > >> mutex_unlock(&sdev->state_mutex); > >> + /* If device state changes to SDEV_RUNNING, we need to run hw queue > >> + * to avoid io hung. > >> + */ > >> + if (ret == 0 && state == SDEV_RUNNING) > >> + blk_mq_run_hw_queues(sdev->request_queue, true); > >> > > > > Thinking of further, this way isn't safe, given the caller should make > > sure that the queue is alive before calling blk_mq_run_hw_queues(). > > > > Now it is in sysfs write path, and the scsi device can become gone > > just before calling blk_mq_run_hw_queues(). > > > > > > Thanks, > > Ming > > > > . > > >
On 3/21/19 7:11 PM, Ming Lei wrote: > On Fri, Mar 22, 2019 at 09:45:54AM +0800, zhengbin wrote: >> + /* If device state changes to SDEV_RUNNING, we need to run hw queue >> + * to avoid io hung. >> + */ >> + if (ret == 0 && state == SDEV_RUNNING) >> + blk_mq_run_hw_queues(sdev->request_queue, true); >> > > Thinking of further, this way isn't safe, given the caller should make > sure that the queue is alive before calling blk_mq_run_hw_queues(). > > Now it is in sysfs write path, and the scsi device can become gone > just before calling blk_mq_run_hw_queues(). Hi Ming, What makes you think that this approach is unsafe? I think that __scsi_remove_device() waits until all sysfs callbacks have finished before it calls blk_cleanup_queue(). Bart.
On Thu, Mar 21, 2019 at 08:17:44PM -0700, Bart Van Assche wrote: > On 3/21/19 7:11 PM, Ming Lei wrote: > > On Fri, Mar 22, 2019 at 09:45:54AM +0800, zhengbin wrote: > > > + /* If device state changes to SDEV_RUNNING, we need to run hw queue > > > + * to avoid io hung. > > > + */ > > > + if (ret == 0 && state == SDEV_RUNNING) > > > + blk_mq_run_hw_queues(sdev->request_queue, true); > > > > > > > Thinking of further, this way isn't safe, given the caller should make > > sure that the queue is alive before calling blk_mq_run_hw_queues(). > > > > Now it is in sysfs write path, and the scsi device can become gone > > just before calling blk_mq_run_hw_queues(). > > Hi Ming, > > What makes you think that this approach is unsafe? I think that > __scsi_remove_device() waits until all sysfs callbacks have finished before > it calls blk_cleanup_queue(). You mean sysfs_remove_groups() will wait for completion of the current store? If sysfs would provide such guarantee, it is safe. Looks too tricky even though it is true, so I still suggest to run queue with holding the mutex of state_mutex, given this usage has been done by __scsi_remove_device() already. Thanks, Ming
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index 6a9040f..266e4c7 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -772,6 +772,11 @@ store_state_field(struct device *dev, struct device_attribute *attr, mutex_lock(&sdev->state_mutex); ret = scsi_device_set_state(sdev, state); mutex_unlock(&sdev->state_mutex); + /* If device state changes to SDEV_RUNNING, we need to run hw queue + * to avoid io hung. + */ + if (ret == 0 && state == SDEV_RUNNING) + blk_mq_run_hw_queues(sdev->request_queue, true); return ret == 0 ? count : -EINVAL; }
When I use dd test a SCSI device which use blk-mq in the following steps: 1.echo "blocked" >/sys/block/sda/device/state 2.dd if=/dev/sda of=/mnt/t.log bs=1M count=10 3.echo "running" >/sys/block/sda/device/state dd should finish this work after step 3, unfortunately, still hung. After step2, the key code process is like this: blk_mq_dispatch_rq_list-->scsi_queue_rq-->prep_to_mq prep_to_mq will return BLK_STS_RESOURCE, and scsi_queue_rq will transter it to BLK_STS_DEV_RESOURCE, which means that driver can guarantee that IO dispatch will be triggered in future when the resource is available. Need to follow the rule if we set the device state to running. Signed-off-by: zhengbin <zhengbin13@huawei.com> --- drivers/scsi/scsi_sysfs.c | 5 +++++ 1 file changed, 5 insertions(+) -- 2.7.4