Message ID | 20220718062928.335399-2-hch@lst.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] block: call blk_mq_exit_queue from disk_release for never added disks | expand |
Hi Christoph, On Mon, Jul 18, 2022 at 08:29:28AM +0200, Christoph Hellwig wrote: > This was just a rather broken way to paper over a core block layer bug > that is now fixed. > > This reverts commit cebbe577cb17ed9b04b50d9e6802a8bacffbadca. This change will break START_DEV/STOP_DEV, which is supposed to run multiple cycles after the device is added, especially this way can help to implement error recovery from userside, such as one ubq_daemon is crashed/hang, the device can be recovered by sending STOP_DEV/START_DEV commands again after new ubq_daemon is setup. So here we do need separated request_queue/disk, and the model is similar with scsi's, in which disk rebind needs to be supported and GD_OWNS_QUEUE can't be set. Thanks, Ming
On Tue, Jul 19, 2022 at 08:37:23PM +0800, Ming Lei wrote: > This change will break START_DEV/STOP_DEV, which is supposed to run > multiple cycles after the device is added, especially this way can > help to implement error recovery from userside, such as one ubq_daemon > is crashed/hang, the device can be recovered by sending STOP_DEV/START_DEV > commands again after new ubq_daemon is setup. What is broken in START_DEV/STOP_DEV? Please explain the semantics you want and what doesn't work. FYI, there is nothing in the test suite the complains. And besides the obvious block layer bug that Jens found you seemed to be perfectly happy with the semantics. > So here we do need separated request_queue/disk, and the model is > similar with scsi's, in which disk rebind needs to be supported > and GD_OWNS_QUEUE can't be set. SCSI needs it because it needs the request_queue to probe for what ULP to bind to, and it allows to unbind the ULP. None of that is the case here. And managing the lifetimes separately is a complete mess, so don't do it. Especially not in a virtual driver where you don't have to cater to a long set protocol like SCSI. > > Thanks, > Ming ---end quoted text---
On Wed, Jul 20, 2022 at 08:07:05AM +0200, Christoph Hellwig wrote: > On Tue, Jul 19, 2022 at 08:37:23PM +0800, Ming Lei wrote: > > This change will break START_DEV/STOP_DEV, which is supposed to run > > multiple cycles after the device is added, especially this way can > > help to implement error recovery from userside, such as one ubq_daemon > > is crashed/hang, the device can be recovered by sending STOP_DEV/START_DEV > > commands again after new ubq_daemon is setup. > > What is broken in START_DEV/STOP_DEV? Please explain the semantics you > want and what doesn't work. FYI, there is nothing in the test suite the > complains. And besides the obvious block layer bug that Jens found you > seemed to be perfectly happy with the semantics. START_DEV calls add_disk(), and STOP_DEV calls del_gendisk(), but if GD_OWNS_QUEUE is set, blk_mq_exit_queue() will be called in del_gendisk(), then the following START_DEV will stuck. > > > So here we do need separated request_queue/disk, and the model is > > similar with scsi's, in which disk rebind needs to be supported > > and GD_OWNS_QUEUE can't be set. > > SCSI needs it because it needs the request_queue to probe for what ULP > to bind to, and it allows to unbind the ULP. None of that is the case > here. And managing the lifetimes separately is a complete mess, so > don't do it. Especially not in a virtual driver where you don't have > to cater to a long set protocol like SCSI. If blk_mq_exit_queue is called in del_gendisk() for scsi, how can re-bind work as expected since it needs one completely workable request queue instead of partial exited one? Thanks, Ming
On Wed, Jul 20, 2022 at 03:47:27PM +0800, Ming Lei wrote: > > What is broken in START_DEV/STOP_DEV? Please explain the semantics you > > want and what doesn't work. FYI, there is nothing in the test suite the > > complains. And besides the obvious block layer bug that Jens found you > > seemed to be perfectly happy with the semantics. > > START_DEV calls add_disk(), and STOP_DEV calls del_gendisk(), but if > GD_OWNS_QUEUE is set, blk_mq_exit_queue() will be called in > del_gendisk(), then the following START_DEV will stuck. Uh, yeah. alloc_disk and add_disk are supposed to be paired and not split over different ioctls. The lifetime rules here are rather broken. > > > similar with scsi's, in which disk rebind needs to be supported > > > and GD_OWNS_QUEUE can't be set. > > > > SCSI needs it because it needs the request_queue to probe for what ULP > > to bind to, and it allows to unbind the ULP. None of that is the case > > here. And managing the lifetimes separately is a complete mess, so > > don't do it. Especially not in a virtual driver where you don't have > > to cater to a long set protocol like SCSI. > > If blk_mq_exit_queue is called in del_gendisk() for scsi, how can > re-bind work as expected since it needs one completely workable > request queue instead of partial exited one? For !GD_OWNS_QUEUE blk_mq_exit_queue is not called from del_gendisk().
On Wed, Jul 20, 2022 at 11:00:40AM +0200, Christoph Hellwig wrote: > On Wed, Jul 20, 2022 at 03:47:27PM +0800, Ming Lei wrote: > > > What is broken in START_DEV/STOP_DEV? Please explain the semantics you > > > want and what doesn't work. FYI, there is nothing in the test suite the > > > complains. And besides the obvious block layer bug that Jens found you > > > seemed to be perfectly happy with the semantics. > > > > START_DEV calls add_disk(), and STOP_DEV calls del_gendisk(), but if > > GD_OWNS_QUEUE is set, blk_mq_exit_queue() will be called in > > del_gendisk(), then the following START_DEV will stuck. > > Uh, yeah. alloc_disk and add_disk are supposed to be paired and > not split over different ioctls. The lifetime rules here are > rather broken. Can you explain what this way breaks? And why can't one disk be added/deleted multiple times? > > > > > similar with scsi's, in which disk rebind needs to be supported > > > > and GD_OWNS_QUEUE can't be set. > > > > > > SCSI needs it because it needs the request_queue to probe for what ULP > > > to bind to, and it allows to unbind the ULP. None of that is the case > > > here. And managing the lifetimes separately is a complete mess, so > > > don't do it. Especially not in a virtual driver where you don't have > > > to cater to a long set protocol like SCSI. > > > > If blk_mq_exit_queue is called in del_gendisk() for scsi, how can > > re-bind work as expected since it needs one completely workable > > request queue instead of partial exited one? > > For !GD_OWNS_QUEUE blk_mq_exit_queue is not called from del_gendisk(). That is why scsi can't set GD_OWNS_QUEUE, for any driver, if disk rebind or similar behavior is needed, GD_OWNS_QUEUE can't be set. That is why ublk_drv uses separated queue/disk. thanks, Ming
On Wed, Jul 20, 2022 at 11:00:40AM +0200, Christoph Hellwig wrote: > On Wed, Jul 20, 2022 at 03:47:27PM +0800, Ming Lei wrote: > > > What is broken in START_DEV/STOP_DEV? Please explain the semantics you > > > want and what doesn't work. FYI, there is nothing in the test suite the > > > complains. And besides the obvious block layer bug that Jens found you > > > seemed to be perfectly happy with the semantics. > > > > START_DEV calls add_disk(), and STOP_DEV calls del_gendisk(), but if > > GD_OWNS_QUEUE is set, blk_mq_exit_queue() will be called in > > del_gendisk(), then the following START_DEV will stuck. > > Uh, yeah. alloc_disk and add_disk are supposed to be paired and > not split over different ioctls. The lifetime rules here are > rather broken. Even though alloc_disk and add_disk is paired here, GD_OWNS_QUEUE still can't be set because request queue has to be workable for the new alloc/ added disk, just like scsi. So it is nothing to do with pair of alloc_disk/add_disk(). Not mention I don't see any thing wrong with adding/deleting disk multiple times. Thanks, Ming
On Wed, Jul 20, 2022 at 06:23:22PM +0800, Ming Lei wrote: > Even though alloc_disk and add_disk is paired here, GD_OWNS_QUEUE still > can't be set because request queue has to be workable for the new alloc/ > added disk, just like scsi. How so? dm has totally normall disk/request_queue lifetimes. The only caveat is that the blk-mq bits of the queue are added after the initial non-mq disk allocation. There is no newly added disk after the disk and queue are torn down.
On Wed, Jul 20, 2022 at 03:08:45PM +0200, Christoph Hellwig wrote: > On Wed, Jul 20, 2022 at 06:23:22PM +0800, Ming Lei wrote: > > Even though alloc_disk and add_disk is paired here, GD_OWNS_QUEUE still > > can't be set because request queue has to be workable for the new alloc/ > > added disk, just like scsi. > > How so? dm has totally normall disk/request_queue lifetimes. The only > caveat is that the blk-mq bits of the queue are added after the initial > non-mq disk allocation. There is no newly added disk after the disk > and queue are torn down. I meant that request queue is supposed to be low level stuff for implementing disk function, and request queue hasn't to be released and re-allocated after each disk whole lifetime(alloc disk, add disk, del_gendisk, release disk). IMO, the limit is just from GD_OWNS_QUEUE which moves releasing some queue resource into del_gendisk or disk release(as the fixes you posted), and this way is fragile, frankly speaking. In theory, allocating queue and releasing queue should be completely symmetrical, but GD_OWNS_QUEUE does change this way. And GD_OWNS_QUEUE can't be set for SCSI, so the two modes have to be supported by block layer. Thanks, Ming
On Wed, Jul 20, 2022 at 11:33:24PM +0800, Ming Lei wrote: > I meant that request queue is supposed to be low level stuff for implementing > disk function, and request queue hasn't to be released and re-allocated > after each disk whole lifetime(alloc disk, add disk, del_gendisk, release disk). It doesn't have to. But it is pretty damn convenient if we can. What we can't easily do is to re-add a disk after del_gendisk was called but before the final reference went away, which is what ublk does do currently. In other words it does not even follow the SCSI model, which works but is somewhat cumbersome, but comes up with yet another weird model that is probably going to fall apart pretty quickly when doing STAT_DEV, STOP_DEV loops.
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c index b633d268b90ae..4e0248ef6bec5 100644 --- a/drivers/block/ublk_drv.c +++ b/drivers/block/ublk_drv.c @@ -155,8 +155,6 @@ static DEFINE_MUTEX(ublk_ctl_mutex); static struct miscdevice ublk_misc; -static struct lock_class_key ublk_bio_compl_lkclass; - static inline bool ublk_can_use_task_work(const struct ublk_queue *ubq) { if (IS_BUILTIN(CONFIG_BLK_DEV_UBLK) && @@ -633,7 +631,7 @@ static void ublk_commit_rqs(struct blk_mq_hw_ctx *hctx) static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data, unsigned int hctx_idx) { - struct ublk_device *ub = driver_data; + struct ublk_device *ub = hctx->queue->queuedata; struct ublk_queue *ubq = ublk_get_queue(ub, hctx->queue_num); hctx->driver_data = ubq; @@ -1074,8 +1072,6 @@ static void ublk_cdev_rel(struct device *dev) { struct ublk_device *ub = container_of(dev, struct ublk_device, cdev_dev); - blk_mq_destroy_queue(ub->ub_queue); - put_disk(ub->ub_disk); blk_mq_free_tag_set(&ub->tag_set); @@ -1165,17 +1161,14 @@ static int ublk_add_dev(struct ublk_device *ub) if (err) goto out_deinit_queues; - ub->ub_queue = blk_mq_init_queue(&ub->tag_set); - if (IS_ERR(ub->ub_queue)) - goto out_cleanup_tags; - ub->ub_queue->queuedata = ub; - - disk = ub->ub_disk = blk_mq_alloc_disk_for_queue(ub->ub_queue, - &ublk_bio_compl_lkclass); + disk = ub->ub_disk = blk_mq_alloc_disk(&ub->tag_set, ub); if (IS_ERR(disk)) { err = PTR_ERR(disk); - goto out_free_request_queue; + goto out_cleanup_tags; } + ub->ub_queue = ub->ub_disk->queue; + + ub->ub_queue->queuedata = ub; blk_queue_logical_block_size(ub->ub_queue, bsize); blk_queue_physical_block_size(ub->ub_queue, bsize); @@ -1207,8 +1200,6 @@ static int ublk_add_dev(struct ublk_device *ub) return 0; -out_free_request_queue: - blk_mq_destroy_queue(ub->ub_queue); out_cleanup_tags: blk_mq_free_tag_set(&ub->tag_set); out_deinit_queues:
This was just a rather broken way to paper over a core block layer bug that is now fixed. This reverts commit cebbe577cb17ed9b04b50d9e6802a8bacffbadca. Signed-off-by: Christoph Hellwig <hch@lst.de> --- drivers/block/ublk_drv.c | 21 ++++++--------------- 1 file changed, 6 insertions(+), 15 deletions(-)