diff mbox series

[2/2] Revert "ublk_drv: fix request queue leak"

Message ID 20220718062928.335399-2-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [1/2] block: call blk_mq_exit_queue from disk_release for never added disks | expand

Commit Message

Christoph Hellwig July 18, 2022, 6:29 a.m. UTC
This was just a rather broken way to paper over a core block layer bug
that is now fixed.

This reverts commit cebbe577cb17ed9b04b50d9e6802a8bacffbadca.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/ublk_drv.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

Comments

Ming Lei July 19, 2022, 12:37 p.m. UTC | #1
Hi Christoph,

On Mon, Jul 18, 2022 at 08:29:28AM +0200, Christoph Hellwig wrote:
> This was just a rather broken way to paper over a core block layer bug
> that is now fixed.
> 
> This reverts commit cebbe577cb17ed9b04b50d9e6802a8bacffbadca.

This change will break START_DEV/STOP_DEV, which is supposed to run
multiple cycles after the device is added, especially this way can
help to implement error recovery from userside, such as one ubq_daemon
is crashed/hang, the device can be recovered by sending STOP_DEV/START_DEV
commands again after new ubq_daemon is setup.

So here we do need separated request_queue/disk, and the model is
similar with scsi's, in which disk rebind needs to be supported
and GD_OWNS_QUEUE can't be set.


Thanks,
Ming
Christoph Hellwig July 20, 2022, 6:07 a.m. UTC | #2
On Tue, Jul 19, 2022 at 08:37:23PM +0800, Ming Lei wrote:
> This change will break START_DEV/STOP_DEV, which is supposed to run
> multiple cycles after the device is added, especially this way can
> help to implement error recovery from userside, such as one ubq_daemon
> is crashed/hang, the device can be recovered by sending STOP_DEV/START_DEV
> commands again after new ubq_daemon is setup.

What is broken in START_DEV/STOP_DEV?  Please explain the semantics you
want and what doesn't work.  FYI, there is nothing in the test suite the
complains.  And besides the obvious block layer bug that Jens found you
seemed to be perfectly happy with the semantics.

> So here we do need separated request_queue/disk, and the model is
> similar with scsi's, in which disk rebind needs to be supported
> and GD_OWNS_QUEUE can't be set.

SCSI needs it because it needs the request_queue to probe for what ULP
to bind to, and it allows to unbind the ULP.  None of that is the case
here.  And managing the lifetimes separately is a complete mess, so
don't do it.  Especially not in a virtual driver where you don't have
to cater to a long set protocol like SCSI.

> 
> Thanks,
> Ming
---end quoted text---
Ming Lei July 20, 2022, 7:47 a.m. UTC | #3
On Wed, Jul 20, 2022 at 08:07:05AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 19, 2022 at 08:37:23PM +0800, Ming Lei wrote:
> > This change will break START_DEV/STOP_DEV, which is supposed to run
> > multiple cycles after the device is added, especially this way can
> > help to implement error recovery from userside, such as one ubq_daemon
> > is crashed/hang, the device can be recovered by sending STOP_DEV/START_DEV
> > commands again after new ubq_daemon is setup.
> 
> What is broken in START_DEV/STOP_DEV?  Please explain the semantics you
> want and what doesn't work.  FYI, there is nothing in the test suite the
> complains.  And besides the obvious block layer bug that Jens found you
> seemed to be perfectly happy with the semantics.

START_DEV calls add_disk(), and STOP_DEV calls del_gendisk(), but if 
GD_OWNS_QUEUE is set, blk_mq_exit_queue() will be called in
del_gendisk(), then the following START_DEV will stuck.

> 
> > So here we do need separated request_queue/disk, and the model is
> > similar with scsi's, in which disk rebind needs to be supported
> > and GD_OWNS_QUEUE can't be set.
> 
> SCSI needs it because it needs the request_queue to probe for what ULP
> to bind to, and it allows to unbind the ULP.  None of that is the case
> here.  And managing the lifetimes separately is a complete mess, so
> don't do it.  Especially not in a virtual driver where you don't have
> to cater to a long set protocol like SCSI.

If blk_mq_exit_queue is called in del_gendisk() for scsi, how can
re-bind work as expected since it needs one completely workable
request queue instead of partial exited one?


Thanks,
Ming
Christoph Hellwig July 20, 2022, 9 a.m. UTC | #4
On Wed, Jul 20, 2022 at 03:47:27PM +0800, Ming Lei wrote:
> > What is broken in START_DEV/STOP_DEV?  Please explain the semantics you
> > want and what doesn't work.  FYI, there is nothing in the test suite the
> > complains.  And besides the obvious block layer bug that Jens found you
> > seemed to be perfectly happy with the semantics.
> 
> START_DEV calls add_disk(), and STOP_DEV calls del_gendisk(), but if 
> GD_OWNS_QUEUE is set, blk_mq_exit_queue() will be called in
> del_gendisk(), then the following START_DEV will stuck.

Uh, yeah.  alloc_disk and add_disk are supposed to be paired and
not split over different ioctls.  The lifetime rules here are
rather broken.

> > > similar with scsi's, in which disk rebind needs to be supported
> > > and GD_OWNS_QUEUE can't be set.
> > 
> > SCSI needs it because it needs the request_queue to probe for what ULP
> > to bind to, and it allows to unbind the ULP.  None of that is the case
> > here.  And managing the lifetimes separately is a complete mess, so
> > don't do it.  Especially not in a virtual driver where you don't have
> > to cater to a long set protocol like SCSI.
> 
> If blk_mq_exit_queue is called in del_gendisk() for scsi, how can
> re-bind work as expected since it needs one completely workable
> request queue instead of partial exited one?

For !GD_OWNS_QUEUE blk_mq_exit_queue is not called from del_gendisk().
Ming Lei July 20, 2022, 10:16 a.m. UTC | #5
On Wed, Jul 20, 2022 at 11:00:40AM +0200, Christoph Hellwig wrote:
> On Wed, Jul 20, 2022 at 03:47:27PM +0800, Ming Lei wrote:
> > > What is broken in START_DEV/STOP_DEV?  Please explain the semantics you
> > > want and what doesn't work.  FYI, there is nothing in the test suite the
> > > complains.  And besides the obvious block layer bug that Jens found you
> > > seemed to be perfectly happy with the semantics.
> > 
> > START_DEV calls add_disk(), and STOP_DEV calls del_gendisk(), but if 
> > GD_OWNS_QUEUE is set, blk_mq_exit_queue() will be called in
> > del_gendisk(), then the following START_DEV will stuck.
> 
> Uh, yeah.  alloc_disk and add_disk are supposed to be paired and
> not split over different ioctls.  The lifetime rules here are
> rather broken.

Can you explain what this way breaks? And why can't one disk be
added/deleted multiple times?

> 
> > > > similar with scsi's, in which disk rebind needs to be supported
> > > > and GD_OWNS_QUEUE can't be set.
> > > 
> > > SCSI needs it because it needs the request_queue to probe for what ULP
> > > to bind to, and it allows to unbind the ULP.  None of that is the case
> > > here.  And managing the lifetimes separately is a complete mess, so
> > > don't do it.  Especially not in a virtual driver where you don't have
> > > to cater to a long set protocol like SCSI.
> > 
> > If blk_mq_exit_queue is called in del_gendisk() for scsi, how can
> > re-bind work as expected since it needs one completely workable
> > request queue instead of partial exited one?
> 
> For !GD_OWNS_QUEUE blk_mq_exit_queue is not called from del_gendisk().

That is why scsi can't set GD_OWNS_QUEUE, for any driver, if disk rebind
or similar behavior is needed, GD_OWNS_QUEUE can't be set. That is why
ublk_drv uses separated queue/disk.


thanks,
Ming
Ming Lei July 20, 2022, 10:23 a.m. UTC | #6
On Wed, Jul 20, 2022 at 11:00:40AM +0200, Christoph Hellwig wrote:
> On Wed, Jul 20, 2022 at 03:47:27PM +0800, Ming Lei wrote:
> > > What is broken in START_DEV/STOP_DEV?  Please explain the semantics you
> > > want and what doesn't work.  FYI, there is nothing in the test suite the
> > > complains.  And besides the obvious block layer bug that Jens found you
> > > seemed to be perfectly happy with the semantics.
> > 
> > START_DEV calls add_disk(), and STOP_DEV calls del_gendisk(), but if 
> > GD_OWNS_QUEUE is set, blk_mq_exit_queue() will be called in
> > del_gendisk(), then the following START_DEV will stuck.
> 
> Uh, yeah.  alloc_disk and add_disk are supposed to be paired and
> not split over different ioctls.  The lifetime rules here are
> rather broken.

Even though alloc_disk and add_disk is paired here, GD_OWNS_QUEUE still
can't be set because request queue has to be workable for the new alloc/
added disk, just like scsi.

So it is nothing to do with pair of alloc_disk/add_disk().

Not mention I don't see any thing wrong with adding/deleting disk
multiple times.

Thanks,
Ming
Christoph Hellwig July 20, 2022, 1:08 p.m. UTC | #7
On Wed, Jul 20, 2022 at 06:23:22PM +0800, Ming Lei wrote:
> Even though alloc_disk and add_disk is paired here, GD_OWNS_QUEUE still
> can't be set because request queue has to be workable for the new alloc/
> added disk, just like scsi.

How so?  dm has totally normall disk/request_queue lifetimes.  The only
caveat is that the blk-mq bits of the queue are added after the initial
non-mq disk allocation.  There is no newly added disk after the disk
and queue are torn down.
Ming Lei July 20, 2022, 3:33 p.m. UTC | #8
On Wed, Jul 20, 2022 at 03:08:45PM +0200, Christoph Hellwig wrote:
> On Wed, Jul 20, 2022 at 06:23:22PM +0800, Ming Lei wrote:
> > Even though alloc_disk and add_disk is paired here, GD_OWNS_QUEUE still
> > can't be set because request queue has to be workable for the new alloc/
> > added disk, just like scsi.
> 
> How so?  dm has totally normall disk/request_queue lifetimes.  The only
> caveat is that the blk-mq bits of the queue are added after the initial
> non-mq disk allocation.  There is no newly added disk after the disk
> and queue are torn down.

I meant that request queue is supposed to be low level stuff for implementing
disk function, and request queue hasn't to be released and re-allocated
after each disk whole lifetime(alloc disk, add disk, del_gendisk, release disk).

IMO, the limit is just from GD_OWNS_QUEUE which moves releasing some queue
resource into del_gendisk or disk release(as the fixes you posted), and this
way is fragile, frankly speaking. In theory, allocating queue and releasing
queue should be completely symmetrical, but GD_OWNS_QUEUE does change this way.

And GD_OWNS_QUEUE can't be set for SCSI, so the two modes have to be
supported by block layer.

Thanks,
Ming
Christoph Hellwig July 21, 2022, 5:12 a.m. UTC | #9
On Wed, Jul 20, 2022 at 11:33:24PM +0800, Ming Lei wrote:
> I meant that request queue is supposed to be low level stuff for implementing
> disk function, and request queue hasn't to be released and re-allocated
> after each disk whole lifetime(alloc disk, add disk, del_gendisk, release disk).

It doesn't have to.  But it is pretty damn convenient if we can.

What we can't easily do is to re-add a disk after del_gendisk was
called but before the final reference went away, which is what ublk
does do currently.  In other words it does not even follow the SCSI
model, which works but is somewhat cumbersome, but comes up with
yet another weird model that is probably going to fall apart pretty
quickly when doing STAT_DEV, STOP_DEV loops.
diff mbox series

Patch

diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index b633d268b90ae..4e0248ef6bec5 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -155,8 +155,6 @@  static DEFINE_MUTEX(ublk_ctl_mutex);
 
 static struct miscdevice ublk_misc;
 
-static struct lock_class_key ublk_bio_compl_lkclass;
-
 static inline bool ublk_can_use_task_work(const struct ublk_queue *ubq)
 {
 	if (IS_BUILTIN(CONFIG_BLK_DEV_UBLK) &&
@@ -633,7 +631,7 @@  static void ublk_commit_rqs(struct blk_mq_hw_ctx *hctx)
 static int ublk_init_hctx(struct blk_mq_hw_ctx *hctx, void *driver_data,
 		unsigned int hctx_idx)
 {
-	struct ublk_device *ub = driver_data;
+	struct ublk_device *ub = hctx->queue->queuedata;
 	struct ublk_queue *ubq = ublk_get_queue(ub, hctx->queue_num);
 
 	hctx->driver_data = ubq;
@@ -1074,8 +1072,6 @@  static void ublk_cdev_rel(struct device *dev)
 {
 	struct ublk_device *ub = container_of(dev, struct ublk_device, cdev_dev);
 
-	blk_mq_destroy_queue(ub->ub_queue);
-
 	put_disk(ub->ub_disk);
 
 	blk_mq_free_tag_set(&ub->tag_set);
@@ -1165,17 +1161,14 @@  static int ublk_add_dev(struct ublk_device *ub)
 	if (err)
 		goto out_deinit_queues;
 
-	ub->ub_queue = blk_mq_init_queue(&ub->tag_set);
-	if (IS_ERR(ub->ub_queue))
-		goto out_cleanup_tags;
-	ub->ub_queue->queuedata = ub;
-
-	disk = ub->ub_disk = blk_mq_alloc_disk_for_queue(ub->ub_queue,
-						 &ublk_bio_compl_lkclass);
+	disk = ub->ub_disk = blk_mq_alloc_disk(&ub->tag_set, ub);
 	if (IS_ERR(disk)) {
 		err = PTR_ERR(disk);
-		goto out_free_request_queue;
+		goto out_cleanup_tags;
 	}
+	ub->ub_queue = ub->ub_disk->queue;
+
+	ub->ub_queue->queuedata = ub;
 
 	blk_queue_logical_block_size(ub->ub_queue, bsize);
 	blk_queue_physical_block_size(ub->ub_queue, bsize);
@@ -1207,8 +1200,6 @@  static int ublk_add_dev(struct ublk_device *ub)
 
 	return 0;
 
-out_free_request_queue:
-	blk_mq_destroy_queue(ub->ub_queue);
 out_cleanup_tags:
 	blk_mq_free_tag_set(&ub->tag_set);
 out_deinit_queues: