diff mbox series

[V2,09/13] scsi: force unfreezing queue into atomic mode

Message ID 20220122111054.1126146-10-ming.lei@redhat.com (mailing list archive)
State Not Applicable
Headers show
Series block: don't drain file system I/O on del_gendisk | expand

Commit Message

Ming Lei Jan. 22, 2022, 11:10 a.m. UTC
In scsi_disk_release() request queue is frozen for clearing
disk->private_data, and there can't be any FS IO issued to
this queue, and only private passthrough request will be handled, so
force unfreezing queue into atomic mode.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 drivers/scsi/sd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christoph Hellwig Jan. 24, 2022, 1:15 p.m. UTC | #1
On Sat, Jan 22, 2022 at 07:10:50PM +0800, Ming Lei wrote:
> In scsi_disk_release() request queue is frozen for clearing
> disk->private_data, and there can't be any FS IO issued to
> this queue, and only private passthrough request will be handled, so
> force unfreezing queue into atomic mode.
> 
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  drivers/scsi/sd.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 0e73c3f2f381..27f04c860f00 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3670,7 +3670,7 @@ static void scsi_disk_release(struct device *dev)
>  	 * in case multiple processes open a /dev/sd... node concurrently.
>  	 */
>  	blk_mq_freeze_queue(q);
> -	blk_mq_unfreeze_queue(q);
> +	__blk_mq_unfreeze_queue(q, true);

I think the right thing here is to drop the freeze/unfreeze pair.
Now that del_gendisk properly freezes the queue, we don't need this
protection as the issue that Bart fixed with it can't happen any more.
Ming Lei Jan. 24, 2022, 11:21 p.m. UTC | #2
On Mon, Jan 24, 2022 at 02:15:16PM +0100, Christoph Hellwig wrote:
> On Sat, Jan 22, 2022 at 07:10:50PM +0800, Ming Lei wrote:
> > In scsi_disk_release() request queue is frozen for clearing
> > disk->private_data, and there can't be any FS IO issued to
> > this queue, and only private passthrough request will be handled, so
> > force unfreezing queue into atomic mode.
> > 
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  drivers/scsi/sd.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> > index 0e73c3f2f381..27f04c860f00 100644
> > --- a/drivers/scsi/sd.c
> > +++ b/drivers/scsi/sd.c
> > @@ -3670,7 +3670,7 @@ static void scsi_disk_release(struct device *dev)
> >  	 * in case multiple processes open a /dev/sd... node concurrently.
> >  	 */
> >  	blk_mq_freeze_queue(q);
> > -	blk_mq_unfreeze_queue(q);
> > +	__blk_mq_unfreeze_queue(q, true);
> 
> I think the right thing here is to drop the freeze/unfreeze pair.
> Now that del_gendisk properly freezes the queue, we don't need this
> protection as the issue that Bart fixed with it can't happen any more.

As you see, the last patch removes freeze/unfreeze pair in del_gendisk(),
which looks not very useful: it can't drain IO on bio based driver, and
del_gendisk() is supposed to provide consistent behavior for both request
and bio based driver.


Thanks,
Ming
Christoph Hellwig Jan. 25, 2022, 7:27 a.m. UTC | #3
On Tue, Jan 25, 2022 at 07:21:55AM +0800, Ming Lei wrote:
> > > @@ -3670,7 +3670,7 @@ static void scsi_disk_release(struct device *dev)
> > >  	 * in case multiple processes open a /dev/sd... node concurrently.
> > >  	 */
> > >  	blk_mq_freeze_queue(q);
> > > -	blk_mq_unfreeze_queue(q);
> > > +	__blk_mq_unfreeze_queue(q, true);
> > 
> > I think the right thing here is to drop the freeze/unfreeze pair.
> > Now that del_gendisk properly freezes the queue, we don't need this
> > protection as the issue that Bart fixed with it can't happen any more.
> 
> As you see, the last patch removes freeze/unfreeze pair in del_gendisk(),
> which looks not very useful: it can't drain IO on bio based driver, and
> del_gendisk() is supposed to provide consistent behavior for both request
> and bio based driver.

So what is the advantage of trying to remove the freeze from where
it belongs (common unregister code) while keeping it where it is a bandaid
(driver specific unregister code)?
Ming Lei Jan. 25, 2022, 8:54 a.m. UTC | #4
On Tue, Jan 25, 2022 at 08:27:39AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 25, 2022 at 07:21:55AM +0800, Ming Lei wrote:
> > > > @@ -3670,7 +3670,7 @@ static void scsi_disk_release(struct device *dev)
> > > >  	 * in case multiple processes open a /dev/sd... node concurrently.
> > > >  	 */
> > > >  	blk_mq_freeze_queue(q);
> > > > -	blk_mq_unfreeze_queue(q);
> > > > +	__blk_mq_unfreeze_queue(q, true);
> > > 
> > > I think the right thing here is to drop the freeze/unfreeze pair.
> > > Now that del_gendisk properly freezes the queue, we don't need this
> > > protection as the issue that Bart fixed with it can't happen any more.
> > 
> > As you see, the last patch removes freeze/unfreeze pair in del_gendisk(),
> > which looks not very useful: it can't drain IO on bio based driver, and
> > del_gendisk() is supposed to provide consistent behavior for both request
> > and bio based driver.
> 
> So what is the advantage of trying to remove the freeze from where
> it belongs (common unregister code) while keeping it where it is a bandaid
> (driver specific unregister code)?

freeze in common unregister code is actually not good, because it provide
nothing for bio based driver, so we can't move blk-cgroup shutdown into
del_gendisk. Also we can't move elevator shutdown to del_gendisk for
similar reason.

Secondly freeze is pretty slow in percpu mode, so why slow down removing every
disk just for scsi's bandaid?


Thanks,
Ming
Christoph Hellwig Jan. 26, 2022, 8:15 a.m. UTC | #5
On Tue, Jan 25, 2022 at 04:54:43PM +0800, Ming Lei wrote:
> > So what is the advantage of trying to remove the freeze from where
> > it belongs (common unregister code) while keeping it where it is a bandaid
> > (driver specific unregister code)?
> 
> freeze in common unregister code is actually not good, because it provide
> nothing for bio based driver, so we can't move blk-cgroup shutdown into
> del_gendisk. Also we can't move elevator shutdown to del_gendisk for
> similar reason.
> 
> Secondly freeze is pretty slow in percpu mode, so why slow down removing every
> disk just for scsi's bandaid?

I'd frame this differently:

 - del_gendisk is the right place to freeze the queue, as that is where the
   gendisk is unregistered and all fs I/O needs to stop.  If we don't get
   all aspects right we need to fix.  As mentioned I'm already looking into
   keeping a reference for the bio life time for bio based drivers.
 - SCSI is the only driver that ever submits passthrough I/O without the
   gendisk.  So doing an additional freeze during request_queue teardown
   is only needed for SCSI and we can eventually remove that for everyone
   else.  And most of your series already does really good work towards that
   goal!

> 
> 
> Thanks,
> Ming
---end quoted text---
diff mbox series

Patch

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 0e73c3f2f381..27f04c860f00 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3670,7 +3670,7 @@  static void scsi_disk_release(struct device *dev)
 	 * in case multiple processes open a /dev/sd... node concurrently.
 	 */
 	blk_mq_freeze_queue(q);
-	blk_mq_unfreeze_queue(q);
+	__blk_mq_unfreeze_queue(q, true);
 
 	disk->private_data = NULL;
 	put_disk(disk);