diff mbox series

[V2,05/13] block: only account passthrough IO from userspace

Message ID 20220122111054.1126146-6-ming.lei@redhat.com (mailing list archive)
State Not Applicable
Headers show
Series block: don't drain file system I/O on del_gendisk | expand

Commit Message

Ming Lei Jan. 22, 2022, 11:10 a.m. UTC
Passthrough request from userspace has one active block_device/disk
associated, so they can be accounted via rq->q->disk. For other
passthrough request, there may not be disk/block_device for the queue,
since either the queue has not a disk or the disk may be deleted
already.

Add flag of BLK_MQ_REQ_USER_IO for only accounting passthrough request
from userspace.

Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c            | 13 ++++++++++++-
 drivers/nvme/host/ioctl.c |  2 +-
 drivers/scsi/scsi_ioctl.c |  3 ++-
 include/linux/blk-mq.h    |  2 ++
 4 files changed, 17 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig Jan. 24, 2022, 1:05 p.m. UTC | #1
On Sat, Jan 22, 2022 at 07:10:46PM +0800, Ming Lei wrote:
> Passthrough request from userspace has one active block_device/disk
> associated, so they can be accounted via rq->q->disk. For other
> passthrough request, there may not be disk/block_device for the queue,
> since either the queue has not a disk or the disk may be deleted
> already.
> 
> Add flag of BLK_MQ_REQ_USER_IO for only accounting passthrough request
> from userspace.

Please explain why you want to change this.

Also this is missing I/O from /dev/sg, CDROM CDDA BPC reading, the tape
drivers and bsg-lib.
Ming Lei Jan. 24, 2022, 11:09 p.m. UTC | #2
On Mon, Jan 24, 2022 at 02:05:55PM +0100, Christoph Hellwig wrote:
> On Sat, Jan 22, 2022 at 07:10:46PM +0800, Ming Lei wrote:
> > Passthrough request from userspace has one active block_device/disk
> > associated, so they can be accounted via rq->q->disk. For other
> > passthrough request, there may not be disk/block_device for the queue,
> > since either the queue has not a disk or the disk may be deleted
> > already.
> > 
> > Add flag of BLK_MQ_REQ_USER_IO for only accounting passthrough request
> > from userspace.
> 
> Please explain why you want to change this.

Please see the following code:

        /* passthrough requests can hold bios that do not have ->bi_bdev set */
        if (rq->bio && rq->bio->bi_bdev)
                rq->part = rq->bio->bi_bdev;
        else if (rq->q->disk)
                rq->part = rq->q->disk->part0;

q->disk can be cleared by disk_release() just when referring the above line, then
NULL ptr reference is caused, and similar issue with any reference to rq->part for
passthrough request sent not from userspace.

> 
> Also this is missing I/O from /dev/sg, CDROM CDDA BPC reading, the tape
> drivers and bsg-lib.

Except for CDROM CDDA BPC reading, the others don't have gendisk associated,
so they needn't such change. And it looks easy to do that for CDROM CDDA BPC
reading.


Thanks,
Ming
Christoph Hellwig Jan. 25, 2022, 6:16 a.m. UTC | #3
On Tue, Jan 25, 2022 at 07:09:09AM +0800, Ming Lei wrote:
> > Please explain why you want to change this.
> 
> Please see the following code:

This needs to go into the commit log.

> 
>         /* passthrough requests can hold bios that do not have ->bi_bdev set */
>         if (rq->bio && rq->bio->bi_bdev)
>                 rq->part = rq->bio->bi_bdev;
>         else if (rq->q->disk)
>                 rq->part = rq->q->disk->part0;
> 
> q->disk can be cleared by disk_release() just when referring the above line, then
> NULL ptr reference is caused, and similar issue with any reference to rq->part for
> passthrough request sent not from userspace.

So why not key off accouning off "rq->bio && rq->bio->bi_bdev"
and remove the need for the flag and the second half of the assignment
above?  That is much less error probe and removes code size.
Christoph Hellwig Jan. 25, 2022, 7:19 a.m. UTC | #4
On Tue, Jan 25, 2022 at 07:16:34AM +0100, Christoph Hellwig wrote:
> So why not key off accouning off "rq->bio && rq->bio->bi_bdev"
> and remove the need for the flag and the second half of the assignment
> above?  That is much less error probe and removes code size.

Something like this, lightly tested:

---
From 5499d013341b492899d1fecde7680ff8ebd232e9 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Tue, 25 Jan 2022 07:29:06 +0100
Subject: block: remove the part field from struct request

All file system I/O and most userspace passthrough bios have bi_bdev set.
Switch I/O accounting to directly use the bio and stop copying it into a
separate struct request field.

This changes behavior in that e.g. /dev/sgX requests are not accounted
to the gendisk for the SCSI disk any more, which is the correct thing to
do as they never went through that gendisk, and fixes a potential race
when the disk driver is unbound while /dev/sgX I/O is in progress.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-merge.c      | 12 ++++++------
 block/blk-mq.c         | 32 +++++++++++++-------------------
 block/blk.h            |  6 +++---
 include/linux/blk-mq.h |  1 -
 4 files changed, 22 insertions(+), 29 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 4de34a332c9fd..43e46ea2f0152 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -739,11 +739,11 @@ void blk_rq_set_mixed_merge(struct request *rq)
 
 static void blk_account_io_merge_request(struct request *req)
 {
-	if (blk_do_io_stat(req)) {
-		part_stat_lock();
-		part_stat_inc(req->part, merges[op_stat_group(req_op(req))]);
-		part_stat_unlock();
-	}
+	if (!blk_do_io_stat(req))
+		return;
+	part_stat_lock();
+	part_stat_inc(req->bio->bi_bdev, merges[op_stat_group(req_op(req))]);
+	part_stat_unlock();
 }
 
 static enum elv_merge blk_try_req_merge(struct request *req,
@@ -947,7 +947,7 @@ static void blk_account_io_merge_bio(struct request *req)
 		return;
 
 	part_stat_lock();
-	part_stat_inc(req->part, merges[op_stat_group(req_op(req))]);
+	part_stat_inc(req->bio->bi_bdev, merges[op_stat_group(req_op(req))]);
 	part_stat_unlock();
 }
 
diff --git a/block/blk-mq.c b/block/blk-mq.c
index f3bf3358a3bb2..01b3862347965 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -132,10 +132,12 @@ static bool blk_mq_check_inflight(struct request *rq, void *priv,
 {
 	struct mq_inflight *mi = priv;
 
-	if ((!mi->part->bd_partno || rq->part == mi->part) &&
-	    blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT)
-		mi->inflight[rq_data_dir(rq)]++;
+	if (blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)
+		return true;
+	if (mi->part->bd_partno && rq->bio && rq->bio->bi_bdev != mi->part)
+		return true;
 
+	mi->inflight[rq_data_dir(rq)]++;
 	return true;
 }
 
@@ -331,7 +333,6 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
 	rq->tag = BLK_MQ_NO_TAG;
 	rq->internal_tag = BLK_MQ_NO_TAG;
 	rq->start_time_ns = ktime_get_ns();
-	rq->part = NULL;
 	blk_crypto_rq_set_defaults(rq);
 }
 EXPORT_SYMBOL(blk_rq_init);
@@ -368,7 +369,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		rq->start_time_ns = ktime_get_ns();
 	else
 		rq->start_time_ns = 0;
-	rq->part = NULL;
 #ifdef CONFIG_BLK_RQ_ALLOC_TIME
 	rq->alloc_time_ns = alloc_time_ns;
 #endif
@@ -687,11 +687,11 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
 
 static void blk_account_io_completion(struct request *req, unsigned int bytes)
 {
-	if (req->part && blk_do_io_stat(req)) {
+	if (blk_do_io_stat(req)) {
 		const int sgrp = op_stat_group(req_op(req));
 
 		part_stat_lock();
-		part_stat_add(req->part, sectors[sgrp], bytes >> 9);
+		part_stat_add(req->bio->bi_bdev, sectors[sgrp], bytes >> 9);
 		part_stat_unlock();
 	}
 }
@@ -859,11 +859,12 @@ EXPORT_SYMBOL_GPL(blk_update_request);
 static void __blk_account_io_done(struct request *req, u64 now)
 {
 	const int sgrp = op_stat_group(req_op(req));
+	struct block_device *bdev = req->bio->bi_bdev;
 
 	part_stat_lock();
-	update_io_ticks(req->part, jiffies, true);
-	part_stat_inc(req->part, ios[sgrp]);
-	part_stat_add(req->part, nsecs[sgrp], now - req->start_time_ns);
+	update_io_ticks(bdev, jiffies, true);
+	part_stat_inc(bdev, ios[sgrp]);
+	part_stat_add(bdev, nsecs[sgrp], now - req->start_time_ns);
 	part_stat_unlock();
 }
 
@@ -874,21 +875,14 @@ static inline void blk_account_io_done(struct request *req, u64 now)
 	 * normal IO on queueing nor completion.  Accounting the
 	 * containing request is enough.
 	 */
-	if (blk_do_io_stat(req) && req->part &&
-	    !(req->rq_flags & RQF_FLUSH_SEQ))
+	if (blk_do_io_stat(req) && !(req->rq_flags & RQF_FLUSH_SEQ))
 		__blk_account_io_done(req, now);
 }
 
 static void __blk_account_io_start(struct request *rq)
 {
-	/* passthrough requests can hold bios that do not have ->bi_bdev set */
-	if (rq->bio && rq->bio->bi_bdev)
-		rq->part = rq->bio->bi_bdev;
-	else if (rq->q->disk)
-		rq->part = rq->q->disk->part0;
-
 	part_stat_lock();
-	update_io_ticks(rq->part, jiffies, false);
+	update_io_ticks(rq->bio->bi_bdev, jiffies, false);
 	part_stat_unlock();
 }
 
diff --git a/block/blk.h b/block/blk.h
index 8bd43b3ad33d5..a7a5a5435e09d 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -320,12 +320,12 @@ int blk_dev_init(void);
 /*
  * Contribute to IO statistics IFF:
  *
- *	a) it's attached to a gendisk, and
- *	b) the queue had IO stats enabled when this request was started
+ *	a) the queue had IO stats enabled when this request was started, and
+ *	b) it has an assigned block_device
  */
 static inline bool blk_do_io_stat(struct request *rq)
 {
-	return (rq->rq_flags & RQF_IO_STAT) && rq->q->disk;
+	return (rq->rq_flags & RQF_IO_STAT) && rq->bio && rq->bio->bi_bdev;
 }
 
 void update_io_ticks(struct block_device *part, unsigned long now, bool end);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index d319ffa59354a..81769c01e6e4b 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -99,7 +99,6 @@ struct request {
 		struct request *rq_next;
 	};
 
-	struct block_device *part;
 #ifdef CONFIG_BLK_RQ_ALLOC_TIME
 	/* Time that the first bio started allocating this request. */
 	u64 alloc_time_ns;
Ming Lei Jan. 25, 2022, 8:35 a.m. UTC | #5
On Tue, Jan 25, 2022 at 08:19:06AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 25, 2022 at 07:16:34AM +0100, Christoph Hellwig wrote:
> > So why not key off accouning off "rq->bio && rq->bio->bi_bdev"
> > and remove the need for the flag and the second half of the assignment
> > above?  That is much less error probe and removes code size.
> 
> Something like this, lightly tested:
> 
> ---
> From 5499d013341b492899d1fecde7680ff8ebd232e9 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Tue, 25 Jan 2022 07:29:06 +0100
> Subject: block: remove the part field from struct request
> 
> All file system I/O and most userspace passthrough bios have bi_bdev set.
> Switch I/O accounting to directly use the bio and stop copying it into a
> separate struct request field.
> 
> This changes behavior in that e.g. /dev/sgX requests are not accounted
> to the gendisk for the SCSI disk any more, which is the correct thing to
> do as they never went through that gendisk, and fixes a potential race
> when the disk driver is unbound while /dev/sgX I/O is in progress.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-merge.c      | 12 ++++++------
>  block/blk-mq.c         | 32 +++++++++++++-------------------
>  block/blk.h            |  6 +++---
>  include/linux/blk-mq.h |  1 -
>  4 files changed, 22 insertions(+), 29 deletions(-)
> 
> diff --git a/block/blk-merge.c b/block/blk-merge.c
> index 4de34a332c9fd..43e46ea2f0152 100644
> --- a/block/blk-merge.c
> +++ b/block/blk-merge.c
> @@ -739,11 +739,11 @@ void blk_rq_set_mixed_merge(struct request *rq)
>  
>  static void blk_account_io_merge_request(struct request *req)
>  {
> -	if (blk_do_io_stat(req)) {
> -		part_stat_lock();
> -		part_stat_inc(req->part, merges[op_stat_group(req_op(req))]);
> -		part_stat_unlock();
> -	}
> +	if (!blk_do_io_stat(req))
> +		return;
> +	part_stat_lock();
> +	part_stat_inc(req->bio->bi_bdev, merges[op_stat_group(req_op(req))]);
> +	part_stat_unlock();
>  }
>  
>  static enum elv_merge blk_try_req_merge(struct request *req,
> @@ -947,7 +947,7 @@ static void blk_account_io_merge_bio(struct request *req)
>  		return;
>  
>  	part_stat_lock();
> -	part_stat_inc(req->part, merges[op_stat_group(req_op(req))]);
> +	part_stat_inc(req->bio->bi_bdev, merges[op_stat_group(req_op(req))]);
>  	part_stat_unlock();
>  }
>  
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index f3bf3358a3bb2..01b3862347965 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -132,10 +132,12 @@ static bool blk_mq_check_inflight(struct request *rq, void *priv,
>  {
>  	struct mq_inflight *mi = priv;
>  
> -	if ((!mi->part->bd_partno || rq->part == mi->part) &&
> -	    blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT)
> -		mi->inflight[rq_data_dir(rq)]++;
> +	if (blk_mq_rq_state(rq) != MQ_RQ_IN_FLIGHT)
> +		return true;
> +	if (mi->part->bd_partno && rq->bio && rq->bio->bi_bdev != mi->part)
> +		return true;
>  
> +	mi->inflight[rq_data_dir(rq)]++;
>  	return true;
>  }
>  
> @@ -331,7 +333,6 @@ void blk_rq_init(struct request_queue *q, struct request *rq)
>  	rq->tag = BLK_MQ_NO_TAG;
>  	rq->internal_tag = BLK_MQ_NO_TAG;
>  	rq->start_time_ns = ktime_get_ns();
> -	rq->part = NULL;
>  	blk_crypto_rq_set_defaults(rq);
>  }
>  EXPORT_SYMBOL(blk_rq_init);
> @@ -368,7 +369,6 @@ static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
>  		rq->start_time_ns = ktime_get_ns();
>  	else
>  		rq->start_time_ns = 0;
> -	rq->part = NULL;
>  #ifdef CONFIG_BLK_RQ_ALLOC_TIME
>  	rq->alloc_time_ns = alloc_time_ns;
>  #endif
> @@ -687,11 +687,11 @@ static void req_bio_endio(struct request *rq, struct bio *bio,
>  
>  static void blk_account_io_completion(struct request *req, unsigned int bytes)
>  {
> -	if (req->part && blk_do_io_stat(req)) {
> +	if (blk_do_io_stat(req)) {
>  		const int sgrp = op_stat_group(req_op(req));
>  
>  		part_stat_lock();
> -		part_stat_add(req->part, sectors[sgrp], bytes >> 9);
> +		part_stat_add(req->bio->bi_bdev, sectors[sgrp], bytes >> 9);
>  		part_stat_unlock();
>  	}
>  }
> @@ -859,11 +859,12 @@ EXPORT_SYMBOL_GPL(blk_update_request);
>  static void __blk_account_io_done(struct request *req, u64 now)
>  {
>  	const int sgrp = op_stat_group(req_op(req));
> +	struct block_device *bdev = req->bio->bi_bdev;

Then you need to move account_io_done before calling blk_update_request().


thanks,
Ming
Ming Lei Jan. 25, 2022, 9:09 a.m. UTC | #6
On Tue, Jan 25, 2022 at 08:19:06AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 25, 2022 at 07:16:34AM +0100, Christoph Hellwig wrote:
> > So why not key off accouning off "rq->bio && rq->bio->bi_bdev"
> > and remove the need for the flag and the second half of the assignment
> > above?  That is much less error probe and removes code size.
> 
> Something like this, lightly tested:
> 

Follows another simple way by accounting all request with bio attached,
except for requests with kernel buffer.


diff --git a/block/blk-map.c b/block/blk-map.c
index 4526adde0156..1210b51c62ae 100644
--- a/block/blk-map.c
+++ b/block/blk-map.c
@@ -630,6 +630,8 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf,
 	struct bio *bio;
 	int ret;
 
+	rq->rq_flags &= ~RQF_IO_STAT;
+
 	if (len > (queue_max_hw_sectors(q) << 9))
 		return -EINVAL;
 	if (!len || !kbuf)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 72ae9955cf27..eac589d2c340 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -903,7 +903,7 @@ static void __blk_account_io_start(struct request *rq)
 	/* passthrough requests can hold bios that do not have ->bi_bdev set */
 	if (rq->bio && rq->bio->bi_bdev)
 		rq->part = rq->bio->bi_bdev;
-	else if (rq->q->disk)
+	else if (rq->q->disk && rq->bio)
 		rq->part = rq->q->disk->part0;
 
 	part_stat_lock();


Thanks,
Ming
Christoph Hellwig Jan. 26, 2022, 5:50 a.m. UTC | #7
On Tue, Jan 25, 2022 at 05:09:42PM +0800, Ming Lei wrote:
> Follows another simple way by accounting all request with bio attached,
> except for requests with kernel buffer.

> -	else if (rq->q->disk)
> +	else if (rq->q->disk && rq->bio)
>  		rq->part = rq->q->disk->part0;

Most passthrough requests will have a bio, so you'll still use e.g.
the sd gendisk for sg request here.

I think the right way would be to just remove this branch entirely.
This means we only account bios with a block_device, which implies
they have a gendisk.
Ming Lei Jan. 26, 2022, 7:21 a.m. UTC | #8
On Wed, Jan 26, 2022 at 06:50:03AM +0100, Christoph Hellwig wrote:
> On Tue, Jan 25, 2022 at 05:09:42PM +0800, Ming Lei wrote:
> > Follows another simple way by accounting all request with bio attached,
> > except for requests with kernel buffer.
> 
> > -	else if (rq->q->disk)
> > +	else if (rq->q->disk && rq->bio)
> >  		rq->part = rq->q->disk->part0;
> 
> Most passthrough requests will have a bio, so you'll still use e.g.
> the sd gendisk for sg request here.
> 
> I think the right way would be to just remove this branch entirely.
> This means we only account bios with a block_device, which implies
> they have a gendisk.

That will not account userspace IO, and people may complain.

We can just account passthrough request from userspace by the patch
in my last email.


Thanks, 
Ming
Christoph Hellwig Jan. 26, 2022, 8:10 a.m. UTC | #9
On Wed, Jan 26, 2022 at 03:21:04PM +0800, Ming Lei wrote:
> > I think the right way would be to just remove this branch entirely.
> > This means we only account bios with a block_device, which implies
> > they have a gendisk.
> 
> That will not account userspace IO, and people may complain.
> 
> We can just account passthrough request from userspace by the patch
> in my last email.

Let's take a step back:  what I/O do we want to account, and how
do we want to archive that?

Assuming accounting is enabled:

 - current mainline accounts all I/O one queues that have a gendisk
 - your original patch accounts file system I/O and some passthrough I/O
   that has a special flag set

Dropping the conditional to grab a bdev from the queue leaves us with
the following rule:

 - all I/O that has a bio and bdev is accounted.  This requires
   passthrough I/O to explicitly set the bdev in case we haven't
   done so, and it requires them to have a bio at all

I guess you are worried about the latter conditionin that we stop
accounting for no data transfer passthrough commands?
Ming Lei Jan. 26, 2022, 8:33 a.m. UTC | #10
On Wed, Jan 26, 2022 at 09:10:52AM +0100, Christoph Hellwig wrote:
> On Wed, Jan 26, 2022 at 03:21:04PM +0800, Ming Lei wrote:
> > > I think the right way would be to just remove this branch entirely.
> > > This means we only account bios with a block_device, which implies
> > > they have a gendisk.
> > 
> > That will not account userspace IO, and people may complain.
> > 
> > We can just account passthrough request from userspace by the patch
> > in my last email.
> 
> Let's take a step back:  what I/O do we want to account, and how
> do we want to archive that?

FS IO, and passthrough IO from userspace at least, since
that is what user cares.

Also the bdev/disk is guaranteed to be live when this userspace
passthrough IO is inflight.

> 
> Assuming accounting is enabled:
> 
>  - current mainline accounts all I/O one queues that have a gendisk
>  - your original patch accounts file system I/O and some passthrough I/O
>    that has a special flag set

The special flag is just for recognizing userspace passthrough IO.

> 
> Dropping the conditional to grab a bdev from the queue leaves us with
> the following rule:
> 
>  - all I/O that has a bio and bdev is accounted.  This requires
>    passthrough I/O to explicitly set the bdev in case we haven't
>    done so, and it requires them to have a bio at all

That is basically to make bio->bi_bdev points to part0, and the
problem is that you have to make sure that part0 won't be released
when this request is inflight.

> 
> I guess you are worried about the latter conditionin that we stop
> accounting for no data transfer passthrough commands?

No, I meant that bio->bi_bdev isn't setup yet for passthrough request,
and not sure that can be done easily.


Thanks,
Ming
Christoph Hellwig Jan. 26, 2022, 8:49 a.m. UTC | #11
On Wed, Jan 26, 2022 at 04:33:54PM +0800, Ming Lei wrote:
> > I guess you are worried about the latter conditionin that we stop
> > accounting for no data transfer passthrough commands?
> 
> No, I meant that bio->bi_bdev isn't setup yet for passthrough request,
> and not sure that can be done easily.

Take a look at e.g. nvme_submit_user_cmd and iblock_get_bio.
Ming Lei Jan. 26, 2022, 9:59 a.m. UTC | #12
On Wed, Jan 26, 2022 at 09:49:50AM +0100, Christoph Hellwig wrote:
> On Wed, Jan 26, 2022 at 04:33:54PM +0800, Ming Lei wrote:
> > > I guess you are worried about the latter conditionin that we stop
> > > accounting for no data transfer passthrough commands?
> > 
> > No, I meant that bio->bi_bdev isn't setup yet for passthrough request,
> > and not sure that can be done easily.
> 
> Take a look at e.g. nvme_submit_user_cmd and iblock_get_bio.

nvme just sets part0 to rq->bio, which is fine since nvme doesn't
support partial completion.

The simplest way could be to assign bio->bi_bdev with q->disk->part0 in both
bio_copy_user_iov() and bio_map_user_iov(), which should cover most of cases.
Given user io is always on device instead of partition even though the
command is sent via partition bdev.


Thanks,
Ming
Christoph Hellwig Jan. 26, 2022, 4:37 p.m. UTC | #13
On Wed, Jan 26, 2022 at 05:59:08PM +0800, Ming Lei wrote:
> nvme just sets part0 to rq->bio, which is fine since nvme doesn't
> support partial completion.
> 
> The simplest way could be to assign bio->bi_bdev with q->disk->part0 in both
> bio_copy_user_iov() and bio_map_user_iov(), which should cover most of cases.
> Given user io is always on device instead of partition even though the
> command is sent via partition bdev.

This would be easiest, but it would also assign them when called from
the SG driver.  And that means these I/Os could be in flight when
detaching a SCSI ULP.  At least without the freeze in del_gendisk
(or the local one in case of the sd driver).
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index a6d4780580fc..0d25cc5778c9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -336,6 +336,17 @@  void blk_rq_init(struct request_queue *q, struct request *rq)
 }
 EXPORT_SYMBOL(blk_rq_init);
 
+static inline bool blk_mq_io_may_account(struct blk_mq_alloc_data *data)
+{
+	if (!blk_op_is_passthrough(data->cmd_flags))
+		return true;
+
+	if (data->flags & BLK_MQ_REQ_USER_IO)
+		return true;
+
+	return false;
+}
+
 static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 		struct blk_mq_tags *tags, unsigned int tag, u64 alloc_time_ns)
 {
@@ -351,7 +362,7 @@  static struct request *blk_mq_rq_ctx_init(struct blk_mq_alloc_data *data,
 
 	if (data->flags & BLK_MQ_REQ_PM)
 		data->rq_flags |= RQF_PM;
-	if (blk_queue_io_stat(q))
+	if (blk_queue_io_stat(q) && blk_mq_io_may_account(data))
 		data->rq_flags |= RQF_IO_STAT;
 	rq->rq_flags = data->rq_flags;
 
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 22314962842d..f94afc38a6e3 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -66,7 +66,7 @@  static int nvme_submit_user_cmd(struct request_queue *q,
 	void *meta = NULL;
 	int ret;
 
-	req = nvme_alloc_request(q, cmd, 0);
+	req = nvme_alloc_request(q, cmd, BLK_MQ_REQ_USER_IO);
 	if (IS_ERR(req))
 		return PTR_ERR(req);
 
diff --git a/drivers/scsi/scsi_ioctl.c b/drivers/scsi/scsi_ioctl.c
index e13fd380deb6..b262fe06dacc 100644
--- a/drivers/scsi/scsi_ioctl.c
+++ b/drivers/scsi/scsi_ioctl.c
@@ -440,7 +440,8 @@  static int sg_io(struct scsi_device *sdev, struct sg_io_hdr *hdr, fmode_t mode)
 
 	ret = -ENOMEM;
 	rq = scsi_alloc_request(sdev->request_queue, writing ?
-			     REQ_OP_DRV_OUT : REQ_OP_DRV_IN, 0);
+			     REQ_OP_DRV_OUT : REQ_OP_DRV_IN,
+			     BLK_MQ_REQ_USER_IO);
 	if (IS_ERR(rq))
 		return PTR_ERR(rq);
 	req = scsi_req(rq);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index d319ffa59354..d2ad2ed11723 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -709,6 +709,8 @@  enum {
 	BLK_MQ_REQ_RESERVED	= (__force blk_mq_req_flags_t)(1 << 1),
 	/* set RQF_PM */
 	BLK_MQ_REQ_PM		= (__force blk_mq_req_flags_t)(1 << 2),
+	/* user IO request */
+	BLK_MQ_REQ_USER_IO	= (__force blk_mq_req_flags_t)(1 << 3),
 };
 
 struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op,