[1/2] block: add bio based rw helper for data buffer
diff mbox series

Message ID 20200406232440.4027-2-chaitanya.kulkarni@wdc.com
State New
Headers show
Series
  • block: add bio based read-write helper
Related show

Commit Message

Chaitanya Kulkarni April 6, 2020, 11:24 p.m. UTC
One of the common application for the file systems and drivers to map
the buffer to the bio and issue I/Os on the block device.

This is a prep patch which adds two helper functions for block layer
which allows file-systems and the drivers to map data buffer on to the
bios and issue bios synchronously using blkdev_issue_rw() or
asynchronously using __blkdev_issue_rw().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
---
 block/blk-lib.c        | 105 +++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |   7 +++
 2 files changed, 112 insertions(+)

Comments

Damien Le Moal April 7, 2020, 12:28 a.m. UTC | #1
On 2020/04/07 8:24, Chaitanya Kulkarni wrote:
> One of the common application for the file systems and drivers to map
> the buffer to the bio and issue I/Os on the block device.

Drivers generally do not do this. At least not lower ones (scsi, nvme, etc). I
guess you are referring to DM drivers. If yes, better be clear about this.
Also, "the buffer" is a little unclear. Better qualify that.

Another thing: "to map the buffer to the bio" is a little strange. The reverse
seems more natural: a bio maps a buffer.

> 
> This is a prep patch which adds two helper functions for block layer
> which allows file-systems and the drivers to map data buffer on to the
> bios and issue bios synchronously using blkdev_issue_rw() or
> asynchronously using __blkdev_issue_rw().

The function names are basically the same but do completely different things.
Better rename that. May be blkdev_issue_rw_sync() and blkdev_issue_rw_async() ?

> 
> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
> ---
>  block/blk-lib.c        | 105 +++++++++++++++++++++++++++++++++++++++++
>  include/linux/blkdev.h |   7 +++
>  2 files changed, 112 insertions(+)
> 
> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 5f2c429d4378..451c367fc0d6 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -405,3 +405,108 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>  	return ret;
>  }
>  EXPORT_SYMBOL(blkdev_issue_zeroout);
> +
> +/**
> + * __blkdev_ssue_rw - issue read/write bios from buffer asynchronously

s/__blkdev_ssue_rw/__blkdev_issue_rw

> + * @bdev:	blockdev to read/write
> + * @buf:	data buffer
> + * @sector:	start sector
> + * @nr_sects:	number of sectors
> + * @op:	REQ_OP_READ or REQ_OP_WRITE
> + * @opf:	flags
> + * @gfp_mask:	memory allocation flags (for bio_alloc)
> + * @biop:	pointer to anchor bio
> + *
> + * Description:
> + *  Generic helper function to map data buffer into bios for read and write ops.
> + *  Returns pointer to the anchored last bio for caller to submit asynchronously
> + *  or synchronously.
> + */
> +int __blkdev_issue_rw(struct block_device *bdev, char *buf, sector_t sector,
> +		      sector_t nr_sects, unsigned op, unsigned opf,
> +		      gfp_t gfp_mask, struct bio **biop)
> +{
> +	bool vm = is_vmalloc_addr(buf) ? true : false;

You do not need the clunky "? true : false" here. is_vmalloc_addr() returns a
bool already.

> +	struct bio *bio = *biop;
> +	unsigned int nr_pages;
> +	struct page *page;
> +	unsigned int off;
> +	unsigned int len;
> +	int bi_size;
> +
> +	if (!bdev_get_queue(bdev))
> +		return -ENXIO;
> +
> +	if (bdev_read_only(bdev))
> +		return -EPERM;

One can read a read-only device. So the check here is not correct. You need a
"&& op == REQ_OP_WRITE" in the condition.

> +
> +	if (!(op == REQ_OP_READ || op == REQ_OP_WRITE))
> +		return -EINVAL;

This probably should be checked before read-only.

> +
> +	while (nr_sects != 0) {
> +		nr_pages = __blkdev_sectors_to_bio_pages(nr_sects);
> +
> +		bio = blk_next_bio(bio, nr_pages, gfp_mask);
> +		bio->bi_iter.bi_sector = sector;
> +		bio_set_dev(bio, bdev);
> +		bio_set_op_attrs(bio, op, 0);
> +
> +		while (nr_sects != 0) {
> +			off = offset_in_page(buf);
> +			page = vm ? vmalloc_to_page(buf) : virt_to_page(buf);
> +			len = min((sector_t) PAGE_SIZE, nr_sects << 9);
> +
> +			bi_size = bio_add_page(bio, page, len, off);

The variable naming is super confusing. bio_add_page() returns 0 if nothing is
added and len if the page was added. So bi_size as a var name is not the best of
name in my opinion.

> +
> +			nr_sects -= bi_size >> 9;
> +			sector += bi_size >> 9;
> +			buf += bi_size;
> +
> +			if (bi_size < len)
> +				break;

You will get either 0 or len from bio_add_page. So the check here is not ideal.
I think it is better to move it up under bio_add_page() and make it:

			len = bioa_add_page(bio, page, len, off);
			if (!len)
				break;

> +		}
> +		cond_resched();
> +	}
> +	*biop = bio;
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(__blkdev_issue_rw);
> +
> +/**
> + * blkdev_execute_rw_sync - issue read/write bios from buffer synchronously
> + * @bdev:	blockdev to read/write
> + * @buf:	data buffer
> + * @sector:	start sector
> + * @count:	number of bytes
> + * @op:	REQ_OP_READ or REQ_OP_WRITE
> + * @opf:	flags
> + * @gfp_mask:	memory allocation flags (for bio_alloc)
> + *
> + * Description:
> + *  Generic helper function to map data buffer buffer into bios for read and
> + *  write requests.
> + */
> +int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector,
> +		     unsigned count, unsigned op, unsigned opf, gfp_t mask)

function name differs between description and declaration.
blkdev_execute_rw_sync() is better than blkdev_issue_rw() in my opinion.

> +{
> +	unsigned int is_vmalloc = is_vmalloc_addr(buf);
> +	sector_t nr_sects = count >> 9;
> +	struct bio *bio = NULL;
> +	int error;
> +
> +	if (is_vmalloc && op == REQ_OP_WRITE)
> +		flush_kernel_vmap_range(buf, count);
> +
> +	opf |= REQ_SYNC;

You can add this directly in the call below.

> +	error = __blkdev_issue_rw(b, buf, sector, nr_sects, op, opf, mask, &bio);
> +	if (!error && bio) {
> +		error = submit_bio_wait(bio);
> +		bio_put(bio);
> +	}
> +
> +	if (is_vmalloc && op == REQ_OP_READ)
> +		invalidate_kernel_vmap_range(buf, count);
> +
> +	return error;
> +}
> +EXPORT_SYMBOL_GPL(blkdev_issue_rw);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 32868fbedc9e..cb315b301ad9 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1248,6 +1248,13 @@ static inline int sb_issue_zeroout(struct super_block *sb, sector_t block,
>  				    gfp_mask, 0);
>  }
>  
> +extern int __blkdev_issue_rw(struct block_device *bdev, char *buf,
> +			     sector_t sector, sector_t nr_sects, unsigned op,
> +			     unsigned opf, gfp_t gfp_mask, struct bio **biop);
> +extern int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector,
> +			   unsigned count, unsigned op, unsigned opf,
> +			   gfp_t mask);

No need for the extern I think.

> +
>  extern int blk_verify_command(unsigned char *cmd, fmode_t mode);
>  
>  enum blk_default_limits {
>
Danil Kipnis April 7, 2020, 10:09 a.m. UTC | #2
In __blkdev_issue_rw():
> +               bio = blk_next_bio(bio, nr_pages, gfp_mask);

Doesn't this already submits the bio even before the pages are added?

> +       error = __blkdev_issue_rw(b, buf, sector, nr_sects, op, opf, mask, &bio);
> +       if (!error && bio) {
> +               error = submit_bio_wait(bio);

And then the bio is submitted again in blkdev_issue_rw()...

Or do I understand it wrong?
Chaitanya Kulkarni April 7, 2020, 8:01 p.m. UTC | #3
On 04/06/2020 05:28 PM, Damien Le Moal wrote:
> On 2020/04/07 8:24, Chaitanya Kulkarni wrote:
>> One of the common application for the file systems and drivers to map
>> the buffer to the bio and issue I/Os on the block device.
>
> Drivers generally do not do this. At least not lower ones (scsi, nvme, etc). I
> guess you are referring to DM drivers. If yes, better be clear about this.
> Also, "the buffer" is a little unclear. Better qualify that.
>
Agree most drivers deals with sg_list. Right now there is only one such
an example I know which is rnbd as driver as it is mentioned in the
cover-letter references.
> Another thing: "to map the buffer to the bio" is a little strange. The reverse
> seems more natural: a bio maps a buffer.
>
Make sense, will do.
>>
>> This is a prep patch which adds two helper functions for block layer
>> which allows file-systems and the drivers to map data buffer on to the
>> bios and issue bios synchronously using blkdev_issue_rw() or
>> asynchronously using __blkdev_issue_rw().
>
> The function names are basically the same but do completely different things.
> Better rename that. May be blkdev_issue_rw_sync() and blkdev_issue_rw_async() ?
>
 >

This is exactly I named functions to start with (see the function
documentation which I missed to update), but the pattern in blk-lib.c
uses no such sync and async postfix, which is used is sync and async
context all over the kernel :-

# grep EXPORT block/blk-lib.c
EXPORT_SYMBOL(__blkdev_issue_discard);
EXPORT_SYMBOL(blkdev_issue_discard);, since as it is nr_sects and sector 
calculation doesn't matter after break.
EXPORT_SYMBOL(blkdev_issue_write_same);
EXPORT_SYMBOL(__blkdev_issue_zeroout);
EXPORT_SYMBOL(blkdev_issue_zeroout);
EXPORT_SYMBOL_GPL(__blkdev_issue_rw);
EXPORT_SYMBOL_GPL(blkdev_issue_rw);

In absence of documentation for naming how about we just follow
existing naming convention ?

Which matches the pattern for new API.

>>
>> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
>> ---
>>   block/blk-lib.c        | 105 +++++++++++++++++++++++++++++++++++++++++
>>   include/linux/blkdev.h |   7 +++
>>   2 files changed, 112 insertions(+)
>>
>> diff --git a/block/blk-lib.c b/block/blk-lib.c
>> index 5f2c429d4378..451c367fc0d6 100644
>> --- a/block/blk-lib.c
>> +++ b/block/blk-lib.c
>> @@ -405,3 +405,108 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
>>   	return ret;
>>   }
>>   EXPORT_SYMBOL(blkdev_issue_zeroout);
>> +
>> +/**
>> + * __blkdev_ssue_rw - issue read/write bios from buffer asynchronously
>
> s/__blkdev_ssue_rw/__blkdev_issue_rw
Thanks, will fix.
>
>> + * @bdev:	blockdev to read/write
>> + * @buf:	data buffer
>> + * @sector:	start sector
>> + * @nr_sects:	number of sectors
>> + * @op:	REQ_OP_READ or REQ_OP_WRITE
>> + * @opf:	flags
>> + * @gfp_mask:	memory allocation flags (for bio_alloc)
>> + * @biop:	pointer to anchor bio
>> + *
>> + * Description:
>> + *  Generic helper function to map data buffer into bios for read and write ops.
>> + *  Returns pointer to the anchored last bio for caller to submit asynchronously
>> + *  or synchronously.
>> + */
>> +int __blkdev_issue_rw(struct block_device *bdev, char *buf, sector_t sector,
>> +		      sector_t nr_sects, unsigned op, unsigned opf,
>> +		      gfp_t gfp_mask, struct bio **biop)
>> +{
>> +	bool vm = is_vmalloc_addr(buf) ? true : false;
>
> You do not need the clunky "? true : false" here. is_vmalloc_addr() returns a
> bool already.
>
I will remove it.
>> +	struct bio *bio = *biop;
>> +	unsigned int nr_pages;
>> +	struct page *page;
>> +	unsigned int off;
>> +	unsigned int len;
>> +	int bi_size;
>> +
>> +	if (!bdev_get_queue(bdev))
>> +		return -ENXIO;
>> +
>> +	if (bdev_read_only(bdev))
>> +		return -EPERM;
>
> One can read a read-only device. So the check here is not correct. You need a
> "&& op == REQ_OP_WRITE" in the condition.
>
True, this shouldn't be here, also I'm thinking of lifting these checks 
to caller, we can add it later if needed.
>> +
>> +	if (!(op == REQ_OP_READ || op == REQ_OP_WRITE))
>> +		return -EINVAL;
>
> This probably should be checked before read-only.
>
Yes.
>> +
>> +	while (nr_sects != 0) {
>> +		nr_pages = __blkdev_sectors_to_bio_pages(nr_sects);
>> +
>> +		bio = blk_next_bio(bio, nr_pages, gfp_mask);
>> +		bio->bi_iter.bi_sector = sector;
>> +		bio_set_dev(bio, bdev);
>> +		bio_set_op_attrs(bio, op, 0);
>> +
>> +		while (nr_sects != 0) {
>> +			off = offset_in_page(buf);
>> +			page = vm ? vmalloc_to_page(buf) : virt_to_page(buf);
>> +			len = min((sector_t) PAGE_SIZE, nr_sects << 9);
>> +
>> +			bi_size = bio_add_page(bio, page, len, off);
>
> The variable naming is super confusing. bio_add_page() returns 0 if nothing is
> added and len if the page was added. So bi_size as a var name is not the best of
> name in my opinion.
>
Here bio, page, len, off variables are passed to the bio_add_page() 
function, which has the same names in the function parameter list.
(I'll fix the off to offset)

Regarding the bi_size, given that bio_add_page() returns 
bio->bi_iter.bi_size, isn't bi_size also maps to the what function is 
returning and explains what we are dealing with?

Also, bi_size is used in the blk-lib.c: __blkdev_issue_zero_pages(), if 
that is confusing then we need to change that, what is your preference?

I'll send a cleanup patch __blkdev_issue_zero_pages() also.

>> +
>> +			nr_sects -= bi_size >> 9;
>> +			sector += bi_size >> 9;
>> +			buf += bi_size;
>> +
>> +			if (bi_size < len)
>> +				break;
>
> You will get either 0 or len from bio_add_page. So the check here is not ideal.
> I think it is better to move it up under bio_add_page() and make it:
>
> 			len = bioa_add_page(bio, page, len, off);
> 			if (!len)
> 				break;
I'd still like to keep bi_size, I think I had received a comment about 
using same variable for function call and updating as a return value.

Regarding the check, will move it.
>
>> +		}
>> +		cond_resched();
>> +	}
>> +	*biop = bio;
>> +	return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(__blkdev_issue_rw);
>> +
>> +/**
>> + * blkdev_execute_rw_sync - issue read/write bios from buffer synchronously
>> + * @bdev:	blockdev to read/write
>> + * @buf:	data buffer
>> + * @sector:	start sector
>> + * @count:	number of bytes
>> + * @op:	REQ_OP_READ or REQ_OP_WRITE
>> + * @opf:	flags
>> + * @gfp_mask:	memory allocation flags (for bio_alloc)
>> + *
>> + * Description:
>> + *  Generic helper function to map data buffer buffer into bios for read and
>> + *  write requests.
>> + */
>> +int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector,
>> +		     unsigned count, unsigned op, unsigned opf, gfp_t mask)
>
> function name differs between description and declaration.
> blkdev_execute_rw_sync() is better than blkdev_issue_rw() in my opinion.
>
That was remained from initial version, will fix it, thanks for
pointing it out.
>> +{
>> +	unsigned int is_vmalloc = is_vmalloc_addr(buf);
>> +	sector_t nr_sects = count >> 9;
>> +	struct bio *bio = NULL;
>> +	int error;
>> +
>> +	if (is_vmalloc && op == REQ_OP_WRITE)
>> +		flush_kernel_vmap_range(buf, count);
>> +
>> +	opf |= REQ_SYNC;
>
> You can add this directly in the call below.
>
Breaks calling of __blkdev_issue_rw() to new line. This also adds a new
line which is unavoidable but keeps the function call smooth in one
line.
>> +	error = __blkdev_issue_rw(b, buf, sector, nr_sects, op, opf, mask, &bio);
>> +	if (!error && bio) {
>> +		error = submit_bio_wait(bio);
>> +		bio_put(bio);
>> +	}
>> +
>> +	if (is_vmalloc && op == REQ_OP_READ)
>> +		invalidate_kernel_vmap_range(buf, count);
>> +
>> +	return error;
>> +}
>> +EXPORT_SYMBOL_GPL(blkdev_issue_rw);
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 32868fbedc9e..cb315b301ad9 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -1248,6 +1248,13 @@ static inline int sb_issue_zeroout(struct super_block *sb, sector_t block,
>>   				    gfp_mask, 0);
>>   }
>>
>> +extern int __blkdev_issue_rw(struct block_device *bdev, char *buf,
>> +			     sector_t sector, sector_t nr_sects, unsigned op,
>> +			     unsigned opf, gfp_t gfp_mask, struct bio **biop);
>> +extern int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector,
>> +			   unsigned count, unsigned op, unsigned opf,
>> +			   gfp_t mask);
>
> No need for the extern I think.
>
Again, following the exiting pattern in the file for the header:-

  # grep blkdev_issue include/linux/blkdev.h
extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);

extern int blkdev_issue_write_same(struct block_device *bdev, sector_t
sector,

extern int blkdev_issue_discard(struct block_device *bdev, sector_t
sector,

extern int __blkdev_issue_discard(struct block_device *bdev, sector_t
sector,

extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t
sector,

extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t
sector,

extern int __blkdev_issue_rw(struct block_device *bdev, char *buf,

extern int blkdev_issue_rw(struct block_device *b, char *buf, sector_t
sector,
>> +
>>   extern int blk_verify_command(unsigned char *cmd, fmode_t mode);
>>
>>   enum blk_default_limits {
>>
>
>

Patch
diff mbox series

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 5f2c429d4378..451c367fc0d6 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -405,3 +405,108 @@  int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	return ret;
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * __blkdev_ssue_rw - issue read/write bios from buffer asynchronously
+ * @bdev:	blockdev to read/write
+ * @buf:	data buffer
+ * @sector:	start sector
+ * @nr_sects:	number of sectors
+ * @op:	REQ_OP_READ or REQ_OP_WRITE
+ * @opf:	flags
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ * @biop:	pointer to anchor bio
+ *
+ * Description:
+ *  Generic helper function to map data buffer into bios for read and write ops.
+ *  Returns pointer to the anchored last bio for caller to submit asynchronously
+ *  or synchronously.
+ */
+int __blkdev_issue_rw(struct block_device *bdev, char *buf, sector_t sector,
+		      sector_t nr_sects, unsigned op, unsigned opf,
+		      gfp_t gfp_mask, struct bio **biop)
+{
+	bool vm = is_vmalloc_addr(buf) ? true : false;
+	struct bio *bio = *biop;
+	unsigned int nr_pages;
+	struct page *page;
+	unsigned int off;
+	unsigned int len;
+	int bi_size;
+
+	if (!bdev_get_queue(bdev))
+		return -ENXIO;
+
+	if (bdev_read_only(bdev))
+		return -EPERM;
+
+	if (!(op == REQ_OP_READ || op == REQ_OP_WRITE))
+		return -EINVAL;
+
+	while (nr_sects != 0) {
+		nr_pages = __blkdev_sectors_to_bio_pages(nr_sects);
+
+		bio = blk_next_bio(bio, nr_pages, gfp_mask);
+		bio->bi_iter.bi_sector = sector;
+		bio_set_dev(bio, bdev);
+		bio_set_op_attrs(bio, op, 0);
+
+		while (nr_sects != 0) {
+			off = offset_in_page(buf);
+			page = vm ? vmalloc_to_page(buf) : virt_to_page(buf);
+			len = min((sector_t) PAGE_SIZE, nr_sects << 9);
+
+			bi_size = bio_add_page(bio, page, len, off);
+
+			nr_sects -= bi_size >> 9;
+			sector += bi_size >> 9;
+			buf += bi_size;
+
+			if (bi_size < len)
+				break;
+		}
+		cond_resched();
+	}
+	*biop = bio;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(__blkdev_issue_rw);
+
+/**
+ * blkdev_execute_rw_sync - issue read/write bios from buffer synchronously
+ * @bdev:	blockdev to read/write
+ * @buf:	data buffer
+ * @sector:	start sector
+ * @count:	number of bytes
+ * @op:	REQ_OP_READ or REQ_OP_WRITE
+ * @opf:	flags
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *  Generic helper function to map data buffer buffer into bios for read and
+ *  write requests.
+ */
+int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector,
+		     unsigned count, unsigned op, unsigned opf, gfp_t mask)
+{
+	unsigned int is_vmalloc = is_vmalloc_addr(buf);
+	sector_t nr_sects = count >> 9;
+	struct bio *bio = NULL;
+	int error;
+
+	if (is_vmalloc && op == REQ_OP_WRITE)
+		flush_kernel_vmap_range(buf, count);
+
+	opf |= REQ_SYNC;
+	error = __blkdev_issue_rw(b, buf, sector, nr_sects, op, opf, mask, &bio);
+	if (!error && bio) {
+		error = submit_bio_wait(bio);
+		bio_put(bio);
+	}
+
+	if (is_vmalloc && op == REQ_OP_READ)
+		invalidate_kernel_vmap_range(buf, count);
+
+	return error;
+}
+EXPORT_SYMBOL_GPL(blkdev_issue_rw);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 32868fbedc9e..cb315b301ad9 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1248,6 +1248,13 @@  static inline int sb_issue_zeroout(struct super_block *sb, sector_t block,
 				    gfp_mask, 0);
 }
 
+extern int __blkdev_issue_rw(struct block_device *bdev, char *buf,
+			     sector_t sector, sector_t nr_sects, unsigned op,
+			     unsigned opf, gfp_t gfp_mask, struct bio **biop);
+extern int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector,
+			   unsigned count, unsigned op, unsigned opf,
+			   gfp_t mask);
+
 extern int blk_verify_command(unsigned char *cmd, fmode_t mode);
 
 enum blk_default_limits {