diff mbox series

[V2,2/2] block: try to make aligned bio in case of big chunk IO

Message ID 20231109082827.2276696-3-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: try to make aligned bio in case of big chunk IO | expand

Commit Message

Ming Lei Nov. 9, 2023, 8:28 a.m. UTC
In case of big chunk sequential IO, bio size is often not aligned with
queue's max IO size because of multipage bvec, and unaligned & small bio
can be caused by bio split, then sequential IO perf drops, so try to align
bio with max IO size for avoiding this issue.

Provide 'max_size' hint to iov_iter_extract_pages() when this bio is close
to be full, and try to keep bio aligned with max IO size, so that we can
minimize bio & iov_iter revert. In my 1GB IO test over VM with 2G ram,
when memory becomes highly fragmented, revert ratio(revert bytes/buf size)
can be kept as small as 0.5% with this algorithm.

Ed Tsai reported that this change improves 64MB read/write by > 15%~25% in
Antutu V10 Storage Test.

Reported-by: Ed Tsai <ed.tsai@mediatek.com>
Closes: https://lore.kernel.org/linux-block/20231025092255.27930-1-ed.tsai@mediatek.com/
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/bio.c            | 116 +++++++++++++++++++++++++++++++++++++++--
 include/linux/blkdev.h |   5 ++
 2 files changed, 118 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig Nov. 9, 2023, 2:39 p.m. UTC | #1
> +/*
> + * Figure out max_size hint of iov_iter_extract_pages() for minimizing
> + * bio & iov iter revert so that bio can be aligned with max io size.
> + */
> +static unsigned int bio_get_buffer_size_hint(const struct bio *bio,
> +		unsigned int left)
> +{
> +	unsigned int nr_bvecs = bio->bi_max_vecs - bio->bi_vcnt;
> +	unsigned int size, predicted_space, max_bytes;
> +	unsigned int space = nr_bvecs << PAGE_SHIFT;
> +	unsigned int align_deviation;
> +
> +	/* If we have enough space really, just try to get all pages */
> +	if (!bio->bi_bdev || nr_bvecs >= (bio->bi_max_vecs / 4) ||
> +			!bio->bi_vcnt || left <= space)

We need to either decided if the bdev is mandatory and we can rely
on it, or stop adding conditionals based on it.  NAK for anything
adding more if bio->bi_bdev in this path.

But more important I'm not sure why we need all this extra code to
start with.  If you just did an extra for more space than we want
to submit, just release the pages that we grabbed but don't want to
add instead of adding them to be bio_vec.

Remember this is an absolute fast path for high IOPS I/O, we should
avoid adding cruft to it.
kernel test robot Nov. 9, 2023, 7:30 p.m. UTC | #2
Hi Ming,

kernel test robot noticed the following build warnings:

[auto build test WARNING on axboe-block/for-next]
[also build test WARNING on linus/master v6.6 next-20231109]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ming-Lei/block-don-t-call-into-iov_iter_revert-if-nothing-is-left/20231109-190100
base:   https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next
patch link:    https://lore.kernel.org/r/20231109082827.2276696-3-ming.lei%40redhat.com
patch subject: [PATCH V2 2/2] block: try to make aligned bio in case of big chunk IO
config: arm-davinci_all_defconfig (https://download.01.org/0day-ci/archive/20231110/202311100354.HYfqOQ7o-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231110/202311100354.HYfqOQ7o-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202311100354.HYfqOQ7o-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> block/bio.c:1228:21: warning: variable 'size' is uninitialized when used here [-Wuninitialized]
    1228 |                 return UINT_MAX - size;
         |                                   ^~~~
   block/bio.c:1221:19: note: initialize the variable 'size' to silence this warning
    1221 |         unsigned int size, predicted_space, max_bytes;
         |                          ^
         |                           = 0
   1 warning generated.

Kconfig warnings: (for reference only)
   WARNING: unmet direct dependencies detected for PINCTRL_SINGLE
   Depends on [n]: PINCTRL [=n] && OF [=y] && HAS_IOMEM [=y]
   Selected by [y]:
   - ARCH_DAVINCI [=y] && ARCH_MULTI_V5 [=y] && CPU_LITTLE_ENDIAN [=y]


vim +/size +1228 block/bio.c

  1212	
  1213	/*
  1214	 * Figure out max_size hint of iov_iter_extract_pages() for minimizing
  1215	 * bio & iov iter revert so that bio can be aligned with max io size.
  1216	 */
  1217	static unsigned int bio_get_buffer_size_hint(const struct bio *bio,
  1218			unsigned int left)
  1219	{
  1220		unsigned int nr_bvecs = bio->bi_max_vecs - bio->bi_vcnt;
  1221		unsigned int size, predicted_space, max_bytes;
  1222		unsigned int space = nr_bvecs << PAGE_SHIFT;
  1223		unsigned int align_deviation;
  1224	
  1225		/* If we have enough space really, just try to get all pages */
  1226		if (!bio->bi_bdev || nr_bvecs >= (bio->bi_max_vecs / 4) ||
  1227				!bio->bi_vcnt || left <= space)
> 1228			return UINT_MAX - size;
  1229	
  1230		max_bytes = bdev_max_io_bytes(bio->bi_bdev);
  1231		size = bio->bi_iter.bi_size;
  1232	
  1233		/*
  1234		 * One bvec can hold physically continuous page frames with
  1235		 * multipage bvec and bytes in these pages can be pretty big, so
  1236		 * predict the available space by averaging bytes on all bvecs
  1237		 */
  1238		predicted_space = size * nr_bvecs / bio->bi_vcnt;
  1239		/*
  1240		 * If predicted space is bigger than max io bytes and at least two
  1241		 * vectors left, ask for all pages
  1242		 */
  1243		if (predicted_space > max_bytes && nr_bvecs > 2)
  1244			return UINT_MAX - size;
  1245	
  1246		/*
  1247		 * This bio is close to be full, and stop to add pages if it is
  1248		 * basically aligned, otherwise just get & add pages if the bio
  1249		 * can be kept as aligned, so that we can minimize bio & iov iter
  1250		 * revert
  1251		 */
  1252		align_deviation = max_t(unsigned int, 16U * 1024, max_bytes / 16);
  1253		if ((size & (max_bytes - 1)) > align_deviation) {
  1254			unsigned aligned_bytes = max_bytes - (size & (max_bytes - 1));
  1255	
  1256			/* try to keep bio aligned if we have enough data and space */
  1257			if (aligned_bytes <= left && aligned_bytes <= predicted_space)
  1258				return aligned_bytes;
  1259		}
  1260	
  1261		return 0;
  1262	}
  1263
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index 09a5e71a0372..e360ac052764 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1210,6 +1210,57 @@  static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
 	return 0;
 }
 
+/*
+ * Figure out max_size hint of iov_iter_extract_pages() for minimizing
+ * bio & iov iter revert so that bio can be aligned with max io size.
+ */
+static unsigned int bio_get_buffer_size_hint(const struct bio *bio,
+		unsigned int left)
+{
+	unsigned int nr_bvecs = bio->bi_max_vecs - bio->bi_vcnt;
+	unsigned int size, predicted_space, max_bytes;
+	unsigned int space = nr_bvecs << PAGE_SHIFT;
+	unsigned int align_deviation;
+
+	/* If we have enough space really, just try to get all pages */
+	if (!bio->bi_bdev || nr_bvecs >= (bio->bi_max_vecs / 4) ||
+			!bio->bi_vcnt || left <= space)
+		return UINT_MAX - size;
+
+	max_bytes = bdev_max_io_bytes(bio->bi_bdev);
+	size = bio->bi_iter.bi_size;
+
+	/*
+	 * One bvec can hold physically continuous page frames with
+	 * multipage bvec and bytes in these pages can be pretty big, so
+	 * predict the available space by averaging bytes on all bvecs
+	 */
+	predicted_space = size * nr_bvecs / bio->bi_vcnt;
+	/*
+	 * If predicted space is bigger than max io bytes and at least two
+	 * vectors left, ask for all pages
+	 */
+	if (predicted_space > max_bytes && nr_bvecs > 2)
+		return UINT_MAX - size;
+
+	/*
+	 * This bio is close to be full, and stop to add pages if it is
+	 * basically aligned, otherwise just get & add pages if the bio
+	 * can be kept as aligned, so that we can minimize bio & iov iter
+	 * revert
+	 */
+	align_deviation = max_t(unsigned int, 16U * 1024, max_bytes / 16);
+	if ((size & (max_bytes - 1)) > align_deviation) {
+		unsigned aligned_bytes = max_bytes - (size & (max_bytes - 1));
+
+		/* try to keep bio aligned if we have enough data and space */
+		if (aligned_bytes <= left && aligned_bytes <= predicted_space)
+			return aligned_bytes;
+	}
+
+	return 0;
+}
+
 #define PAGE_PTRS_PER_BVEC     (sizeof(struct bio_vec) / sizeof(struct page *))
 
 /**
@@ -1229,7 +1280,7 @@  static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
-	ssize_t size, left;
+	ssize_t size, left, max_size;
 	unsigned len, i = 0;
 	size_t offset;
 	int ret = 0;
@@ -1245,6 +1296,10 @@  static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	if (bio->bi_bdev && blk_queue_pci_p2pdma(bio->bi_bdev->bd_disk->queue))
 		extraction_flags |= ITER_ALLOW_P2PDMA;
 
+	max_size = bio_get_buffer_size_hint(bio, iov_iter_count(iter));
+	if (!max_size)
+		return -E2BIG;
+
 	/*
 	 * Each segment in the iov is required to be a block size multiple.
 	 * However, we may not be able to get the entire segment if it spans
@@ -1252,8 +1307,7 @@  static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	 * result to ensure the bio's total size is correct. The remainder of
 	 * the iov data will be picked up in the next bio iteration.
 	 */
-	size = iov_iter_extract_pages(iter, &pages,
-				      UINT_MAX - bio->bi_iter.bi_size,
+	size = iov_iter_extract_pages(iter, &pages, max_size,
 				      nr_pages, extraction_flags, &offset);
 	if (unlikely(size <= 0))
 		return size ? size : -EFAULT;
@@ -1298,6 +1352,46 @@  static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	return ret;
 }
 
+/* should only be called before submission */
+static void bio_shrink(struct bio *bio, unsigned bytes)
+{
+	unsigned int size = bio->bi_iter.bi_size;
+	int idx;
+
+	if (bytes >= size)
+		return;
+
+	WARN_ON_ONCE(bio_flagged(bio, BIO_CLONED));
+
+	idx = bio->bi_vcnt - 1;
+	bio->bi_iter.bi_size -= bytes;
+	while (bytes > 0) {
+		struct bio_vec *bv = &bio->bi_io_vec[idx];
+		unsigned int len = min_t(unsigned, bv->bv_len, bytes);
+
+		bytes -= len;
+		bv->bv_len -= len;
+		if (!bv->bv_len) {
+			bio_release_page(bio, bv->bv_page);
+			idx--;
+		}
+	}
+	WARN_ON_ONCE(idx < 0);
+	bio->bi_vcnt = idx + 1;
+}
+
+static unsigned bio_align_with_io_size(struct bio *bio)
+{
+	unsigned int size = bio->bi_iter.bi_size;
+	unsigned int trim = size & (bdev_max_io_bytes(bio->bi_bdev) - 1);
+
+	if (trim && trim != size) {
+		bio_shrink(bio, trim);
+		return trim;
+	}
+	return 0;
+}
+
 /**
  * bio_iov_iter_get_pages - add user or kernel pages to a bio
  * @bio: bio to add pages to
@@ -1337,6 +1431,22 @@  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		ret = __bio_iov_iter_get_pages(bio, iter);
 	} while (!ret && iov_iter_count(iter) && !bio_full(bio, 0));
 
+
+	/*
+	 * If we still have data and bio is full, this bio size may not be
+	 * aligned with max io size, small bio can be caused by split, try
+	 * to avoid this situation by aligning bio with max io size.
+	 *
+	 * Big chunk of sequential IO workload can benefit from this way.
+	 */
+	if (!ret && iov_iter_count(iter) && bio->bi_bdev &&
+			bio_op(bio) != REQ_OP_ZONE_APPEND) {
+		unsigned trim = bio_align_with_io_size(bio);
+
+		if (trim)
+			iov_iter_revert(iter, trim);
+	}
+
 	return bio->bi_vcnt ? 0 : ret;
 }
 EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index eef450f25982..2d275cdc39d8 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1151,6 +1151,11 @@  static inline unsigned queue_logical_block_size(const struct request_queue *q)
 	return retval;
 }
 
+static inline unsigned int bdev_max_io_bytes(struct block_device *bdev)
+{
+	return queue_max_bytes(bdev_get_queue(bdev));
+}
+
 static inline unsigned int bdev_logical_block_size(struct block_device *bdev)
 {
 	return queue_logical_block_size(bdev_get_queue(bdev));