diff mbox series

[v9,2/4] block: introduce folio awareness and add a bigger size from folio

Message ID 20240830075257.186834-3-kundan.kumar@samsung.com (mailing list archive)
State New, archived
Headers show
Series block: add larger order folio instead of pages | expand

Commit Message

Kundan Kumar Aug. 30, 2024, 7:52 a.m. UTC
Add a bigger size from folio to bio and skip merge processing for pages.

Fetch the offset of page within a folio. Depending on the size of folio
and folio_offset, fetch a larger length. This length may consist of
multiple contiguous pages if folio is multiorder.

Using the length calculate number of pages which will be added to bio and
increment the loop counter to skip those pages.

This technique helps to avoid overhead of merging pages which belong to
same large order folio.

Also folio-ize the functions bio_iov_add_page() and
bio_iov_add_zone_append_page()

Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
Tested-by: Luis Chamberlain <mcgrof@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 block/bio.c | 79 +++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 61 insertions(+), 18 deletions(-)

Comments

Matthew Wilcox Sept. 9, 2024, 9:17 p.m. UTC | #1
On Fri, Aug 30, 2024 at 01:22:55PM +0530, Kundan Kumar wrote:
> +++ b/block/bio.c
> @@ -931,7 +931,8 @@ static bool bvec_try_merge_page(struct bio_vec *bv, struct page *page,
>  	if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
>  		return false;
>  
> -	*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
> +	*same_page = ((vec_end_addr & PAGE_MASK) == ((page_addr + off) &
> +		     PAGE_MASK));
>  	if (!*same_page) {
>  		if (IS_ENABLED(CONFIG_KMSAN))
>  			return false;

This seems like a completely independent change, which has presumably
only now been noticed as a problem, but really should be in a separate
commit and marked for backporting?

> @@ -1280,9 +1312,9 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
>  	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
>  	struct page **pages = (struct page **)bv;
> -	ssize_t size, left;
> -	unsigned len, i = 0;
> -	size_t offset;
> +	ssize_t size;
> +	unsigned int i = 0, num_pages;

I prefer

	unsigned int num_pages, i = 0;

but that's a mild preference.

> @@ -1322,17 +1354,28 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  		goto out;
>  	}
>  
> -	for (left = size, i = 0; left > 0; left -= len, i++) {
> +	for (left = size, i = 0; left > 0; left -= len, i += num_pages) {
>  		struct page *page = pages[i];
> +		struct folio *folio = page_folio(page);
> +
> +		folio_offset = ((size_t)folio_page_idx(folio, page) <<
> +			       PAGE_SHIFT) + offset;
> +
> +		len = min_t(size_t, (folio_size(folio) - folio_offset), left);

Does this need to be min_t?  afaict these are all already size_t.

Other than that last one, looks good.

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Matthew Wilcox Sept. 9, 2024, 9:23 p.m. UTC | #2
On Fri, Aug 30, 2024 at 01:22:55PM +0530, Kundan Kumar wrote:
> @@ -1237,30 +1238,61 @@ static int bio_iov_add_page(struct bio *bio, struct page *page,
>  
>  	if (bio->bi_vcnt > 0 &&
>  	    bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1],
> -				page, len, offset, &same_page)) {
> +				folio_page(folio, 0), len, offset,
> +				&same_page)) {
>  		bio->bi_iter.bi_size += len;
>  		if (same_page)
> -			bio_release_page(bio, page);
> +			bio_release_page(bio, folio_page(folio, 0));

Shouldn't there be a subsequent patch that converts this to
		if (same_page && bio_flagged(bio, BIO_PAGE_PINNED))
			unpin_user_folio(folio, 1)

... also does this mean that 'same_page' is misnamed and it should
really be 'same_folio', in which case, is the bugfix in patch 1 correct?
Kundan Kumar Sept. 10, 2024, 6:11 a.m. UTC | #3
On 09/09/24 10:23PM, Matthew Wilcox wrote:
>On Fri, Aug 30, 2024 at 01:22:55PM +0530, Kundan Kumar wrote:
>> @@ -1237,30 +1238,61 @@ static int bio_iov_add_page(struct bio *bio, struct page *page,
>>
>>  	if (bio->bi_vcnt > 0 &&
>>  	    bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1],
>> -				page, len, offset, &same_page)) {
>> +				folio_page(folio, 0), len, offset,
>> +				&same_page)) {
>>  		bio->bi_iter.bi_size += len;
>>  		if (same_page)
>> -			bio_release_page(bio, page);
>> +			bio_release_page(bio, folio_page(folio, 0));
>
>Shouldn't there be a subsequent patch that converts this to

Will do it in next version

>		if (same_page && bio_flagged(bio, BIO_PAGE_PINNED))
>			unpin_user_folio(folio, 1)
>
>... also does this mean that 'same_page' is misnamed and it should
>really be 'same_folio', in which case, is the bugfix in patch 1 correct?
>

No, we want to find same page rather than same folio. The logic still
determines whether two addresses lie in the same page. It's just
modified to take care of larger offset values that we may receive in
case of folios.
Kundan Kumar Sept. 10, 2024, 6:15 a.m. UTC | #4
On 09/09/24 10:17PM, Matthew Wilcox wrote:
>On Fri, Aug 30, 2024 at 01:22:55PM +0530, Kundan Kumar wrote:
>> +++ b/block/bio.c
>> @@ -931,7 +931,8 @@ static bool bvec_try_merge_page(struct bio_vec *bv, struct page *page,
>>  	if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
>>  		return false;
>>
>> -	*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
>> +	*same_page = ((vec_end_addr & PAGE_MASK) == ((page_addr + off) &
>> +		     PAGE_MASK));
>>  	if (!*same_page) {
>>  		if (IS_ENABLED(CONFIG_KMSAN))
>>  			return false;
>
>This seems like a completely independent change, which has presumably
>only now been noticed as a problem, but really should be in a separate
>commit and marked for backporting?

Currently, the offset lies between 0 to PAGE_SIZE. Only after the
changes introduced in this series, folio_offset is used which can be
greater than PAGE_SIZE. So need not be backported.
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index f9d759315f4d..c8fc97b42410 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -931,7 +931,8 @@  static bool bvec_try_merge_page(struct bio_vec *bv, struct page *page,
 	if (!zone_device_pages_have_same_pgmap(bv->bv_page, page))
 		return false;
 
-	*same_page = ((vec_end_addr & PAGE_MASK) == page_addr);
+	*same_page = ((vec_end_addr & PAGE_MASK) == ((page_addr + off) &
+		     PAGE_MASK));
 	if (!*same_page) {
 		if (IS_ENABLED(CONFIG_KMSAN))
 			return false;
@@ -1227,8 +1228,8 @@  void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
 	bio_set_flag(bio, BIO_CLONED);
 }
 
-static int bio_iov_add_page(struct bio *bio, struct page *page,
-		unsigned int len, unsigned int offset)
+static int bio_iov_add_folio(struct bio *bio, struct folio *folio, size_t len,
+			     size_t offset)
 {
 	bool same_page = false;
 
@@ -1237,30 +1238,61 @@  static int bio_iov_add_page(struct bio *bio, struct page *page,
 
 	if (bio->bi_vcnt > 0 &&
 	    bvec_try_merge_page(&bio->bi_io_vec[bio->bi_vcnt - 1],
-				page, len, offset, &same_page)) {
+				folio_page(folio, 0), len, offset,
+				&same_page)) {
 		bio->bi_iter.bi_size += len;
 		if (same_page)
-			bio_release_page(bio, page);
+			bio_release_page(bio, folio_page(folio, 0));
 		return 0;
 	}
-	__bio_add_page(bio, page, len, offset);
+	bio_add_folio_nofail(bio, folio, len, offset);
 	return 0;
 }
 
-static int bio_iov_add_zone_append_page(struct bio *bio, struct page *page,
-		unsigned int len, unsigned int offset)
+static int bio_iov_add_zone_append_folio(struct bio *bio, struct folio *folio,
+					 size_t len, size_t offset)
 {
 	struct request_queue *q = bdev_get_queue(bio->bi_bdev);
 	bool same_page = false;
 
-	if (bio_add_hw_page(q, bio, page, len, offset,
+	if (bio_add_hw_folio(q, bio, folio, len, offset,
 			queue_max_zone_append_sectors(q), &same_page) != len)
 		return -EINVAL;
 	if (same_page)
-		bio_release_page(bio, page);
+		bio_release_page(bio, folio_page(folio, 0));
 	return 0;
 }
 
+static unsigned int get_contig_folio_len(unsigned int *num_pages,
+					 struct page **pages, unsigned int i,
+					 struct folio *folio, size_t left,
+					 size_t offset)
+{
+	size_t bytes = left;
+	size_t contig_sz = min_t(size_t, PAGE_SIZE - offset, bytes);
+	unsigned int j;
+
+	/*
+	 * We might COW a single page in the middle of
+	 * a large folio, so we have to check that all
+	 * pages belong to the same folio.
+	 */
+	bytes -= contig_sz;
+	for (j = i + 1; j < i + *num_pages; j++) {
+		size_t next = min_t(size_t, PAGE_SIZE, bytes);
+
+		if (page_folio(pages[j]) != folio ||
+		    pages[j] != pages[j - 1] + 1) {
+			break;
+		}
+		contig_sz += next;
+		bytes -= next;
+	}
+	*num_pages = j - i;
+
+	return contig_sz;
+}
+
 #define PAGE_PTRS_PER_BVEC     (sizeof(struct bio_vec) / sizeof(struct page *))
 
 /**
@@ -1280,9 +1312,9 @@  static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 	unsigned short entries_left = bio->bi_max_vecs - bio->bi_vcnt;
 	struct bio_vec *bv = bio->bi_io_vec + bio->bi_vcnt;
 	struct page **pages = (struct page **)bv;
-	ssize_t size, left;
-	unsigned len, i = 0;
-	size_t offset;
+	ssize_t size;
+	unsigned int i = 0, num_pages;
+	size_t offset, folio_offset, left, len;
 	int ret = 0;
 
 	/*
@@ -1322,17 +1354,28 @@  static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 		goto out;
 	}
 
-	for (left = size, i = 0; left > 0; left -= len, i++) {
+	for (left = size, i = 0; left > 0; left -= len, i += num_pages) {
 		struct page *page = pages[i];
+		struct folio *folio = page_folio(page);
+
+		folio_offset = ((size_t)folio_page_idx(folio, page) <<
+			       PAGE_SHIFT) + offset;
+
+		len = min_t(size_t, (folio_size(folio) - folio_offset), left);
+
+		num_pages = DIV_ROUND_UP(offset + len, PAGE_SIZE);
+
+		if (num_pages > 1)
+			len = get_contig_folio_len(&num_pages, pages, i,
+						   folio, left, offset);
 
-		len = min_t(size_t, PAGE_SIZE - offset, left);
 		if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
-			ret = bio_iov_add_zone_append_page(bio, page, len,
-					offset);
+			ret = bio_iov_add_zone_append_folio(bio, folio, len,
+					folio_offset);
 			if (ret)
 				break;
 		} else
-			bio_iov_add_page(bio, page, len, offset);
+			bio_iov_add_folio(bio, folio, len, folio_offset);
 
 		offset = 0;
 	}