diff mbox series

[v8,01/41] block: add bio_add_zone_append_page

Message ID dece91bca322ce44bed19f2b0f460fa5ded2e512.1601574234.git.naohiro.aota@wdc.com (mailing list archive)
State New, archived
Headers show
Series btrfs: zoned block device support | expand

Commit Message

Naohiro Aota Oct. 1, 2020, 6:36 p.m. UTC
From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which
is intended to be used by file systems that directly add pages to a bio
instead of using bio_iov_iter_get_pages().

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 block/bio.c         | 36 ++++++++++++++++++++++++++++++++++++
 include/linux/bio.h |  2 ++
 2 files changed, 38 insertions(+)

Comments

Martin K. Petersen Oct. 2, 2020, 1:39 p.m. UTC | #1
Naohiro/Johannes,

> Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which
> is intended to be used by file systems that directly add pages to a bio
> instead of using bio_iov_iter_get_pages().

Why use the hardware limit? For filesystem I/O we generally use the
queue soft limit to prevent I/Os getting too big which can lead to very
unpredictable latency in mixed workloads.

max_zone_append_sectors also appears to be gated exclusively by hardware
constraints. Is there user-controllable limit in place for append
operations?
Damien Le Moal Oct. 5, 2020, 1:46 a.m. UTC | #2
On 2020/10/02 22:39, Martin K. Petersen wrote:
> 
> Naohiro/Johannes,
> 
>> Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which
>> is intended to be used by file systems that directly add pages to a bio
>> instead of using bio_iov_iter_get_pages().
> 
> Why use the hardware limit? For filesystem I/O we generally use the
> queue soft limit to prevent I/Os getting too big which can lead to very
> unpredictable latency in mixed workloads.

max_zone_append_sectors is already gated by max_hw_sectors, but it is not gated
by max_sectors/BLK_DEF_MAX_SECTORS. If we add such gating to
blk_queue_max_zone_append_sectors(), max_zone_append_sectors would become a soft
limit too. So should we have max_zone_append_sectors and
max_hw_zone_append_sectors ?

Which also means that we should tweak queue_max_sectors_store() to gate
max_zone_append_sectors to that limit upon a user change.

> 
> max_zone_append_sectors also appears to be gated exclusively by hardware
> constraints. Is there user-controllable limit in place for append
> operations?

No, none that I know of. At the HW level, max_zone_append_sectors is basically
max_hw_sectors. That is gated by the zone size in
blk_queue_max_zone_append_sectors(). If as mentioned above we tweak
queue_max_sectors_store() to change max_zone_append_sectors too if needed, we
would then have an indirect user-controllable limit. Or we could implement a
queue_zone_append_max_store() too I guess. Yet, having everything indirectly
controlled through queue_max_sectors_store() is probably simpler I think.
Christoph Hellwig Oct. 5, 2020, 1:43 p.m. UTC | #3
On Fri, Oct 02, 2020 at 09:39:04AM -0400, Martin K. Petersen wrote:
> 
> Naohiro/Johannes,
> 
> > Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which
> > is intended to be used by file systems that directly add pages to a bio
> > instead of using bio_iov_iter_get_pages().
> 
> Why use the hardware limit? For filesystem I/O we generally use the
> queue soft limit to prevent I/Os getting too big which can lead to very
> unpredictable latency in mixed workloads.
> 
> max_zone_append_sectors also appears to be gated exclusively by hardware
> constraints. Is there user-controllable limit in place for append
> operations?

Zone Append operations can't be split, as they return the first written
LBA to the caller.  If you'd split it you'd now need to return multiple
start LBA values.
Martin K. Petersen Oct. 6, 2020, 1:26 a.m. UTC | #4
Christoph,

>> max_zone_append_sectors also appears to be gated exclusively by hardware
>> constraints. Is there user-controllable limit in place for append
>> operations?
>
> Zone Append operations can't be split, as they return the first written
> LBA to the caller.  If you'd split it you'd now need to return multiple
> start LBA values.

Yep, this limit would have to be enforced at the top.

Anyway, not sure how important this is given modern drives. I just know
that for our workloads the soft limit is something we almost always end
up tweaking.
Damien Le Moal Oct. 6, 2020, 5:12 a.m. UTC | #5
On 2020/10/06 10:27, Martin K. Petersen wrote:
> 
> Christoph,
> 
>>> max_zone_append_sectors also appears to be gated exclusively by hardware
>>> constraints. Is there user-controllable limit in place for append
>>> operations?
>>
>> Zone Append operations can't be split, as they return the first written
>> LBA to the caller.  If you'd split it you'd now need to return multiple
>> start LBA values.
> 
> Yep, this limit would have to be enforced at the top.
> 
> Anyway, not sure how important this is given modern drives. I just know
> that for our workloads the soft limit is something we almost always end
> up tweaking.

We can fix the max_zone_append_sectors limit independently of this btrfs series.
Right now, this attribute represents the hardware limit but we can easily rename
it to max_hw_zone_append_sectors and define a soft max_zone_append_sectors which
would be min(max_sectors, max_hw_zone_append_sectors). That will result in
similar soft limit control as for regular writes & max_sectors/max_hw_sectors.
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index e865ea55b9f9..e0d41ccc4e90 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -853,6 +853,42 @@  int bio_add_pc_page(struct request_queue *q, struct bio *bio,
 }
 EXPORT_SYMBOL(bio_add_pc_page);
 
+/**
+ * bio_add_zone_append_page - attempt to add page to zone-append bio
+ * @bio: destination bio
+ * @page: page to add
+ * @len: vec entry length
+ * @offset: vec entry offset
+ *
+ * Attempt to add a page to the bio_vec maplist of a bio that will be submitted
+ * for a zone-append request. This can fail for a number of reasons, such as the
+ * bio being full or the target block device is not a zoned block device or
+ * other limitations of the target block device. The target block device must
+ * allow bio's up to PAGE_SIZE, so it is always possible to add a single page
+ * to an empty bio.
+ */
+int bio_add_zone_append_page(struct bio *bio, struct page *page,
+			     unsigned int len, unsigned int offset)
+{
+	struct request_queue *q;
+	bool same_page = false;
+
+	if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_ZONE_APPEND))
+		return 0;
+
+	if (WARN_ON_ONCE(!bio->bi_disk))
+		return 0;
+
+	q = bio->bi_disk->queue;
+
+	if (WARN_ON_ONCE(!blk_queue_is_zoned(q)))
+		return 0;
+
+	return bio_add_hw_page(q, bio, page, len, offset,
+			       queue_max_zone_append_sectors(q), &same_page);
+}
+EXPORT_SYMBOL_GPL(bio_add_zone_append_page);
+
 /**
  * __bio_try_merge_page - try appending data to an existing bvec.
  * @bio: destination bio
diff --git a/include/linux/bio.h b/include/linux/bio.h
index c6d765382926..7ef300cb4e9a 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -442,6 +442,8 @@  void bio_chain(struct bio *, struct bio *);
 extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int);
 extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *,
 			   unsigned int, unsigned int);
+int bio_add_zone_append_page(struct bio *bio, struct page *page,
+			     unsigned int len, unsigned int offset);
 bool __bio_try_merge_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off, bool *same_page);
 void __bio_add_page(struct bio *bio, struct page *page,