Message ID | dece91bca322ce44bed19f2b0f460fa5ded2e512.1601574234.git.naohiro.aota@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: zoned block device support | expand |
Naohiro/Johannes, > Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which > is intended to be used by file systems that directly add pages to a bio > instead of using bio_iov_iter_get_pages(). Why use the hardware limit? For filesystem I/O we generally use the queue soft limit to prevent I/Os getting too big which can lead to very unpredictable latency in mixed workloads. max_zone_append_sectors also appears to be gated exclusively by hardware constraints. Is there user-controllable limit in place for append operations?
On 2020/10/02 22:39, Martin K. Petersen wrote: > > Naohiro/Johannes, > >> Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which >> is intended to be used by file systems that directly add pages to a bio >> instead of using bio_iov_iter_get_pages(). > > Why use the hardware limit? For filesystem I/O we generally use the > queue soft limit to prevent I/Os getting too big which can lead to very > unpredictable latency in mixed workloads. max_zone_append_sectors is already gated by max_hw_sectors, but it is not gated by max_sectors/BLK_DEF_MAX_SECTORS. If we add such gating to blk_queue_max_zone_append_sectors(), max_zone_append_sectors would become a soft limit too. So should we have max_zone_append_sectors and max_hw_zone_append_sectors ? Which also means that we should tweak queue_max_sectors_store() to gate max_zone_append_sectors to that limit upon a user change. > > max_zone_append_sectors also appears to be gated exclusively by hardware > constraints. Is there user-controllable limit in place for append > operations? No, none that I know of. At the HW level, max_zone_append_sectors is basically max_hw_sectors. That is gated by the zone size in blk_queue_max_zone_append_sectors(). If as mentioned above we tweak queue_max_sectors_store() to change max_zone_append_sectors too if needed, we would then have an indirect user-controllable limit. Or we could implement a queue_zone_append_max_store() too I guess. Yet, having everything indirectly controlled through queue_max_sectors_store() is probably simpler I think.
On Fri, Oct 02, 2020 at 09:39:04AM -0400, Martin K. Petersen wrote: > > Naohiro/Johannes, > > > Add bio_add_zone_append_page(), a wrapper around bio_add_hw_page() which > > is intended to be used by file systems that directly add pages to a bio > > instead of using bio_iov_iter_get_pages(). > > Why use the hardware limit? For filesystem I/O we generally use the > queue soft limit to prevent I/Os getting too big which can lead to very > unpredictable latency in mixed workloads. > > max_zone_append_sectors also appears to be gated exclusively by hardware > constraints. Is there user-controllable limit in place for append > operations? Zone Append operations can't be split, as they return the first written LBA to the caller. If you'd split it you'd now need to return multiple start LBA values.
Christoph, >> max_zone_append_sectors also appears to be gated exclusively by hardware >> constraints. Is there user-controllable limit in place for append >> operations? > > Zone Append operations can't be split, as they return the first written > LBA to the caller. If you'd split it you'd now need to return multiple > start LBA values. Yep, this limit would have to be enforced at the top. Anyway, not sure how important this is given modern drives. I just know that for our workloads the soft limit is something we almost always end up tweaking.
On 2020/10/06 10:27, Martin K. Petersen wrote: > > Christoph, > >>> max_zone_append_sectors also appears to be gated exclusively by hardware >>> constraints. Is there user-controllable limit in place for append >>> operations? >> >> Zone Append operations can't be split, as they return the first written >> LBA to the caller. If you'd split it you'd now need to return multiple >> start LBA values. > > Yep, this limit would have to be enforced at the top. > > Anyway, not sure how important this is given modern drives. I just know > that for our workloads the soft limit is something we almost always end > up tweaking. We can fix the max_zone_append_sectors limit independently of this btrfs series. Right now, this attribute represents the hardware limit but we can easily rename it to max_hw_zone_append_sectors and define a soft max_zone_append_sectors which would be min(max_sectors, max_hw_zone_append_sectors). That will result in similar soft limit control as for regular writes & max_sectors/max_hw_sectors.
diff --git a/block/bio.c b/block/bio.c index e865ea55b9f9..e0d41ccc4e90 100644 --- a/block/bio.c +++ b/block/bio.c @@ -853,6 +853,42 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio, } EXPORT_SYMBOL(bio_add_pc_page); +/** + * bio_add_zone_append_page - attempt to add page to zone-append bio + * @bio: destination bio + * @page: page to add + * @len: vec entry length + * @offset: vec entry offset + * + * Attempt to add a page to the bio_vec maplist of a bio that will be submitted + * for a zone-append request. This can fail for a number of reasons, such as the + * bio being full or the target block device is not a zoned block device or + * other limitations of the target block device. The target block device must + * allow bio's up to PAGE_SIZE, so it is always possible to add a single page + * to an empty bio. + */ +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset) +{ + struct request_queue *q; + bool same_page = false; + + if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_ZONE_APPEND)) + return 0; + + if (WARN_ON_ONCE(!bio->bi_disk)) + return 0; + + q = bio->bi_disk->queue; + + if (WARN_ON_ONCE(!blk_queue_is_zoned(q))) + return 0; + + return bio_add_hw_page(q, bio, page, len, offset, + queue_max_zone_append_sectors(q), &same_page); +} +EXPORT_SYMBOL_GPL(bio_add_zone_append_page); + /** * __bio_try_merge_page - try appending data to an existing bvec. * @bio: destination bio diff --git a/include/linux/bio.h b/include/linux/bio.h index c6d765382926..7ef300cb4e9a 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -442,6 +442,8 @@ void bio_chain(struct bio *, struct bio *); extern int bio_add_page(struct bio *, struct page *, unsigned int,unsigned int); extern int bio_add_pc_page(struct request_queue *, struct bio *, struct page *, unsigned int, unsigned int); +int bio_add_zone_append_page(struct bio *bio, struct page *page, + unsigned int len, unsigned int offset); bool __bio_try_merge_page(struct bio *bio, struct page *page, unsigned int len, unsigned int off, bool *same_page); void __bio_add_page(struct bio *bio, struct page *page,