Message ID | 20210809212401.19807-2-axboe@kernel.dk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Enable bio recycling for polled IO | expand |
Hi Jens, On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote: > Add a set of helpers that can encapsulate bio allocations, reusing them > as needed. Caller must provide the necessary locking, if any is needed. > The primary intended use case is polled IO from io_uring, which will not > need any external locking. > > Very simple - keeps a count of bio's in the cache, and maintains a max > of 512 with a slack of 64. If we get above max + slack, we drop slack > number of bio's. > > The cache is intended to be per-task, and the user will need to supply > the storage for it. As io_uring will be the only user right now, provide > a hook that returns the cache there. Stub it out as NULL initially. Is it possible for user space to submit & poll IO from different io_uring tasks? Then one bio may be allocated from bio cache of the submission task, and freed to cache of the poll task? Thanks, Ming
On 8/10/21 7:15 AM, Ming Lei wrote: > Hi Jens, > > On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote: >> Add a set of helpers that can encapsulate bio allocations, reusing them >> as needed. Caller must provide the necessary locking, if any is needed. >> The primary intended use case is polled IO from io_uring, which will not >> need any external locking. >> >> Very simple - keeps a count of bio's in the cache, and maintains a max >> of 512 with a slack of 64. If we get above max + slack, we drop slack >> number of bio's. >> >> The cache is intended to be per-task, and the user will need to supply >> the storage for it. As io_uring will be the only user right now, provide >> a hook that returns the cache there. Stub it out as NULL initially. > > Is it possible for user space to submit & poll IO from different io_uring > tasks? > > Then one bio may be allocated from bio cache of the submission task, and > freed to cache of the poll task? Yes that is possible, and yes that would not benefit from this cache at all. The previous version would work just fine with that, as the cache is just under the ring lock and hence you can share it between tasks. I wonder if the niftier solution here is to retain the cache in the ring still, yet have the pointer be per-task. So basically the setup that this version does, except we store the cache itself in the ring. I'll give that a whirl, should be a minor change, and it'll work per ring instead then like before.
On 8/10/21 7:53 AM, Jens Axboe wrote: > On 8/10/21 7:15 AM, Ming Lei wrote: >> Hi Jens, >> >> On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote: >>> Add a set of helpers that can encapsulate bio allocations, reusing them >>> as needed. Caller must provide the necessary locking, if any is needed. >>> The primary intended use case is polled IO from io_uring, which will not >>> need any external locking. >>> >>> Very simple - keeps a count of bio's in the cache, and maintains a max >>> of 512 with a slack of 64. If we get above max + slack, we drop slack >>> number of bio's. >>> >>> The cache is intended to be per-task, and the user will need to supply >>> the storage for it. As io_uring will be the only user right now, provide >>> a hook that returns the cache there. Stub it out as NULL initially. >> >> Is it possible for user space to submit & poll IO from different io_uring >> tasks? >> >> Then one bio may be allocated from bio cache of the submission task, and >> freed to cache of the poll task? > > Yes that is possible, and yes that would not benefit from this cache > at all. The previous version would work just fine with that, as the > cache is just under the ring lock and hence you can share it between > tasks. > > I wonder if the niftier solution here is to retain the cache in the > ring still, yet have the pointer be per-task. So basically the setup > that this version does, except we store the cache itself in the ring. > I'll give that a whirl, should be a minor change, and it'll work per > ring instead then like before. That won't work, as we'd have to do a ctx lookup (which would defeat the purpose), and we don't even have anything to key off of at that point... The current approach seems like the only viable one, or adding a member to kiocb so we can pass in the cache in question. The latter did work just fine, but I really dislike the fact that it's growing the kiocb to more than a cacheline.
On 8/10/21 8:24 AM, Jens Axboe wrote: > On 8/10/21 7:53 AM, Jens Axboe wrote: >> On 8/10/21 7:15 AM, Ming Lei wrote: >>> Hi Jens, >>> >>> On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote: >>>> Add a set of helpers that can encapsulate bio allocations, reusing them >>>> as needed. Caller must provide the necessary locking, if any is needed. >>>> The primary intended use case is polled IO from io_uring, which will not >>>> need any external locking. >>>> >>>> Very simple - keeps a count of bio's in the cache, and maintains a max >>>> of 512 with a slack of 64. If we get above max + slack, we drop slack >>>> number of bio's. >>>> >>>> The cache is intended to be per-task, and the user will need to supply >>>> the storage for it. As io_uring will be the only user right now, provide >>>> a hook that returns the cache there. Stub it out as NULL initially. >>> >>> Is it possible for user space to submit & poll IO from different io_uring >>> tasks? >>> >>> Then one bio may be allocated from bio cache of the submission task, and >>> freed to cache of the poll task? >> >> Yes that is possible, and yes that would not benefit from this cache >> at all. The previous version would work just fine with that, as the >> cache is just under the ring lock and hence you can share it between >> tasks. >> >> I wonder if the niftier solution here is to retain the cache in the >> ring still, yet have the pointer be per-task. So basically the setup >> that this version does, except we store the cache itself in the ring. >> I'll give that a whirl, should be a minor change, and it'll work per >> ring instead then like before. > > That won't work, as we'd have to do a ctx lookup (which would defeat the > purpose), and we don't even have anything to key off of at that point... > > The current approach seems like the only viable one, or adding a member > to kiocb so we can pass in the cache in question. The latter did work > just fine, but I really dislike the fact that it's growing the kiocb to > more than a cacheline. One potential way around this is to store the bio cache pointer in the iov_iter. Each consumer will setup a new struct to hold the bio etc, so we can continue storing it in there and have it for completion as well. Upside of that is that we retain the per-ring cache, instead of per-task, and iov_iter has room to hold the pointer without getting near the cacheline size yet. The downside is that it's kind of odd to store it in the iov_iter, and I'm sure that Al would hate it. Does seem like the best option though, in terms of getting the storage for the cache "for free".
On 8/10/21 8:48 AM, Jens Axboe wrote: > On 8/10/21 8:24 AM, Jens Axboe wrote: >> On 8/10/21 7:53 AM, Jens Axboe wrote: >>> On 8/10/21 7:15 AM, Ming Lei wrote: >>>> Hi Jens, >>>> >>>> On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote: >>>>> Add a set of helpers that can encapsulate bio allocations, reusing them >>>>> as needed. Caller must provide the necessary locking, if any is needed. >>>>> The primary intended use case is polled IO from io_uring, which will not >>>>> need any external locking. >>>>> >>>>> Very simple - keeps a count of bio's in the cache, and maintains a max >>>>> of 512 with a slack of 64. If we get above max + slack, we drop slack >>>>> number of bio's. >>>>> >>>>> The cache is intended to be per-task, and the user will need to supply >>>>> the storage for it. As io_uring will be the only user right now, provide >>>>> a hook that returns the cache there. Stub it out as NULL initially. >>>> >>>> Is it possible for user space to submit & poll IO from different io_uring >>>> tasks? >>>> >>>> Then one bio may be allocated from bio cache of the submission task, and >>>> freed to cache of the poll task? >>> >>> Yes that is possible, and yes that would not benefit from this cache >>> at all. The previous version would work just fine with that, as the >>> cache is just under the ring lock and hence you can share it between >>> tasks. >>> >>> I wonder if the niftier solution here is to retain the cache in the >>> ring still, yet have the pointer be per-task. So basically the setup >>> that this version does, except we store the cache itself in the ring. >>> I'll give that a whirl, should be a minor change, and it'll work per >>> ring instead then like before. >> >> That won't work, as we'd have to do a ctx lookup (which would defeat the >> purpose), and we don't even have anything to key off of at that point... >> >> The current approach seems like the only viable one, or adding a member >> to kiocb so we can pass in the cache in question. The latter did work >> just fine, but I really dislike the fact that it's growing the kiocb to >> more than a cacheline. > > One potential way around this is to store the bio cache pointer in the > iov_iter. Each consumer will setup a new struct to hold the bio etc, so > we can continue storing it in there and have it for completion as well. > > Upside of that is that we retain the per-ring cache, instead of > per-task, and iov_iter has room to hold the pointer without getting near > the cacheline size yet. > > The downside is that it's kind of odd to store it in the iov_iter, and > I'm sure that Al would hate it. Does seem like the best option though, > in terms of getting the storage for the cache "for free". Here's that approach: https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-bio-cache.3 totally untested so far.
On Tue, Aug 10, 2021 at 8:18 PM Jens Axboe <axboe@kernel.dk> wrote: > > On 8/10/21 7:53 AM, Jens Axboe wrote: > > On 8/10/21 7:15 AM, Ming Lei wrote: > >> Hi Jens, > >> > >> On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote: > >>> Add a set of helpers that can encapsulate bio allocations, reusing them > >>> as needed. Caller must provide the necessary locking, if any is needed. > >>> The primary intended use case is polled IO from io_uring, which will not > >>> need any external locking. > >>> > >>> Very simple - keeps a count of bio's in the cache, and maintains a max > >>> of 512 with a slack of 64. If we get above max + slack, we drop slack > >>> number of bio's. > >>> > >>> The cache is intended to be per-task, and the user will need to supply > >>> the storage for it. As io_uring will be the only user right now, provide > >>> a hook that returns the cache there. Stub it out as NULL initially. > >> > >> Is it possible for user space to submit & poll IO from different io_uring > >> tasks? > >> > >> Then one bio may be allocated from bio cache of the submission task, and > >> freed to cache of the poll task? > > > > Yes that is possible, and yes that would not benefit from this cache > > at all. The previous version would work just fine with that, as the > > cache is just under the ring lock and hence you can share it between > > tasks. > > > > I wonder if the niftier solution here is to retain the cache in the > > ring still, yet have the pointer be per-task. So basically the setup > > that this version does, except we store the cache itself in the ring. > > I'll give that a whirl, should be a minor change, and it'll work per > > ring instead then like before. > > That won't work, as we'd have to do a ctx lookup (which would defeat the > purpose), and we don't even have anything to key off of at that point... > > The current approach seems like the only viable one, or adding a member > to kiocb so we can pass in the cache in question. The latter did work > just fine, but I really dislike the fact that it's growing the kiocb to > more than a cacheline. > Still under a cacheline seems. kiocb took 48 bytes, and adding a bio-cache pointer made it 56.
On 8/10/21 9:54 AM, Kanchan Joshi wrote: > On Tue, Aug 10, 2021 at 8:18 PM Jens Axboe <axboe@kernel.dk> wrote: >> >> On 8/10/21 7:53 AM, Jens Axboe wrote: >>> On 8/10/21 7:15 AM, Ming Lei wrote: >>>> Hi Jens, >>>> >>>> On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote: >>>>> Add a set of helpers that can encapsulate bio allocations, reusing them >>>>> as needed. Caller must provide the necessary locking, if any is needed. >>>>> The primary intended use case is polled IO from io_uring, which will not >>>>> need any external locking. >>>>> >>>>> Very simple - keeps a count of bio's in the cache, and maintains a max >>>>> of 512 with a slack of 64. If we get above max + slack, we drop slack >>>>> number of bio's. >>>>> >>>>> The cache is intended to be per-task, and the user will need to supply >>>>> the storage for it. As io_uring will be the only user right now, provide >>>>> a hook that returns the cache there. Stub it out as NULL initially. >>>> >>>> Is it possible for user space to submit & poll IO from different io_uring >>>> tasks? >>>> >>>> Then one bio may be allocated from bio cache of the submission task, and >>>> freed to cache of the poll task? >>> >>> Yes that is possible, and yes that would not benefit from this cache >>> at all. The previous version would work just fine with that, as the >>> cache is just under the ring lock and hence you can share it between >>> tasks. >>> >>> I wonder if the niftier solution here is to retain the cache in the >>> ring still, yet have the pointer be per-task. So basically the setup >>> that this version does, except we store the cache itself in the ring. >>> I'll give that a whirl, should be a minor change, and it'll work per >>> ring instead then like before. >> >> That won't work, as we'd have to do a ctx lookup (which would defeat the >> purpose), and we don't even have anything to key off of at that point... >> >> The current approach seems like the only viable one, or adding a member >> to kiocb so we can pass in the cache in question. The latter did work >> just fine, but I really dislike the fact that it's growing the kiocb to >> more than a cacheline. >> > Still under a cacheline seems. kiocb took 48 bytes, and adding a > bio-cache pointer made it 56. Huh yes, I think I'm mixing up the fact that we embed kiocb and it takes req->rw over a cacheline, but I did put a fix on top for that one. I guess we can ignore that then and just shove it in the kiocb, at the end.
diff --git a/block/bio.c b/block/bio.c index 1fab762e079b..3bbda1be27be 100644 --- a/block/bio.c +++ b/block/bio.c @@ -20,6 +20,7 @@ #include <linux/sched/sysctl.h> #include <linux/blk-crypto.h> #include <linux/xarray.h> +#include <linux/io_uring.h> #include <trace/events/block.h> #include "blk.h" @@ -238,6 +239,35 @@ static void bio_free(struct bio *bio) } } +static inline void __bio_init(struct bio *bio) +{ + bio->bi_next = NULL; + bio->bi_bdev = NULL; + bio->bi_opf = 0; + bio->bi_flags = bio->bi_ioprio = bio->bi_write_hint = 0; + bio->bi_status = 0; + bio->bi_iter.bi_sector = 0; + bio->bi_iter.bi_size = 0; + bio->bi_iter.bi_idx = 0; + bio->bi_iter.bi_bvec_done = 0; + bio->bi_end_io = NULL; + bio->bi_private = NULL; +#ifdef CONFIG_BLK_CGROUP + bio->bi_blkg = NULL; + bio->bi_issue.value = 0; +#ifdef CONFIG_BLK_CGROUP_IOCOST + bio->bi_iocost_cost = 0; +#endif +#endif +#ifdef CONFIG_BLK_INLINE_ENCRYPTION + bio->bi_crypt_context = NULL; +#endif +#ifdef CONFIG_BLK_DEV_INTEGRITY + bio->bi_integrity = NULL; +#endif + bio->bi_vcnt = 0; +} + /* * Users of this function have their own bio allocation. Subsequently, * they must remember to pair any call to bio_init() with bio_uninit() @@ -246,7 +276,7 @@ static void bio_free(struct bio *bio) void bio_init(struct bio *bio, struct bio_vec *table, unsigned short max_vecs) { - memset(bio, 0, sizeof(*bio)); + __bio_init(bio); atomic_set(&bio->__bi_remaining, 1); atomic_set(&bio->__bi_cnt, 1); @@ -591,6 +621,19 @@ void guard_bio_eod(struct bio *bio) bio_truncate(bio, maxsector << 9); } +static bool __bio_put(struct bio *bio) +{ + if (!bio_flagged(bio, BIO_REFFED)) + return true; + + BIO_BUG_ON(!atomic_read(&bio->__bi_cnt)); + + /* + * last put frees it + */ + return atomic_dec_and_test(&bio->__bi_cnt); +} + /** * bio_put - release a reference to a bio * @bio: bio to release reference to @@ -601,17 +644,8 @@ void guard_bio_eod(struct bio *bio) **/ void bio_put(struct bio *bio) { - if (!bio_flagged(bio, BIO_REFFED)) + if (__bio_put(bio)) bio_free(bio); - else { - BIO_BUG_ON(!atomic_read(&bio->__bi_cnt)); - - /* - * last put frees it - */ - if (atomic_dec_and_test(&bio->__bi_cnt)) - bio_free(bio); - } } EXPORT_SYMBOL(bio_put); @@ -1595,6 +1629,76 @@ int bioset_init_from_src(struct bio_set *bs, struct bio_set *src) } EXPORT_SYMBOL(bioset_init_from_src); +void bio_alloc_cache_init(struct bio_alloc_cache *cache) +{ + bio_list_init(&cache->free_list); + cache->nr = 0; +} + +static void bio_alloc_cache_prune(struct bio_alloc_cache *cache, + unsigned int nr) +{ + struct bio *bio; + unsigned int i; + + i = 0; + while ((bio = bio_list_pop(&cache->free_list)) != NULL) { + cache->nr--; + bio_free(bio); + if (++i == nr) + break; + } +} + +void bio_alloc_cache_destroy(struct bio_alloc_cache *cache) +{ + bio_alloc_cache_prune(cache, -1U); +} + +struct bio *bio_cache_get(gfp_t gfp, unsigned short nr_vecs, struct bio_set *bs) +{ + struct bio_alloc_cache *cache = io_uring_bio_cache(); + struct bio *bio; + + if (!cache || nr_vecs > BIO_INLINE_VECS) + return NULL; + if (bio_list_empty(&cache->free_list)) { +alloc: + if (bs) + return bio_alloc_bioset(gfp, nr_vecs, bs); + else + return bio_alloc(gfp, nr_vecs); + } + + bio = bio_list_peek(&cache->free_list); + if (bs && bio->bi_pool != bs) + goto alloc; + bio_list_del_head(&cache->free_list, bio); + cache->nr--; + bio_init(bio, nr_vecs ? bio->bi_inline_vecs : NULL, nr_vecs); + return bio; +} + +#define ALLOC_CACHE_MAX 512 +#define ALLOC_CACHE_SLACK 64 + +void bio_cache_put(struct bio *bio) +{ + struct bio_alloc_cache *cache = io_uring_bio_cache(); + + if (unlikely(!__bio_put(bio))) + return; + if (cache) { + bio_uninit(bio); + bio_list_add_head(&cache->free_list, bio); + cache->nr++; + if (cache->nr > ALLOC_CACHE_MAX + ALLOC_CACHE_SLACK) + bio_alloc_cache_prune(cache, ALLOC_CACHE_SLACK); + } else { + bio_free(bio); + } +} + static int __init init_bio(void) { int i; diff --git a/include/linux/bio.h b/include/linux/bio.h index 2203b686e1f0..b70c72365fa2 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -652,18 +652,22 @@ static inline struct bio *bio_list_peek(struct bio_list *bl) return bl->head; } -static inline struct bio *bio_list_pop(struct bio_list *bl) +static inline void bio_list_del_head(struct bio_list *bl, struct bio *head) { - struct bio *bio = bl->head; - - if (bio) { + if (head) { bl->head = bl->head->bi_next; if (!bl->head) bl->tail = NULL; - bio->bi_next = NULL; + head->bi_next = NULL; } +} +static inline struct bio *bio_list_pop(struct bio_list *bl) +{ + struct bio *bio = bl->head; + + bio_list_del_head(bl, bio); return bio; } @@ -676,6 +680,16 @@ static inline struct bio *bio_list_get(struct bio_list *bl) return bio; } +struct bio_alloc_cache { + struct bio_list free_list; + unsigned int nr; +}; + +void bio_alloc_cache_init(struct bio_alloc_cache *); +void bio_alloc_cache_destroy(struct bio_alloc_cache *); +struct bio *bio_cache_get(gfp_t, unsigned short, struct bio_set *bs); +void bio_cache_put(struct bio *); + /* * Increment chain count for the bio. Make sure the CHAIN flag update * is visible before the raised count. diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 04b650bcbbe5..2fb53047638e 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -5,6 +5,8 @@ #include <linux/sched.h> #include <linux/xarray.h> +struct bio_alloc_cache; + #if defined(CONFIG_IO_URING) struct sock *io_uring_get_socket(struct file *file); void __io_uring_cancel(struct files_struct *files); @@ -40,4 +42,9 @@ static inline void io_uring_free(struct task_struct *tsk) } #endif +static inline struct bio_alloc_cache *io_uring_bio_cache(void) +{ + return NULL; +} + #endif
Add a set of helpers that can encapsulate bio allocations, reusing them as needed. Caller must provide the necessary locking, if any is needed. The primary intended use case is polled IO from io_uring, which will not need any external locking. Very simple - keeps a count of bio's in the cache, and maintains a max of 512 with a slack of 64. If we get above max + slack, we drop slack number of bio's. The cache is intended to be per-task, and the user will need to supply the storage for it. As io_uring will be the only user right now, provide a hook that returns the cache there. Stub it out as NULL initially. Signed-off-by: Jens Axboe <axboe@kernel.dk> --- block/bio.c | 126 +++++++++++++++++++++++++++++++++++---- include/linux/bio.h | 24 ++++++-- include/linux/io_uring.h | 7 +++ 3 files changed, 141 insertions(+), 16 deletions(-)