diff mbox series

[1/4] bio: add allocation cache abstraction

Message ID 20210809212401.19807-2-axboe@kernel.dk (mailing list archive)
State New, archived
Headers show
Series Enable bio recycling for polled IO | expand

Commit Message

Jens Axboe Aug. 9, 2021, 9:23 p.m. UTC
Add a set of helpers that can encapsulate bio allocations, reusing them
as needed. Caller must provide the necessary locking, if any is needed.
The primary intended use case is polled IO from io_uring, which will not
need any external locking.

Very simple - keeps a count of bio's in the cache, and maintains a max
of 512 with a slack of 64. If we get above max + slack, we drop slack
number of bio's.

The cache is intended to be per-task, and the user will need to supply
the storage for it. As io_uring will be the only user right now, provide
a hook that returns the cache there. Stub it out as NULL initially.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 block/bio.c              | 126 +++++++++++++++++++++++++++++++++++----
 include/linux/bio.h      |  24 ++++++--
 include/linux/io_uring.h |   7 +++
 3 files changed, 141 insertions(+), 16 deletions(-)

Comments

Ming Lei Aug. 10, 2021, 1:15 p.m. UTC | #1
Hi Jens,

On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote:
> Add a set of helpers that can encapsulate bio allocations, reusing them
> as needed. Caller must provide the necessary locking, if any is needed.
> The primary intended use case is polled IO from io_uring, which will not
> need any external locking.
> 
> Very simple - keeps a count of bio's in the cache, and maintains a max
> of 512 with a slack of 64. If we get above max + slack, we drop slack
> number of bio's.
> 
> The cache is intended to be per-task, and the user will need to supply
> the storage for it. As io_uring will be the only user right now, provide
> a hook that returns the cache there. Stub it out as NULL initially.

Is it possible for user space to submit & poll IO from different io_uring
tasks?

Then one bio may be allocated from bio cache of the submission task, and
freed to cache of the poll task?


Thanks, 
Ming
Jens Axboe Aug. 10, 2021, 1:53 p.m. UTC | #2
On 8/10/21 7:15 AM, Ming Lei wrote:
> Hi Jens,
> 
> On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote:
>> Add a set of helpers that can encapsulate bio allocations, reusing them
>> as needed. Caller must provide the necessary locking, if any is needed.
>> The primary intended use case is polled IO from io_uring, which will not
>> need any external locking.
>>
>> Very simple - keeps a count of bio's in the cache, and maintains a max
>> of 512 with a slack of 64. If we get above max + slack, we drop slack
>> number of bio's.
>>
>> The cache is intended to be per-task, and the user will need to supply
>> the storage for it. As io_uring will be the only user right now, provide
>> a hook that returns the cache there. Stub it out as NULL initially.
> 
> Is it possible for user space to submit & poll IO from different io_uring
> tasks?
> 
> Then one bio may be allocated from bio cache of the submission task, and
> freed to cache of the poll task?

Yes that is possible, and yes that would not benefit from this cache
at all. The previous version would work just fine with that, as the
cache is just under the ring lock and hence you can share it between
tasks.

I wonder if the niftier solution here is to retain the cache in the
ring still, yet have the pointer be per-task. So basically the setup
that this version does, except we store the cache itself in the ring.
I'll give that a whirl, should be a minor change, and it'll work per
ring instead then like before.
Jens Axboe Aug. 10, 2021, 2:24 p.m. UTC | #3
On 8/10/21 7:53 AM, Jens Axboe wrote:
> On 8/10/21 7:15 AM, Ming Lei wrote:
>> Hi Jens,
>>
>> On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote:
>>> Add a set of helpers that can encapsulate bio allocations, reusing them
>>> as needed. Caller must provide the necessary locking, if any is needed.
>>> The primary intended use case is polled IO from io_uring, which will not
>>> need any external locking.
>>>
>>> Very simple - keeps a count of bio's in the cache, and maintains a max
>>> of 512 with a slack of 64. If we get above max + slack, we drop slack
>>> number of bio's.
>>>
>>> The cache is intended to be per-task, and the user will need to supply
>>> the storage for it. As io_uring will be the only user right now, provide
>>> a hook that returns the cache there. Stub it out as NULL initially.
>>
>> Is it possible for user space to submit & poll IO from different io_uring
>> tasks?
>>
>> Then one bio may be allocated from bio cache of the submission task, and
>> freed to cache of the poll task?
> 
> Yes that is possible, and yes that would not benefit from this cache
> at all. The previous version would work just fine with that, as the
> cache is just under the ring lock and hence you can share it between
> tasks.
> 
> I wonder if the niftier solution here is to retain the cache in the
> ring still, yet have the pointer be per-task. So basically the setup
> that this version does, except we store the cache itself in the ring.
> I'll give that a whirl, should be a minor change, and it'll work per
> ring instead then like before.

That won't work, as we'd have to do a ctx lookup (which would defeat the
purpose), and we don't even have anything to key off of at that point...

The current approach seems like the only viable one, or adding a member
to kiocb so we can pass in the cache in question. The latter did work
just fine, but I really dislike the fact that it's growing the kiocb to
more than a cacheline.
Jens Axboe Aug. 10, 2021, 2:48 p.m. UTC | #4
On 8/10/21 8:24 AM, Jens Axboe wrote:
> On 8/10/21 7:53 AM, Jens Axboe wrote:
>> On 8/10/21 7:15 AM, Ming Lei wrote:
>>> Hi Jens,
>>>
>>> On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote:
>>>> Add a set of helpers that can encapsulate bio allocations, reusing them
>>>> as needed. Caller must provide the necessary locking, if any is needed.
>>>> The primary intended use case is polled IO from io_uring, which will not
>>>> need any external locking.
>>>>
>>>> Very simple - keeps a count of bio's in the cache, and maintains a max
>>>> of 512 with a slack of 64. If we get above max + slack, we drop slack
>>>> number of bio's.
>>>>
>>>> The cache is intended to be per-task, and the user will need to supply
>>>> the storage for it. As io_uring will be the only user right now, provide
>>>> a hook that returns the cache there. Stub it out as NULL initially.
>>>
>>> Is it possible for user space to submit & poll IO from different io_uring
>>> tasks?
>>>
>>> Then one bio may be allocated from bio cache of the submission task, and
>>> freed to cache of the poll task?
>>
>> Yes that is possible, and yes that would not benefit from this cache
>> at all. The previous version would work just fine with that, as the
>> cache is just under the ring lock and hence you can share it between
>> tasks.
>>
>> I wonder if the niftier solution here is to retain the cache in the
>> ring still, yet have the pointer be per-task. So basically the setup
>> that this version does, except we store the cache itself in the ring.
>> I'll give that a whirl, should be a minor change, and it'll work per
>> ring instead then like before.
> 
> That won't work, as we'd have to do a ctx lookup (which would defeat the
> purpose), and we don't even have anything to key off of at that point...
> 
> The current approach seems like the only viable one, or adding a member
> to kiocb so we can pass in the cache in question. The latter did work
> just fine, but I really dislike the fact that it's growing the kiocb to
> more than a cacheline.

One potential way around this is to store the bio cache pointer in the
iov_iter. Each consumer will setup a new struct to hold the bio etc, so
we can continue storing it in there and have it for completion as well.

Upside of that is that we retain the per-ring cache, instead of
per-task, and iov_iter has room to hold the pointer without getting near
the cacheline size yet.

The downside is that it's kind of odd to store it in the iov_iter, and
I'm sure that Al would hate it. Does seem like the best option though,
in terms of getting the storage for the cache "for free".
Jens Axboe Aug. 10, 2021, 3:35 p.m. UTC | #5
On 8/10/21 8:48 AM, Jens Axboe wrote:
> On 8/10/21 8:24 AM, Jens Axboe wrote:
>> On 8/10/21 7:53 AM, Jens Axboe wrote:
>>> On 8/10/21 7:15 AM, Ming Lei wrote:
>>>> Hi Jens,
>>>>
>>>> On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote:
>>>>> Add a set of helpers that can encapsulate bio allocations, reusing them
>>>>> as needed. Caller must provide the necessary locking, if any is needed.
>>>>> The primary intended use case is polled IO from io_uring, which will not
>>>>> need any external locking.
>>>>>
>>>>> Very simple - keeps a count of bio's in the cache, and maintains a max
>>>>> of 512 with a slack of 64. If we get above max + slack, we drop slack
>>>>> number of bio's.
>>>>>
>>>>> The cache is intended to be per-task, and the user will need to supply
>>>>> the storage for it. As io_uring will be the only user right now, provide
>>>>> a hook that returns the cache there. Stub it out as NULL initially.
>>>>
>>>> Is it possible for user space to submit & poll IO from different io_uring
>>>> tasks?
>>>>
>>>> Then one bio may be allocated from bio cache of the submission task, and
>>>> freed to cache of the poll task?
>>>
>>> Yes that is possible, and yes that would not benefit from this cache
>>> at all. The previous version would work just fine with that, as the
>>> cache is just under the ring lock and hence you can share it between
>>> tasks.
>>>
>>> I wonder if the niftier solution here is to retain the cache in the
>>> ring still, yet have the pointer be per-task. So basically the setup
>>> that this version does, except we store the cache itself in the ring.
>>> I'll give that a whirl, should be a minor change, and it'll work per
>>> ring instead then like before.
>>
>> That won't work, as we'd have to do a ctx lookup (which would defeat the
>> purpose), and we don't even have anything to key off of at that point...
>>
>> The current approach seems like the only viable one, or adding a member
>> to kiocb so we can pass in the cache in question. The latter did work
>> just fine, but I really dislike the fact that it's growing the kiocb to
>> more than a cacheline.
> 
> One potential way around this is to store the bio cache pointer in the
> iov_iter. Each consumer will setup a new struct to hold the bio etc, so
> we can continue storing it in there and have it for completion as well.
> 
> Upside of that is that we retain the per-ring cache, instead of
> per-task, and iov_iter has room to hold the pointer without getting near
> the cacheline size yet.
> 
> The downside is that it's kind of odd to store it in the iov_iter, and
> I'm sure that Al would hate it. Does seem like the best option though,
> in terms of getting the storage for the cache "for free".

Here's that approach:

https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-bio-cache.3

totally untested so far.
Kanchan Joshi Aug. 10, 2021, 3:54 p.m. UTC | #6
On Tue, Aug 10, 2021 at 8:18 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 8/10/21 7:53 AM, Jens Axboe wrote:
> > On 8/10/21 7:15 AM, Ming Lei wrote:
> >> Hi Jens,
> >>
> >> On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote:
> >>> Add a set of helpers that can encapsulate bio allocations, reusing them
> >>> as needed. Caller must provide the necessary locking, if any is needed.
> >>> The primary intended use case is polled IO from io_uring, which will not
> >>> need any external locking.
> >>>
> >>> Very simple - keeps a count of bio's in the cache, and maintains a max
> >>> of 512 with a slack of 64. If we get above max + slack, we drop slack
> >>> number of bio's.
> >>>
> >>> The cache is intended to be per-task, and the user will need to supply
> >>> the storage for it. As io_uring will be the only user right now, provide
> >>> a hook that returns the cache there. Stub it out as NULL initially.
> >>
> >> Is it possible for user space to submit & poll IO from different io_uring
> >> tasks?
> >>
> >> Then one bio may be allocated from bio cache of the submission task, and
> >> freed to cache of the poll task?
> >
> > Yes that is possible, and yes that would not benefit from this cache
> > at all. The previous version would work just fine with that, as the
> > cache is just under the ring lock and hence you can share it between
> > tasks.
> >
> > I wonder if the niftier solution here is to retain the cache in the
> > ring still, yet have the pointer be per-task. So basically the setup
> > that this version does, except we store the cache itself in the ring.
> > I'll give that a whirl, should be a minor change, and it'll work per
> > ring instead then like before.
>
> That won't work, as we'd have to do a ctx lookup (which would defeat the
> purpose), and we don't even have anything to key off of at that point...
>
> The current approach seems like the only viable one, or adding a member
> to kiocb so we can pass in the cache in question. The latter did work
> just fine, but I really dislike the fact that it's growing the kiocb to
> more than a cacheline.
>
Still under a cacheline seems. kiocb took 48 bytes, and adding a
bio-cache pointer made it 56.
Jens Axboe Aug. 10, 2021, 3:58 p.m. UTC | #7
On 8/10/21 9:54 AM, Kanchan Joshi wrote:
> On Tue, Aug 10, 2021 at 8:18 PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 8/10/21 7:53 AM, Jens Axboe wrote:
>>> On 8/10/21 7:15 AM, Ming Lei wrote:
>>>> Hi Jens,
>>>>
>>>> On Mon, Aug 09, 2021 at 03:23:58PM -0600, Jens Axboe wrote:
>>>>> Add a set of helpers that can encapsulate bio allocations, reusing them
>>>>> as needed. Caller must provide the necessary locking, if any is needed.
>>>>> The primary intended use case is polled IO from io_uring, which will not
>>>>> need any external locking.
>>>>>
>>>>> Very simple - keeps a count of bio's in the cache, and maintains a max
>>>>> of 512 with a slack of 64. If we get above max + slack, we drop slack
>>>>> number of bio's.
>>>>>
>>>>> The cache is intended to be per-task, and the user will need to supply
>>>>> the storage for it. As io_uring will be the only user right now, provide
>>>>> a hook that returns the cache there. Stub it out as NULL initially.
>>>>
>>>> Is it possible for user space to submit & poll IO from different io_uring
>>>> tasks?
>>>>
>>>> Then one bio may be allocated from bio cache of the submission task, and
>>>> freed to cache of the poll task?
>>>
>>> Yes that is possible, and yes that would not benefit from this cache
>>> at all. The previous version would work just fine with that, as the
>>> cache is just under the ring lock and hence you can share it between
>>> tasks.
>>>
>>> I wonder if the niftier solution here is to retain the cache in the
>>> ring still, yet have the pointer be per-task. So basically the setup
>>> that this version does, except we store the cache itself in the ring.
>>> I'll give that a whirl, should be a minor change, and it'll work per
>>> ring instead then like before.
>>
>> That won't work, as we'd have to do a ctx lookup (which would defeat the
>> purpose), and we don't even have anything to key off of at that point...
>>
>> The current approach seems like the only viable one, or adding a member
>> to kiocb so we can pass in the cache in question. The latter did work
>> just fine, but I really dislike the fact that it's growing the kiocb to
>> more than a cacheline.
>>
> Still under a cacheline seems. kiocb took 48 bytes, and adding a
> bio-cache pointer made it 56.

Huh yes, I think I'm mixing up the fact that we embed kiocb and it takes
req->rw over a cacheline, but I did put a fix on top for that one.

I guess we can ignore that then and just shove it in the kiocb, at the
end.
diff mbox series

Patch

diff --git a/block/bio.c b/block/bio.c
index 1fab762e079b..3bbda1be27be 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -20,6 +20,7 @@ 
 #include <linux/sched/sysctl.h>
 #include <linux/blk-crypto.h>
 #include <linux/xarray.h>
+#include <linux/io_uring.h>
 
 #include <trace/events/block.h>
 #include "blk.h"
@@ -238,6 +239,35 @@  static void bio_free(struct bio *bio)
 	}
 }
 
+static inline void __bio_init(struct bio *bio)
+{
+	bio->bi_next = NULL;
+	bio->bi_bdev = NULL;
+	bio->bi_opf = 0;
+	bio->bi_flags = bio->bi_ioprio = bio->bi_write_hint = 0;
+	bio->bi_status = 0;
+	bio->bi_iter.bi_sector = 0;
+	bio->bi_iter.bi_size = 0;
+	bio->bi_iter.bi_idx = 0;
+	bio->bi_iter.bi_bvec_done = 0;
+	bio->bi_end_io = NULL;
+	bio->bi_private = NULL;
+#ifdef CONFIG_BLK_CGROUP
+	bio->bi_blkg = NULL;
+	bio->bi_issue.value = 0;
+#ifdef CONFIG_BLK_CGROUP_IOCOST
+	bio->bi_iocost_cost = 0;
+#endif
+#endif
+#ifdef CONFIG_BLK_INLINE_ENCRYPTION
+	bio->bi_crypt_context = NULL;
+#endif
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	bio->bi_integrity = NULL;
+#endif
+	bio->bi_vcnt = 0;
+}
+
 /*
  * Users of this function have their own bio allocation. Subsequently,
  * they must remember to pair any call to bio_init() with bio_uninit()
@@ -246,7 +276,7 @@  static void bio_free(struct bio *bio)
 void bio_init(struct bio *bio, struct bio_vec *table,
 	      unsigned short max_vecs)
 {
-	memset(bio, 0, sizeof(*bio));
+	__bio_init(bio);
 	atomic_set(&bio->__bi_remaining, 1);
 	atomic_set(&bio->__bi_cnt, 1);
 
@@ -591,6 +621,19 @@  void guard_bio_eod(struct bio *bio)
 	bio_truncate(bio, maxsector << 9);
 }
 
+static bool __bio_put(struct bio *bio)
+{
+	if (!bio_flagged(bio, BIO_REFFED))
+		return true;
+
+	BIO_BUG_ON(!atomic_read(&bio->__bi_cnt));
+
+	/*
+	 * last put frees it
+	 */
+	return atomic_dec_and_test(&bio->__bi_cnt);
+}
+
 /**
  * bio_put - release a reference to a bio
  * @bio:   bio to release reference to
@@ -601,17 +644,8 @@  void guard_bio_eod(struct bio *bio)
  **/
 void bio_put(struct bio *bio)
 {
-	if (!bio_flagged(bio, BIO_REFFED))
+	if (__bio_put(bio))
 		bio_free(bio);
-	else {
-		BIO_BUG_ON(!atomic_read(&bio->__bi_cnt));
-
-		/*
-		 * last put frees it
-		 */
-		if (atomic_dec_and_test(&bio->__bi_cnt))
-			bio_free(bio);
-	}
 }
 EXPORT_SYMBOL(bio_put);
 
@@ -1595,6 +1629,76 @@  int bioset_init_from_src(struct bio_set *bs, struct bio_set *src)
 }
 EXPORT_SYMBOL(bioset_init_from_src);
 
+void bio_alloc_cache_init(struct bio_alloc_cache *cache)
+{
+	bio_list_init(&cache->free_list);
+	cache->nr = 0;
+}
+
+static void bio_alloc_cache_prune(struct bio_alloc_cache *cache,
+				  unsigned int nr)
+{
+	struct bio *bio;
+	unsigned int i;
+
+	i = 0;
+	while ((bio = bio_list_pop(&cache->free_list)) != NULL) {
+		cache->nr--;
+		bio_free(bio);
+		if (++i == nr)
+			break;
+	}
+}
+
+void bio_alloc_cache_destroy(struct bio_alloc_cache *cache)
+{
+	bio_alloc_cache_prune(cache, -1U);
+}
+
+struct bio *bio_cache_get(gfp_t gfp, unsigned short nr_vecs, struct bio_set *bs)
+{
+	struct bio_alloc_cache *cache = io_uring_bio_cache();
+	struct bio *bio;
+
+	if (!cache || nr_vecs > BIO_INLINE_VECS)
+		return NULL;
+	if (bio_list_empty(&cache->free_list)) {
+alloc:
+		if (bs)
+			return bio_alloc_bioset(gfp, nr_vecs, bs);
+		else
+			return bio_alloc(gfp, nr_vecs);
+	}
+
+	bio = bio_list_peek(&cache->free_list);
+	if (bs && bio->bi_pool != bs)
+		goto alloc;
+	bio_list_del_head(&cache->free_list, bio);
+	cache->nr--;
+	bio_init(bio, nr_vecs ? bio->bi_inline_vecs : NULL, nr_vecs);
+	return bio;
+}
+
+#define ALLOC_CACHE_MAX		512
+#define ALLOC_CACHE_SLACK	 64
+
+void bio_cache_put(struct bio *bio)
+{
+	struct bio_alloc_cache *cache = io_uring_bio_cache();
+
+	if (unlikely(!__bio_put(bio)))
+		return;
+	if (cache) {
+		bio_uninit(bio);
+		bio_list_add_head(&cache->free_list, bio);
+		cache->nr++;
+		if (cache->nr > ALLOC_CACHE_MAX + ALLOC_CACHE_SLACK)
+			bio_alloc_cache_prune(cache, ALLOC_CACHE_SLACK);
+	} else {
+		bio_free(bio);
+	}
+}
+
 static int __init init_bio(void)
 {
 	int i;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 2203b686e1f0..b70c72365fa2 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -652,18 +652,22 @@  static inline struct bio *bio_list_peek(struct bio_list *bl)
 	return bl->head;
 }
 
-static inline struct bio *bio_list_pop(struct bio_list *bl)
+static inline void bio_list_del_head(struct bio_list *bl, struct bio *head)
 {
-	struct bio *bio = bl->head;
-
-	if (bio) {
+	if (head) {
 		bl->head = bl->head->bi_next;
 		if (!bl->head)
 			bl->tail = NULL;
 
-		bio->bi_next = NULL;
+		head->bi_next = NULL;
 	}
+}
 
+static inline struct bio *bio_list_pop(struct bio_list *bl)
+{
+	struct bio *bio = bl->head;
+
+	bio_list_del_head(bl, bio);
 	return bio;
 }
 
@@ -676,6 +680,16 @@  static inline struct bio *bio_list_get(struct bio_list *bl)
 	return bio;
 }
 
+struct bio_alloc_cache {
+	struct bio_list		free_list;
+	unsigned int		nr;
+};
+
+void bio_alloc_cache_init(struct bio_alloc_cache *);
+void bio_alloc_cache_destroy(struct bio_alloc_cache *);
+struct bio *bio_cache_get(gfp_t, unsigned short, struct bio_set *bs);
+void bio_cache_put(struct bio *);
+
 /*
  * Increment chain count for the bio. Make sure the CHAIN flag update
  * is visible before the raised count.
diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h
index 04b650bcbbe5..2fb53047638e 100644
--- a/include/linux/io_uring.h
+++ b/include/linux/io_uring.h
@@ -5,6 +5,8 @@ 
 #include <linux/sched.h>
 #include <linux/xarray.h>
 
+struct bio_alloc_cache;
+
 #if defined(CONFIG_IO_URING)
 struct sock *io_uring_get_socket(struct file *file);
 void __io_uring_cancel(struct files_struct *files);
@@ -40,4 +42,9 @@  static inline void io_uring_free(struct task_struct *tsk)
 }
 #endif
 
+static inline struct bio_alloc_cache *io_uring_bio_cache(void)
+{
+	return NULL;
+}
+
 #endif