Message ID | 20241119012224.1698238-2-krisman@suse.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Clean up alloc_cache allocations | expand |
On 11/18/24 6:22 PM, Gabriel Krisman Bertazi wrote: > diff --git a/io_uring/alloc_cache.h b/io_uring/alloc_cache.h > index b7a38a2069cf..6b34e491a30a 100644 > --- a/io_uring/alloc_cache.h > +++ b/io_uring/alloc_cache.h > @@ -30,6 +30,13 @@ static inline void *io_alloc_cache_get(struct io_alloc_cache *cache) > return NULL; > } > > +static inline void *io_alloc_cache_alloc(struct io_alloc_cache *cache, gfp_t gfp) > +{ > + if (!cache->nr_cached) > + return kzalloc(cache->elem_size, gfp); > + return io_alloc_cache_get(cache); > +} I don't think you want to use kzalloc here. The caller will need to clear what its needs for the cached path anyway, so has no other option than to clear/set things twice for that case.
Jens Axboe <axboe@kernel.dk> writes: > On 11/18/24 6:22 PM, Gabriel Krisman Bertazi wrote: >> diff --git a/io_uring/alloc_cache.h b/io_uring/alloc_cache.h >> index b7a38a2069cf..6b34e491a30a 100644 >> --- a/io_uring/alloc_cache.h >> +++ b/io_uring/alloc_cache.h >> @@ -30,6 +30,13 @@ static inline void *io_alloc_cache_get(struct io_alloc_cache *cache) >> return NULL; >> } >> >> +static inline void *io_alloc_cache_alloc(struct io_alloc_cache *cache, gfp_t gfp) >> +{ >> + if (!cache->nr_cached) >> + return kzalloc(cache->elem_size, gfp); >> + return io_alloc_cache_get(cache); >> +} > > I don't think you want to use kzalloc here. The caller will need to > clear what its needs for the cached path anyway, so has no other option > than to clear/set things twice for that case. Hi Jens, The reason I do kzalloc here is to be able to trust the value of rw->free_iov (io_rw_alloc_async) and hdr->free_iov (io_msg_alloc_async) regardless of where the allocated memory came from, cache or slab. In the callers (patch 6 and 7), we do: + hdr = io_uring_alloc_async_data(&ctx->netmsg_cache, req); + if (!hdr) + return NULL; + + /* If the async data was cached, we might have an iov cached inside. */ + if (hdr->free_iov) { An alternative would be to return a flag indicating whether the allocated memory came from the cache or not, but it didn't seem elegant. Do you see a better way? I also considered that zeroing memory here shouldn't harm performance, because it'll hit the cache most of the time.
On 11/19/24 8:30 AM, Gabriel Krisman Bertazi wrote: > Jens Axboe <axboe@kernel.dk> writes: > >> On 11/18/24 6:22 PM, Gabriel Krisman Bertazi wrote: >>> diff --git a/io_uring/alloc_cache.h b/io_uring/alloc_cache.h >>> index b7a38a2069cf..6b34e491a30a 100644 >>> --- a/io_uring/alloc_cache.h >>> +++ b/io_uring/alloc_cache.h >>> @@ -30,6 +30,13 @@ static inline void *io_alloc_cache_get(struct io_alloc_cache *cache) >>> return NULL; >>> } >>> >>> +static inline void *io_alloc_cache_alloc(struct io_alloc_cache *cache, gfp_t gfp) >>> +{ >>> + if (!cache->nr_cached) >>> + return kzalloc(cache->elem_size, gfp); >>> + return io_alloc_cache_get(cache); >>> +} >> >> I don't think you want to use kzalloc here. The caller will need to >> clear what its needs for the cached path anyway, so has no other option >> than to clear/set things twice for that case. > > Hi Jens, > > The reason I do kzalloc here is to be able to trust the value of > rw->free_iov (io_rw_alloc_async) and hdr->free_iov (io_msg_alloc_async) > regardless of where the allocated memory came from, cache or slab. In > the callers (patch 6 and 7), we do: I see, I guess that makes sense as some things are persistent in cache and need clearing upfront if freshly allocated. > + hdr = io_uring_alloc_async_data(&ctx->netmsg_cache, req); > + if (!hdr) > + return NULL; > + > + /* If the async data was cached, we might have an iov cached inside. */ > + if (hdr->free_iov) { > > An alternative would be to return a flag indicating whether the > allocated memory came from the cache or not, but it didn't seem elegant. > Do you see a better way? > > I also considered that zeroing memory here shouldn't harm performance, > because it'll hit the cache most of the time. It should hit cache most of the time, but if we exceed the cache size, then you will see allocations happen and churn. I don't like the idea of the flag, then we still need to complicate the caller. We can do something like slab where you have a hook for freshly allocated data only? That can either be a property of the cache, or passed in via io_alloc_cache_alloc()? BTW, I'd probably change the name of that to io_cache_get() or io_cache_alloc() or something like that, I don't think we need two allocs in there.
diff --git a/io_uring/alloc_cache.h b/io_uring/alloc_cache.h index b7a38a2069cf..6b34e491a30a 100644 --- a/io_uring/alloc_cache.h +++ b/io_uring/alloc_cache.h @@ -30,6 +30,13 @@ static inline void *io_alloc_cache_get(struct io_alloc_cache *cache) return NULL; } +static inline void *io_alloc_cache_alloc(struct io_alloc_cache *cache, gfp_t gfp) +{ + if (!cache->nr_cached) + return kzalloc(cache->elem_size, gfp); + return io_alloc_cache_get(cache); +} + /* returns false if the cache was initialized properly */ static inline bool io_alloc_cache_init(struct io_alloc_cache *cache, unsigned max_nr, size_t size)
The allocation paths that use alloc_cache duplicate the same code pattern, sometimes in a quite convoluted way. Fold the allocation into the cache code itself, making it just an allocator function, and keeping the cache policy invisible to callers. Another justification for doing this, beyond code simplicity, is that it makes it trivial to test the impact of disabling the cache and using slab directly, which I've used for slab improvement experiments. One relevant detail is that this allocates zeroed memory. Rationale is that it simplifies the handling of the embedded free_iov in some of the cached objects, and the performance impact shouldn't be meaningful, since we are supposed to be hitting the cache most of the time and the allocation is already the slow path. Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> --- io_uring/alloc_cache.h | 7 +++++++ 1 file changed, 7 insertions(+)