diff mbox series

[08/26] aio: don't zero entire aio_kiocb aio_get_req()

Message ID 20181204233729.26776-9-axboe@kernel.dk (mailing list archive)
State New, archived
Headers show
Series [01/26] fs: add an iopoll method to struct file_operations | expand

Commit Message

Jens Axboe Dec. 4, 2018, 11:37 p.m. UTC
It's 192 bytes, fairly substantial. Most items don't need to be cleared,
especially not upfront. Clear the ones we do need to clear, and leave
the other ones for setup when the iocb is prepared and submitted.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 fs/aio.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Jeff Moyer Dec. 6, 2018, 7:27 p.m. UTC | #1
Jens Axboe <axboe@kernel.dk> writes:

> It's 192 bytes, fairly substantial. Most items don't need to be cleared,
> especially not upfront. Clear the ones we do need to clear, and leave
> the other ones for setup when the iocb is prepared and submitted.

What performance gains do you see from this?

-Jeff

> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  fs/aio.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index eaceb40e6cf5..522c04864d82 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1009,14 +1009,15 @@ static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
>  {
>  	struct aio_kiocb *req;
>  
> -	req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO);
> +	req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL);
>  	if (unlikely(!req))
>  		return NULL;
>  
>  	percpu_ref_get(&ctx->reqs);
> +	req->ki_ctx = ctx;
>  	INIT_LIST_HEAD(&req->ki_list);
>  	refcount_set(&req->ki_refcnt, 0);
> -	req->ki_ctx = ctx;
> +	req->ki_eventfd = NULL;
>  	return req;
>  }
>  
> @@ -1730,6 +1731,10 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
>  	if (unlikely(!req->file))
>  		return -EBADF;
>  
> +	req->head = NULL;
> +	req->woken = false;
> +	req->cancelled = false;
> +
>  	apt.pt._qproc = aio_poll_queue_proc;
>  	apt.pt._key = req->events;
>  	apt.iocb = aiocb;
Jens Axboe Dec. 6, 2018, 7:28 p.m. UTC | #2
On 12/6/18 12:27 PM, Jeff Moyer wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> It's 192 bytes, fairly substantial. Most items don't need to be cleared,
>> especially not upfront. Clear the ones we do need to clear, and leave
>> the other ones for setup when the iocb is prepared and submitted.
> 
> What performance gains do you see from this?

Before this, I had 1% in memset doing high IOPS. With it, that's gone.
1% is a lot, when you have just one thread doing everything from submission
to completion.
Jeff Moyer Dec. 6, 2018, 7:38 p.m. UTC | #3
Jens Axboe <axboe@kernel.dk> writes:

> On 12/6/18 12:27 PM, Jeff Moyer wrote:
>> Jens Axboe <axboe@kernel.dk> writes:
>> 
>>> It's 192 bytes, fairly substantial. Most items don't need to be cleared,
>>> especially not upfront. Clear the ones we do need to clear, and leave
>>> the other ones for setup when the iocb is prepared and submitted.
>> 
>> What performance gains do you see from this?
>
> Before this, I had 1% in memset doing high IOPS. With it, that's gone.
> 1% is a lot, when you have just one thread doing everything from submission
> to completion.

I'm used to customers complaining about fractions of a percent, so I get
it.  :-)  I just wanted to know we had some measurable impact, as I've
seen bugs crop up from code like this in the past.

Thanks!
Jeff
Jens Axboe Dec. 6, 2018, 7:42 p.m. UTC | #4
On 12/6/18 12:38 PM, Jeff Moyer wrote:
> Jens Axboe <axboe@kernel.dk> writes:
> 
>> On 12/6/18 12:27 PM, Jeff Moyer wrote:
>>> Jens Axboe <axboe@kernel.dk> writes:
>>>
>>>> It's 192 bytes, fairly substantial. Most items don't need to be cleared,
>>>> especially not upfront. Clear the ones we do need to clear, and leave
>>>> the other ones for setup when the iocb is prepared and submitted.
>>>
>>> What performance gains do you see from this?
>>
>> Before this, I had 1% in memset doing high IOPS. With it, that's gone.
>> 1% is a lot, when you have just one thread doing everything from submission
>> to completion.
> 
> I'm used to customers complaining about fractions of a percent, so I get
> it.  :-)  I just wanted to know we had some measurable impact, as I've
> seen bugs crop up from code like this in the past.

Oh for sure, I wouldn't do it if there wasn't a noticeable gain from this!
diff mbox series

Patch

diff --git a/fs/aio.c b/fs/aio.c
index eaceb40e6cf5..522c04864d82 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1009,14 +1009,15 @@  static inline struct aio_kiocb *aio_get_req(struct kioctx *ctx)
 {
 	struct aio_kiocb *req;
 
-	req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO);
+	req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL);
 	if (unlikely(!req))
 		return NULL;
 
 	percpu_ref_get(&ctx->reqs);
+	req->ki_ctx = ctx;
 	INIT_LIST_HEAD(&req->ki_list);
 	refcount_set(&req->ki_refcnt, 0);
-	req->ki_ctx = ctx;
+	req->ki_eventfd = NULL;
 	return req;
 }
 
@@ -1730,6 +1731,10 @@  static ssize_t aio_poll(struct aio_kiocb *aiocb, struct iocb *iocb)
 	if (unlikely(!req->file))
 		return -EBADF;
 
+	req->head = NULL;
+	req->woken = false;
+	req->cancelled = false;
+
 	apt.pt._qproc = aio_poll_queue_proc;
 	apt.pt._key = req->events;
 	apt.iocb = aiocb;