diff mbox series

[6/8] fs: add IOCB flags related to passing back dio completions

Message ID 20230720181310.71589-7-axboe@kernel.dk (mailing list archive)
State Superseded
Headers show
Series Improve async iomap DIO performance | expand

Commit Message

Jens Axboe July 20, 2023, 6:13 p.m. UTC
Async dio completions generally happen from hard/soft IRQ context, which
means that users like iomap may need to defer some of the completion
handling to a workqueue. This is less efficient than having the original
issuer handle it, like we do for sync IO, and it adds latency to the
completions.

Add IOCB_DIO_DEFER, which the issuer can set if it is able to safely
punt these completions to a safe context. If the dio handler is aware
of this flag, assign a callback handler in kiocb->dio_complete and
associated data io kiocb->private. The issuer will then call this handler
with that data from task context.

No functional changes in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 include/linux/fs.h | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig July 21, 2023, 6:18 a.m. UTC | #1
Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
Darrick J. Wong July 21, 2023, 3:48 p.m. UTC | #2
On Thu, Jul 20, 2023 at 12:13:08PM -0600, Jens Axboe wrote:
> Async dio completions generally happen from hard/soft IRQ context, which
> means that users like iomap may need to defer some of the completion
> handling to a workqueue. This is less efficient than having the original
> issuer handle it, like we do for sync IO, and it adds latency to the
> completions.
> 
> Add IOCB_DIO_DEFER, which the issuer can set if it is able to safely
> punt these completions to a safe context. If the dio handler is aware
> of this flag, assign a callback handler in kiocb->dio_complete and
> associated data io kiocb->private. The issuer will then call this handler
> with that data from task context.
> 
> No functional changes in this patch.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> ---
>  include/linux/fs.h | 34 ++++++++++++++++++++++++++++++++--
>  1 file changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 6867512907d6..2c589418a078 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -338,6 +338,20 @@ enum rw_hint {
>  #define IOCB_NOIO		(1 << 20)
>  /* can use bio alloc cache */
>  #define IOCB_ALLOC_CACHE	(1 << 21)
> +/*
> + * IOCB_DIO_DEFER can be set by the iocb owner, to indicate that the
> + * iocb completion can be passed back to the owner for execution from a safe
> + * context rather than needing to be punted through a workqueue. If this
> + * flag is set, the completion handling may set iocb->dio_complete to a
> + * handler, which the issuer will then call from task context to complete
> + * the processing of the iocb. iocb->private should then also be set to
> + * the argument being passed to this handler. Note that while this provides

Who should be setting iocb->private?  Can I suggest rewording this to:

"If this flag is set, the bio completion handling may set
iocb->dio_complete to a handler function and iocb->private to context
information for that handler.  The issuer should call the handler with
that context information from task context to complete the processing of
the iocb."

Assuming I've understood what this does from the next patch? :)

> + * a task context for the dio_complete() callback, it should only be used
> + * on the completion side for non-IO generating completions. It's fine to
> + * call blocking functions from this callback, but they should not wait for
> + * unrelated IO (like cache flushing, new IO generation, etc).
> + */
> +#define IOCB_DIO_DEFER		(1 << 22)

Sorry to nitpick names here, but "defer" feels a little vague to me.
Defer what?  And to whom?

This flag means "defer iocb completion to the caller", right?  If so,
wouldn't this be better named IOCB_DIO_CALLER_COMP?

>  /* for use in trace events */
>  #define TRACE_IOCB_STRINGS \
> @@ -351,7 +365,8 @@ enum rw_hint {
>  	{ IOCB_WRITE,		"WRITE" }, \
>  	{ IOCB_WAITQ,		"WAITQ" }, \
>  	{ IOCB_NOIO,		"NOIO" }, \
> -	{ IOCB_ALLOC_CACHE,	"ALLOC_CACHE" }
> +	{ IOCB_ALLOC_CACHE,	"ALLOC_CACHE" }, \
> +	{ IOCB_DIO_DEFER,	"DIO_DEFER" }
>  
>  struct kiocb {
>  	struct file		*ki_filp;
> @@ -360,7 +375,22 @@ struct kiocb {
>  	void			*private;
>  	int			ki_flags;
>  	u16			ki_ioprio; /* See linux/ioprio.h */
> -	struct wait_page_queue	*ki_waitq; /* for async buffered IO */
> +	union {
> +		/*
> +		 * Only used for async buffered reads, where it denotes the
> +		 * page waitqueue associated with completing the read. Valid
> +		 * IFF IOCB_WAITQ is set.
> +		 */
> +		struct wait_page_queue	*ki_waitq;
> +		/*
> +		 * Can be used for O_DIRECT IO, where the completion handling
> +		 * is punted back to the issuer of the IO. May only be set
> +		 * if IOCB_DIO_DEFER is set by the issuer, and the issuer must
> +		 * then check for presence of this handler when ki_complete is
> +		 * invoked.

Might want to reiterate in the comment that kiocb.private should be
passed as @data.

--D

> +		 */
> +		ssize_t (*dio_complete)(void *data);
> +	};
>  };
>  
>  static inline bool is_sync_kiocb(struct kiocb *kiocb)
> -- 
> 2.40.1
>
Jens Axboe July 21, 2023, 3:53 p.m. UTC | #3
On 7/21/23 9:48?AM, Darrick J. Wong wrote:
> On Thu, Jul 20, 2023 at 12:13:08PM -0600, Jens Axboe wrote:
>> Async dio completions generally happen from hard/soft IRQ context, which
>> means that users like iomap may need to defer some of the completion
>> handling to a workqueue. This is less efficient than having the original
>> issuer handle it, like we do for sync IO, and it adds latency to the
>> completions.
>>
>> Add IOCB_DIO_DEFER, which the issuer can set if it is able to safely
>> punt these completions to a safe context. If the dio handler is aware
>> of this flag, assign a callback handler in kiocb->dio_complete and
>> associated data io kiocb->private. The issuer will then call this handler
>> with that data from task context.
>>
>> No functional changes in this patch.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> ---
>>  include/linux/fs.h | 34 ++++++++++++++++++++++++++++++++--
>>  1 file changed, 32 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 6867512907d6..2c589418a078 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -338,6 +338,20 @@ enum rw_hint {
>>  #define IOCB_NOIO		(1 << 20)
>>  /* can use bio alloc cache */
>>  #define IOCB_ALLOC_CACHE	(1 << 21)
>> +/*
>> + * IOCB_DIO_DEFER can be set by the iocb owner, to indicate that the
>> + * iocb completion can be passed back to the owner for execution from a safe
>> + * context rather than needing to be punted through a workqueue. If this
>> + * flag is set, the completion handling may set iocb->dio_complete to a
>> + * handler, which the issuer will then call from task context to complete
>> + * the processing of the iocb. iocb->private should then also be set to
>> + * the argument being passed to this handler. Note that while this provides
> 
> Who should be setting iocb->private?  Can I suggest rewording this to:
> 
> "If this flag is set, the bio completion handling may set
> iocb->dio_complete to a handler function and iocb->private to context
> information for that handler.  The issuer should call the handler with
> that context information from task context to complete the processing of
> the iocb."
> 
> Assuming I've understood what this does from the next patch? :)

Yep this is definitely better - thanks, I'll update it!

>> + * a task context for the dio_complete() callback, it should only be used
>> + * on the completion side for non-IO generating completions. It's fine to
>> + * call blocking functions from this callback, but they should not wait for
>> + * unrelated IO (like cache flushing, new IO generation, etc).
>> + */
>> +#define IOCB_DIO_DEFER		(1 << 22)
> 
> Sorry to nitpick names here, but "defer" feels a little vague to me.
> Defer what?  And to whom?
> 
> This flag means "defer iocb completion to the caller", right?  If so,
> wouldn't this be better named IOCB_DIO_CALLER_COMP?

That is probably better indeed. Naming is hard! CALLER_COMP or
ISSUER_COMP would be better and more descriptive. I'll go with your
suggestion.

>>  /* for use in trace events */
>>  #define TRACE_IOCB_STRINGS \
>> @@ -351,7 +365,8 @@ enum rw_hint {
>>  	{ IOCB_WRITE,		"WRITE" }, \
>>  	{ IOCB_WAITQ,		"WAITQ" }, \
>>  	{ IOCB_NOIO,		"NOIO" }, \
>> -	{ IOCB_ALLOC_CACHE,	"ALLOC_CACHE" }
>> +	{ IOCB_ALLOC_CACHE,	"ALLOC_CACHE" }, \
>> +	{ IOCB_DIO_DEFER,	"DIO_DEFER" }
>>  
>>  struct kiocb {
>>  	struct file		*ki_filp;
>> @@ -360,7 +375,22 @@ struct kiocb {
>>  	void			*private;
>>  	int			ki_flags;
>>  	u16			ki_ioprio; /* See linux/ioprio.h */
>> -	struct wait_page_queue	*ki_waitq; /* for async buffered IO */
>> +	union {
>> +		/*
>> +		 * Only used for async buffered reads, where it denotes the
>> +		 * page waitqueue associated with completing the read. Valid
>> +		 * IFF IOCB_WAITQ is set.
>> +		 */
>> +		struct wait_page_queue	*ki_waitq;
>> +		/*
>> +		 * Can be used for O_DIRECT IO, where the completion handling
>> +		 * is punted back to the issuer of the IO. May only be set
>> +		 * if IOCB_DIO_DEFER is set by the issuer, and the issuer must
>> +		 * then check for presence of this handler when ki_complete is
>> +		 * invoked.
> 
> Might want to reiterate in the comment that kiocb.private should be
> passed as @data.

OK, will do.
diff mbox series

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6867512907d6..2c589418a078 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -338,6 +338,20 @@  enum rw_hint {
 #define IOCB_NOIO		(1 << 20)
 /* can use bio alloc cache */
 #define IOCB_ALLOC_CACHE	(1 << 21)
+/*
+ * IOCB_DIO_DEFER can be set by the iocb owner, to indicate that the
+ * iocb completion can be passed back to the owner for execution from a safe
+ * context rather than needing to be punted through a workqueue. If this
+ * flag is set, the completion handling may set iocb->dio_complete to a
+ * handler, which the issuer will then call from task context to complete
+ * the processing of the iocb. iocb->private should then also be set to
+ * the argument being passed to this handler. Note that while this provides
+ * a task context for the dio_complete() callback, it should only be used
+ * on the completion side for non-IO generating completions. It's fine to
+ * call blocking functions from this callback, but they should not wait for
+ * unrelated IO (like cache flushing, new IO generation, etc).
+ */
+#define IOCB_DIO_DEFER		(1 << 22)
 
 /* for use in trace events */
 #define TRACE_IOCB_STRINGS \
@@ -351,7 +365,8 @@  enum rw_hint {
 	{ IOCB_WRITE,		"WRITE" }, \
 	{ IOCB_WAITQ,		"WAITQ" }, \
 	{ IOCB_NOIO,		"NOIO" }, \
-	{ IOCB_ALLOC_CACHE,	"ALLOC_CACHE" }
+	{ IOCB_ALLOC_CACHE,	"ALLOC_CACHE" }, \
+	{ IOCB_DIO_DEFER,	"DIO_DEFER" }
 
 struct kiocb {
 	struct file		*ki_filp;
@@ -360,7 +375,22 @@  struct kiocb {
 	void			*private;
 	int			ki_flags;
 	u16			ki_ioprio; /* See linux/ioprio.h */
-	struct wait_page_queue	*ki_waitq; /* for async buffered IO */
+	union {
+		/*
+		 * Only used for async buffered reads, where it denotes the
+		 * page waitqueue associated with completing the read. Valid
+		 * IFF IOCB_WAITQ is set.
+		 */
+		struct wait_page_queue	*ki_waitq;
+		/*
+		 * Can be used for O_DIRECT IO, where the completion handling
+		 * is punted back to the issuer of the IO. May only be set
+		 * if IOCB_DIO_DEFER is set by the issuer, and the issuer must
+		 * then check for presence of this handler when ki_complete is
+		 * invoked.
+		 */
+		ssize_t (*dio_complete)(void *data);
+	};
 };
 
 static inline bool is_sync_kiocb(struct kiocb *kiocb)