diff mbox series

[v4,1/6] fs: introduce FMODE_ZONE_APPEND and IOCB_ZONE_APPEND

Message ID 1595605762-17010-2-git-send-email-joshi.k@samsung.com (mailing list archive)
State New, archived
Headers show
Series zone-append support in io-uring and aio | expand

Commit Message

Kanchan Joshi July 24, 2020, 3:49 p.m. UTC
Enable zone-append using existing O_APPEND and RWF_APPEND.
Zone-append is similar to appending writes, but requires written-location
to be returned, in order to be effective.
Returning completion-result requires bit of additional processing in
common path. Also, we guarantee that zone-append does not cause a short
write, which is not the case with regular appending-write.
Therefore make the feature opt-in by introducing new FMODE_ZONE_APPEND
mode (kernel-only) and IOCB_ZONE_APPEND flag.
When a file is opened, it can opt in for zone-append by setting
FMODE_ZONE_APPEND.
If file has opted in, and receives write that meets file-append
criteria (RWF_APPEND write or O_APPEND open), set IOCB_ZONE_APPEND in
kiocb->ki_flag, apart from existing IOCB_APPEND. IOCB_ZONE_APPEND is
meant to isolate the code that returns written-location with appending
write.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Selvakumar S <selvakuma.s1@samsung.com>
Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Javier Gonzalez <javier.gonz@samsung.com>
---
 include/linux/fs.h | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Jens Axboe July 24, 2020, 4:34 p.m. UTC | #1
On 7/24/20 9:49 AM, Kanchan Joshi wrote:
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 6c4ab4d..ef13df4 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -175,6 +175,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
>  /* File does not contribute to nr_files count */
>  #define FMODE_NOACCOUNT		((__force fmode_t)0x20000000)
>  
> +/* File can support zone-append */
> +#define FMODE_ZONE_APPEND	((__force fmode_t)0x40000000)

This conflicts with the async buffered read support in linux-next that
has been queued up for a long time.

> @@ -315,6 +318,7 @@ enum rw_hint {
>  #define IOCB_SYNC		(1 << 5)
>  #define IOCB_WRITE		(1 << 6)
>  #define IOCB_NOWAIT		(1 << 7)
> +#define IOCB_ZONE_APPEND	(1 << 8)

Ditto this one, and that also clashes with mainline. The next available
bit would be 10, IOCB_WAITQ and IOCB_NOIO are 8 and 9.
Christoph Hellwig July 26, 2020, 3:18 p.m. UTC | #2
Zone append is a protocol context that ha not business showing up
in a file system interface.  The right interface is a generic way
to report the written offset for an append write for any kind of file.
So we should pick a better name like FMODE_REPORT_APPEND_OFFSET
(not that I particularly like that name, but it is the best I could
quickly come up with).
Matthew Wilcox July 28, 2020, 1:49 a.m. UTC | #3
On Sun, Jul 26, 2020 at 04:18:10PM +0100, Christoph Hellwig wrote:
> Zone append is a protocol context that ha not business showing up
> in a file system interface.  The right interface is a generic way
> to report the written offset for an append write for any kind of file.
> So we should pick a better name like FMODE_REPORT_APPEND_OFFSET
> (not that I particularly like that name, but it is the best I could
> quickly come up with).

Is it necessarily *append*?  There were a spate of papers about ten
years ago for APIs that were "write anywhere and I'll tell you where it
ended up".  So FMODE_ANONYMOUS_WRITE perhaps?
Christoph Hellwig July 28, 2020, 7:26 a.m. UTC | #4
On Tue, Jul 28, 2020 at 02:49:59AM +0100, Matthew Wilcox wrote:
> On Sun, Jul 26, 2020 at 04:18:10PM +0100, Christoph Hellwig wrote:
> > Zone append is a protocol context that ha not business showing up
> > in a file system interface.  The right interface is a generic way
> > to report the written offset for an append write for any kind of file.
> > So we should pick a better name like FMODE_REPORT_APPEND_OFFSET
> > (not that I particularly like that name, but it is the best I could
> > quickly come up with).
> 
> Is it necessarily *append*?  There were a spate of papers about ten
> years ago for APIs that were "write anywhere and I'll tell you where it
> ended up".  So FMODE_ANONYMOUS_WRITE perhaps?

But that really is not the semantics I had in mind - both the semantics
for the proposed Linux file API and the NVMe Zone Append command say
write exactly at the write pointer (NVMe) or end of the file (file API).
diff mbox series

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6c4ab4d..ef13df4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -175,6 +175,9 @@  typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File does not contribute to nr_files count */
 #define FMODE_NOACCOUNT		((__force fmode_t)0x20000000)
 
+/* File can support zone-append */
+#define FMODE_ZONE_APPEND	((__force fmode_t)0x40000000)
+
 /*
  * Flag for rw_copy_check_uvector and compat_rw_copy_check_uvector
  * that indicates that they should check the contents of the iovec are
@@ -315,6 +318,7 @@  enum rw_hint {
 #define IOCB_SYNC		(1 << 5)
 #define IOCB_WRITE		(1 << 6)
 #define IOCB_NOWAIT		(1 << 7)
+#define IOCB_ZONE_APPEND	(1 << 8)
 
 struct kiocb {
 	struct file		*ki_filp;
@@ -3427,8 +3431,11 @@  static inline bool vma_is_fsdax(struct vm_area_struct *vma)
 static inline int iocb_flags(struct file *file)
 {
 	int res = 0;
-	if (file->f_flags & O_APPEND)
+	if (file->f_flags & O_APPEND) {
 		res |= IOCB_APPEND;
+		if (file->f_mode & FMODE_ZONE_APPEND)
+			res |= IOCB_ZONE_APPEND;
+	}
 	if (file->f_flags & O_DIRECT)
 		res |= IOCB_DIRECT;
 	if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))
@@ -3454,8 +3461,11 @@  static inline int kiocb_set_rw_flags(struct kiocb *ki, rwf_t flags)
 		ki->ki_flags |= IOCB_DSYNC;
 	if (flags & RWF_SYNC)
 		ki->ki_flags |= (IOCB_DSYNC | IOCB_SYNC);
-	if (flags & RWF_APPEND)
+	if (flags & RWF_APPEND) {
 		ki->ki_flags |= IOCB_APPEND;
+		if (ki->ki_filp->f_mode & FMODE_ZONE_APPEND)
+			ki->ki_flags |= IOCB_ZONE_APPEND;
+	}
 	return 0;
 }