diff mbox

[v5,7/7] fs: add a flag for per-operation O_DSYNC semantics

Message ID c188b04ede700ce5f986b19de12fa617d158540f.1415220890.git.milosz@adfin.com (mailing list archive)
State New, archived
Headers show

Commit Message

Milosz Tanski Nov. 5, 2014, 9:14 p.m. UTC
From: Christoph Hellwig <hch@lst.de>

With the new read/write with flags syscalls we can support a flag
to enable O_DSYNC semantics on a per-operation basis.  This ?s
useful to implement protocols like SMB, NFS or SCSI that have such
per-operation flags.

Example program below:

cat > pwritev2.c << EOF

        (off_t) val,                              \
        (off_t) ((((uint64_t) (val)) >> (sizeof (long) * 4)) >> (sizeof (long) * 4))

static ssize_t
pwritev2(int fd, const struct iovec *iov, int iovcnt, off_t offset, int flags)
{
        return syscall(__NR_pwritev2, fd, iov, iovcnt, LO_HI_LONG(offset),
			 flags);
}

int main(int argc, char **argv)
{
	int fd = open(argv[1], O_WRONLY|O_CREAT|O_TRUNC, 0666);
	char buf[1024];
	struct iovec iov = { .iov_base = buf, .iov_len = 1024 };
	int ret;

        if (fd < 0) {
                perror("open");
                return 0;
        }

	memset(buf, 0xfe, sizeof(buf));

	ret = pwritev2(fd, &iov, 1, 0, RWF_DSYNC);
	if (ret < 0)
		perror("pwritev2");
	else
		printf("ret = %d\n", ret);

	return 0;
}
EOF

Signed-off-by: Christoph Hellwig <hch@lst.de>
[milosz@adfin.com: added flag check to compat_do_readv_writev()]
Signed-off-by: Milosz Tanski <milosz@adfin.com>
---
 fs/ceph/file.c     |  4 +++-
 fs/fuse/file.c     |  2 ++
 fs/nfs/file.c      | 10 ++++++----
 fs/ocfs2/file.c    |  6 ++++--
 fs/read_write.c    | 20 +++++++++++++++-----
 include/linux/fs.h |  3 ++-
 mm/filemap.c       |  4 +++-
 7 files changed, 35 insertions(+), 14 deletions(-)

Comments

Jeff Moyer Nov. 6, 2014, 11:46 p.m. UTC | #1
Milosz Tanski <milosz@adfin.com> writes:

> -		if (type == READ && (flags & RWF_NONBLOCK))
> -			return -EAGAIN;
> +		if (type == READ) {
> +			if (flags & RWF_NONBLOCK)
> +				return -EAGAIN;
> +		} else {
> +			if (flags & RWF_DSYNC)
> +				return -EINVAL;
> +		}

Minor nit, but I'd rather read something that looks like this:

	if (type == READ && (flags & RWF_NONBLOCK))
		return -EAGAIN;
	else if (type == WRITE && (flags & RWF_DSYNC))
		return -EINVAL;

I won't lose sleep over it, though.

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Anton Altaparmakov Nov. 7, 2014, 4:22 a.m. UTC | #2
Hi Jeff,

> On 7 Nov 2014, at 01:46, Jeff Moyer <jmoyer@redhat.com> wrote:
> 
> Milosz Tanski <milosz@adfin.com> writes:
> 
>> -		if (type == READ && (flags & RWF_NONBLOCK))
>> -			return -EAGAIN;
>> +		if (type == READ) {
>> +			if (flags & RWF_NONBLOCK)
>> +				return -EAGAIN;
>> +		} else {
>> +			if (flags & RWF_DSYNC)
>> +				return -EINVAL;
>> +		}
> 
> Minor nit, but I'd rather read something that looks like this:
> 
> 	if (type == READ && (flags & RWF_NONBLOCK))
> 		return -EAGAIN;
> 	else if (type == WRITE && (flags & RWF_DSYNC))
> 		return -EINVAL;

But your version is less logically efficient for the case where "type == READ" is true and "flags & RWF_NONBLOCK" is false because your version then has to do the "if (type == WRITE" check before discovering it does not need to take that branch either, whilst the original version does not have to do such a test at all.

Best regards,

	Anton

> I won't lose sleep over it, though.
> 
> Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Anton Altaparmakov Nov. 7, 2014, 6:43 a.m. UTC | #3
Hi,

> On 7 Nov 2014, at 07:52, Anand Avati <avati@gluster.org> wrote:
> On Thu, Nov 6, 2014 at 8:22 PM, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> > On 7 Nov 2014, at 01:46, Jeff Moyer <jmoyer@redhat.com> wrote:
> > Minor nit, but I'd rather read something that looks like this:
> >
> >       if (type == READ && (flags & RWF_NONBLOCK))
> >               return -EAGAIN;
> >       else if (type == WRITE && (flags & RWF_DSYNC))
> >               return -EINVAL;
> 
> But your version is less logically efficient for the case where "type == READ" is true and "flags & RWF_NONBLOCK" is false because your version then has to do the "if (type == WRITE" check before discovering it does not need to take that branch either, whilst the original version does not have to do such a test at all.
> 
> Seriously?

Of course seriously.

> Just focus on the code readability/maintainability which makes the code most easily understood/obvious to a new pair of eyes, and leave such micro-optimizations to the compiler..

The original version is more readable (IMO) and this is not a micro-optimization.  It is people like you who are responsible for the fact that we need faster and faster computers to cope with the inefficient/poor code being written more and more...

And I really wouldn't hedge my bets on gcc optimizing something like that.  The amount of crap assembly produced from gcc that I have seen over the years suggests that it is quite likely it will make a hash of it instead...

Best regards,

	Anton

> Thanks
Roger Willcocks Nov. 7, 2014, 2:21 p.m. UTC | #4
On Fri, 2014-11-07 at 08:43 +0200, Anton Altaparmakov wrote:
> Hi,
> 
> > On 7 Nov 2014, at 07:52, Anand Avati <avati@gluster.org> wrote:
> > On Thu, Nov 6, 2014 at 8:22 PM, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
> > > On 7 Nov 2014, at 01:46, Jeff Moyer <jmoyer@redhat.com> wrote:
> > > Minor nit, but I'd rather read something that looks like this:
> > >
> > >       if (type == READ && (flags & RWF_NONBLOCK))
> > >               return -EAGAIN;
> > >       else if (type == WRITE && (flags & RWF_DSYNC))
> > >               return -EINVAL;
> > 
> > But your version is less logically efficient for the case where "type == READ" is true and "flags & RWF_NONBLOCK" is false because your version then has to do the "if (type == WRITE" check before discovering it does not need to take that branch either, whilst the original version does not have to do such a test at all.
> > 
> > Seriously?
> 
> Of course seriously.
> 
> > Just focus on the code readability/maintainability which makes the code most easily understood/obvious to a new pair of eyes, and leave such micro-optimizations to the compiler..
> 
> The original version is more readable (IMO) and this is not a micro-optimization.  It is people like you who are responsible for the fact that we need faster and faster computers to cope with the inefficient/poor code being written more and more...
> 

Your original version needs me to know that type can only be either READ
or WRITE (and not, for instance, READONLY or READWRITE or some other
random special case) and it rings alarm bells when I first see it. If
you want to keep the micro optimization, you need an assertion to
acknowledge the potential bug and a comment to make the code obvious:

 +            assert(type == READ || type == WRITE);
 +            if (type == READ) {
 +                    if (flags & RWF_NONBLOCK)
 +                            return -EAGAIN;
 +            } else { /* WRITE */
 +                    if (flags & RWF_DSYNC)
 +                            return -EINVAL;
 +            }

but since what's really happening here is two separate and independent
error checks, Jeff's version is still better, even if it does take an
extra couple of nanoseconds.

Actually I'd probably write:

       if (type == READ && (flags & RWF_NONBLOCK))
              return -EAGAIN;

       if (type == WRITE && (flags & RWF_DSYNC))
              return -EINVAL;

(no 'else' since the code will never be reached if the first test is
true).
Milosz Tanski Nov. 7, 2014, 7:58 p.m. UTC | #5
On Fri, Nov 7, 2014 at 9:21 AM, Roger Willcocks <roger@filmlight.ltd.uk> wrote:
>
> On Fri, 2014-11-07 at 08:43 +0200, Anton Altaparmakov wrote:
>> Hi,
>>
>> > On 7 Nov 2014, at 07:52, Anand Avati <avati@gluster.org> wrote:
>> > On Thu, Nov 6, 2014 at 8:22 PM, Anton Altaparmakov <aia21@cam.ac.uk> wrote:
>> > > On 7 Nov 2014, at 01:46, Jeff Moyer <jmoyer@redhat.com> wrote:
>> > > Minor nit, but I'd rather read something that looks like this:
>> > >
>> > >       if (type == READ && (flags & RWF_NONBLOCK))
>> > >               return -EAGAIN;
>> > >       else if (type == WRITE && (flags & RWF_DSYNC))
>> > >               return -EINVAL;
>> >
>> > But your version is less logically efficient for the case where "type == READ" is true and "flags & RWF_NONBLOCK" is false because your version then has to do the "if (type == WRITE" check before discovering it does not need to take that branch either, whilst the original version does not have to do such a test at all.
>> >
>> > Seriously?
>>
>> Of course seriously.
>>
>> > Just focus on the code readability/maintainability which makes the code most easily understood/obvious to a new pair of eyes, and leave such micro-optimizations to the compiler..
>>
>> The original version is more readable (IMO) and this is not a micro-optimization.  It is people like you who are responsible for the fact that we need faster and faster computers to cope with the inefficient/poor code being written more and more...
>>
>
> Your original version needs me to know that type can only be either READ
> or WRITE (and not, for instance, READONLY or READWRITE or some other
> random special case) and it rings alarm bells when I first see it. If
> you want to keep the micro optimization, you need an assertion to
> acknowledge the potential bug and a comment to make the code obvious:
>
>  +            assert(type == READ || type == WRITE);
>  +            if (type == READ) {
>  +                    if (flags & RWF_NONBLOCK)
>  +                            return -EAGAIN;
>  +            } else { /* WRITE */
>  +                    if (flags & RWF_DSYNC)
>  +                            return -EINVAL;
>  +            }
>
> but since what's really happening here is two separate and independent
> error checks, Jeff's version is still better, even if it does take an
> extra couple of nanoseconds.
>
> Actually I'd probably write:
>
>        if (type == READ && (flags & RWF_NONBLOCK))
>               return -EAGAIN;
>
>        if (type == WRITE && (flags & RWF_DSYNC))
>               return -EINVAL;
>
> (no 'else' since the code will never be reached if the first test is
> true).
>
>
> --
> Roger Willcocks <roger@filmlight.ltd.uk>
>

This is what I changed it to (and will be sending that out for the
next version).
Sage Weil Nov. 10, 2014, 4:07 p.m. UTC | #6
On Wed, 5 Nov 2014, Milosz Tanski wrote:
> From: Christoph Hellwig <hch@lst.de>
> 
> With the new read/write with flags syscalls we can support a flag
> to enable O_DSYNC semantics on a per-operation basis.  This ?s
> useful to implement protocols like SMB, NFS or SCSI that have such
> per-operation flags.
> 
> Example program below:
> 
> cat > pwritev2.c << EOF
> 
>         (off_t) val,                              \
>         (off_t) ((((uint64_t) (val)) >> (sizeof (long) * 4)) >> (sizeof (long) * 4))
> 
> static ssize_t
> pwritev2(int fd, const struct iovec *iov, int iovcnt, off_t offset, int flags)
> {
>         return syscall(__NR_pwritev2, fd, iov, iovcnt, LO_HI_LONG(offset),
> 			 flags);
> }
> 
> int main(int argc, char **argv)
> {
> 	int fd = open(argv[1], O_WRONLY|O_CREAT|O_TRUNC, 0666);
> 	char buf[1024];
> 	struct iovec iov = { .iov_base = buf, .iov_len = 1024 };
> 	int ret;
> 
>         if (fd < 0) {
>                 perror("open");
>                 return 0;
>         }
> 
> 	memset(buf, 0xfe, sizeof(buf));
> 
> 	ret = pwritev2(fd, &iov, 1, 0, RWF_DSYNC);
> 	if (ret < 0)
> 		perror("pwritev2");
> 	else
> 		printf("ret = %d\n", ret);
> 
> 	return 0;
> }
> EOF
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> [milosz@adfin.com: added flag check to compat_do_readv_writev()]
> Signed-off-by: Milosz Tanski <milosz@adfin.com>

Ceph bits

Acked-by: Sage Weil <sage@redhat.com>

> ---
>  fs/ceph/file.c     |  4 +++-
>  fs/fuse/file.c     |  2 ++
>  fs/nfs/file.c      | 10 ++++++----
>  fs/ocfs2/file.c    |  6 ++++--
>  fs/read_write.c    | 20 +++++++++++++++-----
>  include/linux/fs.h |  3 ++-
>  mm/filemap.c       |  4 +++-
>  7 files changed, 35 insertions(+), 14 deletions(-)
> 
> diff --git a/fs/ceph/file.c b/fs/ceph/file.c
> index b798b5c..2d4e15a 100644
> --- a/fs/ceph/file.c
> +++ b/fs/ceph/file.c
> @@ -983,7 +983,9 @@ retry_snap:
>  	ceph_put_cap_refs(ci, got);
>  
>  	if (written >= 0 &&
> -	    ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host) ||
> +	    ((file->f_flags & O_SYNC) ||
> +	     IS_SYNC(file->f_mapping->host) ||
> +	     (iocb->ki_rwflags & RWF_DSYNC) ||
>  	     ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) {
>  		err = vfs_fsync_range(file, pos, pos + written - 1, 1);
>  		if (err < 0)
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index caa8d95..bb4fb23 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -1248,6 +1248,8 @@ static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
>  		written += written_buffered;
>  		iocb->ki_pos = pos + written_buffered;
>  	} else {
> +		if (iocb->ki_rwflags & RWF_DSYNC)
> +			return -EINVAL;
>  		written = fuse_perform_write(file, mapping, from, pos);
>  		if (written >= 0)
>  			iocb->ki_pos = pos + written;
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index aa9046f..c59b0b7 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -652,13 +652,15 @@ static const struct vm_operations_struct nfs_file_vm_ops = {
>  	.remap_pages = generic_file_remap_pages,
>  };
>  
> -static int nfs_need_sync_write(struct file *filp, struct inode *inode)
> +static int nfs_need_sync_write(struct kiocb *iocb, struct inode *inode)
>  {
>  	struct nfs_open_context *ctx;
>  
> -	if (IS_SYNC(inode) || (filp->f_flags & O_DSYNC))
> +	if (IS_SYNC(inode) ||
> +	    (iocb->ki_filp->f_flags & O_DSYNC) ||
> +	    (iocb->ki_rwflags & RWF_DSYNC))
>  		return 1;
> -	ctx = nfs_file_open_context(filp);
> +	ctx = nfs_file_open_context(iocb->ki_filp);
>  	if (test_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags) ||
>  	    nfs_ctx_key_to_expire(ctx))
>  		return 1;
> @@ -705,7 +707,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
>  		written = result;
>  
>  	/* Return error values for O_DSYNC and IS_SYNC() */
> -	if (result >= 0 && nfs_need_sync_write(file, inode)) {
> +	if (result >= 0 && nfs_need_sync_write(iocb, inode)) {
>  		int err = vfs_fsync(file, 0);
>  		if (err < 0)
>  			result = err;
> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
> index bb66ca4..8f9a86b 100644
> --- a/fs/ocfs2/file.c
> +++ b/fs/ocfs2/file.c
> @@ -2374,8 +2374,10 @@ out_dio:
>  	/* buffered aio wouldn't have proper lock coverage today */
>  	BUG_ON(ret == -EIOCBQUEUED && !(file->f_flags & O_DIRECT));
>  
> -	if (((file->f_flags & O_DSYNC) && !direct_io) || IS_SYNC(inode) ||
> -	    ((file->f_flags & O_DIRECT) && !direct_io)) {
> +	if (((file->f_flags & O_DSYNC) && !direct_io) ||
> +	    IS_SYNC(inode) ||
> +	    ((file->f_flags & O_DIRECT) && !direct_io) ||
> +	    (iocb->ki_rwflags & RWF_DSYNC)) {
>  		ret = filemap_fdatawrite_range(file->f_mapping, *ppos,
>  					       *ppos + count - 1);
>  		if (ret < 0)
> diff --git a/fs/read_write.c b/fs/read_write.c
> index cba7d4c..3443265 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -839,8 +839,13 @@ static ssize_t do_readv_writev(int type, struct file *file,
>  		ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
>  						pos, iter_fn, flags);
>  	} else {
> -		if (type == READ && (flags & RWF_NONBLOCK))
> -			return -EAGAIN;
> +		if (type == READ) {
> +			if (flags & RWF_NONBLOCK)
> +				return -EAGAIN;
> +		} else {
> +			if (flags & RWF_DSYNC)
> +				return -EINVAL;
> +		}
>  
>  		if (fnv)
>  			ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
> @@ -888,7 +893,7 @@ ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
>  		return -EBADF;
>  	if (!(file->f_mode & FMODE_CAN_WRITE))
>  		return -EINVAL;
> -	if (flags & ~0)
> +	if (flags & ~RWF_DSYNC)
>  		return -EINVAL;
>  
>  	return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
> @@ -1080,8 +1085,13 @@ static ssize_t compat_do_readv_writev(int type, struct file *file,
>  		ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
>  						pos, iter_fn, flags);
>  	} else {
> -		if (type == READ && (flags & RWF_NONBLOCK))
> -			return -EAGAIN;
> +		if (type == READ) {
> +			if (flags & RWF_NONBLOCK)
> +				return -EAGAIN;
> +		} else {
> +			if (flags & RWF_DSYNC)
> +				return -EINVAL;
> +		}
>  
>  		if (fnv)
>  			ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 7d0e116..7786b88 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1460,7 +1460,8 @@ struct block_device_operations;
>  #define HAVE_UNLOCKED_IOCTL 1
>  
>  /* These flags are used for the readv/writev syscalls with flags. */
> -#define RWF_NONBLOCK 0x00000001
> +#define RWF_NONBLOCK	0x00000001
> +#define RWF_DSYNC	0x00000002
>  
>  struct iov_iter;
>  
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 6107058..4fbef99 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2669,7 +2669,9 @@ int generic_write_sync(struct kiocb *iocb, loff_t count)
>  	struct file *file = iocb->ki_filp;
>  
>  	if (count > 0 &&
> -	    ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))) {
> +	    ((file->f_flags & O_DSYNC) ||
> +	     (iocb->ki_rwflags & RWF_DSYNC) ||
> +	     IS_SYNC(file->f_mapping->host))) {
>  		bool fdatasync = !(file->f_flags & __O_SYNC);
>  		ssize_t ret = 0;
>  
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/ceph/file.c b/fs/ceph/file.c
index b798b5c..2d4e15a 100644
--- a/fs/ceph/file.c
+++ b/fs/ceph/file.c
@@ -983,7 +983,9 @@  retry_snap:
 	ceph_put_cap_refs(ci, got);
 
 	if (written >= 0 &&
-	    ((file->f_flags & O_SYNC) || IS_SYNC(file->f_mapping->host) ||
+	    ((file->f_flags & O_SYNC) ||
+	     IS_SYNC(file->f_mapping->host) ||
+	     (iocb->ki_rwflags & RWF_DSYNC) ||
 	     ceph_osdmap_flag(osdc->osdmap, CEPH_OSDMAP_NEARFULL))) {
 		err = vfs_fsync_range(file, pos, pos + written - 1, 1);
 		if (err < 0)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index caa8d95..bb4fb23 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1248,6 +1248,8 @@  static ssize_t fuse_file_write_iter(struct kiocb *iocb, struct iov_iter *from)
 		written += written_buffered;
 		iocb->ki_pos = pos + written_buffered;
 	} else {
+		if (iocb->ki_rwflags & RWF_DSYNC)
+			return -EINVAL;
 		written = fuse_perform_write(file, mapping, from, pos);
 		if (written >= 0)
 			iocb->ki_pos = pos + written;
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index aa9046f..c59b0b7 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -652,13 +652,15 @@  static const struct vm_operations_struct nfs_file_vm_ops = {
 	.remap_pages = generic_file_remap_pages,
 };
 
-static int nfs_need_sync_write(struct file *filp, struct inode *inode)
+static int nfs_need_sync_write(struct kiocb *iocb, struct inode *inode)
 {
 	struct nfs_open_context *ctx;
 
-	if (IS_SYNC(inode) || (filp->f_flags & O_DSYNC))
+	if (IS_SYNC(inode) ||
+	    (iocb->ki_filp->f_flags & O_DSYNC) ||
+	    (iocb->ki_rwflags & RWF_DSYNC))
 		return 1;
-	ctx = nfs_file_open_context(filp);
+	ctx = nfs_file_open_context(iocb->ki_filp);
 	if (test_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags) ||
 	    nfs_ctx_key_to_expire(ctx))
 		return 1;
@@ -705,7 +707,7 @@  ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
 		written = result;
 
 	/* Return error values for O_DSYNC and IS_SYNC() */
-	if (result >= 0 && nfs_need_sync_write(file, inode)) {
+	if (result >= 0 && nfs_need_sync_write(iocb, inode)) {
 		int err = vfs_fsync(file, 0);
 		if (err < 0)
 			result = err;
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index bb66ca4..8f9a86b 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -2374,8 +2374,10 @@  out_dio:
 	/* buffered aio wouldn't have proper lock coverage today */
 	BUG_ON(ret == -EIOCBQUEUED && !(file->f_flags & O_DIRECT));
 
-	if (((file->f_flags & O_DSYNC) && !direct_io) || IS_SYNC(inode) ||
-	    ((file->f_flags & O_DIRECT) && !direct_io)) {
+	if (((file->f_flags & O_DSYNC) && !direct_io) ||
+	    IS_SYNC(inode) ||
+	    ((file->f_flags & O_DIRECT) && !direct_io) ||
+	    (iocb->ki_rwflags & RWF_DSYNC)) {
 		ret = filemap_fdatawrite_range(file->f_mapping, *ppos,
 					       *ppos + count - 1);
 		if (ret < 0)
diff --git a/fs/read_write.c b/fs/read_write.c
index cba7d4c..3443265 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -839,8 +839,13 @@  static ssize_t do_readv_writev(int type, struct file *file,
 		ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
 						pos, iter_fn, flags);
 	} else {
-		if (type == READ && (flags & RWF_NONBLOCK))
-			return -EAGAIN;
+		if (type == READ) {
+			if (flags & RWF_NONBLOCK)
+				return -EAGAIN;
+		} else {
+			if (flags & RWF_DSYNC)
+				return -EINVAL;
+		}
 
 		if (fnv)
 			ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
@@ -888,7 +893,7 @@  ssize_t vfs_writev(struct file *file, const struct iovec __user *vec,
 		return -EBADF;
 	if (!(file->f_mode & FMODE_CAN_WRITE))
 		return -EINVAL;
-	if (flags & ~0)
+	if (flags & ~RWF_DSYNC)
 		return -EINVAL;
 
 	return do_readv_writev(WRITE, file, vec, vlen, pos, flags);
@@ -1080,8 +1085,13 @@  static ssize_t compat_do_readv_writev(int type, struct file *file,
 		ret = do_iter_readv_writev(file, type, iov, nr_segs, tot_len,
 						pos, iter_fn, flags);
 	} else {
-		if (type == READ && (flags & RWF_NONBLOCK))
-			return -EAGAIN;
+		if (type == READ) {
+			if (flags & RWF_NONBLOCK)
+				return -EAGAIN;
+		} else {
+			if (flags & RWF_DSYNC)
+				return -EINVAL;
+		}
 
 		if (fnv)
 			ret = do_sync_readv_writev(file, iov, nr_segs, tot_len,
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 7d0e116..7786b88 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1460,7 +1460,8 @@  struct block_device_operations;
 #define HAVE_UNLOCKED_IOCTL 1
 
 /* These flags are used for the readv/writev syscalls with flags. */
-#define RWF_NONBLOCK 0x00000001
+#define RWF_NONBLOCK	0x00000001
+#define RWF_DSYNC	0x00000002
 
 struct iov_iter;
 
diff --git a/mm/filemap.c b/mm/filemap.c
index 6107058..4fbef99 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2669,7 +2669,9 @@  int generic_write_sync(struct kiocb *iocb, loff_t count)
 	struct file *file = iocb->ki_filp;
 
 	if (count > 0 &&
-	    ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host))) {
+	    ((file->f_flags & O_DSYNC) ||
+	     (iocb->ki_rwflags & RWF_DSYNC) ||
+	     IS_SYNC(file->f_mapping->host))) {
 		bool fdatasync = !(file->f_flags & __O_SYNC);
 		ssize_t ret = 0;