diff mbox series

[-next,1/2] nfs: nfs{,4}_file_flush should consume writeback error

Message ID 20220305124636.2002383-2-chenxiaosong2@huawei.com (mailing list archive)
State New
Headers show
Series nfs: check writeback errors correctly | expand

Commit Message

ChenXiaoSong March 5, 2022, 12:46 p.m. UTC
filemap_sample_wb_err() will return 0 if nobody has seen the error yet,
then filemap_check_wb_err() will return the unchanged writeback error.

Reproducer:
        nfs server               |       nfs client
 --------------------------------|-----------------------------------------------
 # No space left on server       |
 fallocate -l 100G /server/file1 |
                                 | mount -t nfs $nfs_server_ip:/ /mnt
                                 | # Expected error: No space left on device
                                 | dd if=/dev/zero of=/mnt/file2 count=1 ibs=100K
 # Release space on server       |
 rm /server/file1                |
                                 | # Unexpected error: No space left on device
                                 | dd if=/dev/zero of=/mnt/file2 count=1 ibs=100K

Fix this by using file_check_and_advance_wb_err(). If there is an error during
the writeback process, it should be returned when user space calls close() or fsync(),
flush() is called when user space calls close().

Note that fsync() will not be called after close().

Fixes: 67dd23f9e6fb ("nfs: ensure correct writeback errors are returned on close()")
Signed-off-by: ChenXiaoSong <chenxiaosong2@huawei.com>
---
 fs/nfs/file.c     | 4 +---
 fs/nfs/nfs4file.c | 4 +---
 2 files changed, 2 insertions(+), 6 deletions(-)

Comments

Trond Myklebust March 5, 2022, 4:53 p.m. UTC | #1
On Sat, 2022-03-05 at 20:46 +0800, ChenXiaoSong wrote:
> filemap_sample_wb_err() will return 0 if nobody has seen the error
> yet,
> then filemap_check_wb_err() will return the unchanged writeback
> error.
> 
> Reproducer:
>         nfs server               |       nfs client
>  --------------------------------|-----------------------------------
> ------------
>  # No space left on server       |
>  fallocate -l 100G /server/file1 |
>                                  | mount -t nfs $nfs_server_ip:/ /mnt
>                                  | # Expected error: No space left on
> device
>                                  | dd if=/dev/zero of=/mnt/file2
> count=1 ibs=100K
>  # Release space on server       |
>  rm /server/file1                |
>                                  | # Unexpected error: No space left
> on device
>                                  | dd if=/dev/zero of=/mnt/file2
> count=1 ibs=100K
> 

'rm' doesn't open any files or do any I/O, so it shouldn't be returning
any errors from the page cache.

IOW: The problem here is not that we're failing to clear an error from
the page cache. It is that something in 'rm' is checking the page cache
and returning any errors that it finds there.

Is 'rm' perhaps doing a stat() on the file it is deleting? If so, does
this patch fix the bug?

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/comm
it/?id=d19e0183a883
ChenXiaoSong March 6, 2022, 3:54 a.m. UTC | #2
It would be more clear if I update the reproducer like this:

         nfs server                 |       nfs client
  --------------------------------- |---------------------------------
  # No space left on server         |
  fallocate -l 100G /server/nospace |
                                    | mount -t nfs $nfs_server_ip:/ /mnt
                                    |
                                    | # Expected error
                                    | dd if=/dev/zero of=/mnt/file
                                    |
                                    | # Release space on mountpoint
                                    | rm /mnt/nospace
                                    |
                                    | # Unexpected error
                                    | dd if=/dev/zero of=/mnt/file

The Unexpected error (No space left on device) when doing second `dd`, 
is from unconsumed writeback error after close() the file when doing 
first `dd`. There is enough space when doing second `dd`, we should not 
report the nospace error.

We should report and consume the writeback error when userspace call 
close()->flush(), the writeback error should not be left for next open().

Currently, fsync() will consume the writeback error while calling 
file_check_and_advance_wb_err(), close()->flush() should also consume 
the writeback error.


在 2022/3/6 0:53, Trond Myklebust 写道:
> 'rm' doesn't open any files or do any I/O, so it shouldn't be returning
> any errors from the page cache.
> 
> IOW: The problem here is not that we're failing to clear an error from
> the page cache. It is that something in 'rm' is checking the page cache
> and returning any errors that it finds there.
> 
> Is 'rm' perhaps doing a stat() on the file it is deleting? If so, does
> this patch fix the bug?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/comm
> it/?id=d19e0183a883
>
Trond Myklebust March 6, 2022, 2:04 p.m. UTC | #3
On Sun, 2022-03-06 at 11:54 +0800, chenxiaosong (A) wrote:
> It would be more clear if I update the reproducer like this:
> 
>          nfs server                 |       nfs client
>   --------------------------------- |--------------------------------
> -
>   # No space left on server         |
>   fallocate -l 100G /server/nospace |
>                                     | mount -t nfs $nfs_server_ip:/
> /mnt
>                                     |
>                                     | # Expected error
>                                     | dd if=/dev/zero of=/mnt/file
>                                     |
>                                     | # Release space on mountpoint
>                                     | rm /mnt/nospace
>                                     |
>                                     | # Unexpected error
>                                     | dd if=/dev/zero of=/mnt/file
> 
> The Unexpected error (No space left on device) when doing second
> `dd`, 
> is from unconsumed writeback error after close() the file when doing 
> first `dd`. There is enough space when doing second `dd`, we should
> not 
> report the nospace error.
> 
> We should report and consume the writeback error when userspace call 
> close()->flush(), the writeback error should not be left for next
> open().
> 
> Currently, fsync() will consume the writeback error while calling 
> file_check_and_advance_wb_err(), close()->flush() should also consume
> the writeback error.

No. That's not part of the API of any other filesystem. Why should we
make an exception for NFS?

The problem here seems to be the way that filemap_sample_wb_err() is
defined, and how it initialises f->f_wb_err in the function
do_dentry_open(). It is designed to do exactly what you see above.
Trond Myklebust March 6, 2022, 3:08 p.m. UTC | #4
On Sun, 2022-03-06 at 09:04 -0500, Trond Myklebust wrote:
> On Sun, 2022-03-06 at 11:54 +0800, chenxiaosong (A) wrote:
> > It would be more clear if I update the reproducer like this:
> > 
> >          nfs server                 |       nfs client
> >   --------------------------------- |------------------------------
> > --
> > -
> >   # No space left on server         |
> >   fallocate -l 100G /server/nospace |
> >                                     | mount -t nfs $nfs_server_ip:/
> > /mnt
> >                                     |
> >                                     | # Expected error
> >                                     | dd if=/dev/zero of=/mnt/file
> >                                     |
> >                                     | # Release space on mountpoint
> >                                     | rm /mnt/nospace
> >                                     |
> >                                     | # Unexpected error
> >                                     | dd if=/dev/zero of=/mnt/file
> > 
> > The Unexpected error (No space left on device) when doing second
> > `dd`, 
> > is from unconsumed writeback error after close() the file when
> > doing 
> > first `dd`. There is enough space when doing second `dd`, we should
> > not 
> > report the nospace error.
> > 
> > We should report and consume the writeback error when userspace
> > call 
> > close()->flush(), the writeback error should not be left for next
> > open().
> > 
> > Currently, fsync() will consume the writeback error while calling 
> > file_check_and_advance_wb_err(), close()->flush() should also
> > consume
> > the writeback error.
> 
> No. That's not part of the API of any other filesystem. Why should we
> make an exception for NFS?
> 
> The problem here seems to be the way that filemap_sample_wb_err() is
> defined, and how it initialises f->f_wb_err in the function
> do_dentry_open(). It is designed to do exactly what you see above.
> 

Just to clarify a little.

I don't see a need to consume the writeback errors on close(), unless
other filesystems do the same. If the intention is that fsync() should
see _all_ errors that haven't already been seen, then NFS should follow
the same semantics as all the other filesystems.

However, if that is the case (i.e. if the semantics for
filemap_sample_wb_err() are deliberate, and the intention is that
open() should behave as it does today) then we should fix the various
instances of filemap_sample_wb_err() / filemap_check_wb_err() in the
NFS and nfsd code to ignore the old errors. Their intention is
definitely to only report errors that occur in the timeframe between
the two calls.
ChenXiaoSong April 12, 2022, 1:46 p.m. UTC | #5
在 2022/3/6 23:08, Trond Myklebust 写道:
> 
> Just to clarify a little.
> 
> I don't see a need to consume the writeback errors on close(), unless
> other filesystems do the same. If the intention is that fsync() should
> see _all_ errors that haven't already been seen, then NFS should follow
> the same semantics as all the other filesystems.
> 

Other filesystem will _not_ clear writeback error on close().
And other filesystem will _not_ clear writeback error on async write() too.

Other filesystem _only_ clear writeback error on fsync() or sync write().

Should NFS follow the same semantics as all the other filesystems?
Trond Myklebust April 12, 2022, 1:56 p.m. UTC | #6
On Tue, 2022-04-12 at 21:46 +0800, chenxiaosong (A) wrote:
> 在 2022/3/6 23:08, Trond Myklebust 写道:
> > 
> > Just to clarify a little.
> > 
> > I don't see a need to consume the writeback errors on close(),
> > unless
> > other filesystems do the same. If the intention is that fsync()
> > should
> > see _all_ errors that haven't already been seen, then NFS should
> > follow
> > the same semantics as all the other filesystems.
> > 
> 
> Other filesystem will _not_ clear writeback error on close().
> And other filesystem will _not_ clear writeback error on async
> write() too.
> 
> Other filesystem _only_ clear writeback error on fsync() or sync
> write().
> 

Yes. We might even consider not reporting writeback errors at all in
close(), since most developers don't check it. We certainly don't want
to clear those errors there because the manpages don't document that as
being the case.

> Should NFS follow the same semantics as all the other filesystems?

It needs to follow the semantics described in the manpage for write(2)
and fsync(2) as closely as possible, yes. That documentation is
supposed to be normative for application developers.

We won't guarantee to immediately report ENOSPC like other filesystems
do (because that would require us to only support synchronous writes),
however that behaviour is already documented in the manpage.

We may also report some errors that are not documented in the manpage
(e.g. EACCES or EROFS) simply because those errors cannot always be
reported at open() time, as would be the case for a local filesystem.
That's just how the NFS protocol works (particularly for the case of
the stateless NFSv3 protocol).
ChenXiaoSong April 12, 2022, 2:12 p.m. UTC | #7
在 2022/4/12 21:56, Trond Myklebust 写道:
> On Tue, 2022-04-12 at 21:46 +0800, chenxiaosong (A) wrote:
>>
>> Other filesystem will _not_ clear writeback error on close().
>> And other filesystem will _not_ clear writeback error on async
>> write() too.
>>
>> Other filesystem _only_ clear writeback error on fsync() or sync
>> write().
>>
> 
> Yes. We might even consider not reporting writeback errors at all in
> close(), since most developers don't check it. We certainly don't want
> to clear those errors there because the manpages don't document that as
> being the case.
> 
>> Should NFS follow the same semantics as all the other filesystems?
> 
> It needs to follow the semantics described in the manpage for write(2)
> and fsync(2) as closely as possible, yes. That documentation is
> supposed to be normative for application developers.
> 
> We won't guarantee to immediately report ENOSPC like other filesystems
> do (because that would require us to only support synchronous writes),
> however that behaviour is already documented in the manpage.
> 
> We may also report some errors that are not documented in the manpage
> (e.g. EACCES or EROFS) simply because those errors cannot always be
> reported at open() time, as would be the case for a local filesystem.
> That's just how the NFS protocol works (particularly for the case of
> the stateless NFSv3 protocol).
> 

After merging your patchset, NFS will clear wb error on async write(), 
is this reasonable?

And more importantly, we can not detect new error by using 
filemap_sample_wb_err()/filemap_sample_wb_err() while nfs_wb_all(),just 
as I described:

```c
   since = filemap_sample_wb_err() = 0
     errseq_sample
       if (!(old & ERRSEQ_SEEN)) // nobody see the error
         return 0;
   nfs_wb_all // no new error
   error = filemap_check_wb_err(..., since) != 0 // unexpected error
```
Trond Myklebust April 12, 2022, 2:27 p.m. UTC | #8
On Tue, 2022-04-12 at 22:12 +0800, chenxiaosong (A) wrote:
> 在 2022/4/12 21:56, Trond Myklebust 写道:
> > On Tue, 2022-04-12 at 21:46 +0800, chenxiaosong (A) wrote:
> > > 
> > > Other filesystem will _not_ clear writeback error on close().
> > > And other filesystem will _not_ clear writeback error on async
> > > write() too.
> > > 
> > > Other filesystem _only_ clear writeback error on fsync() or sync
> > > write().
> > > 
> > 
> > Yes. We might even consider not reporting writeback errors at all
> > in
> > close(), since most developers don't check it. We certainly don't
> > want
> > to clear those errors there because the manpages don't document
> > that as
> > being the case.
> > 
> > > Should NFS follow the same semantics as all the other
> > > filesystems?
> > 
> > It needs to follow the semantics described in the manpage for
> > write(2)
> > and fsync(2) as closely as possible, yes. That documentation is
> > supposed to be normative for application developers.
> > 
> > We won't guarantee to immediately report ENOSPC like other
> > filesystems
> > do (because that would require us to only support synchronous
> > writes),
> > however that behaviour is already documented in the manpage.
> > 
> > We may also report some errors that are not documented in the
> > manpage
> > (e.g. EACCES or EROFS) simply because those errors cannot always be
> > reported at open() time, as would be the case for a local
> > filesystem.
> > That's just how the NFS protocol works (particularly for the case
> > of
> > the stateless NFSv3 protocol).
> > 
> 
> After merging your patchset, NFS will clear wb error on async
> write(), 
> is this reasonable?
> 

It will clear ENOSPC, EDQUOT and EFBIG. It should not clear other
errors that are not supposed to be reported by write().

> And more importantly, we can not detect new error by using 
> filemap_sample_wb_err()/filemap_sample_wb_err() while
> nfs_wb_all(),just 
> as I described:
> 
> ```c
>    since = filemap_sample_wb_err() = 0
>      errseq_sample
>        if (!(old & ERRSEQ_SEEN)) // nobody see the error
>          return 0;
>    nfs_wb_all // no new error
>    error = filemap_check_wb_err(..., since) != 0 // unexpected error
> ```


As I keep repeating, that is _documented behaviour_!
ChenXiaoSong April 20, 2022, 8:50 a.m. UTC | #9
在 2022/4/12 22:27, Trond Myklebust 写道:

>
> It will clear ENOSPC, EDQUOT and EFBIG. It should not clear other
> errors that are not supposed to be reported by write().
> 
> As I keep repeating, that is _documented behaviour_!
> 

Hi Trond:

You may mean that write(2) manpage described:

> Since Linux 4.13, errors from write-back come with a promise that
> they may be reported by subsequent.  write(2) requests, and will be
> reported by a subsequent fsync(2) (whether or not they were also
> reported by write(2)).

The manpage mentioned that "reported by a subsequent fsync(2)", your 
patch[1] clear the wb err on _async_ write(), and wb err will _not_ be 
reported by subsequent fsync(2), is it documented behaviour?

All other filesystems will _not_ clear any wb err on _async_ write().

[1] 
https://patchwork.kernel.org/project/linux-nfs/patch/20220411213346.762302-4-trondmy@kernel.org/
ChenXiaoSong May 9, 2022, 7:43 a.m. UTC | #10
在 2022/4/20 16:50, chenxiaosong (A) 写道:
> 在 2022/4/12 22:27, Trond Myklebust 写道:
> 
>>
>> It will clear ENOSPC, EDQUOT and EFBIG. It should not clear other
>> errors that are not supposed to be reported by write().
>>
>> As I keep repeating, that is _documented behaviour_!
>>
> 
> Hi Trond:
> 
> You may mean that write(2) manpage described:
> 
>> Since Linux 4.13, errors from write-back come with a promise that
>> they may be reported by subsequent.  write(2) requests, and will be
>> reported by a subsequent fsync(2) (whether or not they were also
>> reported by write(2)).
> 
> The manpage mentioned that "reported by a subsequent fsync(2)", your 
> patch[1] clear the wb err on _async_ write(), and wb err will _not_ be 
> reported by subsequent fsync(2), is it documented behaviour?
> 
> All other filesystems will _not_ clear any wb err on _async_ write().
> 
> [1] 
> https://patchwork.kernel.org/project/linux-nfs/patch/20220411213346.762302-4-trondmy@kernel.org/ 
> 


Hi Trond:

write(2) manpage described:

> On some filesystems, including NFS, it does not even guarantee that
> space has successfully been reserved for the data.  In this case, some
> errors might be delayed until a future write(2), fsync(2), or even
> close(2).  The only way to be sure is to call fsync(2) after you are
> done writing all your data.

Maybe it mean that: writeback errors of NFS is delayed until future sync 
write() and fsync(), because sync write() will call fsync(). We all 
agreed that close()->flush() should not clear writeback errors, should 
async write() do the same thing like other filesystems?
diff mbox series

Patch

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 76d76acbc594..83d63bce9596 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -141,7 +141,6 @@  static int
 nfs_file_flush(struct file *file, fl_owner_t id)
 {
 	struct inode	*inode = file_inode(file);
-	errseq_t since;
 
 	dprintk("NFS: flush(%pD2)\n", file);
 
@@ -150,9 +149,8 @@  nfs_file_flush(struct file *file, fl_owner_t id)
 		return 0;
 
 	/* Flush writes to the server and return any errors */
-	since = filemap_sample_wb_err(file->f_mapping);
 	nfs_wb_all(inode);
-	return filemap_check_wb_err(file->f_mapping, since);
+	return file_check_and_advance_wb_err(file);
 }
 
 ssize_t
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index e79ae4cbc395..63a57e5b6db7 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -111,7 +111,6 @@  static int
 nfs4_file_flush(struct file *file, fl_owner_t id)
 {
 	struct inode	*inode = file_inode(file);
-	errseq_t since;
 
 	dprintk("NFS: flush(%pD2)\n", file);
 
@@ -127,9 +126,8 @@  nfs4_file_flush(struct file *file, fl_owner_t id)
 		return filemap_fdatawrite(file->f_mapping);
 
 	/* Flush writes to the server and return any errors */
-	since = filemap_sample_wb_err(file->f_mapping);
 	nfs_wb_all(inode);
-	return filemap_check_wb_err(file->f_mapping, since);
+	return file_check_and_advance_wb_err(file);
 }
 
 #ifdef CONFIG_NFS_V4_2