NFS regression - EIO is returned instead of ENOSPC

Hi Trond et al,
 we seem to have a regression introduced by 

commit 7b281ee026552f10862b617a2a51acf49c829554
    NFS: fsync() must exit with an error if page writeback failed

which has found it's way (in different form into -stable releases).

The problem is that an NFSERR_NOSPC comes through as EIO.

e.g. if /mnt2/ if an nfs mounted filesystem that has no space then

strace dd if=/dev/zero conv=fsync >> /mnt2/afile count=1

reported Input/output error and the relevant parts of the strace output are:

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1)                                = -1 EIO (Input/output error)
close(1)                                = -1 ENOSPC (No space left on device)

i.e. we get an EIO from fsync, then the ENOSPC comes with the close.

If don't do the fsync, then:

strace dd if=/dev/zero >> /mnt2/afile count=1

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
close(1)                                = -1 EIO (Input/output error)

we don't see the ENOSPC at all.

The problem is that filemap_fdatawait_range, when it sees a page with
PageError set, will return -EIO unless AS_ENOSPC is set.  NFS never sets
AS_ENOSPC so we get -EIO.

Then nfs_file_fsync() will return as soon as it sees an error from
filemap_write_and_wait_range(), which will be that EIO.

nfs_file_fsync_commit knows to prefer the error from ctx->error over any
other error, but nfs_file_fsync() doesn't.

I see two ways to "fix" this.

We could get nfs{,4}_file_fsync() to always call nfs_file_fsync_commit() and
use the error from the later in preference to the error from
filemap_write_and_wait_range().

That results in this, more correct, behaviour:

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1)                                = -1 ENOSPC (No space left on device)
close(1)                                = 0

Or we could get nfs_context_set_write_error() to call mapping_set_error(),
which would result in AS_ENOSPC being set and so
filemap_write_and_wait_range() will return the "correct" error.
This results in this behaviour:

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1)                                = -1 ENOSPC (No space left on device)
close(1)                                = -1 ENOSPC (No space left on device)

which is a bit odd.  The first ENOSPC is from AS_ENOSPC.  The second is from
ctx->error.

I feel that calling mapping_set_error() is the "right" thing to do, but it
would need a bit more work to avoid the double errors.

What approach would you prefer?

The two patches I used are:

Thanks,
NeilBrown

NFS regression - EIO is returned instead of ENOSPC

Commit Message

Comments

Patch