diff mbox

NFS regression - EIO is returned instead of ENOSPC

Message ID 20121212092813.23afbc5f@notabene.brown (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown Dec. 11, 2012, 10:28 p.m. UTC
Hi Trond et al,
 we seem to have a regression introduced by 

commit 7b281ee026552f10862b617a2a51acf49c829554
    NFS: fsync() must exit with an error if page writeback failed

which has found it's way (in different form into -stable releases).

The problem is that an NFSERR_NOSPC comes through as EIO.

e.g. if /mnt2/ if an nfs mounted filesystem that has no space then

strace dd if=/dev/zero conv=fsync >> /mnt2/afile count=1

reported Input/output error and the relevant parts of the strace output are:

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1)                                = -1 EIO (Input/output error)
close(1)                                = -1 ENOSPC (No space left on device)

i.e. we get an EIO from fsync, then the ENOSPC comes with the close.

If don't do the fsync, then:

strace dd if=/dev/zero >> /mnt2/afile count=1


write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
close(1)                                = -1 EIO (Input/output error)


we don't see the ENOSPC at all.

The problem is that filemap_fdatawait_range, when it sees a page with
PageError set, will return -EIO unless AS_ENOSPC is set.  NFS never sets
AS_ENOSPC so we get -EIO.

Then nfs_file_fsync() will return as soon as it sees an error from
filemap_write_and_wait_range(), which will be that EIO.

nfs_file_fsync_commit knows to prefer the error from ctx->error over any
other error, but nfs_file_fsync() doesn't.

I see two ways to "fix" this.

We could get nfs{,4}_file_fsync() to always call nfs_file_fsync_commit() and
use the error from the later in preference to the error from
filemap_write_and_wait_range().

That results in this, more correct, behaviour:

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1)                                = -1 ENOSPC (No space left on device)
close(1)                                = 0

Or we could get nfs_context_set_write_error() to call mapping_set_error(),
which would result in AS_ENOSPC being set and so
filemap_write_and_wait_range() will return the "correct" error.
This results in this behaviour:

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1)                                = -1 ENOSPC (No space left on device)
close(1)                                = -1 ENOSPC (No space left on device)

which is a bit odd.  The first ENOSPC is from AS_ENOSPC.  The second is from
ctx->error.

I feel that calling mapping_set_error() is the "right" thing to do, but it
would need a bit more work to avoid the double errors.

What approach would you prefer?

The two patches I used are:




Thanks,
NeilBrown

Comments

NeilBrown Dec. 11, 2012, 10:53 p.m. UTC | #1
On Wed, 12 Dec 2012 09:28:13 +1100 NeilBrown <neilb@suse.de> wrote:

> 
> Hi Trond et al,
>  we seem to have a regression introduced by 
> 
> commit 7b281ee026552f10862b617a2a51acf49c829554
>     NFS: fsync() must exit with an error if page writeback failed
> 
> which has found it's way (in different form into -stable releases).

Bit of a clarification here.  It didn't get into -stable, but we have the bug
in our 3.0 based SLES11-SP2 through a different route (I assumed it came
through stable but was being too hasty).

The bug first arrived in v3.1-rc1

commit 02c24a82187d5a628c68edfe71ae60dc135cd178
    fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers


was fixed by me in v3.3-rc1

commit 2edb6bc3852c681c0d948245bd55108dc6407604
    NFS - fix recent breakage to NFS error handling.

the code was then messed up a bit by 

commit a5c58892b427a2752e3ec44b0aad4ce9221dc63b
    NFS: Create a v4-specific fsync function
in v3.6-rc1

and that mess was fixed by 

commit 7b281ee026552f10862b617a2a51acf49c829554
    NFS: fsync() must exit with an error if page writeback failed

which re-introduced the original problem in v3.6-rc6.

That first patch has been backported to SLES11 so now I'm fixing the bug
again and finding it in mainline again :-)

NeilBrown
Trond Myklebust Dec. 11, 2012, 11:16 p.m. UTC | #2
On Wed, 2012-12-12 at 09:53 +1100, NeilBrown wrote:
> On Wed, 12 Dec 2012 09:28:13 +1100 NeilBrown <neilb@suse.de> wrote:

> 

> > 

> > Hi Trond et al,

> >  we seem to have a regression introduced by 

> > 

> > commit 7b281ee026552f10862b617a2a51acf49c829554

> >     NFS: fsync() must exit with an error if page writeback failed

> > 

> > which has found it's way (in different form into -stable releases).

> 

> Bit of a clarification here.  It didn't get into -stable, but we have the bug

> in our 3.0 based SLES11-SP2 through a different route (I assumed it came

> through stable but was being too hasty).

> 

> The bug first arrived in v3.1-rc1

> 

> commit 02c24a82187d5a628c68edfe71ae60dc135cd178

>     fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers

> 

> 

> was fixed by me in v3.3-rc1

> 

> commit 2edb6bc3852c681c0d948245bd55108dc6407604

>     NFS - fix recent breakage to NFS error handling.

> 

> the code was then messed up a bit by 

> 

> commit a5c58892b427a2752e3ec44b0aad4ce9221dc63b

>     NFS: Create a v4-specific fsync function

> in v3.6-rc1

> 

> and that mess was fixed by 

> 

> commit 7b281ee026552f10862b617a2a51acf49c829554

>     NFS: fsync() must exit with an error if page writeback failed

> 

> which re-introduced the original problem in v3.6-rc6.

> 

> That first patch has been backported to SLES11 so now I'm fixing the bug

> again and finding it in mainline again :-)


Hmm... I can see 2 places where we're setting the PageError flag.

     1. nfs_updatepage(): in this case, the error occurred when we tried
        to change the page contents. Since we're holding the page lock,
        and so rather than mark the page as bad, we could probably just
        write back existing dirty areas (using nfs_wb_page()) and then
        remove it from the mapping.
     2.  nfs_write_completion(): here the writeback error applies to the
        entire dirty area on the page, and there is no point in try to
        write back again. Better just evict the page from the page cache
        (which is what nfs_zap_mapping() is supposed to do). While
        setting the PageError flag does cause some of the writeback
        functions to return EIO, that's not really what we're after; we
        already report errors more completely via the open context.

So for now, can't we just change nfs_set_pageerror() to not bother
setting the PG_error flag? Then in the future we might want to make
nfs_updatepage a bit more sophisticated in how it deals with those
errors...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com
Trond Myklebust Dec. 11, 2012, 11:20 p.m. UTC | #3
T24gVHVlLCAyMDEyLTEyLTExIGF0IDE4OjE2IC0wNTAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6
DQo+IEhtbS4uLiBJIGNhbiBzZWUgMiBwbGFjZXMgd2hlcmUgd2UncmUgc2V0dGluZyB0aGUgUGFn
ZUVycm9yIGZsYWcuDQo+IA0KPiAgICAgIDEuIG5mc191cGRhdGVwYWdlKCk6IGluIHRoaXMgY2Fz
ZSwgdGhlIGVycm9yIG9jY3VycmVkIHdoZW4gd2UgdHJpZWQNCj4gICAgICAgICB0byBjaGFuZ2Ug
dGhlIHBhZ2UgY29udGVudHMuIFNpbmNlIHdlJ3JlIGhvbGRpbmcgdGhlIHBhZ2UgbG9jaywNCj4g
ICAgICAgICBhbmQgc28gcmF0aGVyIHRoYW4gbWFyayB0aGUgcGFnZSBhcyBiYWQsIHdlIGNvdWxk
IHByb2JhYmx5IGp1c3QNCj4gICAgICAgICB3cml0ZSBiYWNrIGV4aXN0aW5nIGRpcnR5IGFyZWFz
ICh1c2luZyBuZnNfd2JfcGFnZSgpKSBhbmQgdGhlbg0KPiAgICAgICAgIHJlbW92ZSBpdCBmcm9t
IHRoZSBtYXBwaW5nLg0KPiAgICAgIDIuICBuZnNfd3JpdGVfY29tcGxldGlvbigpOiBoZXJlIHRo
ZSB3cml0ZWJhY2sgZXJyb3IgYXBwbGllcyB0byB0aGUNCj4gICAgICAgICBlbnRpcmUgZGlydHkg
YXJlYSBvbiB0aGUgcGFnZSwgYW5kIHRoZXJlIGlzIG5vIHBvaW50IGluIHRyeSB0bw0KPiAgICAg
ICAgIHdyaXRlIGJhY2sgYWdhaW4uIEJldHRlciBqdXN0IGV2aWN0IHRoZSBwYWdlIGZyb20gdGhl
IHBhZ2UgY2FjaGUNCj4gICAgICAgICAod2hpY2ggaXMgd2hhdCBuZnNfemFwX21hcHBpbmcoKSBp
cyBzdXBwb3NlZCB0byBkbykuIFdoaWxlDQo+ICAgICAgICAgc2V0dGluZyB0aGUgUGFnZUVycm9y
IGZsYWcgZG9lcyBjYXVzZSBzb21lIG9mIHRoZSB3cml0ZWJhY2sNCj4gICAgICAgICBmdW5jdGlv
bnMgdG8gcmV0dXJuIEVJTywgdGhhdCdzIG5vdCByZWFsbHkgd2hhdCB3ZSdyZSBhZnRlcjsgd2UN
Cj4gICAgICAgICBhbHJlYWR5IHJlcG9ydCBlcnJvcnMgbW9yZSBjb21wbGV0ZWx5IHZpYSB0aGUg
b3BlbiBjb250ZXh0Lg0KPiANCj4gU28gZm9yIG5vdywgY2FuJ3Qgd2UganVzdCBjaGFuZ2UgbmZz
X3NldF9wYWdlZXJyb3IoKSB0byBub3QgYm90aGVyDQo+IHNldHRpbmcgdGhlIFBHX2Vycm9yIGZs
YWc/IFRoZW4gaW4gdGhlIGZ1dHVyZSB3ZSBtaWdodCB3YW50IHRvIG1ha2UNCj4gbmZzX3VwZGF0
ZXBhZ2UgYSBiaXQgbW9yZSBzb3BoaXN0aWNhdGVkIGluIGhvdyBpdCBkZWFscyB3aXRoIHRob3Nl
DQo+IGVycm9ycy4uLg0KDQpVbHRpbWF0ZWx5LCB3aGF0IEknbSBzYXlpbmcgaXMgdGhhdCBQYWdl
RXJyb3IgaXMgYSBoYWNrIGZvciBwYXNzaW5nDQplcnJvcnMgYXJvdW5kLiBTaW5jZSB3ZSBoYXZl
IG91ciBvd24gaGFjayBmb3IgZG9pbmcgdGhlIHNhbWUsIHRoZW4gd2h5DQp1c2UgUGFnZUVycm9y
IGF0IGFsbD8NCg0KLS0gDQpUcm9uZCBNeWtsZWJ1c3QNCkxpbnV4IE5GUyBjbGllbnQgbWFpbnRh
aW5lcg0KDQpOZXRBcHANClRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tDQp3d3cubmV0YXBwLmNv
bQ0K
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 582bb88..19c06b9 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -295,12 +295,12 @@  nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	struct inode *inode = file->f_path.dentry->d_inode;
 
 	do {
-		ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
-		if (ret != 0)
-			break;
+		int ret1 = filemap_write_and_wait_range(inode->i_mapping, start, end);
 		mutex_lock(&inode->i_mutex);
 		ret = nfs_file_fsync_commit(file, start, end, datasync);
 		mutex_unlock(&inode->i_mutex);
+		if (ret == 0)
+			ret = ret1;
 		/*
 		 * If nfs_file_fsync_commit detected a server reboot, then
 		 * resend all dirty pages that might have been covered by
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index afddd66..f0d9a88 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -96,15 +96,15 @@  nfs4_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	struct inode *inode = file->f_path.dentry->d_inode;
 
 	do {
-		ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
-		if (ret != 0)
-			break;
+		int ret1 = filemap_write_and_wait_range(inode->i_mapping, start, end);
 		mutex_lock(&inode->i_mutex);
 		ret = nfs_file_fsync_commit(file, start, end, datasync);
 		if (!ret && !datasync)
 			/* application has asked for meta-data sync */
 			ret = pnfs_layoutcommit_inode(inode, true);
 		mutex_unlock(&inode->i_mutex);
+		if (ret == 0)
+			ret = ret1;
 		/*
 		 * If nfs_file_fsync_commit detected a server reboot, then
 		 * resend all dirty pages that might have been covered by


and 



diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 9347ab7..b0f5bb7 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -140,6 +140,7 @@  static void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
 	ctx->error = error;
 	smp_wmb();
 	set_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags);
+	mapping_set_error(ctx->dentry->d_inode->i_mapping, error);
 }
 
 static struct nfs_page *