diff mbox

warning in ext4_journal_start_sb on filesystem freeze

Message ID 20140224154532.GB11992@fieldses.org (mailing list archive)
State New, archived
Headers show

Commit Message

J. Bruce Fields Feb. 24, 2014, 3:45 p.m. UTC
On Mon, Feb 24, 2014 at 10:55:25AM +0100, Jan Kara wrote:
> On Sat 22-02-14 09:50:06, Matthew Rahtz wrote:
> > Thanks for your help Jan,
> > 
> > A few months later, we've noticed the issue is actually still there.
> > Using 3.11.0-17-generic on Ubuntu 12.04, we’re seeing this in the kernel
> > logs:
> > 
> > [29243.606215] WARNING: CPU: 0 PID: 1785 at
> > /build/buildd/linux-lts-saucy-3.11.0/fs/ext4/ext4_jbd2.c:48
> > ext4_journal_check_start+0x83/0x90()
> > 
> > Having a look at the Ubuntu source package for that version, it
> > definitely does include commit 03d95eb2f2578083a3f6286262e1cb5d88a00c02,
> > and the line generating the warning is still:
> > 
> > WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
> > 
> > Are there any other obvious possibilities for what may be causing this?
> > There seem to be some users of Oracle Linux experiencing similar problems
> > at https://community.oracle.com/thread/2617418, which was apparently
> > fixed in Oracle's kernel version '3.8.13-26.el6uek'. Any word on when
> > this might be integrated into the official kernel?
> > 
> > Full call trace included below.
>   Looking at the trace below, now the problem seems to be in the NFS server
> code. NFS should get protection against the filesystem being frozen (or
> remounted read-only for that matter) via mnt_want_write() before calling
> into notify_change() (actually before calling fh_lock() because of lock
> ordering).  Similarly to what we do e.g. in fchownat(). Bruce?

Like this?

But I wonder why this is just popping up now--as far as I can tell we've
had the bug since those write counts were introduced.

--b.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Jan Kara Feb. 25, 2014, 10:21 a.m. UTC | #1
On Mon 24-02-14 10:45:32, J. Bruce Fields wrote:
> On Mon, Feb 24, 2014 at 10:55:25AM +0100, Jan Kara wrote:
> > On Sat 22-02-14 09:50:06, Matthew Rahtz wrote:
> > > Thanks for your help Jan,
> > > 
> > > A few months later, we've noticed the issue is actually still there.
> > > Using 3.11.0-17-generic on Ubuntu 12.04, we’re seeing this in the kernel
> > > logs:
> > > 
> > > [29243.606215] WARNING: CPU: 0 PID: 1785 at
> > > /build/buildd/linux-lts-saucy-3.11.0/fs/ext4/ext4_jbd2.c:48
> > > ext4_journal_check_start+0x83/0x90()
> > > 
> > > Having a look at the Ubuntu source package for that version, it
> > > definitely does include commit 03d95eb2f2578083a3f6286262e1cb5d88a00c02,
> > > and the line generating the warning is still:
> > > 
> > > WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
> > > 
> > > Are there any other obvious possibilities for what may be causing this?
> > > There seem to be some users of Oracle Linux experiencing similar problems
> > > at https://community.oracle.com/thread/2617418, which was apparently
> > > fixed in Oracle's kernel version '3.8.13-26.el6uek'. Any word on when
> > > this might be integrated into the official kernel?
> > > 
> > > Full call trace included below.
> >   Looking at the trace below, now the problem seems to be in the NFS server
> > code. NFS should get protection against the filesystem being frozen (or
> > remounted read-only for that matter) via mnt_want_write() before calling
> > into notify_change() (actually before calling fh_lock() because of lock
> > ordering).  Similarly to what we do e.g. in fchownat(). Bruce?
> 
> Like this?
  Yup, that looks right.

> But I wonder why this is just popping up now--as far as I can tell we've
> had the bug since those write counts were introduced.
  Yeah, I'm wondering as well. NFS server on ext4 should have been
complaining for a long time.

								Honza

> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 6d7be3f..d573b61 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -445,12 +445,16 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
>  		err = nfserr_notsync;
>  		goto out_put_write_access;
>  	}
> +	host_err = fh_want_write(fhp);
> +	if (host_err)
> +		goto out_nfserr;
>  
>  	fh_lock(fhp);
>  	host_err = notify_change(dentry, iap, NULL);
>  	fh_unlock(fhp);
> +	fh_drop_write(fhp);
> +out_nfserr:
>  	err = nfserrno(host_err);
> -
>  out_put_write_access:
>  	if (size_change)
>  		put_write_access(inode);
J. Bruce Fields March 4, 2014, 4:43 p.m. UTC | #2
On Tue, Feb 25, 2014 at 11:21:26AM +0100, Jan Kara wrote:
> On Mon 24-02-14 10:45:32, J. Bruce Fields wrote:
> > On Mon, Feb 24, 2014 at 10:55:25AM +0100, Jan Kara wrote:
> > > On Sat 22-02-14 09:50:06, Matthew Rahtz wrote:
> > > > Thanks for your help Jan,
> > > > 
> > > > A few months later, we've noticed the issue is actually still there.
> > > > Using 3.11.0-17-generic on Ubuntu 12.04, we’re seeing this in the kernel
> > > > logs:
> > > > 
> > > > [29243.606215] WARNING: CPU: 0 PID: 1785 at
> > > > /build/buildd/linux-lts-saucy-3.11.0/fs/ext4/ext4_jbd2.c:48
> > > > ext4_journal_check_start+0x83/0x90()
> > > > 
> > > > Having a look at the Ubuntu source package for that version, it
> > > > definitely does include commit 03d95eb2f2578083a3f6286262e1cb5d88a00c02,
> > > > and the line generating the warning is still:
> > > > 
> > > > WARN_ON(sb->s_writers.frozen == SB_FREEZE_COMPLETE);
> > > > 
> > > > Are there any other obvious possibilities for what may be causing this?
> > > > There seem to be some users of Oracle Linux experiencing similar problems
> > > > at https://community.oracle.com/thread/2617418, which was apparently
> > > > fixed in Oracle's kernel version '3.8.13-26.el6uek'. Any word on when
> > > > this might be integrated into the official kernel?
> > > > 
> > > > Full call trace included below.
> > >   Looking at the trace below, now the problem seems to be in the NFS server
> > > code. NFS should get protection against the filesystem being frozen (or
> > > remounted read-only for that matter) via mnt_want_write() before calling
> > > into notify_change() (actually before calling fh_lock() because of lock
> > > ordering).  Similarly to what we do e.g. in fchownat(). Bruce?
> > 
> > Like this?
>   Yup, that looks right.

Ugh, actually, I didn't realize we can't do mnt_want_write recursively,
and there's a confusing mixture of callers that do and don't already
take it, so I'll have to do something a little more complicated.

Oh well.--b.

> 
> > But I wonder why this is just popping up now--as far as I can tell we've
> > had the bug since those write counts were introduced.
>   Yeah, I'm wondering as well. NFS server on ext4 should have been
> complaining for a long time.
> 
> 								Honza
> 
> > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> > index 6d7be3f..d573b61 100644
> > --- a/fs/nfsd/vfs.c
> > +++ b/fs/nfsd/vfs.c
> > @@ -445,12 +445,16 @@ nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
> >  		err = nfserr_notsync;
> >  		goto out_put_write_access;
> >  	}
> > +	host_err = fh_want_write(fhp);
> > +	if (host_err)
> > +		goto out_nfserr;
> >  
> >  	fh_lock(fhp);
> >  	host_err = notify_change(dentry, iap, NULL);
> >  	fh_unlock(fhp);
> > +	fh_drop_write(fhp);
> > +out_nfserr:
> >  	err = nfserrno(host_err);
> > -
> >  out_put_write_access:
> >  	if (size_change)
> >  		put_write_access(inode);
> -- 
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 6d7be3f..d573b61 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -445,12 +445,16 @@  nfsd_setattr(struct svc_rqst *rqstp, struct svc_fh *fhp, struct iattr *iap,
 		err = nfserr_notsync;
 		goto out_put_write_access;
 	}
+	host_err = fh_want_write(fhp);
+	if (host_err)
+		goto out_nfserr;
 
 	fh_lock(fhp);
 	host_err = notify_change(dentry, iap, NULL);
 	fh_unlock(fhp);
+	fh_drop_write(fhp);
+out_nfserr:
 	err = nfserrno(host_err);
-
 out_put_write_access:
 	if (size_change)
 		put_write_access(inode);