diff mbox

[v3] NFS: Fix writeback performance issue on cache invalidation

Message ID 1375910064-23731-1-git-send-email-Trond.Myklebust@netapp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust Aug. 7, 2013, 9:14 p.m. UTC
If a cache invalidation is triggered, and we happen to have a lot of
writebacks cached at the time, then the call to invalidate_inode_pages2()
will end up calling ->launder_page() on each and every dirty page in order
to sync its contents to disk, thus defeating write coalescing.
The following patch ensures that we try to sync the inode to disk before
calling invalidate_inode_pages2() so that we do the writeback as efficiently
as possible.

Reported-by: William Dauchy <william@gandi.net>
Reported-by: Pascal Bouchareine <pascal@gandi.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: William Dauchy <william@gandi.net>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
---
v2: Add check for regular file as per Jeff Layton's suggestion.
v3: Minor cleanup and add Jeff as a reviewer

 fs/nfs/inode.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Jeff Layton Aug. 8, 2013, 6:11 p.m. UTC | #1
On Wed, 7 Aug 2013 17:14:24 -0400
Trond Myklebust <Trond.Myklebust@netapp.com> wrote:

> If a cache invalidation is triggered, and we happen to have a lot of
> writebacks cached at the time, then the call to invalidate_inode_pages2()
> will end up calling ->launder_page() on each and every dirty page in order
> to sync its contents to disk, thus defeating write coalescing.
> The following patch ensures that we try to sync the inode to disk before
> calling invalidate_inode_pages2() so that we do the writeback as efficiently
> as possible.
> 
> Reported-by: William Dauchy <william@gandi.net>
> Reported-by: Pascal Bouchareine <pascal@gandi.net>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> Tested-by: William Dauchy <william@gandi.net>
> Reviewed-by: Jeff Layton <jlayton@redhat.com>
> ---
> v2: Add check for regular file as per Jeff Layton's suggestion.
> v3: Minor cleanup and add Jeff as a reviewer
> 
>  fs/nfs/inode.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index af6e806..3ea4f64 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -963,9 +963,15 @@ EXPORT_SYMBOL_GPL(nfs_revalidate_inode);
>  static int nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping)
>  {
>  	struct nfs_inode *nfsi = NFS_I(inode);
> -	
> +	int ret;
> +
>  	if (mapping->nrpages != 0) {
> -		int ret = invalidate_inode_pages2(mapping);
> +		if (S_ISREG(inode->i_mode)) {
> +			ret = nfs_sync_mapping(mapping);
> +			if (ret < 0)
> +				return ret;
> +		}
> +		ret = invalidate_inode_pages2(mapping);
>  		if (ret < 0)
>  			return ret;
>  	}

It occurs to me that we have several places that call nfs_sync_mapping
without checking S_ISREG. Are they potentially problematic?

Might it make more sense to move the S_ISREG test inside of
nfs_sync_mapping and just have it "return 0" when it's not a regular
file?
Trond Myklebust Aug. 8, 2013, 6:21 p.m. UTC | #2
On Thu, 2013-08-08 at 14:11 -0400, Jeff Layton wrote:
> On Wed, 7 Aug 2013 17:14:24 -0400

> Trond Myklebust <Trond.Myklebust@netapp.com> wrote:

> 

> > If a cache invalidation is triggered, and we happen to have a lot of

> > writebacks cached at the time, then the call to invalidate_inode_pages2()

> > will end up calling ->launder_page() on each and every dirty page in order

> > to sync its contents to disk, thus defeating write coalescing.

> > The following patch ensures that we try to sync the inode to disk before

> > calling invalidate_inode_pages2() so that we do the writeback as efficiently

> > as possible.

> > 

> > Reported-by: William Dauchy <william@gandi.net>

> > Reported-by: Pascal Bouchareine <pascal@gandi.net>

> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

> > Tested-by: William Dauchy <william@gandi.net>

> > Reviewed-by: Jeff Layton <jlayton@redhat.com>

> > ---

> > v2: Add check for regular file as per Jeff Layton's suggestion.

> > v3: Minor cleanup and add Jeff as a reviewer

> > 

> >  fs/nfs/inode.c | 10 ++++++++--

> >  1 file changed, 8 insertions(+), 2 deletions(-)

> > 

> > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c

> > index af6e806..3ea4f64 100644

> > --- a/fs/nfs/inode.c

> > +++ b/fs/nfs/inode.c

> > @@ -963,9 +963,15 @@ EXPORT_SYMBOL_GPL(nfs_revalidate_inode);

> >  static int nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping)

> >  {

> >  	struct nfs_inode *nfsi = NFS_I(inode);

> > -	

> > +	int ret;

> > +

> >  	if (mapping->nrpages != 0) {

> > -		int ret = invalidate_inode_pages2(mapping);

> > +		if (S_ISREG(inode->i_mode)) {

> > +			ret = nfs_sync_mapping(mapping);

> > +			if (ret < 0)

> > +				return ret;

> > +		}

> > +		ret = invalidate_inode_pages2(mapping);

> >  		if (ret < 0)

> >  			return ret;

> >  	}

> 

> It occurs to me that we have several places that call nfs_sync_mapping

> without checking S_ISREG. Are they potentially problematic?

> 

> Might it make more sense to move the S_ISREG test inside of

> nfs_sync_mapping and just have it "return 0" when it's not a regular

> file?


I see 5 callers of nfs_sync_mapping() aside from the above: 2 are in the
O_DIRECT code, the other 3 are all in the file locking code. AFAICS,
none of those can ever be fed to non-regular files.

Am I missing anything?

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com
Jeff Layton Aug. 8, 2013, 6:42 p.m. UTC | #3
On Thu, 8 Aug 2013 18:21:35 +0000
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> On Thu, 2013-08-08 at 14:11 -0400, Jeff Layton wrote:
> > On Wed, 7 Aug 2013 17:14:24 -0400
> > Trond Myklebust <Trond.Myklebust@netapp.com> wrote:
> > 
> > > If a cache invalidation is triggered, and we happen to have a lot of
> > > writebacks cached at the time, then the call to invalidate_inode_pages2()
> > > will end up calling ->launder_page() on each and every dirty page in order
> > > to sync its contents to disk, thus defeating write coalescing.
> > > The following patch ensures that we try to sync the inode to disk before
> > > calling invalidate_inode_pages2() so that we do the writeback as efficiently
> > > as possible.
> > > 
> > > Reported-by: William Dauchy <william@gandi.net>
> > > Reported-by: Pascal Bouchareine <pascal@gandi.net>
> > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> > > Tested-by: William Dauchy <william@gandi.net>
> > > Reviewed-by: Jeff Layton <jlayton@redhat.com>
> > > ---
> > > v2: Add check for regular file as per Jeff Layton's suggestion.
> > > v3: Minor cleanup and add Jeff as a reviewer
> > > 
> > >  fs/nfs/inode.c | 10 ++++++++--
> > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> > > index af6e806..3ea4f64 100644
> > > --- a/fs/nfs/inode.c
> > > +++ b/fs/nfs/inode.c
> > > @@ -963,9 +963,15 @@ EXPORT_SYMBOL_GPL(nfs_revalidate_inode);
> > >  static int nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping)
> > >  {
> > >  	struct nfs_inode *nfsi = NFS_I(inode);
> > > -	
> > > +	int ret;
> > > +
> > >  	if (mapping->nrpages != 0) {
> > > -		int ret = invalidate_inode_pages2(mapping);
> > > +		if (S_ISREG(inode->i_mode)) {
> > > +			ret = nfs_sync_mapping(mapping);
> > > +			if (ret < 0)
> > > +				return ret;
> > > +		}
> > > +		ret = invalidate_inode_pages2(mapping);
> > >  		if (ret < 0)
> > >  			return ret;
> > >  	}
> > 
> > It occurs to me that we have several places that call nfs_sync_mapping
> > without checking S_ISREG. Are they potentially problematic?
> > 
> > Might it make more sense to move the S_ISREG test inside of
> > nfs_sync_mapping and just have it "return 0" when it's not a regular
> > file?
> 
> I see 5 callers of nfs_sync_mapping() aside from the above: 2 are in the
> O_DIRECT code, the other 3 are all in the file locking code. AFAICS,
> none of those can ever be fed to non-regular files.
> 
> Am I missing anything?
> 

You can lock a directory or device special file though, right?

In practice I don't think there's any way to end up with dirty pages on
a !S_ISREG inode, but in that case, the S_ISREG check here would be
superfluous (though checking it might be a reasonable optimization).
Trond Myklebust Aug. 8, 2013, 8:16 p.m. UTC | #4
On Thu, 2013-08-08 at 14:42 -0400, Jeff Layton wrote:
> On Thu, 8 Aug 2013 18:21:35 +0000

> "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:

> 

> > On Thu, 2013-08-08 at 14:11 -0400, Jeff Layton wrote:

> > > On Wed, 7 Aug 2013 17:14:24 -0400

> > > Trond Myklebust <Trond.Myklebust@netapp.com> wrote:

> > > 

> > > > If a cache invalidation is triggered, and we happen to have a lot of

> > > > writebacks cached at the time, then the call to invalidate_inode_pages2()

> > > > will end up calling ->launder_page() on each and every dirty page in order

> > > > to sync its contents to disk, thus defeating write coalescing.

> > > > The following patch ensures that we try to sync the inode to disk before

> > > > calling invalidate_inode_pages2() so that we do the writeback as efficiently

> > > > as possible.

> > > > 

> > > > Reported-by: William Dauchy <william@gandi.net>

> > > > Reported-by: Pascal Bouchareine <pascal@gandi.net>

> > > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

> > > > Tested-by: William Dauchy <william@gandi.net>

> > > > Reviewed-by: Jeff Layton <jlayton@redhat.com>

> > > > ---

> > > > v2: Add check for regular file as per Jeff Layton's suggestion.

> > > > v3: Minor cleanup and add Jeff as a reviewer

> > > > 

> > > >  fs/nfs/inode.c | 10 ++++++++--

> > > >  1 file changed, 8 insertions(+), 2 deletions(-)

> > > > 

> > > > diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c

> > > > index af6e806..3ea4f64 100644

> > > > --- a/fs/nfs/inode.c

> > > > +++ b/fs/nfs/inode.c

> > > > @@ -963,9 +963,15 @@ EXPORT_SYMBOL_GPL(nfs_revalidate_inode);

> > > >  static int nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping)

> > > >  {

> > > >  	struct nfs_inode *nfsi = NFS_I(inode);

> > > > -	

> > > > +	int ret;

> > > > +

> > > >  	if (mapping->nrpages != 0) {

> > > > -		int ret = invalidate_inode_pages2(mapping);

> > > > +		if (S_ISREG(inode->i_mode)) {

> > > > +			ret = nfs_sync_mapping(mapping);

> > > > +			if (ret < 0)

> > > > +				return ret;

> > > > +		}

> > > > +		ret = invalidate_inode_pages2(mapping);

> > > >  		if (ret < 0)

> > > >  			return ret;

> > > >  	}

> > > 

> > > It occurs to me that we have several places that call nfs_sync_mapping

> > > without checking S_ISREG. Are they potentially problematic?

> > > 

> > > Might it make more sense to move the S_ISREG test inside of

> > > nfs_sync_mapping and just have it "return 0" when it's not a regular

> > > file?

> > 

> > I see 5 callers of nfs_sync_mapping() aside from the above: 2 are in the

> > O_DIRECT code, the other 3 are all in the file locking code. AFAICS,

> > none of those can ever be fed to non-regular files.

> > 

> > Am I missing anything?

> > 

> 

> You can lock a directory or device special file though, right?


No. inode->i_fop is specific to the regular files. NFSv4 won't support
locking on anything which you can't OPEN.

POSIX itself does appear to allow locking on directories, but since you
can only in practice do read locks (AFAIK you can't open a directory for
writing) then I'm not losing any sleep over it.

> In practice I don't think there's any way to end up with dirty pages on

> a !S_ISREG inode, but in that case, the S_ISREG check here would be

> superfluous (though checking it might be a reasonable optimization).


AFAICR, nfs_sync_mapping() does actually blow up if you call it on a
non-regular file, but as I said, the only place where that appears to be
possible is nfs_invalidate_mapping().

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com
diff mbox

Patch

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index af6e806..3ea4f64 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -963,9 +963,15 @@  EXPORT_SYMBOL_GPL(nfs_revalidate_inode);
 static int nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
-	
+	int ret;
+
 	if (mapping->nrpages != 0) {
-		int ret = invalidate_inode_pages2(mapping);
+		if (S_ISREG(inode->i_mode)) {
+			ret = nfs_sync_mapping(mapping);
+			if (ret < 0)
+				return ret;
+		}
+		ret = invalidate_inode_pages2(mapping);
 		if (ret < 0)
 			return ret;
 	}