diff mbox

[3/5] nfsd: Only set PF_LESS_THROTTLE when really needed.

Message ID 20140423024058.4725.38098.stgit@notabene.brown (mailing list archive)
State New, archived
Headers show

Commit Message

NeilBrown April 23, 2014, 2:40 a.m. UTC
PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks
and live-locks while writing to the page cache in a loop-back
NFS mount situation.

It therefore makes sense to *only* set PF_LESS_THROTTLE in this
situation.
We now know when a request came from the local-host so it could be a
loop-back mount.  We already know when we are handling write requests,
and when we are doing anything else.

So combine those two to allow nfsd to still be throttled (like any
other process) in every situation except when it is known to be
problematic.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfsd/nfssvc.c |    6 ------
 fs/nfsd/vfs.c    |   12 ++++++++++++
 2 files changed, 12 insertions(+), 6 deletions(-)



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

J. Bruce Fields May 6, 2014, 8:54 p.m. UTC | #1
On Wed, Apr 23, 2014 at 12:40:58PM +1000, NeilBrown wrote:
> PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks
> and live-locks while writing to the page cache in a loop-back
> NFS mount situation.
> 
> It therefore makes sense to *only* set PF_LESS_THROTTLE in this
> situation.
> We now know when a request came from the local-host so it could be a
> loop-back mount.  We already know when we are handling write requests,
> and when we are doing anything else.
> 
> So combine those two to allow nfsd to still be throttled (like any
> other process) in every situation except when it is known to be
> problematic.

Looks simple enough, ACK.--b.

> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  fs/nfsd/nfssvc.c |    6 ------
>  fs/nfsd/vfs.c    |   12 ++++++++++++
>  2 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 9a4a5f9e7468..1879e43f2868 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -591,12 +591,6 @@ nfsd(void *vrqstp)
>  	nfsdstats.th_cnt++;
>  	mutex_unlock(&nfsd_mutex);
>  
> -	/*
> -	 * We want less throttling in balance_dirty_pages() so that nfs to
> -	 * localhost doesn't cause nfsd to lock up due to all the client's
> -	 * dirty pages.
> -	 */
> -	current->flags |= PF_LESS_THROTTLE;
>  	set_freezable();
>  
>  	/*
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 6d7be3f80356..2acd00445ad0 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -913,6 +913,16 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
>  	int			stable = *stablep;
>  	int			use_wgather;
>  	loff_t			pos = offset;
> +	unsigned int		pflags = current->flags;
> +
> +	if (rqstp->rq_local)
> +		/*
> +		 * We want less throttling in balance_dirty_pages()
> +		 * and shrink_inactive_list() so that nfs to
> +		 * localhost doesn't cause nfsd to lock up due to all
> +		 * the client's dirty pages or its congested queue.
> +		 */
> +		current->flags |= PF_LESS_THROTTLE;
>  
>  	dentry = file->f_path.dentry;
>  	inode = dentry->d_inode;
> @@ -950,6 +960,8 @@ out_nfserr:
>  		err = 0;
>  	else
>  		err = nfserrno(host_err);
> +	if (rqstp->rq_local)
> +		tsk_restore_flags(current, pflags, PF_LESS_THROTTLE);
>  	return err;
>  }
>  
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rik van Riel May 6, 2014, 9:05 p.m. UTC | #2
On 04/22/2014 10:40 PM, NeilBrown wrote:
> PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks
> and live-locks while writing to the page cache in a loop-back
> NFS mount situation.
> 
> It therefore makes sense to *only* set PF_LESS_THROTTLE in this
> situation.
> We now know when a request came from the local-host so it could be a
> loop-back mount.  We already know when we are handling write requests,
> and when we are doing anything else.
> 
> So combine those two to allow nfsd to still be throttled (like any
> other process) in every situation except when it is known to be
> problematic.

The FUSE code has something similar, but on the "client"
side.

See BDI_CAP_STRICTLIMIT in mm/writeback.c

Would it make sense to use that flag on loopback-mounted
NFS filesystems?
NeilBrown May 12, 2014, 1:04 a.m. UTC | #3
On Tue, 06 May 2014 17:05:01 -0400 Rik van Riel <riel@redhat.com> wrote:

> On 04/22/2014 10:40 PM, NeilBrown wrote:
> > PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks
> > and live-locks while writing to the page cache in a loop-back
> > NFS mount situation.
> > 
> > It therefore makes sense to *only* set PF_LESS_THROTTLE in this
> > situation.
> > We now know when a request came from the local-host so it could be a
> > loop-back mount.  We already know when we are handling write requests,
> > and when we are doing anything else.
> > 
> > So combine those two to allow nfsd to still be throttled (like any
> > other process) in every situation except when it is known to be
> > problematic.
> 
> The FUSE code has something similar, but on the "client"
> side.
> 
> See BDI_CAP_STRICTLIMIT in mm/writeback.c
> 
> Would it make sense to use that flag on loopback-mounted
> NFS filesystems?
> 

I don't think so.

I don't fully understand BDI_CAP_STRICTLIMIT, but it seems to be very
fuse-specific and relates to NR_WRITEBACK_TEMP, which only fuse uses.  NFS
doesn't need any 'strict' limits.
i.e. it looks like fuse-specific code inside core-vm code, which I would
rather steer clear of.

Setting a bdi flag for a loopback-mounted NFS filesystem isn't really
possible because it "is it loopback mounted" state is fluid.  IP addresses can
be migrated (for HA cluster failover) and what was originally a remote-NFS
mount can become a loopback NFS mount (and that is exactly the case I need to
deal with).

So we can only really assess "is it loop-back" on a per-request basis.

This patch does that assessment in nfsd to limit the use of PF_LESS_THROTTLE.
Another patch does it in nfs to limit the waiting in nfs_release_page.

Thanks,
NeilBrown
NeilBrown May 12, 2014, 1:05 a.m. UTC | #4
On Tue, 6 May 2014 16:54:18 -0400 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Wed, Apr 23, 2014 at 12:40:58PM +1000, NeilBrown wrote:
> > PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks
> > and live-locks while writing to the page cache in a loop-back
> > NFS mount situation.
> > 
> > It therefore makes sense to *only* set PF_LESS_THROTTLE in this
> > situation.
> > We now know when a request came from the local-host so it could be a
> > loop-back mount.  We already know when we are handling write requests,
> > and when we are doing anything else.
> > 
> > So combine those two to allow nfsd to still be throttled (like any
> > other process) in every situation except when it is known to be
> > problematic.
> 
> Looks simple enough, ACK.--b.
> 

Thanks.
I'll resend the bits need for just this.

The NFS side need to wait for wait_on_bit improvements which seem to be on a
slow path at the moment.

Thanks,
NeilBrown
Jan Kara May 12, 2014, 3:32 p.m. UTC | #5
On Mon 12-05-14 11:04:37, NeilBrown wrote:
> On Tue, 06 May 2014 17:05:01 -0400 Rik van Riel <riel@redhat.com> wrote:
> 
> > On 04/22/2014 10:40 PM, NeilBrown wrote:
> > > PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks
> > > and live-locks while writing to the page cache in a loop-back
> > > NFS mount situation.
> > > 
> > > It therefore makes sense to *only* set PF_LESS_THROTTLE in this
> > > situation.
> > > We now know when a request came from the local-host so it could be a
> > > loop-back mount.  We already know when we are handling write requests,
> > > and when we are doing anything else.
> > > 
> > > So combine those two to allow nfsd to still be throttled (like any
> > > other process) in every situation except when it is known to be
> > > problematic.
> > 
> > The FUSE code has something similar, but on the "client"
> > side.
> > 
> > See BDI_CAP_STRICTLIMIT in mm/writeback.c
> > 
> > Would it make sense to use that flag on loopback-mounted
> > NFS filesystems?
> > 
> 
> I don't think so.
> 
> I don't fully understand BDI_CAP_STRICTLIMIT, but it seems to be very
> fuse-specific and relates to NR_WRITEBACK_TEMP, which only fuse uses.  NFS
> doesn't need any 'strict' limits.
> i.e. it looks like fuse-specific code inside core-vm code, which I would
> rather steer clear of.
  It doesn't really relate to NR_WRITEBACK_TEMP. We have two dirty limits
in the VM - the global one and a per bdi one (which is a fraction of a
global one computed based on how much device has been writing back in the
past). Normally until we have more than (dirty_limit +
dirty_background_limit) / 2 dirty pages globally, the per bdi limit is
ignored. And BDI_CAP_STRICTLIMIT means that the per-bdi dirty limit is
always observed. Together with max_ratio and min_ratio this is useful for
limiting amount of dirty pages for specific bdis. And FUSE uses it so that
userspace filesystems cannot easily lockup the system by creating lots of
dirty pages which cannot be written back.

So I actually don't think BDI_CAP_STRICTLIMIT is a particularly good fit
for your problem although I agree with Rik that FUSE faces a similar
problem.

								Honza
diff mbox

Patch

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 9a4a5f9e7468..1879e43f2868 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -591,12 +591,6 @@  nfsd(void *vrqstp)
 	nfsdstats.th_cnt++;
 	mutex_unlock(&nfsd_mutex);
 
-	/*
-	 * We want less throttling in balance_dirty_pages() so that nfs to
-	 * localhost doesn't cause nfsd to lock up due to all the client's
-	 * dirty pages.
-	 */
-	current->flags |= PF_LESS_THROTTLE;
 	set_freezable();
 
 	/*
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 6d7be3f80356..2acd00445ad0 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -913,6 +913,16 @@  nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file,
 	int			stable = *stablep;
 	int			use_wgather;
 	loff_t			pos = offset;
+	unsigned int		pflags = current->flags;
+
+	if (rqstp->rq_local)
+		/*
+		 * We want less throttling in balance_dirty_pages()
+		 * and shrink_inactive_list() so that nfs to
+		 * localhost doesn't cause nfsd to lock up due to all
+		 * the client's dirty pages or its congested queue.
+		 */
+		current->flags |= PF_LESS_THROTTLE;
 
 	dentry = file->f_path.dentry;
 	inode = dentry->d_inode;
@@ -950,6 +960,8 @@  out_nfserr:
 		err = 0;
 	else
 		err = nfserrno(host_err);
+	if (rqstp->rq_local)
+		tsk_restore_flags(current, pflags, PF_LESS_THROTTLE);
 	return err;
 }