Message ID | 20140423024058.4725.38098.stgit@notabene.brown (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 23, 2014 at 12:40:58PM +1000, NeilBrown wrote: > PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks > and live-locks while writing to the page cache in a loop-back > NFS mount situation. > > It therefore makes sense to *only* set PF_LESS_THROTTLE in this > situation. > We now know when a request came from the local-host so it could be a > loop-back mount. We already know when we are handling write requests, > and when we are doing anything else. > > So combine those two to allow nfsd to still be throttled (like any > other process) in every situation except when it is known to be > problematic. Looks simple enough, ACK.--b. > > Signed-off-by: NeilBrown <neilb@suse.de> > --- > fs/nfsd/nfssvc.c | 6 ------ > fs/nfsd/vfs.c | 12 ++++++++++++ > 2 files changed, 12 insertions(+), 6 deletions(-) > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c > index 9a4a5f9e7468..1879e43f2868 100644 > --- a/fs/nfsd/nfssvc.c > +++ b/fs/nfsd/nfssvc.c > @@ -591,12 +591,6 @@ nfsd(void *vrqstp) > nfsdstats.th_cnt++; > mutex_unlock(&nfsd_mutex); > > - /* > - * We want less throttling in balance_dirty_pages() so that nfs to > - * localhost doesn't cause nfsd to lock up due to all the client's > - * dirty pages. > - */ > - current->flags |= PF_LESS_THROTTLE; > set_freezable(); > > /* > diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c > index 6d7be3f80356..2acd00445ad0 100644 > --- a/fs/nfsd/vfs.c > +++ b/fs/nfsd/vfs.c > @@ -913,6 +913,16 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file, > int stable = *stablep; > int use_wgather; > loff_t pos = offset; > + unsigned int pflags = current->flags; > + > + if (rqstp->rq_local) > + /* > + * We want less throttling in balance_dirty_pages() > + * and shrink_inactive_list() so that nfs to > + * localhost doesn't cause nfsd to lock up due to all > + * the client's dirty pages or its congested queue. > + */ > + current->flags |= PF_LESS_THROTTLE; > > dentry = file->f_path.dentry; > inode = dentry->d_inode; > @@ -950,6 +960,8 @@ out_nfserr: > err = 0; > else > err = nfserrno(host_err); > + if (rqstp->rq_local) > + tsk_restore_flags(current, pflags, PF_LESS_THROTTLE); > return err; > } > > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/22/2014 10:40 PM, NeilBrown wrote: > PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks > and live-locks while writing to the page cache in a loop-back > NFS mount situation. > > It therefore makes sense to *only* set PF_LESS_THROTTLE in this > situation. > We now know when a request came from the local-host so it could be a > loop-back mount. We already know when we are handling write requests, > and when we are doing anything else. > > So combine those two to allow nfsd to still be throttled (like any > other process) in every situation except when it is known to be > problematic. The FUSE code has something similar, but on the "client" side. See BDI_CAP_STRICTLIMIT in mm/writeback.c Would it make sense to use that flag on loopback-mounted NFS filesystems?
On Tue, 06 May 2014 17:05:01 -0400 Rik van Riel <riel@redhat.com> wrote: > On 04/22/2014 10:40 PM, NeilBrown wrote: > > PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks > > and live-locks while writing to the page cache in a loop-back > > NFS mount situation. > > > > It therefore makes sense to *only* set PF_LESS_THROTTLE in this > > situation. > > We now know when a request came from the local-host so it could be a > > loop-back mount. We already know when we are handling write requests, > > and when we are doing anything else. > > > > So combine those two to allow nfsd to still be throttled (like any > > other process) in every situation except when it is known to be > > problematic. > > The FUSE code has something similar, but on the "client" > side. > > See BDI_CAP_STRICTLIMIT in mm/writeback.c > > Would it make sense to use that flag on loopback-mounted > NFS filesystems? > I don't think so. I don't fully understand BDI_CAP_STRICTLIMIT, but it seems to be very fuse-specific and relates to NR_WRITEBACK_TEMP, which only fuse uses. NFS doesn't need any 'strict' limits. i.e. it looks like fuse-specific code inside core-vm code, which I would rather steer clear of. Setting a bdi flag for a loopback-mounted NFS filesystem isn't really possible because it "is it loopback mounted" state is fluid. IP addresses can be migrated (for HA cluster failover) and what was originally a remote-NFS mount can become a loopback NFS mount (and that is exactly the case I need to deal with). So we can only really assess "is it loop-back" on a per-request basis. This patch does that assessment in nfsd to limit the use of PF_LESS_THROTTLE. Another patch does it in nfs to limit the waiting in nfs_release_page. Thanks, NeilBrown
On Tue, 6 May 2014 16:54:18 -0400 "J. Bruce Fields" <bfields@fieldses.org> wrote: > On Wed, Apr 23, 2014 at 12:40:58PM +1000, NeilBrown wrote: > > PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks > > and live-locks while writing to the page cache in a loop-back > > NFS mount situation. > > > > It therefore makes sense to *only* set PF_LESS_THROTTLE in this > > situation. > > We now know when a request came from the local-host so it could be a > > loop-back mount. We already know when we are handling write requests, > > and when we are doing anything else. > > > > So combine those two to allow nfsd to still be throttled (like any > > other process) in every situation except when it is known to be > > problematic. > > Looks simple enough, ACK.--b. > Thanks. I'll resend the bits need for just this. The NFS side need to wait for wait_on_bit improvements which seem to be on a slow path at the moment. Thanks, NeilBrown
On Mon 12-05-14 11:04:37, NeilBrown wrote: > On Tue, 06 May 2014 17:05:01 -0400 Rik van Riel <riel@redhat.com> wrote: > > > On 04/22/2014 10:40 PM, NeilBrown wrote: > > > PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks > > > and live-locks while writing to the page cache in a loop-back > > > NFS mount situation. > > > > > > It therefore makes sense to *only* set PF_LESS_THROTTLE in this > > > situation. > > > We now know when a request came from the local-host so it could be a > > > loop-back mount. We already know when we are handling write requests, > > > and when we are doing anything else. > > > > > > So combine those two to allow nfsd to still be throttled (like any > > > other process) in every situation except when it is known to be > > > problematic. > > > > The FUSE code has something similar, but on the "client" > > side. > > > > See BDI_CAP_STRICTLIMIT in mm/writeback.c > > > > Would it make sense to use that flag on loopback-mounted > > NFS filesystems? > > > > I don't think so. > > I don't fully understand BDI_CAP_STRICTLIMIT, but it seems to be very > fuse-specific and relates to NR_WRITEBACK_TEMP, which only fuse uses. NFS > doesn't need any 'strict' limits. > i.e. it looks like fuse-specific code inside core-vm code, which I would > rather steer clear of. It doesn't really relate to NR_WRITEBACK_TEMP. We have two dirty limits in the VM - the global one and a per bdi one (which is a fraction of a global one computed based on how much device has been writing back in the past). Normally until we have more than (dirty_limit + dirty_background_limit) / 2 dirty pages globally, the per bdi limit is ignored. And BDI_CAP_STRICTLIMIT means that the per-bdi dirty limit is always observed. Together with max_ratio and min_ratio this is useful for limiting amount of dirty pages for specific bdis. And FUSE uses it so that userspace filesystems cannot easily lockup the system by creating lots of dirty pages which cannot be written back. So I actually don't think BDI_CAP_STRICTLIMIT is a particularly good fit for your problem although I agree with Rik that FUSE faces a similar problem. Honza
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 9a4a5f9e7468..1879e43f2868 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -591,12 +591,6 @@ nfsd(void *vrqstp) nfsdstats.th_cnt++; mutex_unlock(&nfsd_mutex); - /* - * We want less throttling in balance_dirty_pages() so that nfs to - * localhost doesn't cause nfsd to lock up due to all the client's - * dirty pages. - */ - current->flags |= PF_LESS_THROTTLE; set_freezable(); /* diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 6d7be3f80356..2acd00445ad0 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -913,6 +913,16 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct file *file, int stable = *stablep; int use_wgather; loff_t pos = offset; + unsigned int pflags = current->flags; + + if (rqstp->rq_local) + /* + * We want less throttling in balance_dirty_pages() + * and shrink_inactive_list() so that nfs to + * localhost doesn't cause nfsd to lock up due to all + * the client's dirty pages or its congested queue. + */ + current->flags |= PF_LESS_THROTTLE; dentry = file->f_path.dentry; inode = dentry->d_inode; @@ -950,6 +960,8 @@ out_nfserr: err = 0; else err = nfserrno(host_err); + if (rqstp->rq_local) + tsk_restore_flags(current, pflags, PF_LESS_THROTTLE); return err; }
PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks and live-locks while writing to the page cache in a loop-back NFS mount situation. It therefore makes sense to *only* set PF_LESS_THROTTLE in this situation. We now know when a request came from the local-host so it could be a loop-back mount. We already know when we are handling write requests, and when we are doing anything else. So combine those two to allow nfsd to still be throttled (like any other process) in every situation except when it is known to be problematic. Signed-off-by: NeilBrown <neilb@suse.de> --- fs/nfsd/nfssvc.c | 6 ------ fs/nfsd/vfs.c | 12 ++++++++++++ 2 files changed, 12 insertions(+), 6 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html