diff mbox

[RFC,1/3] nfsd: don't try to reuse an expired DRC entry off the list

Message ID 1386241253-5781-2-git-send-email-jlayton@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton Dec. 5, 2013, 11 a.m. UTC
Currently when we are processing a request, we try to scrape an expired
or over-limit entry off the list in preference to allocating a new one
from the slab.

This is unnecessarily complicated. Just use the slab layer.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfscache.c | 36 ++++--------------------------------
 1 file changed, 4 insertions(+), 32 deletions(-)

Comments

Christoph Hellwig Dec. 5, 2013, 1:29 p.m. UTC | #1
On Thu, Dec 05, 2013 at 06:00:51AM -0500, Jeff Layton wrote:
> Currently when we are processing a request, we try to scrape an expired
> or over-limit entry off the list in preference to allocating a new one
> from the slab.
> 
> This is unnecessarily complicated. Just use the slab layer.

I really don't think pruning caches from lookup functions is a good
idea. To many chances for locking inversions, unbounded runtime and
other issues.  So I'd prefer my earlier version of the patch to
remove it entirely if we want to change anything beyond your minimal
fix.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton Dec. 5, 2013, 1:41 p.m. UTC | #2
On Thu, 5 Dec 2013 05:29:19 -0800
Christoph Hellwig <hch@infradead.org> wrote:

> On Thu, Dec 05, 2013 at 06:00:51AM -0500, Jeff Layton wrote:
> > Currently when we are processing a request, we try to scrape an expired
> > or over-limit entry off the list in preference to allocating a new one
> > from the slab.
> > 
> > This is unnecessarily complicated. Just use the slab layer.
> 
> I really don't think pruning caches from lookup functions is a good
> idea. To many chances for locking inversions, unbounded runtime and
> other issues.  So I'd prefer my earlier version of the patch to
> remove it entirely if we want to change anything beyond your minimal
> fix.
> 

It's not likely to hit a locking inversion here since we don't have a
lot of different locks, but point taken...

If we take that out, then that does mean that the cache may grow larger
than the max...possibly much larger since the pruner workqueue job
only runs every 120s and you can put a lot more entries into the cache
in that period.

That said, I don't feel too strongly about it, so if you think leaving
it to the workqueue job and the shrinker is the right thing to do, then
so be it.
J. Bruce Fields Dec. 5, 2013, 3:50 p.m. UTC | #3
On Thu, Dec 05, 2013 at 08:41:56AM -0500, Jeff Layton wrote:
> On Thu, 5 Dec 2013 05:29:19 -0800
> Christoph Hellwig <hch@infradead.org> wrote:
> 
> > On Thu, Dec 05, 2013 at 06:00:51AM -0500, Jeff Layton wrote:
> > > Currently when we are processing a request, we try to scrape an expired
> > > or over-limit entry off the list in preference to allocating a new one
> > > from the slab.
> > > 
> > > This is unnecessarily complicated. Just use the slab layer.
> > 
> > I really don't think pruning caches from lookup functions is a good
> > idea. To many chances for locking inversions, unbounded runtime and
> > other issues.  So I'd prefer my earlier version of the patch to
> > remove it entirely if we want to change anything beyond your minimal
> > fix.
> > 
> 
> It's not likely to hit a locking inversion here since we don't have a
> lot of different locks, but point taken...
> 
> If we take that out, then that does mean that the cache may grow larger
> than the max...possibly much larger since the pruner workqueue job
> only runs every 120s and you can put a lot more entries into the cache
> in that period.

Yeah, this scares me.

I looked through my old mail but couldn't find your previous results:
how long did you find the hash buckets needed to be before the
difference was noticeable on your setup?

We typically do a failed-lookup-and-insert on every cached operation so
I wouldn't be surprised if a workload of small random writes, for
example, could produce an unreasonably large cache in that time.

--b.

> 
> That said, I don't feel too strongly about it, so if you think leaving
> it to the workqueue job and the shrinker is the right thing to do, then
> so be it.
> 
> -- 
> Jeff Layton <jlayton@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jeff Layton Dec. 5, 2013, 4:22 p.m. UTC | #4
On Thu, 5 Dec 2013 10:50:24 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Thu, Dec 05, 2013 at 08:41:56AM -0500, Jeff Layton wrote:
> > On Thu, 5 Dec 2013 05:29:19 -0800
> > Christoph Hellwig <hch@infradead.org> wrote:
> > 
> > > On Thu, Dec 05, 2013 at 06:00:51AM -0500, Jeff Layton wrote:
> > > > Currently when we are processing a request, we try to scrape an expired
> > > > or over-limit entry off the list in preference to allocating a new one
> > > > from the slab.
> > > > 
> > > > This is unnecessarily complicated. Just use the slab layer.
> > > 
> > > I really don't think pruning caches from lookup functions is a good
> > > idea. To many chances for locking inversions, unbounded runtime and
> > > other issues.  So I'd prefer my earlier version of the patch to
> > > remove it entirely if we want to change anything beyond your minimal
> > > fix.
> > > 
> > 
> > It's not likely to hit a locking inversion here since we don't have a
> > lot of different locks, but point taken...
> > 
> > If we take that out, then that does mean that the cache may grow larger
> > than the max...possibly much larger since the pruner workqueue job
> > only runs every 120s and you can put a lot more entries into the cache
> > in that period.
> 
> Yeah, this scares me.
> 
> I looked through my old mail but couldn't find your previous results:
> how long did you find the hash buckets needed to be before the
> difference was noticeable on your setup?
> 
> We typically do a failed-lookup-and-insert on every cached operation so
> I wouldn't be surprised if a workload of small random writes, for
> example, could produce an unreasonably large cache in that time.
> 

I don't think I ever measured that. I did some some measurement of how
long it took to generate checksums, but that's a different phase of
this stuff. I just sort of went with "too long a chain == bad, so keep
them as short as possible".


> 
> > 
> > That said, I don't feel too strongly about it, so if you think leaving
> > it to the workqueue job and the shrinker is the right thing to do, then
> > so be it.
> > 
> > -- 
> > Jeff Layton <jlayton@redhat.com>
diff mbox

Patch

diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c
index b6af150..f8f060f 100644
--- a/fs/nfsd/nfscache.c
+++ b/fs/nfsd/nfscache.c
@@ -132,13 +132,6 @@  nfsd_reply_cache_alloc(void)
 }
 
 static void
-nfsd_reply_cache_unhash(struct svc_cacherep *rp)
-{
-	hlist_del_init(&rp->c_hash);
-	list_del_init(&rp->c_lru);
-}
-
-static void
 nfsd_reply_cache_free_locked(struct svc_cacherep *rp)
 {
 	if (rp->c_type == RC_REPLBUFF && rp->c_replvec.iov_base) {
@@ -416,22 +409,8 @@  nfsd_cache_lookup(struct svc_rqst *rqstp)
 
 	/*
 	 * Since the common case is a cache miss followed by an insert,
-	 * preallocate an entry. First, try to reuse the first entry on the LRU
-	 * if it works, then go ahead and prune the LRU list.
+	 * preallocate an entry.
 	 */
-	spin_lock(&cache_lock);
-	if (!list_empty(&lru_head)) {
-		rp = list_first_entry(&lru_head, struct svc_cacherep, c_lru);
-		if (nfsd_cache_entry_expired(rp) ||
-		    num_drc_entries >= max_drc_entries) {
-			nfsd_reply_cache_unhash(rp);
-			prune_cache_entries();
-			goto search_cache;
-		}
-	}
-
-	/* No expired ones available, allocate a new one. */
-	spin_unlock(&cache_lock);
 	rp = nfsd_reply_cache_alloc();
 	spin_lock(&cache_lock);
 	if (likely(rp)) {
@@ -439,7 +418,9 @@  nfsd_cache_lookup(struct svc_rqst *rqstp)
 		drc_mem_usage += sizeof(*rp);
 	}
 
-search_cache:
+	/* go ahead and prune the cache */
+	prune_cache_entries();
+
 	found = nfsd_cache_search(rqstp, csum);
 	if (found) {
 		if (likely(rp))
@@ -453,15 +434,6 @@  search_cache:
 		goto out;
 	}
 
-	/*
-	 * We're keeping the one we just allocated. Are we now over the
-	 * limit? Prune one off the tip of the LRU in trade for the one we
-	 * just allocated if so.
-	 */
-	if (num_drc_entries >= max_drc_entries)
-		nfsd_reply_cache_free_locked(list_first_entry(&lru_head,
-						struct svc_cacherep, c_lru));
-
 	nfsdstats.rcmisses++;
 	rqstp->rq_cacherep = rp;
 	rp->c_state = RC_INPROG;