From patchwork Wed Mar 13 16:47:34 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bodo Stroesser X-Patchwork-Id: 2263941 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 5AFBA4020C for ; Wed, 13 Mar 2013 16:47:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933656Ab3CMQri (ORCPT ); Wed, 13 Mar 2013 12:47:38 -0400 Received: from dgate10.ts.fujitsu.com ([80.70.172.49]:53532 "EHLO dgate10.ts.fujitsu.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932182Ab3CMQrg (ORCPT ); Wed, 13 Mar 2013 12:47:36 -0400 DomainKey-Signature: s=s1536a; d=ts.fujitsu.com; c=nofws; q=dns; h=Message-Id:Date:X-SBRSScore:X-IronPort-AV:Received: X-IronPort-AV:Received:From:To:Cc:Subject:Content-Type: Content-Transfer-Encoding; b=dxac7OWELEr2+rg2lWLqyEotDq3RxY6qX6GEBevIBnIx6l8SlemUY2Hr 1PT+siW3PxXxOkzYyfBWQypPX4makuJJEiRnoXFEuixG45M/wPYyTn7f2 2sEVSo1IXFlBlp9Hsk0xX7UuQbAnij/dGC9JuPzEwgvFaQaVJ+sgvrTSw ImzUrqH9QOzasPZXznfNEwnWW3ZB25UUuIdNQSNliSu0Y5S+P8hQ1edlc 1F27fIU4jZWJ2t06zfeqgPvPd8HCq; DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=ts.fujitsu.com; i=@ts.fujitsu.com; q=dns/txt; s=s1536b; t=1363193256; x=1394729256; h=from:to:cc:subject:content-transfer-encoding:message-id: date; bh=OBw+l/Q2z+Gnh5JjqQJLzzyshHTP/oUetNDphwJIcVc=; b=ta7FQCZN+46LxV7ixX2rKmd4Pfwt1Zy/Em/ul6udFP5Ds25/HU/5uikA YLjO3DWGomd2Ilc9ljKEEUvBWdL7yNtwj/p0F4wdVkrAg1rm2WfTjI0Rr 89VRMtfo/UTYcpPyS3GOFJXl4tpM8YMToZhsmFHvP6ei2IN7ZjxtEQ8wb 7hxBnQxdXAdh3TbRvtRAscNzoax5L2Tlt8Co3NPXhW2bMdOiZzOpJAqKo mrz9d4jhpASifdhGRahva61vpMjuw; Message-Id: Date: 13 Mar 2013 17:47:34 +0100 X-SBRSScore: None X-IronPort-AV: E=Sophos;i="4.84,838,1355094000"; d="scan'208";a="139789994" Received: from unknown (HELO abgdate50u.abg.fsc.net) ([172.25.138.66]) by dgate10u.abg.fsc.net with ESMTP; 13 Mar 2013 17:47:34 +0100 X-IronPort-AV: E=Sophos;i="4.84,838,1355094000"; d="scan'208";a="6289831" Received: from unknown (HELO BridgeHost.test.fsc.net) ([172.17.68.112]) by abgdate50u.abg.fsc.net with SMTP; 13 Mar 2013 17:47:34 +0100 From: Bodo Stroesser To: neilb@suse.de Cc: bfields@fieldses.org, linux-nfs@vger.kernel.org, bstroesser@ts.fujitsu.com Subject: Re: sunrpc/cache.c: races while updating cache entries Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 13 Mar 2013 07:55:00 +0100 NeilBrown wrote: > On 11 Mar 2013 17:13:41 +0100 Bodo Stroesser > wrote: > > > Hi, > > > > AFAICS, there is one more race in RPC cache. > > > > The problem showed up for the "auth.unix.ip" cache, that > > has a reader. > > > > If a server thread tries to find a cache entry, it first > > does a lookup (or calls ip_map_cached_get() in this specific > > case). Then, it calls cache_check() for this entry. > > > > If the reader updates the cache entry just between the > > thread's lookup and cache_check() execution, cache_check() > > will do an upcall for this cache entry. This is, because > > sunrpc_cache_update() calls cache_fresh_locked(old, 0), > > which sets expiry_time to 0. > > > > Unfortunately, the reply to the upcall will not dequeue > > the queued cache_request, as the reply will be assigned to > > the cache entry newly created by the above cache update. > > > > The result is a growing queue of orphaned cache_request > > structures --> memory leak. > > > > I found this on a SLES11 SP1 with a backport of the latest > > patches that fix the other RPC races. On this old kernel, > > the problem also leads to svc_drop() being called for the > > affected RPC request (after svc_defer()). > > > > Best Regards > > Bodo > > I don't think this is a real problem. > The periodic call to "cache_clean" should find these orphaned requests and > purge them. So you could get a short term memory leak, but it should > correct itself. > Do you agree? > > Thanks, > NeilBrown I agree that it _should_ correct itself. I read the source again and I now see that cache_clean should do the work fine. OTOH, on my SLES11 SP1, there *is* a memory leak and also RPC requests are dropped (which can't happen on mainline, but that made me look into it). Please see the output from crash on the life system below. That output is from a time far after all NFS activity was stopped and the next cycle of cache_clean() had run (ip_map_cache.nextcheck has changed several times from 3173 to the current value of 10373). At the end of the crash output you'll find two examples of what cache_request.item points to. I've looked into some more of them. They all have next == 0 and refcount.counter >= 1 (The max counter I saw was 3). So, the cache_head items all are already unhashed, but the cache_requests still are queued, right? I have no idea, how this could happen. Maybe my patches for SLES11 SP1 are not ok. So, behind the crash output, I also added my patches. Maybe you could have a look onto it? There is one more small point, that isn't clear to me: when the "old" races in auth.unix.gid caused a NFS request to be dropped, the TCP connection after a few minutes was marked by svc_age_temp_xprts() to be closed. When a request is dropped because of the race in auth.unix.ip, svc_age_temp_xprts() does not mark it to be closed. Why does it behave so differently? Thank you and best regards Bodo PS: I also have one further patch, that fixes the problem by retesting the cache entry's state after having set CACHE_PENDING in cache_check() and by examining expiry_time only if it is not 0. crash> print &ip_map_cache.queue $43 = (struct list_head *) 0xffffffffa04a1478 crash> list 0 -H 0xffffffffa04a1478 -s cache_request ffff8800bbf80180 struct cache_request { q = { list = { next = 0xffff8800ba28e940, prev = 0xffffffffa04a1478 }, reader = 0 }, item = 0xffff8800bb822c40, buf = 0xffff8800baadf000 "nfsd 192.168.10.2\n", len = 18, readers = 0 } ffff8800ba28e940 struct cache_request { q = { list = { next = 0xffff8800bb5d9c80, prev = 0xffff8800bbf80180 }, reader = 0 }, item = 0xffff8800bbd88540, buf = 0xffff8800bedcb000 "nfsd 192.168.11.2\n", len = 18, readers = 0 } ffff8800bb5d9c80 struct cache_request { q = { list = { next = 0xffff8800bbd1b800, prev = 0xffff8800ba28e940 }, reader = 0 }, item = 0xffff880204b1b340, buf = 0xffff8800bb21c000 "nfsd 192.168.10.2\n", len = 18, readers = 0 } ffff8800bbd1b800 struct cache_request { q = { list = { next = 0xffff8800b5444d00, prev = 0xffff8800bb5d9c80 }, reader = 0 }, item = 0xffff8800bb991940, buf = 0xffff8800b28f9000 "nfsd 192.168.11.2\n", len = 18, readers = 0 } ffff8800b5444d00 struct cache_request { q = { list = { next = 0xffff8800bf21cc80, prev = 0xffff8800bbd1b800 }, reader = 0 }, item = 0xffff88020a3885c0, buf = 0xffff8800b2880000 "nfsd 192.168.10.2\n", len = 18, readers = 0 } ffff8800bf21cc80 struct cache_request { q = { list = { next = 0xffff8800bed9cb80, prev = 0xffff8800b5444d00 }, reader = 0 }, item = 0xffff880204371ec0, buf = 0xffff8800b1093000 "nfsd 192.168.11.2\n", len = 18, readers = 0 } ffff8800bed9cb80 struct cache_request { q = { list = { next = 0xffff8800bac27c40, prev = 0xffff8800bf21cc80 }, reader = 0 }, item = 0xffff8802359741c0, buf = 0xffff8800bb54c000 "nfsd 192.168.11.2\n", len = 18, readers = 0 } ffff8800bac27c40 struct cache_request { q = { list = { next = 0xffff8800bef1d480, prev = 0xffff8800bed9cb80 }, reader = 0 }, item = 0xffff8800bbd3c140, buf = 0xffff8800bb8ae000 "nfsd 192.168.10.2\n", len = 18, readers = 0 } ffff8800bef1d480 struct cache_request { q = { list = { next = 0xffff8800baae5f80, prev = 0xffff8800bac27c40 }, reader = 0 }, item = 0xffff8800bbfd62c0, buf = 0xffff8800b9ecd000 "nfsd 192.168.10.2\n", len = 18, readers = 0 } ffff8800baae5f80 struct cache_request { q = { list = { next = 0xffff8800bf0cd680, prev = 0xffff8800bef1d480 }, reader = 0 }, item = 0xffff8800bb8224c0, buf = 0xffff8800bf249000 "nfsd 192.168.11.2\n", len = 18, readers = 0 } ffff8800bf0cd680 struct cache_request { q = { list = { next = 0xffff8800bb91a500, prev = 0xffff8800baae5f80 }, reader = 0 }, item = 0xffff8800bf1865c0, buf = 0xffff8800bb1c8000 "nfsd 192.168.10.2\n", len = 18, readers = 0 } ffff8800bb91a500 struct cache_request { q = { list = { next = 0xffff8800b12f3200, prev = 0xffff8800bf0cd680 }, reader = 0 }, item = 0xffff8800bec7b940, buf = 0xffff8800b26b8000 "nfsd 192.168.10.2\n", len = 18, readers = 0 } ffff8800b12f3200 struct cache_request { q = { list = { next = 0xffff880234c97840, prev = 0xffff8800bb91a500 }, reader = 0 }, item = 0xffff8800b6516440, buf = 0xffff8800bb933000 "nfsd 192.168.10.2\n", len = 18, readers = 0 } ffff880234c97840 struct cache_request { q = { list = { next = 0xffffffffa04a1478, prev = 0xffff8800b12f3200 }, reader = 1 }, item = 0x0, buf = 0x313a333a303a38
, len = 0, readers = 0 } crash> print *(struct cache_head *)0xffff8800b6516440 $44 = { next = 0x0, expiry_time = 0, last_refresh = 1140, ref = { refcount = { counter = 1 } }, flags = 5 } crash> print *(struct cache_head *)0xffff8800bec7b940 $45 = { next = 0x0, expiry_time = 0, last_refresh = 1138, ref = { refcount = { counter = 2 } }, flags = 5 } crash> ############################ --- a/net/sunrpc/cache.c 2013-03-05 12:55:28.000000000 +0100 +++ b/net/sunrpc/cache.c 2013-03-05 12:54:42.000000000 +0100 @@ -962,23 +962,33 @@ static int cache_release(struct inode *i static void cache_dequeue(struct cache_detail *detail, struct cache_head *ch) { - struct cache_queue *cq; + struct cache_queue *cq, *tmp; + struct cache_request *cr; + struct list_head dequeued; + + INIT_LIST_HEAD(&dequeued); spin_lock(&queue_lock); - list_for_each_entry(cq, &detail->queue, list) + list_for_each_entry_safe(cq, tmp, &detail->queue, list) if (!cq->reader) { - struct cache_request *cr = container_of(cq, struct cache_request, q); + cr = container_of(cq, struct cache_request, q); if (cr->item != ch) continue; + if (test_bit(CACHE_PENDING, &ch->flags)) + /* Lost a race and it is pending again */ + break; if (cr->readers != 0) continue; - list_del(&cr->q.list); - spin_unlock(&queue_lock); - cache_put(cr->item, detail); - kfree(cr->buf); - kfree(cr); - return; + list_move(&cr->q.list, &dequeued); } spin_unlock(&queue_lock); + + while (!list_empty(&dequeued)) { + cr = list_entry(dequeued.next, struct cache_request, q.list); + list_del(&cr->q.list); + cache_put(cr->item, detail); + kfree(cr->buf); + kfree(cr); + } } /* @@ -1081,6 +1091,7 @@ int sunrpc_cache_pipe_upcall(struct cach struct cache_request *crq; char *bp; int len; + int ret = 0; if (atomic_read(&detail->readers) == 0 && detail->last_close < get_seconds() - 30) { @@ -1113,10 +1124,18 @@ int sunrpc_cache_pipe_upcall(struct cach crq->len = PAGE_SIZE - len; crq->readers = 0; spin_lock(&queue_lock); - list_add_tail(&crq->q.list, &detail->queue); + if (test_bit(CACHE_PENDING, &h->flags)) + list_add_tail(&crq->q.list, &detail->queue); + else + /* Lost a race, no longer PENDING, so don't enqueue */ + ret = -EAGAIN; spin_unlock(&queue_lock); wake_up(&queue_wait); - return 0; + if (ret == -EAGAIN) { + kfree(buf); + kfree(crq); + } + return ret; } EXPORT_SYMBOL_GPL(sunrpc_cache_pipe_upcall); ############################ --- a/net/sunrpc/cache.c 2013-03-05 13:10:58.000000000 +0100 +++ b/net/sunrpc/cache.c 2013-03-05 12:54:42.000000000 +0100 @@ -233,19 +233,16 @@ int cache_check(struct cache_detail *det if (!test_and_set_bit(CACHE_PENDING, &h->flags)) { switch (cache_make_upcall(detail, h)) { case -EINVAL: - clear_bit(CACHE_PENDING, &h->flags); - cache_revisit_request(h); - if (rv == -EAGAIN) { + write_lock(&detail->hash_lock); + if (rv) { set_bit(CACHE_NEGATIVE, &h->flags); cache_fresh_locked(h, monotonic_seconds()+CACHE_NEW_EXPIRY); - cache_fresh_unlocked(h, detail); rv = -ENOENT; } - break; - + write_unlock(&detail->hash_lock); + /* FALLTHROUGH */ case -EAGAIN: - clear_bit(CACHE_PENDING, &h->flags); - cache_revisit_request(h); + cache_fresh_unlocked(h, detail); break; } } @@ -405,17 +402,12 @@ static int cache_clean(void) && ch->last_refresh >= current_detail->flush_time ) continue; - if (test_and_clear_bit(CACHE_PENDING, &ch->flags)) - cache_dequeue(current_detail, ch); - if (atomic_read(&ch->ref.refcount) == 1) - break; - } - if (ch) { *cp = ch->next; ch->next = NULL; current_detail->entries--; rv = 1; + break; } write_unlock(¤t_detail->hash_lock); d = current_detail; @@ -423,7 +415,7 @@ static int cache_clean(void) current_index ++; spin_unlock(&cache_list_lock); if (ch) { - cache_revisit_request(ch); + cache_fresh_unlocked(ch, d); cache_put(ch, d); } } else