From patchwork Fri Jan 18 22:34:07 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ben Greear X-Patchwork-Id: 2004821 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 4E44F3FED4 for ; Fri, 18 Jan 2013 22:34:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755255Ab3ARWeK (ORCPT ); Fri, 18 Jan 2013 17:34:10 -0500 Received: from mail.candelatech.com ([208.74.158.172]:40075 "EHLO ns3.lanforge.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755006Ab3ARWeJ (ORCPT ); Fri, 18 Jan 2013 17:34:09 -0500 Received: from [192.168.100.226] (firewall.candelatech.com [70.89.124.249]) (authenticated bits=0) by ns3.lanforge.com (8.14.2/8.14.2) with ESMTP id r0IMY71m011181 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 18 Jan 2013 14:34:07 -0800 Message-ID: <50F9CDDF.4070609@candelatech.com> Date: Fri, 18 Jan 2013 14:34:07 -0800 From: Ben Greear Organization: Candela Technologies User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/17.0 Thunderbird/17.0 MIME-Version: 1.0 To: Chuck Lever CC: "linux-nfs@vger.kernel.org" Subject: Re: Question on nfs40_discover_server_trunking. References: <50F9BE66.6080608@candelatech.com> <0F001F0E-229D-4314-A42E-84402E4F1FC7@oracle.com> <50F9C5B4.5020000@candelatech.com> <1A56CC87-38FF-47B4-9CA9-7BAE394AF0D2@oracle.com> In-Reply-To: <1A56CC87-38FF-47B4-9CA9-7BAE394AF0D2@oracle.com> Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 01/18/2013 02:29 PM, Chuck Lever wrote: > > On Jan 18, 2013, at 4:59 PM, Ben Greear wrote: > >> On 01/18/2013 01:33 PM, Chuck Lever wrote: >>> >>> On Jan 18, 2013, at 4:28 PM, Ben Greear wrote: >>> >>>> Any chance the STALE_CLIENTID case needs a 'break'? >>> >>> I don't think so. LEASE_CONFIRM is set, and we want to wake the state renewal thread. >>> >>>> >>>> Twice I've seen kernel crashes after the nfs40_walk_client_list >>>> failed (though code comments say it should never fail). >>> >>> nfs40_walk_client_list() is looking for an nfs_client that is supposed to already be in the nfs_client list. If the search fails, that's a bug. >>> >>> Eyeball the contents of your nfs_client list. You should find an appropriate nfs_client in there, and then figure out why the search doesn't find it. >> >> Ok, I think I found another issue. >> >> nfs4_client_init does not initialize 'old', but passes it to nfs4_discover_server_trunking. >> >> That gets passed to the detect_trunking operation, which is nfs40_discover_server_trunking >> or nfs41_discover_server_trunking. > > Yes, *result is an output parameter, not an input parameter. > >> This will call walk_client_list, which also may not ever assign a value to >> 'result'. > > It should always assign *result in the SUCCESS case. > >> The code in walk_client_list always dereferences result, however. >> >> So, that is probably why my system blows up shortly after the 'impossible' >> error message... > > In particular, nfs4_walk_client_list() does not set *result when it returns NFS4ERR_STALE_CLIENTID. In nfs4_discover_server_trunking(), should we therefore have this: > > status = nfs40_walk_client_list(clp, result, cred); > switch (status) { > case -NFS4ERR_STALE_CLIENTID: > set_bit(NFS4CLNT_LEASE_CONFIRM, &clp->cl_state); >>> nfs4_schedule_state_renewal(clp); >>> break; > case 0: > > I'm not sure the nfs4_schedule_state_renewal() call is necessary here. I'm not sure I follow you entirely..but I'm going to start testing this patch in a minute. Looks like it should fix my crash, and maybe give clues about why we get to the error case in the first place. diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c index d6b39a9..3188283 100644 --- a/fs/nfs/nfs4client.c +++ b/fs/nfs/nfs4client.c @@ -307,6 +307,8 @@ int nfs40_walk_client_list(struct nfs_client *new, }; int status; + *result = NULL; + spin_lock(&nn->nfs_client_lock); list_for_each_entry_safe(pos, n, &nn->nfs_client_list, cl_share_link) { /* If "pos" isn't marked ready, we can't trust the @@ -364,6 +366,7 @@ int nfs40_walk_client_list(struct nfs_client *new, nfs_put_client(prev); spin_unlock(&nn->nfs_client_lock); pr_err("NFS: %s Error: no matching nfs_client found\n", __func__); + WARN_ON(1); return -NFS4ERR_STALE_CLIENTID; } @@ -433,6 +436,8 @@ int nfs41_walk_client_list(struct nfs_client *new, struct nfs_client *pos, *n, *prev = NULL; int error; + *result = NULL; + spin_lock(&nn->nfs_client_lock); list_for_each_entry_safe(pos, n, &nn->nfs_client_list, cl_share_link) { /* If "pos" isn't marked ready, we can't trust the @@ -489,6 +494,7 @@ int nfs41_walk_client_list(struct nfs_client *new, */ spin_unlock(&nn->nfs_client_lock); pr_err("NFS: %s Error: no matching nfs_client found\n", __func__); + WARN_ON(1); return -NFS4ERR_STALE_CLIENTID; } #endif /* CONFIG_NFS_V4_1 */ diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c index c351e6b..54d29e2 100644 --- a/fs/nfs/nfs4state.c +++ b/fs/nfs/nfs4state.c @@ -142,7 +142,8 @@ int nfs40_discover_server_trunking(struct nfs_client *clp, case 0: /* Sustain the lease, even if it's empty. If the clientid4 * goes stale it's of no use for trunking discovery. */ - nfs4_schedule_state_renewal(*result); + if (*result) + nfs4_schedule_state_renewal(*result); break; }