From patchwork Thu Nov 10 20:18:21 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Coddington X-Patchwork-Id: 9422035 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A1EC760512 for ; Thu, 10 Nov 2016 20:18:27 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9618329842 for ; Thu, 10 Nov 2016 20:18:27 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8949C29844; Thu, 10 Nov 2016 20:18:27 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.9 required=2.0 tests=BAYES_00,RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 29EE929842 for ; Thu, 10 Nov 2016 20:18:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965345AbcKJUS0 (ORCPT ); Thu, 10 Nov 2016 15:18:26 -0500 Received: from mx1.redhat.com ([209.132.183.28]:58046 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S964923AbcKJUSZ (ORCPT ); Thu, 10 Nov 2016 15:18:25 -0500 Received: from int-mx14.intmail.prod.int.phx2.redhat.com (int-mx14.intmail.prod.int.phx2.redhat.com [10.5.11.27]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A8F8F31B32C; Thu, 10 Nov 2016 20:18:24 +0000 (UTC) Received: from [10.10.55.107] (vpn-55-107.rdu2.redhat.com [10.10.55.107]) by int-mx14.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id uAAKIMg3002286 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA256 bits=256 verify=NO); Thu, 10 Nov 2016 15:18:23 -0500 From: "Benjamin Coddington" To: "Anna Schumaker" Cc: "Trond Myklebust" , "List Linux NFS Mailing" , "Oleg Drokin" Subject: Re: [PATCH v7 13/31] NFSv4.1: Ensure we always run TEST/FREE_STATEID on locks Date: Thu, 10 Nov 2016 15:18:21 -0500 Message-ID: In-Reply-To: References: <1474565961-21303-1-git-send-email-trond.myklebust@primarydata.com> <1474565961-21303-7-git-send-email-trond.myklebust@primarydata.com> <1474565961-21303-8-git-send-email-trond.myklebust@primarydata.com> <1474565961-21303-9-git-send-email-trond.myklebust@primarydata.com> <1474565961-21303-10-git-send-email-trond.myklebust@primarydata.com> <1474565961-21303-11-git-send-email-trond.myklebust@primarydata.com> <1474565961-21303-12-git-send-email-trond.myklebust@primarydata.com> <1474565961-21303-13-git-send-email-trond.myklebust@primarydata.com> <1474565961-21303-14-git-send-email-trond.myklebust@primarydata.com> <599EE56B-46DD-411B-805D-11C2FB5E30A4@redhat.com> <34B1D68A-1A1C-4B59-A19E-467D48F7A9D0@redhat.com> <6ABCDB9B-997E-49C1-9363-D59AF9BEC0E9@primarydata.com> <806bf204-eb35-5a3a-30fa-612bf22fb09a@Netapp.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.68 on 10.5.11.27 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Thu, 10 Nov 2016 20:18:24 +0000 (UTC) Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On 10 Nov 2016, at 10:58, Benjamin Coddington wrote: > Hi Anna, > > On 10 Nov 2016, at 10:01, Anna Schumaker wrote: >> Do you have an estimate for when this patch will be ready? I want to >> include it in my next bugfix pull request for 4.9. > > I haven't posted because I am still trying to get to the bottom of > another > problem where the client gets stuck in a loop sending the same stateid > over > and over on NFS4ERR_OLD_STATEID. I want to make sure this problem > isn't > caused by this fix -- which I don't think it is, but I'd rather make > sure. > If I don't make any progress on this problem by the end of today, I'll > post > what I have. > > Read on if interested in this new problem: > > It looks like racing opens with the same openowner can be returned out > of > order by the server, so the client sees stateid seqid of 2 before 1. > Then a > LOCK sent with seqid 1 is endlessly retried if sent while doing > recovery. > > It's hard to tell if I was able to capture all the moving parts to > describe > this problem, though. As it takes a very long time for me to > reproduce, and > the packet captures were dropping frames. I'm working on manually > reproducing it now. Anna, I haven't gotten to the bottom of it, and so I'm not confident it isn't a problem created by the fix I've been testing, which is: if (test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags)) { struct rpc_cred *cred = lsp->ls_state->owner->so_cred; @@ -2588,7 +2591,10 @@ static int nfs41_check_expired_locks(struct nfs4_state *state) break; } } - }; + nfs4_put_lock_state(lsp); + spin_lock(&state->state_lock); + } + spin_unlock(&state->state_lock); out: return ret; } http://people.redhat.com/bcodding/old_stateid_loop is tshark output of my only good wirecapture of the problem. Without this patch, generic/089 crashes long before this problem is reproduced, so I am stuck figuring it out, I'm afraid. Don't wait on my account. I plan on trying a bit more to reproduce tomorrow, and if I cannot, I'll write about it under separate cover. Ben --- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index e809498..2aa9d86 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -2564,12 +2564,15 @@ static void nfs41_check_delegation_stateid(struct nfs4_state *state) static int nfs41_check_expired_locks(struct nfs4_state *state) { int status, ret = NFS_OK; - struct nfs4_lock_state *lsp; + struct nfs4_lock_state *lsp, *tmp; struct nfs_server *server = NFS_SERVER(state->inode); if (!test_bit(LK_STATE_IN_USE, &state->flags)) goto out; - list_for_each_entry(lsp, &state->lock_states, ls_locks) { + spin_lock(&state->state_lock); + list_for_each_entry_safe(lsp, tmp, &state->lock_states, ls_locks) { + atomic_inc(&lsp->ls_count); + spin_unlock(&state->state_lock);