From patchwork Mon Nov 25 21:06:44 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Trond Myklebust X-Patchwork-Id: 3234401 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 07163C045B for ; Mon, 25 Nov 2013 21:06:51 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 2E1D92012D for ; Mon, 25 Nov 2013 21:06:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 49BEB200D5 for ; Mon, 25 Nov 2013 21:06:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751630Ab3KYVGs (ORCPT ); Mon, 25 Nov 2013 16:06:48 -0500 Received: from mx11.netapp.com ([216.240.18.76]:23803 "EHLO mx11.netapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751565Ab3KYVGs (ORCPT ); Mon, 25 Nov 2013 16:06:48 -0500 X-IronPort-AV: E=Sophos;i="4.93,769,1378882800"; d="dif'?scan'208";a="79303135" Received: from vmwexceht01-prd.hq.netapp.com ([10.106.76.239]) by mx11-out.netapp.com with ESMTP; 25 Nov 2013 13:06:47 -0800 Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.58]) by vmwexceht01-prd.hq.netapp.com ([10.106.76.239]) with mapi id 14.03.0123.003; Mon, 25 Nov 2013 13:06:47 -0800 From: "Myklebust, Trond" To: "Adamson, Andy" CC: Linux NFS Mailing List Subject: Re: [PATCH 1/1] NFSv4.1 fix a kswap nfs4_state_manger race Thread-Topic: [PATCH 1/1] NFSv4.1 fix a kswap nfs4_state_manger race Thread-Index: AQHO6gfoKFVJY1DX3kGSqG9ceqqOa5o2xgyAgAAFj4CAAA8fAIAAB2UAgAAEs4CAAALlgIAAAn4AgAAKVAA= Date: Mon, 25 Nov 2013 21:06:44 +0000 Message-ID: <1385413604.9247.3.camel@leira.trondhjem.org> References: <1385402270-14284-1-git-send-email-andros@netapp.com> <1385402270-14284-2-git-send-email-andros@netapp.com> <5B8C7A9D-CD9E-487A-AC62-B1292649835D@netapp.com> <496E7DBC-183B-43A2-91D4-837FC092E88A@netapp.com> <676D25D8-B845-42BC-BB1E-6441B6B8E5E3@netapp.com> <5BE68579-3F17-4786-89B9-21CEC1A94E8E@netapp.com> <48468258-591E-49A8-9EAA-2DD8E3993100@netapp.com> <3A65AD2F-797E-4292-BA9C-4CF20BD075CB@netapp.com> In-Reply-To: <3A65AD2F-797E-4292-BA9C-4CF20BD075CB@netapp.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-originating-ip: [10.106.53.51] MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org X-Spam-Status: No, score=-6.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_HI, RP_MATCHES_RCVD, T_TVD_MIME_EPI, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 2013-11-25 at 20:29 +0000, Adamson, Andy wrote: > On Nov 25, 2013, at 3:20 PM, "Myklebust, Trond" > wrote: > > > > > On Nov 25, 2013, at 15:10, Adamson, Andy wrote: > > > >> > >> On Nov 25, 2013, at 2:53 PM, "Myklebust, Trond" > >> wrote: > >> > >>> > >>> On Nov 25, 2013, at 14:27, Adamson, Andy wrote: > >>> > >>>> > >>>> On Nov 25, 2013, at 1:33 PM, "Myklebust, Trond" > >>>> wrote: > >>>> > >>>>> > >>>>> On Nov 25, 2013, at 13:13, Myklebust, Trond wrote: > >>>>> > >>>>>> > >>>>>> On Nov 25, 2013, at 12:57, wrote: > >>>>>> > >>>>>>> From: Andy Adamson > >>>>>>> > >>>>>>> The state manager is recovering expired state and recovery OPENs are being > >>>>>>> processed. If kswapd is pruning inodes at the same time, a deadlock can occur > >>>>>>> when kswapd calls evict_inode on an NFSv4.1 inode with a layout, and the > >>>>>>> resultant layoutreturn gets an error that the state mangager is to handle, > >>>>>>> causing the layoutreturn to wait on the (NFS client) cl_rpcwaitq. > >>>>>>> > >>>>>>> At the same time an open is waiting for the inode deletion to complete in > >>>>>>> __wait_on_freeing_inode. > >>>>>>> > >>>>>>> If the open is either the open called by the state manager, or an open from > >>>>>>> the same open owner that is holding the NFSv4.0 sequence id which causes the > >>>>>>> OPEN from the state manager to wait for the sequence id on the Seqid_waitqueue, > >>>>>>> then the state is deadlocked with kswapd. > >>>>>>> > >>>>>>> Do not handle LAYOUTRETURN errors when called from nfs4_evict_inode. > >>>>>> > >>>>>> Why are we waiting for recovery in LAYOUTRETURN at all? Layouts are automatically lost when the server reboots or when the lease is otherwise lost. > >>>>>> > >>>>>> IOW: Is there any reason why we need to special-case nfs4_evict_inode? Shouldn’t we just bail out on error in _all_ cases? > >>>>> > >>>>> BTW: Is it possible that we might have a similar problem with delegreturn? That too can be called from nfs4_evict_inode… > >>>> > >>>> Yes, good point. kswapd could be waiting for a delegation to return which has an error along with the same scenario with sys_open and the state manager running. > >>>> > >>>> With delegreturn, we most definately want to limit 'no error handling' to the evict inode case. > >>> > >>> Ah… I forgot that the delegreturn in nfs4_evict_inode is asynchronous and doesn’t wait for completion, so it shouldn’t be a problem here. > >> > >> Except we just changed that to fix a different state manager hang: > >> > >> commit 4a82fd7c4e78a1b7a224f9ae8bb7e1fd95f670e0 > >> Author: Andy Adamson > >> Date: Fri Nov 15 16:36:16 2013 -0500 > >> > >> NFSv4 wait on recovery for async session errors > > > > Right, but that won’t prevent nfs4_evict_inode from completing, > > Ah - I was thinking of the synchronous handlers call to nfs4_wait_clnt_recover - so yes, no problem > > -->Andy > > > and hence the OPEN that is waiting in nfs_fhget() can also complete, and so there is no deadlock with the state manager thread. How about something like the attached... diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index f01e2aa53210..e040359983ce 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -7599,7 +7599,14 @@ static void nfs4_layoutreturn_done(struct rpc_task *task, void *calldata) return; server = NFS_SERVER(lrp->args.inode); - if (nfs4_async_handle_error(task, server, NULL) == -EAGAIN) { + switch (task->tk_status) { + default: + task->tk_status = 0; + case 0: + break; + case -NFS4ERR_DELAY: + if (nfs4_async_handle_error(task, server, NULL) != -EAGAIN) + break; rpc_restart_call_prepare(task); return; }