From patchwork Sat Jul 1 11:15:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Benjamin Coddington X-Patchwork-Id: 13299032 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8DC34EB64DC for ; Sat, 1 Jul 2023 11:16:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229507AbjGALQT (ORCPT ); Sat, 1 Jul 2023 07:16:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229501AbjGALQS (ORCPT ); Sat, 1 Jul 2023 07:16:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A57C33C05 for ; Sat, 1 Jul 2023 04:15:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688210130; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=kEHGmY1pqMlJdyUe2iWb0kBBqP+sYKT2BDnD+9vMHYg=; b=KgWg+K8xytjxk+POtcy52zNJ+psZaYnjusqH1qUHLDo5wtyuZ0VK77EHFShR1dZp4G0fPL 7fFGK7d8E6q0yM1YXtzFw5iCd+xfNY+OTd3SDBGh4xi4V4qMOU/56N+H/AIUqNhqZcJu6a uG/m63m7kPX5bkWuv+0H1B1Q6B4hxtM= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-67-H4PUysMvNPCjeHBeM68hNQ-1; Sat, 01 Jul 2023 07:15:27 -0400 X-MC-Unique: H4PUysMvNPCjeHBeM68hNQ-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 2C99A380350D; Sat, 1 Jul 2023 11:15:27 +0000 (UTC) Received: from bcodding.csb.redhat.com (unknown [10.22.50.7]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9717D1121314; Sat, 1 Jul 2023 11:15:26 +0000 (UTC) From: Benjamin Coddington To: trond.myklebust@hammerspace.com, anna@kernel.org Cc: Olga.Kornievskaia@netapp.com, linux-nfs@vger.kernel.org Subject: [PATCH v3] NFSv4: Fix dropped lock for racing OPEN and delegation return Date: Sat, 1 Jul 2023 07:15:25 -0400 Message-Id: <93929ecf62e79670f1e3a1878757fc9fa443aa7c.1688210094.git.bcodding@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation return") attempted to solve this problem by using nfs4's generic async error handling, but introduced a regression where v4.0 lock recovery would hang. The additional complexity introduced by overloading that error handling is not necessary for this case. This patch expects that commit to be reverted. The problem as originally explained in the above commit is: There's a small window where a LOCK sent during a delegation return can race with another OPEN on client, but the open stateid has not yet been updated. In this case, the client doesn't handle the OLD_STATEID error from the server and will lose this lock, emitting: "NFS: nfs4_handle_delegation_recall_error: unhandled error -10024". Fix this by using the old_stateid refresh helpers if the server replies with OLD_STATEID. Suggested-by: Trond Myklebust Signed-off-by: Benjamin Coddington --- fs/nfs/nfs4proc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index 6bb14f6cfbc0..f350f41e1967 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -7180,8 +7180,15 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata) } else if (!nfs4_update_lock_stateid(lsp, &data->res.stateid)) goto out_restart; break; - case -NFS4ERR_BAD_STATEID: case -NFS4ERR_OLD_STATEID: + if (data->arg.new_lock_owner != 0 && + nfs4_refresh_open_old_stateid(&data->arg.open_stateid, + lsp->ls_state)) + goto out_restart; + else if (nfs4_refresh_lock_old_stateid(&data->arg.lock_stateid, lsp)) + goto out_restart; + fallthrough; + case -NFS4ERR_BAD_STATEID: case -NFS4ERR_STALE_STATEID: case -NFS4ERR_EXPIRED: if (data->arg.new_lock_owner != 0) {