From patchwork Tue Jun  6 16:39:15 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Olga Kornievskaia <aglo@umich.edu>
X-Patchwork-Id: 9769273
Return-Path: <linux-nfs-owner@kernel.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
	[172.30.200.125])
	by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id
	EA4DF6034B for <patchwork-linux-nfs@patchwork.kernel.org>;
	Tue,  6 Jun 2017 16:39:19 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E6D63284B1
	for <patchwork-linux-nfs@patchwork.kernel.org>;
	Tue,  6 Jun 2017 16:39:19 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id D81A9284B5; Tue,  6 Jun 2017 16:39:19 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-6.8 required=2.0 tests=BAYES_00,DKIM_SIGNED,
	RCVD_IN_DNSWL_HI,T_DKIM_INVALID autolearn=ham version=3.3.1
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 17F3B284B1
	for <patchwork-linux-nfs@patchwork.kernel.org>;
	Tue,  6 Jun 2017 16:39:19 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751431AbdFFQjS (ORCPT
	<rfc822;patchwork-linux-nfs@patchwork.kernel.org>);
	Tue, 6 Jun 2017 12:39:18 -0400
Received: from mail-qt0-f173.google.com ([209.85.216.173]:35306 "EHLO
	mail-qt0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751200AbdFFQjR (ORCPT
	<rfc822;linux-nfs@vger.kernel.org>); Tue, 6 Jun 2017 12:39:17 -0400
Received: by mail-qt0-f173.google.com with SMTP id w1so136365192qtg.2
	for <linux-nfs@vger.kernel.org>; Tue, 06 Jun 2017 09:39:17 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=gmail.com; s=20161025;
	h=mime-version:sender:from:date:message-id:subject:to:cc
	:content-transfer-encoding;
	bh=mTXuYFPxwvFhcbOp93qHDu8TM8w1oE1Q3vMYQR92FTc=;
	b=pgWZ3YWuGm8A1LGwl/6LylZOjFodQPG0rlR9Dvw8bZ04zMDHscirBe3qKcqJfaXmhp
	KYw3M+KkQ6Htmx1J21GxZ1c4O4STneRxc+Qrjgz8CZ//zQLKisow6TgKnUvWyw1cDG9n
	OdxcrCr4wWgmp1+swAOhdDt7sdcS3cx0YzlAGB+BBMKTMJxpwNiyjfgtEERpJ2Dp86bI
	R/h6VElDycFs9tYmnSVo6l+vYVqdZCDoClBOPt89vFVNczDOHYhg8uoiLaOpJvZS/cz7
	P4jgGhPVm7aBeTz17mCcRqBir/IsMO0r+tPp2UEJnhO71QCa9burAxVnl1gdzdh8R2bL
	kTjg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=1e100.net; s=20161025;
	h=x-gm-message-state:mime-version:sender:from:date:message-id:subject
	:to:cc:content-transfer-encoding;
	bh=mTXuYFPxwvFhcbOp93qHDu8TM8w1oE1Q3vMYQR92FTc=;
	b=DJ0XgnxAkB4CRLxCVAIrzvP7PayazVQMxLkqNfz1scyh2gR3h42BVFRyDOP9rbk2yZ
	SOQajPKFB7xQvJOrDaTc+Ys62A0Y0YIunG8Qx+xp8I5uCHhnLU+Bi6/8oWf9DvagmZ7l
	8bE/BQ3mHbqN5P6Qm9aZIzc1guYYbz1OHU2MF67r8olQQYsUGaB2Uha1EK3oOOcsZpUf
	bSPmtILbQ1sBd71rPUhToWwrOJHZVQTkh4SN3tgeaul3oUYypfPduybyuY2JqpbIum+e
	f6dNUQz+nK3OhFYAyoOYcoU8mcWGu+913JzWkIt+tsgYXhIi80xUKRcJuDrkm7bycK2b
	TlVQ==
X-Gm-Message-State: AKS2vOxIh+kvVcsEbYjN40wVyxGqPhggrElcV3MEqiRjZOvDMM9bdqO4
	0II7HzzHmSspVA+5fhl8VCcSGX8nVJbd
X-Received: by 10.200.12.140 with SMTP id n12mr447298qti.230.1496767156323;
	Tue, 06 Jun 2017 09:39:16 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.12.172.79 with HTTP; Tue, 6 Jun 2017 09:39:15 -0700 (PDT)
From: Olga Kornievskaia <aglo@umich.edu>
Date: Tue, 6 Jun 2017 12:39:15 -0400
X-Google-Sender-Auth: izcMTY9VzUIbkXSxTo5W9y8dKgU
Message-ID: 
 <CAN-5tyHTmx7KM76DRa5F5PwgJpNsZELf8BBHeoBq-yacHfzpmg@mail.gmail.com>
Subject: race in state manager leads to failed recovery
To: Trond Myklebust <trondmy@primarydata.com>
Cc: linux-nfs <linux-nfs@vger.kernel.org>
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-nfs.vger.kernel.org>
X-Mailing-List: linux-nfs@vger.kernel.org
X-Virus-Scanned: ClamAV using ClamSMTP

An application can fail with EIO when the server reboots and the
client while doing the recovery is getting into the following
situation where it incorrectly chooses "nograce" recovery instead of
"reboot" recovery. The necessary trigger is a pnfs mount and 2 DS
reads on the same file fail with BAD_STATEID each read uses its own
lock stateid (and MDS server reboots losing state). Since  server
rebooted it will fail recovery ops with BAD_SESSION and then also
STALE_ID triggering session and clientid recovery.

1. Thread1 gets BAD_STATEID initiates stateid recovery calls
nfs4_state_mark_reclaim_no_grace() which sets the cl_state->flags and
state->flags respective _NOGRACE flags.

2. State manager gets to run and it clears the _NOGRACE from the
cl_state. Calls nfs4_do_reclaim() which sends TEST_STATEID which gets
a BAD_SESSION.

3. Thread2 now gets control and it also processing its BAD_STATEID and
calls nfs4_state_mark_reclaim_no_grace() and it again sets the
state/cl_state->flags to have _NOGRACE.

4. State manager runs and continues with state recovery by sending
DESTROY_SESSION, and then CREATE_SESSION which fails with
STALE_CLIENTID. Now we are recovering the lease.

5. State manager in nfs4_recovery_handle_error for STALE_CLIENTID,
calls nfs4_state_start_reclaim_reboot() calls
nfs4_state_mark_reclaim_reboot() which has a check

/* Don't recover state that expired before the reboot */
if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags)) {
  clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags);
  return 0;
}

Because the _NOGRACE was set again in Step3, we end up clearing
_RECLAIM_REBOOT and not setting _RECLAIM_REBOOT in the cl_state->flags
and choose to do "nograce" operation instead of recovery.

Removing this check removes the problem.

  return 1;

I'm confused by the comment why would we not what to recover state?

When both threads execute before the state manager starts this problem
doesn't exist.

If helpful here are two annotated (**) snippet from var log message
from the non-working and then one without the check:

Jun  6 12:06:42 ipa18 kernel: nfs4_state_mark_reclaim_nograce ffff880072f50c00
   ** first thread sets the nograce recovery
Jun  6 12:06:42 ipa18 kernel: AGLO: nfs4_state_manager reclaim_nograce
cleared NOGRACE from client state ffff88006fc23d28
  ** this is from the state manager’s loop from after
test_and_clear(NFS4CLNT_RECLAIM_NO_GRACE, cl_state)
Jun  6 12:06:42 ipa18 kernel: AGLO: nfs4_do_reclaim start
Jun  6 12:06:42 ipa18 kernel: NFS call  test_stateid ffff880077ac0af0
Jun  6 12:06:42 ipa18 kernel: nfs4_state_mark_reclaim_nograce ffff880072f50c00
   ** 2nd thread sets the nograce recovery
Jun  6 12:06:42 ipa18 kernel: AGLO: nfs4_do_reclaim status end=-11
Jun  6 12:06:42 ipa18 kernel: AGLO: nfs4_begin_drain_session
Jun  6 12:06:42 ipa18 kernel: AGLO: nfs4_state_mark_reclaim_reboot
clears RECLAIM_REBOOT from state=ffff880072f50c00
   ** this is from nfs4_state_mark_reclaim_reboot() and ends up not
setting the _RECLAIM_REBOOT flag in cl_state and clears it from the
state->flags
Jun  6 12:06:42 ipa18 kernel: AGLO: nfs4_begin_drain_session
Jun  6 12:06:43 ipa18 kernel: AGLO: nfs4_begin_drain_session
Jun  6 12:06:43 ipa18 kernel: AGLO: nfs4_reclaim_lease client
state=ffff88006fc23d28 flag has NOGRACE
Jun  6 12:06:43 ipa18 kernel: AGLO: nfs4_state_manager reclaim_nograce
cleared NOGRACE from client state ffff88006fc23d28
   ** this is in the state manager loop after the "reclaim_nograce" is
chosen and clearing the _NOGRACE flag
Jun  6 12:06:43 ipa18 kernel: AGLO: nfs4_do_reclaim start
Jun  6 12:06:43 ipa18 kernel: NFS call  test_stateid ffff880077ac0af0
Jun  6 12:06:43 ipa18 kernel: NFS call  test_stateid ffff880077ac02f0
Jun  6 12:06:43 ipa18 kernel: NFS call  test_stateid ffff880072f50c68
Jun  6 12:06:43 ipa18 kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jun  6 12:06:43 ipa18 kernel: NFS: nfs4_reclaim_open_state: Lock reclaim failed!
Jun  6 12:06:43 ipa18 kernel: AGLO: nfs4_reclaim_open_state clearing
NOGRACE from state=ffff880072f50c00
Jun  6 12:06:43 ipa18 kernel: AGLO: nfs4_do_reclaim end=0
Jun  6 12:06:43 ipa18 kernel: AGLO: nfs4_end_drain_session

This is a snippet from when we ignore that _NOGRACE is set and proceed
to set the _RECLAIM_REBOOT
Jun  6 12:21:58 ipa18 kernel: nfs4_state_mark_reclaim_nograce ffff88007363bbc0
Jun  6 12:21:58 ipa18 kernel: AGLO: nfs4_state_manager reclaim_nograce
cleared NOGRACE from client state ffff88002febe528
Jun  6 12:21:58 ipa18 kernel: AGLO: nfs4_do_reclaim start
Jun  6 12:21:58 ipa18 kernel: NFS call  test_stateid ffff880077ac4cf0
Jun  6 12:21:58 ipa18 kernel: nfs4_state_mark_reclaim_nograce ffff88007363bbc0
Jun  6 12:21:58 ipa18 kernel: AGLO: nfs4_do_reclaim status end=-11
Jun  6 12:21:58 ipa18 kernel: AGLO: nfs4_begin_drain_session
Jun  6 12:21:58 ipa18 kernel: AGLO: nfs4_state_mark_reclaim_reboot NOT
clearing _RECLAIM_REBOOT from state=ffff88007363bbc0
    *** Removing the check and instead proceeding to set the
_RECLAIM_REBOOT in the cl_state->flags so that
Jun  6 12:21:58 ipa18 kernel: AGLO: nfs4_begin_drain_session
Jun  6 12:21:59 ipa18 kernel: AGLO: nfs4_begin_drain_session
Jun  6 12:21:59 ipa18 kernel: AGLO: nfs4_reclaim_lease client
state=ffff88002febe528 flag has NOGRACE
Jun  6 12:21:59 ipa18 kernel: AGLO: nfs4_state_manager reclaim reboot
op cl_state=ffff88002febe528
   *** Unlike the previous case the state manager now goes into the
reclaim reboot state.
Jun  6 12:21:59 ipa18 kernel: AGLO: nfs4_do_reclaim start
Jun  6 12:21:59 ipa18 kernel: AGLO: nfs4_reclaim_open_state clearing
NOGRACE from state=ffff88007363bbc0
Jun  6 12:21:59 ipa18 kernel: AGLO: nfs4_do_reclaim end=0
Jun  6 12:21:59 ipa18 kernel: AGLO: nfs4_state_manager reclaim_nograce
cleared NOGRACE from client state ffff88002febe528
Jun  6 12:21:59 ipa18 kernel: AGLO: nfs4_do_reclaim start
Jun  6 12:21:59 ipa18 kernel: AGLO: nfs4_do_reclaim end=0
Jun  6 12:21:59 ipa18 kernel: AGLO: nfs4_end_drain_session
---
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 06274e3..2c668a0 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1321,11 +1321,6 @@ static int
nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_st
  if (!nfs4_valid_open_stateid(state))
  return 0;
  set_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags);
- /* Don't recover state that expired before the reboot */
- if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags)) {
- clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags);
- return 0;
- }
  set_bit(NFS_OWNER_RECLAIM_REBOOT, &state->owner->so_flags);
  set_bit(NFS4CLNT_RECLAIM_REBOOT, &clp->cl_state);