diff mbox

Looping on ERR_OLD_STATEID on LOCK

Message ID CAN-5tyHmCj=OeC--a9d51oPZL1GXku-Oj=oPAM=_0040Qz6AHA@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Olga Kornievskaia Feb. 27, 2018, 5:09 p.m. UTC
Hi folks,

I'd like to understand why nfs4_do_handle_exception() for
ERR_OLD_STATEID just retries the operation. What happens is that the
client resends that same operation with the same stateid over to the
server and gets the same ERR_OLD_STATEID back which puts the client in
an infinite loop (and a hung state).

Shouldn't the client instead treat ERR_OLD_STATEID same as ERR_BAD_STATEID?

    test_bit(NFS_LOCK_INITIALIZED, &lsp->ls_flags) != 0)
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Trond Myklebust Feb. 28, 2018, 2:03 p.m. UTC | #1
On Tue, 2018-02-27 at 12:09 -0500, Olga Kornievskaia wrote:
> Hi folks,

> 

> I'd like to understand why nfs4_do_handle_exception() for

> ERR_OLD_STATEID just retries the operation. What happens is that the

> client resends that same operation with the same stateid over to the

> server and gets the same ERR_OLD_STATEID back which puts the client

> in

> an infinite loop (and a hung state).

> 

> Shouldn't the client instead treat ERR_OLD_STATEID same as

> ERR_BAD_STATEID?


No, NFS4ERR_OLD_STATEID is almost always due to a race between an
incoming RPC reply and outgoing RPC call. It's when the server bumps
the state, and the client happens not to know about it yet.

IOW: We definitely do not want to treat it as a permanent error.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com
diff mbox

Patch

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c8b554a..2e006bc 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -411,6 +411,7 @@  static int nfs4_do_handle_exception(struct
nfs_server *server,
  case -NFS4ERR_ADMIN_REVOKED:
  case -NFS4ERR_EXPIRED:
  case -NFS4ERR_BAD_STATEID:
+ case -NFS4ERR_OLD_STATEID:
  if (inode != NULL && stateid != NULL) {
  nfs_inode_find_state_and_recover(inode,
  stateid);
@@ -477,7 +478,6 @@  static int nfs4_do_handle_exception(struct
nfs_server *server,
  return 0;

  case -NFS4ERR_RETRY_UNCACHED_REP:
- case -NFS4ERR_OLD_STATEID:
  exception->retry = 1;
  break;
  case -NFS4ERR_BADOWNER:

Or alternative, for the LOCK it should be:
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c8b554a..ce846faf 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6441,6 +6441,7 @@  static void nfs4_handle_setlk_error(struct
nfs_server *server, struct nfs4_lock_
  case -NFS4ERR_ADMIN_REVOKED:
  case -NFS4ERR_EXPIRED:
  case -NFS4ERR_BAD_STATEID:
+ case -NFS4ERR_OLD_STATEID:
  lsp->ls_seqid.flags &= ~NFS_SEQID_CONFIRMED;
  if (new_lock_owner != 0 ||