[v3,0/4] Containerised NFS clients and teardown

Message ID	cover.1742570192.git.trond.myklebust@hammerspace.com (mailing list archive)
Headers	show Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B8E41D95A9 for <linux-nfs@vger.kernel.org>; Fri, 21 Mar 2025 15:21:06 +0000 (UTC) From: trondmy@kernel.org To: linux-nfs@vger.kernel.org Cc: Jeff Layton <jlayton@kernel.org>, Josef Bacik <josef@toxicpanda.com> Subject: [PATCH v3 0/4] Containerised NFS clients and teardown Date: Fri, 21 Mar 2025 11:21:00 -0400 Message-ID: <cover.1742570192.git.trond.myklebust@hammerspace.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	Containerised NFS clients and teardown \| expand [v3,0/4] Containerised NFS clients and teardown [v3,1/4] NFS: Add a mount option to make ENETUNREACH errors fatal [v3,2/4] NFS: Treat ENETUNREACH errors as fatal in containers [v3,3/4] pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers [v3,4/4] pNFS/flexfiles: Report ENETDOWN as a connection error

Message ID

cover.1742570192.git.trond.myklebust@hammerspace.com (mailing list archive)

Headers

From: trondmy@kernel.org
To: linux-nfs@vger.kernel.org
Cc: Jeff Layton <jlayton@kernel.org>,
	Josef Bacik <josef@toxicpanda.com>
Subject: [PATCH v3 0/4] Containerised NFS clients and teardown
Date: Fri, 21 Mar 2025 11:21:00 -0400
Message-ID: <cover.1742570192.git.trond.myklebust@hammerspace.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

Containerised NFS clients and teardown | expand

Message

Trond Myklebust March 21, 2025, 3:21 p.m. UTC

From: Trond Myklebust <trond.myklebust@hammerspace.com>

When a NFS client is started from inside a container, it is often not
possible to ensure a safe shutdown and flush of the data before the
container orchestrator steps in to tear down the network. Typically,
what can happen is that the orchestrator triggers a lazy umount of the
mounted filesystems, then proceeds to delete virtual network device
links, bridges, NAT configurations, etc.

Once that happens, it may be impossible to reach into the container to
perform any further shutdown actions on the NFS client.

This patchset proposes to allow the client to deal with these situations
by treating the two errors ENETDOWN  and ENETUNREACH as being fatal.
The intention is to then allow the I/O queue to drain, and any remaining
RPC calls to error out, so that the lazy umounts can complete the
shutdown process.

In order to do so, a new mount option "fatal_neterrors" is introduced,
which can take the values "default", "none" and "ENETDOWN:ENETUNREACH".
The value "none" forces the existing behaviour, whereby hard mounts are
unaffected by the ENETDOWN and ENETUNREACH errors.
The value "ENETDOWN:ENETUNREACH" forces ENETDOWN and ENETUNREACH errors
to always be fatal.
If the user does not specify the "fatal_neterrors" option, or uses the
value "default", then ENETDOWN and ENETUNREACH will be fatal if the
mount was started from inside a network namespace that is not
"init_net", and otherwise not.

The expectation is that users will normally not need to set this option,
unless they are running inside a container, and want to prevent ENETDOWN
and ENETUNREACH from being fatal by setting "-ofatal_neterrors=none".

---
v2:
- Fix NFSv4 client cl_flag initialisation
- Add RPC task flag trace decoding
v3:
- Fix a copy/paste error in nfs4_set_client() (Thanks, Jeff Layton!)
- Fix the mount option name to be "fatal_neterrors".
- Capitalise ENETDOWN and ENETUNREACH in the fatal_neterrors parameter
  list to make it more obvious this refers to the POSIX networking
  errors.
- Always display the "fatal_neterrors" setting in /proc/mounts

Trond Myklebust (4):
  NFS: Add a mount option to make ENETUNREACH errors fatal
  NFS: Treat ENETUNREACH errors as fatal in containers
  pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
  pNFS/flexfiles: Report ENETDOWN as a connection error

 fs/nfs/client.c                        |  5 ++++
 fs/nfs/flexfilelayout/flexfilelayout.c | 24 ++++++++++++++--
 fs/nfs/fs_context.c                    | 39 ++++++++++++++++++++++++++
 fs/nfs/nfs3client.c                    |  2 ++
 fs/nfs/nfs4client.c                    |  7 +++++
 fs/nfs/nfs4proc.c                      |  3 ++
 fs/nfs/super.c                         |  3 ++
 include/linux/nfs4.h                   |  1 +
 include/linux/nfs_fs_sb.h              |  2 ++
 include/linux/sunrpc/clnt.h            |  5 +++-
 include/linux/sunrpc/sched.h           |  1 +
 include/trace/events/sunrpc.h          |  1 +
 net/sunrpc/clnt.c                      | 30 ++++++++++++++------
 13 files changed, 112 insertions(+), 11 deletions(-)

Comments

Chuck Lever March 21, 2025, 3:33 p.m. UTC | #1

On 3/21/25 11:21 AM, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> When a NFS client is started from inside a container, it is often not
> possible to ensure a safe shutdown and flush of the data before the
> container orchestrator steps in to tear down the network. Typically,
> what can happen is that the orchestrator triggers a lazy umount of the
> mounted filesystems, then proceeds to delete virtual network device
> links, bridges, NAT configurations, etc.
> 
> Once that happens, it may be impossible to reach into the container to
> perform any further shutdown actions on the NFS client.
> 
> This patchset proposes to allow the client to deal with these situations
> by treating the two errors ENETDOWN  and ENETUNREACH as being fatal.
> The intention is to then allow the I/O queue to drain, and any remaining
> RPC calls to error out, so that the lazy umounts can complete the
> shutdown process.
> 
> In order to do so, a new mount option "fatal_neterrors" is introduced,
> which can take the values "default", "none" and "ENETDOWN:ENETUNREACH".
> The value "none" forces the existing behaviour, whereby hard mounts are
> unaffected by the ENETDOWN and ENETUNREACH errors.
> The value "ENETDOWN:ENETUNREACH" forces ENETDOWN and ENETUNREACH errors
> to always be fatal.
> If the user does not specify the "fatal_neterrors" option, or uses the
> value "default", then ENETDOWN and ENETUNREACH will be fatal if the
> mount was started from inside a network namespace that is not
> "init_net", and otherwise not.
> 
> The expectation is that users will normally not need to set this option,
> unless they are running inside a container, and want to prevent ENETDOWN
> and ENETUNREACH from being fatal by setting "-ofatal_neterrors=none".
> 
> ---
> v2:
> - Fix NFSv4 client cl_flag initialisation
> - Add RPC task flag trace decoding
> v3:
> - Fix a copy/paste error in nfs4_set_client() (Thanks, Jeff Layton!)
> - Fix the mount option name to be "fatal_neterrors".
> - Capitalise ENETDOWN and ENETUNREACH in the fatal_neterrors parameter
>   list to make it more obvious this refers to the POSIX networking
>   errors.
> - Always display the "fatal_neterrors" setting in /proc/mounts
> 
> Trond Myklebust (4):
>   NFS: Add a mount option to make ENETUNREACH errors fatal
>   NFS: Treat ENETUNREACH errors as fatal in containers
>   pNFS/flexfiles: Treat ENETUNREACH errors as fatal in containers
>   pNFS/flexfiles: Report ENETDOWN as a connection error
> 
>  fs/nfs/client.c                        |  5 ++++
>  fs/nfs/flexfilelayout/flexfilelayout.c | 24 ++++++++++++++--
>  fs/nfs/fs_context.c                    | 39 ++++++++++++++++++++++++++
>  fs/nfs/nfs3client.c                    |  2 ++
>  fs/nfs/nfs4client.c                    |  7 +++++
>  fs/nfs/nfs4proc.c                      |  3 ++
>  fs/nfs/super.c                         |  3 ++
>  include/linux/nfs4.h                   |  1 +
>  include/linux/nfs_fs_sb.h              |  2 ++
>  include/linux/sunrpc/clnt.h            |  5 +++-
>  include/linux/sunrpc/sched.h           |  1 +
>  include/trace/events/sunrpc.h          |  1 +
>  net/sunrpc/clnt.c                      | 30 ++++++++++++++------
>  13 files changed, 112 insertions(+), 11 deletions(-)
> 

As my UK colleagues say, I'm extremely chuffed to see this feature!

Acked-by: Chuck Lever <chuck.lever@oracle.com>