mbox series

[v2,00/28] Fix up soft mounts for NFSv4.x

Message ID 20190329215948.107328-1-trond.myklebust@hammerspace.com (mailing list archive)
Headers show
Series Fix up soft mounts for NFSv4.x | expand

Message

Trond Myklebust March 29, 2019, 9:59 p.m. UTC
This patchset aims to make soft mounts a viable option for NFSv4 clients
by minimising the risk of false positive timeouts, while allowing for
faster failover of reads and writes once a timeout is actually observed.

The patches rely on the NFS server correctly implementing the contract
specified in RFC7530 section 3.1.1 with respect to not dropping requests
while the transport connection is up. When this is the case, the client
can safely assume that if the request has not received a reply after
transmitting a RPC request, it is not because the request was dropped,
but rather is due to congestion, or slow processing on the server.
IOW: as long as the connection remains up, there is no need for requests
to time out.

The patches break down roughly as follows:
- A set of patches to clean up the RPC engine timeouts, and ensure they
  are accurate.
- A set of patches to change the 'soft' mount semantics for NFSv4.x.
- A set of patches to add a new 'softerr' mount option that works like
  soft, but explicitly signals timeouts using the ETIMEDOUT error code
  rather than using EIO. This allows applications to tune their
  behaviour (e.g. by failing over to a different server) if a timeout
  occurs.
- A set of patches to change the NFS error reporting so that it matches
  that of local filesystems w.r.t. guarantees that filesystem errors are
  seen once and once only.
- A patch to ensure the safe interruption of NFS4ERR_DELAYed operations
- A patch to ensure that pNFS operations can be forced to break out
  of layout error cycles after a certain number of retries.
- A few cleanups...

-------
Changes since v1:
- Change NFSv4 soft timeout condition to prevent all requests from
  timing out when the connection is still up, instead of just the
  ones that have been sent.
- RPC queue timer cleanups
- Ratelimit the "server not responding" messages


*** BLURB HERE ***

Trond Myklebust (28):
  SUNRPC: Fix up task signalling
  SUNRPC: Refactor rpc_restart_call/rpc_restart_call_prepare
  SUNRPC: Refactor xprt_request_wait_receive()
  SUNRPC: Refactor rpc_sleep_on()
  SUNRPC: Remove unused argument 'action' from rpc_sleep_on_priority()
  SUNRPC: Add function rpc_sleep_on_timeout()
  SUNRPC: Fix up tracking of timeouts
  SUNRPC: Simplify queue timeouts using timer_reduce()
  SUNRPC: Declare RPC timers as TIMER_DEFERRABLE
  SUNRPC: Ensure that the transport layer respect major timeouts
  SUNRPC: Add tracking of RPC level errors
  SUNRPC: Make "no retrans timeout" soft tasks behave like softconn for
    timeouts
  SUNRPC: Start the first major timeout calculation at task creation
  SUNRPC: Ensure to ratelimit the "server not responding" syslog
    messages
  SUNRPC: Add the 'softerr' rpc_client flag
  NFS: Consider ETIMEDOUT to be a fatal error
  NFS: Move internal constants out of uapi/linux/nfs_mount.h
  NFS: Add a mount option "softerr" to allow clients to see ETIMEDOUT
    errors
  NFS: Don't interrupt file writeout due to fatal errors
  NFS: Don't call generic_error_remove_page() while holding locks
  NFS: Don't inadvertently clear writeback errors
  NFS: Replace custom error reporting mechanism with generic one
  NFS: Fix up NFS I/O subrequest creation
  NFS: Remove unused argument from nfs_create_request()
  pNFS: Add tracking to limit the number of pNFS retries
  NFS: Allow signal interruption of NFS4ERR_DELAYed operations
  NFS: Add a helper to return a pointer to the open context of a struct
    nfs_page
  NFS: Remove redundant open context from nfs_page

 fs/lockd/clntproc.c                        |   4 +-
 fs/nfs/client.c                            |   2 +
 fs/nfs/direct.c                            |  11 +-
 fs/nfs/file.c                              |  31 +---
 fs/nfs/filelayout/filelayout.c             |   4 +-
 fs/nfs/flexfilelayout/flexfilelayout.c     |  14 +-
 fs/nfs/internal.h                          |   7 +-
 fs/nfs/nfs4_fs.h                           |   1 +
 fs/nfs/nfs4file.c                          |   2 +-
 fs/nfs/nfs4proc.c                          | 159 +++++++++++++++------
 fs/nfs/pagelist.c                          | 122 +++++++++-------
 fs/nfs/pnfs.c                              |   4 +-
 fs/nfs/pnfs.h                              |   4 +-
 fs/nfs/read.c                              |   6 +-
 fs/nfs/super.c                             |  15 +-
 fs/nfs/write.c                             |  67 +++++----
 fs/nfsd/nfs4callback.c                     |   4 +-
 include/linux/nfs_fs.h                     |   1 -
 include/linux/nfs_fs_sb.h                  |  10 ++
 include/linux/nfs_page.h                   |  12 +-
 include/linux/sunrpc/clnt.h                |   2 +
 include/linux/sunrpc/sched.h               |  20 ++-
 include/linux/sunrpc/xprt.h                |   6 +-
 include/trace/events/sunrpc.h              |   8 +-
 include/uapi/linux/nfs_mount.h             |   9 --
 net/sunrpc/auth_gss/auth_gss.c             |   5 +-
 net/sunrpc/clnt.c                          | 116 +++++++++------
 net/sunrpc/debugfs.c                       |   2 +-
 net/sunrpc/rpcb_clnt.c                     |   3 +-
 net/sunrpc/sched.c                         | 158 +++++++++++++++-----
 net/sunrpc/xprt.c                          | 150 ++++++++++++-------
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |   2 +-
 net/sunrpc/xprtrdma/transport.c            |   2 +-
 net/sunrpc/xprtsock.c                      |   9 +-
 34 files changed, 631 insertions(+), 341 deletions(-)

Comments

Olga Kornievskaia April 1, 2019, 4:54 p.m. UTC | #1
On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <trondmy@gmail.com> wrote:
>
> This patchset aims to make soft mounts a viable option for NFSv4 clients
> by minimising the risk of false positive timeouts, while allowing for
> faster failover of reads and writes once a timeout is actually observed.
>
> The patches rely on the NFS server correctly implementing the contract
> specified in RFC7530 section 3.1.1 with respect to not dropping requests
> while the transport connection is up. When this is the case, the client
> can safely assume that if the request has not received a reply after
> transmitting a RPC request, it is not because the request was dropped,
> but rather is due to congestion, or slow processing on the server.
> IOW: as long as the connection remains up, there is no need for requests
> to time out.
>
> The patches break down roughly as follows:
> - A set of patches to clean up the RPC engine timeouts, and ensure they
>   are accurate.
> - A set of patches to change the 'soft' mount semantics for NFSv4.x.
> - A set of patches to add a new 'softerr' mount option that works like
>   soft, but explicitly signals timeouts using the ETIMEDOUT error code
>   rather than using EIO. This allows applications to tune their
>   behaviour (e.g. by failing over to a different server) if a timeout
>   occurs.

I'm just curious why would an application be aware of a different
server to connect to and an NFS layer would not be? I'm also curious
wouldn't it break application that typically expect to get an EIO
errors? Do all system calls allow to return ETIMEDOUT error?

> - A set of patches to change the NFS error reporting so that it matches
>   that of local filesystems w.r.t. guarantees that filesystem errors are
>   seen once and once only.
> - A patch to ensure the safe interruption of NFS4ERR_DELAYed operations
> - A patch to ensure that pNFS operations can be forced to break out
>   of layout error cycles after a certain number of retries.
> - A few cleanups...
>
> -------
> Changes since v1:
> - Change NFSv4 soft timeout condition to prevent all requests from
>   timing out when the connection is still up, instead of just the
>   ones that have been sent.
> - RPC queue timer cleanups
> - Ratelimit the "server not responding" messages
>
>
> *** BLURB HERE ***
>
> Trond Myklebust (28):
>   SUNRPC: Fix up task signalling
>   SUNRPC: Refactor rpc_restart_call/rpc_restart_call_prepare
>   SUNRPC: Refactor xprt_request_wait_receive()
>   SUNRPC: Refactor rpc_sleep_on()
>   SUNRPC: Remove unused argument 'action' from rpc_sleep_on_priority()
>   SUNRPC: Add function rpc_sleep_on_timeout()
>   SUNRPC: Fix up tracking of timeouts
>   SUNRPC: Simplify queue timeouts using timer_reduce()
>   SUNRPC: Declare RPC timers as TIMER_DEFERRABLE
>   SUNRPC: Ensure that the transport layer respect major timeouts
>   SUNRPC: Add tracking of RPC level errors
>   SUNRPC: Make "no retrans timeout" soft tasks behave like softconn for
>     timeouts
>   SUNRPC: Start the first major timeout calculation at task creation
>   SUNRPC: Ensure to ratelimit the "server not responding" syslog
>     messages
>   SUNRPC: Add the 'softerr' rpc_client flag
>   NFS: Consider ETIMEDOUT to be a fatal error
>   NFS: Move internal constants out of uapi/linux/nfs_mount.h
>   NFS: Add a mount option "softerr" to allow clients to see ETIMEDOUT
>     errors
>   NFS: Don't interrupt file writeout due to fatal errors
>   NFS: Don't call generic_error_remove_page() while holding locks
>   NFS: Don't inadvertently clear writeback errors
>   NFS: Replace custom error reporting mechanism with generic one
>   NFS: Fix up NFS I/O subrequest creation
>   NFS: Remove unused argument from nfs_create_request()
>   pNFS: Add tracking to limit the number of pNFS retries
>   NFS: Allow signal interruption of NFS4ERR_DELAYed operations
>   NFS: Add a helper to return a pointer to the open context of a struct
>     nfs_page
>   NFS: Remove redundant open context from nfs_page
>
>  fs/lockd/clntproc.c                        |   4 +-
>  fs/nfs/client.c                            |   2 +
>  fs/nfs/direct.c                            |  11 +-
>  fs/nfs/file.c                              |  31 +---
>  fs/nfs/filelayout/filelayout.c             |   4 +-
>  fs/nfs/flexfilelayout/flexfilelayout.c     |  14 +-
>  fs/nfs/internal.h                          |   7 +-
>  fs/nfs/nfs4_fs.h                           |   1 +
>  fs/nfs/nfs4file.c                          |   2 +-
>  fs/nfs/nfs4proc.c                          | 159 +++++++++++++++------
>  fs/nfs/pagelist.c                          | 122 +++++++++-------
>  fs/nfs/pnfs.c                              |   4 +-
>  fs/nfs/pnfs.h                              |   4 +-
>  fs/nfs/read.c                              |   6 +-
>  fs/nfs/super.c                             |  15 +-
>  fs/nfs/write.c                             |  67 +++++----
>  fs/nfsd/nfs4callback.c                     |   4 +-
>  include/linux/nfs_fs.h                     |   1 -
>  include/linux/nfs_fs_sb.h                  |  10 ++
>  include/linux/nfs_page.h                   |  12 +-
>  include/linux/sunrpc/clnt.h                |   2 +
>  include/linux/sunrpc/sched.h               |  20 ++-
>  include/linux/sunrpc/xprt.h                |   6 +-
>  include/trace/events/sunrpc.h              |   8 +-
>  include/uapi/linux/nfs_mount.h             |   9 --
>  net/sunrpc/auth_gss/auth_gss.c             |   5 +-
>  net/sunrpc/clnt.c                          | 116 +++++++++------
>  net/sunrpc/debugfs.c                       |   2 +-
>  net/sunrpc/rpcb_clnt.c                     |   3 +-
>  net/sunrpc/sched.c                         | 158 +++++++++++++++-----
>  net/sunrpc/xprt.c                          | 150 ++++++++++++-------
>  net/sunrpc/xprtrdma/svc_rdma_backchannel.c |   2 +-
>  net/sunrpc/xprtrdma/transport.c            |   2 +-
>  net/sunrpc/xprtsock.c                      |   9 +-
>  34 files changed, 631 insertions(+), 341 deletions(-)
>
> --
> 2.20.1
>
Trond Myklebust April 2, 2019, 6:28 p.m. UTC | #2
On Mon, 2019-04-01 at 12:54 -0400, Olga Kornievskaia wrote:
> On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <trondmy@gmail.com>
> wrote:
> > This patchset aims to make soft mounts a viable option for NFSv4
> > clients
> > by minimising the risk of false positive timeouts, while allowing
> > for
> > faster failover of reads and writes once a timeout is actually
> > observed.
> > 
> > The patches rely on the NFS server correctly implementing the
> > contract
> > specified in RFC7530 section 3.1.1 with respect to not dropping
> > requests
> > while the transport connection is up. When this is the case, the
> > client
> > can safely assume that if the request has not received a reply
> > after
> > transmitting a RPC request, it is not because the request was
> > dropped,
> > but rather is due to congestion, or slow processing on the server.
> > IOW: as long as the connection remains up, there is no need for
> > requests
> > to time out.
> > 
> > The patches break down roughly as follows:
> > - A set of patches to clean up the RPC engine timeouts, and ensure
> > they
> >   are accurate.
> > - A set of patches to change the 'soft' mount semantics for
> > NFSv4.x.
> > - A set of patches to add a new 'softerr' mount option that works
> > like
> >   soft, but explicitly signals timeouts using the ETIMEDOUT error
> > code
> >   rather than using EIO. This allows applications to tune their
> >   behaviour (e.g. by failing over to a different server) if a
> > timeout
> >   occurs.
> 
> I'm just curious why would an application be aware of a different
> server to connect to and an NFS layer would not be? I'm also curious
> wouldn't it break application that typically expect to get an EIO
> errors? Do all system calls allow to return ETIMEDOUT error?

This is why it is a separate mount option. ...and actually most
applications blow up when they get EIO as well. However you can imagine
an application that might decide to retry if it hits an ETIMEDOUT,
while failing if it hits an EIO.

Cheers
  Trond
Mkrtchyan, Tigran April 3, 2019, 8:51 p.m. UTC | #3
Hi Trond,

----- Original Message -----
> From: "Trond Myklebust" <trondmy@gmail.com>
> To: "Olga Kornievskaia" <aglo@umich.edu>
> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> Sent: Tuesday, April 2, 2019 8:28:38 PM
> Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x

> On Mon, 2019-04-01 at 12:54 -0400, Olga Kornievskaia wrote:
>> On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <trondmy@gmail.com>
>> wrote:
>> > This patchset aims to make soft mounts a viable option for NFSv4
>> > clients
>> > by minimising the risk of false positive timeouts, while allowing
>> > for
>> > faster failover of reads and writes once a timeout is actually
>> > observed.
>> > 
>> > The patches rely on the NFS server correctly implementing the
>> > contract
>> > specified in RFC7530 section 3.1.1 with respect to not dropping
>> > requests
>> > while the transport connection is up. When this is the case, the
>> > client
>> > can safely assume that if the request has not received a reply
>> > after
>> > transmitting a RPC request, it is not because the request was
>> > dropped,
>> > but rather is due to congestion, or slow processing on the server.
>> > IOW: as long as the connection remains up, there is no need for
>> > requests
>> > to time out.
>> > 
>> > The patches break down roughly as follows:
>> > - A set of patches to clean up the RPC engine timeouts, and ensure
>> > they
>> >   are accurate.
>> > - A set of patches to change the 'soft' mount semantics for
>> > NFSv4.x.
>> > - A set of patches to add a new 'softerr' mount option that works
>> > like
>> >   soft, but explicitly signals timeouts using the ETIMEDOUT error
>> > code
>> >   rather than using EIO. This allows applications to tune their
>> >   behaviour (e.g. by failing over to a different server) if a
>> > timeout
>> >   occurs.
>> 
>> I'm just curious why would an application be aware of a different
>> server to connect to and an NFS layer would not be? I'm also curious
>> wouldn't it break application that typically expect to get an EIO
>> errors? Do all system calls allow to return ETIMEDOUT error?
> 
> This is why it is a separate mount option. ...and actually most
> applications blow up when they get EIO as well. However you can imagine
> an application that might decide to retry if it hits an ETIMEDOUT,
> while failing if it hits an EIO.

What is the reason of introducing new error code for IO operations, which
is not in the list of POSIX specified values for read(2) and write(2). Is
there expected application behavior change compared to EAGAIN?

I would like to use the opportunity to bring the topic of O_NONBLOCK open(2)
flag for offline files.

Tigran.

> 
> Cheers
>   Trond
Trond Myklebust April 3, 2019, 9:13 p.m. UTC | #4
On Wed, 2019-04-03 at 22:51 +0200, Mkrtchyan, Tigran wrote:
> Hi Trond,
> 
> ----- Original Message -----
> > From: "Trond Myklebust" <trondmy@gmail.com>
> > To: "Olga Kornievskaia" <aglo@umich.edu>
> > Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> > Sent: Tuesday, April 2, 2019 8:28:38 PM
> > Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
> > On Mon, 2019-04-01 at 12:54 -0400, Olga Kornievskaia wrote:
> > > On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <
> > > trondmy@gmail.com>
> > > wrote:
> > > > This patchset aims to make soft mounts a viable option for
> > > > NFSv4
> > > > clients
> > > > by minimising the risk of false positive timeouts, while
> > > > allowing
> > > > for
> > > > faster failover of reads and writes once a timeout is actually
> > > > observed.
> > > > 
> > > > The patches rely on the NFS server correctly implementing the
> > > > contract
> > > > specified in RFC7530 section 3.1.1 with respect to not dropping
> > > > requests
> > > > while the transport connection is up. When this is the case,
> > > > the
> > > > client
> > > > can safely assume that if the request has not received a reply
> > > > after
> > > > transmitting a RPC request, it is not because the request was
> > > > dropped,
> > > > but rather is due to congestion, or slow processing on the
> > > > server.
> > > > IOW: as long as the connection remains up, there is no need for
> > > > requests
> > > > to time out.
> > > > 
> > > > The patches break down roughly as follows:
> > > > - A set of patches to clean up the RPC engine timeouts, and
> > > > ensure
> > > > they
> > > >   are accurate.
> > > > - A set of patches to change the 'soft' mount semantics for
> > > > NFSv4.x.
> > > > - A set of patches to add a new 'softerr' mount option that
> > > > works
> > > > like
> > > >   soft, but explicitly signals timeouts using the ETIMEDOUT
> > > > error
> > > > code
> > > >   rather than using EIO. This allows applications to tune their
> > > >   behaviour (e.g. by failing over to a different server) if a
> > > > timeout
> > > >   occurs.
> > > 
> > > I'm just curious why would an application be aware of a different
> > > server to connect to and an NFS layer would not be? I'm also
> > > curious
> > > wouldn't it break application that typically expect to get an EIO
> > > errors? Do all system calls allow to return ETIMEDOUT error?
> > 
> > This is why it is a separate mount option. ...and actually most
> > applications blow up when they get EIO as well. However you can
> > imagine
> > an application that might decide to retry if it hits an ETIMEDOUT,
> > while failing if it hits an EIO.
> 
> What is the reason of introducing new error code for IO operations,
> which
> is not in the list of POSIX specified values for read(2) and
> write(2). Is
> there expected application behavior change compared to EAGAIN?

The point is to allow aware applications to better handle a situation
which is not covered by POSIX because POSIX has no concept of storage
that is temporarily unavailable.

...and it is being proposed as an opt-in feature, precisely so that
existing applications don't need to change.

> I would like to use the opportunity to bring the topic of O_NONBLOCK
> open(2)
> flag for offline files.
Mkrtchyan, Tigran April 3, 2019, 9:59 p.m. UTC | #5
----- Original Message -----
> From: "trondmy" <trondmy@hammerspace.com>
> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>, "Olga Kornievskaia" <aglo@umich.edu>
> Sent: Wednesday, April 3, 2019 11:13:37 PM
> Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x

> On Wed, 2019-04-03 at 22:51 +0200, Mkrtchyan, Tigran wrote:
>> Hi Trond,
>> 
>> ----- Original Message -----
>> > From: "Trond Myklebust" <trondmy@gmail.com>
>> > To: "Olga Kornievskaia" <aglo@umich.edu>
>> > Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
>> > Sent: Tuesday, April 2, 2019 8:28:38 PM
>> > Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
>> > On Mon, 2019-04-01 at 12:54 -0400, Olga Kornievskaia wrote:
>> > > On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <
>> > > trondmy@gmail.com>
>> > > wrote:
>> > > > This patchset aims to make soft mounts a viable option for
>> > > > NFSv4
>> > > > clients
>> > > > by minimising the risk of false positive timeouts, while
>> > > > allowing
>> > > > for
>> > > > faster failover of reads and writes once a timeout is actually
>> > > > observed.
>> > > > 
>> > > > The patches rely on the NFS server correctly implementing the
>> > > > contract
>> > > > specified in RFC7530 section 3.1.1 with respect to not dropping
>> > > > requests
>> > > > while the transport connection is up. When this is the case,
>> > > > the
>> > > > client
>> > > > can safely assume that if the request has not received a reply
>> > > > after
>> > > > transmitting a RPC request, it is not because the request was
>> > > > dropped,
>> > > > but rather is due to congestion, or slow processing on the
>> > > > server.
>> > > > IOW: as long as the connection remains up, there is no need for
>> > > > requests
>> > > > to time out.
>> > > > 
>> > > > The patches break down roughly as follows:
>> > > > - A set of patches to clean up the RPC engine timeouts, and
>> > > > ensure
>> > > > they
>> > > >   are accurate.
>> > > > - A set of patches to change the 'soft' mount semantics for
>> > > > NFSv4.x.
>> > > > - A set of patches to add a new 'softerr' mount option that
>> > > > works
>> > > > like
>> > > >   soft, but explicitly signals timeouts using the ETIMEDOUT
>> > > > error
>> > > > code
>> > > >   rather than using EIO. This allows applications to tune their
>> > > >   behaviour (e.g. by failing over to a different server) if a
>> > > > timeout
>> > > >   occurs.
>> > > 
>> > > I'm just curious why would an application be aware of a different
>> > > server to connect to and an NFS layer would not be? I'm also
>> > > curious
>> > > wouldn't it break application that typically expect to get an EIO
>> > > errors? Do all system calls allow to return ETIMEDOUT error?
>> > 
>> > This is why it is a separate mount option. ...and actually most
>> > applications blow up when they get EIO as well. However you can
>> > imagine
>> > an application that might decide to retry if it hits an ETIMEDOUT,
>> > while failing if it hits an EIO.
>> 
>> What is the reason of introducing new error code for IO operations,
>> which
>> is not in the list of POSIX specified values for read(2) and
>> write(2). Is
>> there expected application behavior change compared to EAGAIN?
> 
> The point is to allow aware applications to better handle a situation
> which is not covered by POSIX because POSIX has no concept of storage
> that is temporarily unavailable.
> 
> ...and it is being proposed as an opt-in feature, precisely so that
> existing applications don't need to change.

Yes and no. As a mount option, you expose this behavior to all applications
on the client. Thus, either stupid app die and smart survive, or all
block, but smart suffer.

As you probably know, we have to handle similar issue. Currently it's a
server side configuration, which depending on uid/gid of the user returns
either NFSERR_IO or NFSERR_LAYOUTTRYLATER. This is still wrong, as not all
applications from the same users required the same handling.

Regards,
   Tigran.

> 
>> I would like to use the opportunity to bring the topic of O_NONBLOCK
>> open(2)
>> flag for offline files.
> 
> 
> --
> Trond Myklebust
> CTO, Hammerspace Inc
> 4300 El Camino Real, Suite 105
> Los Altos, CA 94022
> www.hammer.space
Trond Myklebust April 3, 2019, 10:10 p.m. UTC | #6
On Wed, 2019-04-03 at 23:59 +0200, Mkrtchyan, Tigran wrote:
> 
> ----- Original Message -----
> > From: "trondmy" <trondmy@hammerspace.com>
> > To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> > Cc: "linux-nfs" <linux-nfs@vger.kernel.org>, "Olga Kornievskaia" <
> > aglo@umich.edu>
> > Sent: Wednesday, April 3, 2019 11:13:37 PM
> > Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
> > On Wed, 2019-04-03 at 22:51 +0200, Mkrtchyan, Tigran wrote:
> > > Hi Trond,
> > > 
> > > ----- Original Message -----
> > > > From: "Trond Myklebust" <trondmy@gmail.com>
> > > > To: "Olga Kornievskaia" <aglo@umich.edu>
> > > > Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> > > > Sent: Tuesday, April 2, 2019 8:28:38 PM
> > > > Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
> > > > On Mon, 2019-04-01 at 12:54 -0400, Olga Kornievskaia wrote:
> > > > > On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <
> > > > > trondmy@gmail.com>
> > > > > wrote:
> > > > > > This patchset aims to make soft mounts a viable option for
> > > > > > NFSv4
> > > > > > clients
> > > > > > by minimising the risk of false positive timeouts, while
> > > > > > allowing
> > > > > > for
> > > > > > faster failover of reads and writes once a timeout is
> > > > > > actually
> > > > > > observed.
> > > > > > 
> > > > > > The patches rely on the NFS server correctly implementing
> > > > > > the
> > > > > > contract
> > > > > > specified in RFC7530 section 3.1.1 with respect to not
> > > > > > dropping
> > > > > > requests
> > > > > > while the transport connection is up. When this is the
> > > > > > case,
> > > > > > the
> > > > > > client
> > > > > > can safely assume that if the request has not received a
> > > > > > reply
> > > > > > after
> > > > > > transmitting a RPC request, it is not because the request
> > > > > > was
> > > > > > dropped,
> > > > > > but rather is due to congestion, or slow processing on the
> > > > > > server.
> > > > > > IOW: as long as the connection remains up, there is no need
> > > > > > for
> > > > > > requests
> > > > > > to time out.
> > > > > > 
> > > > > > The patches break down roughly as follows:
> > > > > > - A set of patches to clean up the RPC engine timeouts, and
> > > > > > ensure
> > > > > > they
> > > > > >   are accurate.
> > > > > > - A set of patches to change the 'soft' mount semantics for
> > > > > > NFSv4.x.
> > > > > > - A set of patches to add a new 'softerr' mount option that
> > > > > > works
> > > > > > like
> > > > > >   soft, but explicitly signals timeouts using the ETIMEDOUT
> > > > > > error
> > > > > > code
> > > > > >   rather than using EIO. This allows applications to tune
> > > > > > their
> > > > > >   behaviour (e.g. by failing over to a different server) if
> > > > > > a
> > > > > > timeout
> > > > > >   occurs.
> > > > > 
> > > > > I'm just curious why would an application be aware of a
> > > > > different
> > > > > server to connect to and an NFS layer would not be? I'm also
> > > > > curious
> > > > > wouldn't it break application that typically expect to get an
> > > > > EIO
> > > > > errors? Do all system calls allow to return ETIMEDOUT error?
> > > > 
> > > > This is why it is a separate mount option. ...and actually most
> > > > applications blow up when they get EIO as well. However you can
> > > > imagine
> > > > an application that might decide to retry if it hits an
> > > > ETIMEDOUT,
> > > > while failing if it hits an EIO.
> > > 
> > > What is the reason of introducing new error code for IO
> > > operations,
> > > which
> > > is not in the list of POSIX specified values for read(2) and
> > > write(2). Is
> > > there expected application behavior change compared to EAGAIN?
> > 
> > The point is to allow aware applications to better handle a
> > situation
> > which is not covered by POSIX because POSIX has no concept of
> > storage
> > that is temporarily unavailable.
> > 
> > ...and it is being proposed as an opt-in feature, precisely so that
> > existing applications don't need to change.
> 
> Yes and no. As a mount option, you expose this behavior to all
> applications
> on the client. Thus, either stupid app die and smart survive, or all
> block, but smart suffer.

I don't understand your point. This is doing the exact same thing as
'soft', but behaves differently with respect to timeouts, by returning
ETIMEDOUT instead of EIO.

IOW: if you want the same behaviour, but returning a POSIX error of
EIO, then that behaviour is already there with "soft".

> As you probably know, we have to handle similar issue. Currently it's
> a
> server side configuration, which depending on uid/gid of the user
> returns
> either NFSERR_IO or NFSERR_LAYOUTTRYLATER. This is still wrong, as
> not all
> applications from the same users required the same handling.

You have options here too.

Containers or VMs are one option for completely isolating applications
that need special behaviours, and giving them their own special mounts.

You can also isolate by path: mounting with one set of options in one
part of your namespace and with another set of options in another part
of the namespace, and then pointing the applications at the "correct"
path for the behaviour they need.


> Regards,
>    Tigran.
> 
> > > I would like to use the opportunity to bring the topic of
> > > O_NONBLOCK
> > > open(2)
> > > flag for offline files.
> > 
> > --
> > Trond Myklebust
> > CTO, Hammerspace Inc
> > 4300 El Camino Real, Suite 105
> > Los Altos, CA 94022
> > www.hammer.space