mbox series

[0/6,nfs-utils,v2] fixes for error handling in nfsd_fh

Message ID 20231023021052.5258-1-neilb@suse.de (mailing list archive)
Headers show
Series fixes for error handling in nfsd_fh | expand

Message

NeilBrown Oct. 23, 2023, 1:58 a.m. UTC
Hi,
 this is a revised version of my previous series with the same name.
 This first two patches are unchanged.
 The third patch, which was an RFC, has been replaced with the last
 patch which actually addresses the issue rather than skirting 
 around it.

 Patch 3 here is a revert of a change I noticed while exploring the
 code.  cache_open() must be called BEFORE forking workers, as explained
 in that patch.
 Patches 4 and 5 factor our common code which makes the final patch
 simpler.

 The core issue is that sometimes mountd (or exportd) cannot give a
 definitey "yes" or "no" to a request to map an fsid to a path name.
 In these cases the only safe option is to delay and try again.

 This only becomes relevant if a filesystem is mounted by a client, then
 the server restarts (or the export cache is flushed) and the client
 tries to use a filehandle that it already has, but that server cannot
 find it and cannot be sure it doesn't exist.  This can happen when an
 export is marked "mountpoint" or when a re-exported NFS filesystem
 cannot contact the server and reports an ETIMEDOUT error.  In these
 cases we want the client to continue waiting (which it does) and also
 want mountd/exportd to periodically check if the target filesystem has
 come back (which it currently does not).  
 With the current code, once this situation happens and the client is
 waiting, the client will continue to wait indefintely even if the
 target filesytem becomes available.  The client can only continue if
 the NFS server is restarted or the export cache is flushed.  After the
 ptsch, then within 2 minutes of the target filesystem becoming
 available again, mountd will tell the kernel and when the client asks
 again it will get be allowed to proceed.

NeilBrown


 [PATCH 1/6] export: fix handling of error from match_fsid()
 [PATCH 2/6] export: add EACCES to the list of known
 [PATCH 3/6] export: move cache_open() before workers are forked.
 [PATCH 4/6] Move fork_workers() and wait_for_workers() in cache.c
 [PATCH 5/6] Share process_loop code between mountd and exportd.
 [PATCH 6/6] cache: periodically retry requests that couldn't be

Comments

Steve Dickson Oct. 25, 2023, 5:37 p.m. UTC | #1
On 10/22/23 9:58 PM, NeilBrown wrote:
> Hi,
>   this is a revised version of my previous series with the same name.
>   This first two patches are unchanged.
>   The third patch, which was an RFC, has been replaced with the last
>   patch which actually addresses the issue rather than skirting
>   around it.
> 
>   Patch 3 here is a revert of a change I noticed while exploring the
>   code.  cache_open() must be called BEFORE forking workers, as explained
>   in that patch.
>   Patches 4 and 5 factor our common code which makes the final patch
>   simpler.
> 
>   The core issue is that sometimes mountd (or exportd) cannot give a
>   definitey "yes" or "no" to a request to map an fsid to a path name.
>   In these cases the only safe option is to delay and try again.
> 
>   This only becomes relevant if a filesystem is mounted by a client, then
>   the server restarts (or the export cache is flushed) and the client
>   tries to use a filehandle that it already has, but that server cannot
>   find it and cannot be sure it doesn't exist.  This can happen when an
>   export is marked "mountpoint" or when a re-exported NFS filesystem
>   cannot contact the server and reports an ETIMEDOUT error.  In these
>   cases we want the client to continue waiting (which it does) and also
>   want mountd/exportd to periodically check if the target filesystem has
>   come back (which it currently does not).
>   With the current code, once this situation happens and the client is
>   waiting, the client will continue to wait indefintely even if the
>   target filesytem becomes available.  The client can only continue if
>   the NFS server is restarted or the export cache is flushed.  After the
>   ptsch, then within 2 minutes of the target filesystem becoming
>   available again, mountd will tell the kernel and when the client asks
>   again it will get be allowed to proceed.
> 
> NeilBrown
> 
> 
>   [PATCH 1/6] export: fix handling of error from match_fsid()
>   [PATCH 2/6] export: add EACCES to the list of known
>   [PATCH 3/6] export: move cache_open() before workers are forked.
>   [PATCH 4/6] Move fork_workers() and wait_for_workers() in cache.c
>   [PATCH 5/6] Share process_loop code between mountd and exportd.
>   [PATCH 6/6] cache: periodically retry requests that couldn't be
> 
Committed... (tag: nfs-utils-2-6-4-rc5)

steved.