mbox series

[v12,0/2] fuse: add kernel-enforced request timeout option

Message ID 20250122215528.1270478-1-joannelkoong@gmail.com (mailing list archive)
Headers show
Series fuse: add kernel-enforced request timeout option | expand

Message

Joanne Koong Jan. 22, 2025, 9:55 p.m. UTC
There are situations where fuse servers can become unresponsive or
stuck, for example if the server is in a deadlock. Currently, there's
no good way to detect if a server is stuck and needs to be killed
manually.

This patchset adds a timeout option where if the server does not reply to a
request by the time the timeout elapses, the connection will be aborted.
This patchset also adds two dynamically configurable fuse sysctls
"default_request_timeout" and "max_request_timeout" for controlling/enforcing
timeout behavior system-wide.

Existing systems running fuse servers will not be affected unless they
explicitly opt into the timeout.

v11:
https://lore.kernel.org/linux-fsdevel/20241218222630.99920-1-joannelkoong@gmail.com/
Changes from v11 -> v12:
* Pass request timeout through init instead of mount option (Miklos)
* Change sysctl upper bound to max u16 val
* Rebase on top of for-next, need to incorporate io-uring timeouts

v10:
https://lore.kernel.org/linux-fsdevel/20241214022827.1773071-1-joannelkoong@gmail.com/
Changes from v10 -> v11:
* Refactor check for request expiration (Sergey)
* Move workqueue cancellation to earlier in function (Jeff)
* Check fc->num_waiting as a shortcut in workqueue job (Etienne)

v9:
https://lore.kernel.org/linux-fsdevel/20241114191332.669127-1-joannelkoong@gmail.com/
Changes from v9 -> v10:
* Use delayed workqueues instead of timers (Sergey and Jeff)
* Change granularity to seconds instead of minutes (Sergey and Jeff)
* Use time_after() api for checking jiffies expiration (Sergey)
* Change timer check to run every 15 secs instead of every min
* Update documentation wording to be more clear

v8:
https://lore.kernel.org/linux-fsdevel/20241011191320.91592-1-joannelkoong@gmail.com/
Changes from v8 -> v9:
* Fix comment for u16 fs_parse_result, ULONG_MAX instead of U32_MAX, fix
  spacing (Bernd)

v7:
https://lore.kernel.org/linux-fsdevel/20241007184258.2837492-1-joannelkoong@gmail.com/
Changes from v7 -> v8:
* Use existing lists for checking expirations (Miklos)

v6:
https://lore.kernel.org/linux-fsdevel/20240830162649.3849586-1-joannelkoong@gmail.com/
Changes from v6 -> v7:
- Make timer per-connection instead of per-request (Miklos)
- Make default granularity of time minutes instead of seconds
- Removed the reviewed-bys since the interface of this has changed (now
  minutes, instead of seconds)

v5:
https://lore.kernel.org/linux-fsdevel/20240826203234.4079338-1-joannelkoong@gmail.com/
Changes from v5 -> v6:
- Gate sysctl.o behind CONFIG_SYSCTL in makefile (kernel test robot)
- Reword/clarify last sentence in cover letter (Miklos)

v4:
https://lore.kernel.org/linux-fsdevel/20240813232241.2369855-1-joannelkoong@gmail.com/
Changes from v4 -> v5:
- Change timeout behavior from aborting request to aborting connection (Miklos)
- Clarify wording for sysctl documentation (Jingbo)

v3:
https://lore.kernel.org/linux-fsdevel/20240808190110.3188039-1-joannelkoong@gmail.com/
Changes from v3 -> v4:
- Fix wording on some comments to make it more clear
- Use simpler logic for timer (eg remove extra if checks, use mod timer API) (Josef)
- Sanity-check should be on FR_FINISHING not FR_FINISHED (Jingbo)
- Fix comment for "processing queue", add req->fpq = NULL safeguard  (Bernd)

v2:
https://lore.kernel.org/linux-fsdevel/20240730002348.3431931-1-joannelkoong@gmail.com/
Changes from v2 -> v3:
- Disarm / rearm timer in dev_do_read to handle race conditions (Bernrd)
- Disarm timer in error handling for fatal interrupt (Yafang)
- Clean up do_fuse_request_end (Jingbo)
- Add timer for notify retrieve requests 
- Fix kernel test robot errors for #define no-op functions

v1:
https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
Changes from v1 -> v2:
- Add timeout for background requests
- Handle resend race condition
- Add sysctls


Joanne Koong (2):
  fuse: add kernel-enforced timeout option for requests
  fuse: add default_request_timeout and max_request_timeout sysctls

 Documentation/admin-guide/sysctl/fs.rst |  25 ++++++
 fs/fuse/dev.c                           | 101 ++++++++++++++++++++++++
 fs/fuse/dev_uring.c                     |  27 +++++++
 fs/fuse/dev_uring_i.h                   |   6 ++
 fs/fuse/fuse_dev_i.h                    |   3 +
 fs/fuse/fuse_i.h                        |  27 +++++++
 fs/fuse/inode.c                         |  41 +++++++++-
 fs/fuse/sysctl.c                        |  24 ++++++
 include/uapi/linux/fuse.h               |  10 ++-
 9 files changed, 261 insertions(+), 3 deletions(-)

Comments

Jeff Layton Jan. 22, 2025, 10:53 p.m. UTC | #1
On Wed, 2025-01-22 at 13:55 -0800, Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is in a deadlock. Currently, there's
> no good way to detect if a server is stuck and needs to be killed
> manually.
> 
> This patchset adds a timeout option where if the server does not reply to a
> request by the time the timeout elapses, the connection will be aborted.
> This patchset also adds two dynamically configurable fuse sysctls
> "default_request_timeout" and "max_request_timeout" for controlling/enforcing
> timeout behavior system-wide.
> 
> Existing systems running fuse servers will not be affected unless they
> explicitly opt into the timeout.
> 
> v11:
> https://lore.kernel.org/linux-fsdevel/20241218222630.99920-1-joannelkoong@gmail.com/
> Changes from v11 -> v12:
> * Pass request timeout through init instead of mount option (Miklos)
> * Change sysctl upper bound to max u16 val
> * Rebase on top of for-next, need to incorporate io-uring timeouts
> 
> v10:
> https://lore.kernel.org/linux-fsdevel/20241214022827.1773071-1-joannelkoong@gmail.com/
> Changes from v10 -> v11:
> * Refactor check for request expiration (Sergey)
> * Move workqueue cancellation to earlier in function (Jeff)
> * Check fc->num_waiting as a shortcut in workqueue job (Etienne)
> 
> v9:
> https://lore.kernel.org/linux-fsdevel/20241114191332.669127-1-joannelkoong@gmail.com/
> Changes from v9 -> v10:
> * Use delayed workqueues instead of timers (Sergey and Jeff)
> * Change granularity to seconds instead of minutes (Sergey and Jeff)
> * Use time_after() api for checking jiffies expiration (Sergey)
> * Change timer check to run every 15 secs instead of every min
> * Update documentation wording to be more clear
> 
> v8:
> https://lore.kernel.org/linux-fsdevel/20241011191320.91592-1-joannelkoong@gmail.com/
> Changes from v8 -> v9:
> * Fix comment for u16 fs_parse_result, ULONG_MAX instead of U32_MAX, fix
>   spacing (Bernd)
> 
> v7:
> https://lore.kernel.org/linux-fsdevel/20241007184258.2837492-1-joannelkoong@gmail.com/
> Changes from v7 -> v8:
> * Use existing lists for checking expirations (Miklos)
> 
> v6:
> https://lore.kernel.org/linux-fsdevel/20240830162649.3849586-1-joannelkoong@gmail.com/
> Changes from v6 -> v7:
> - Make timer per-connection instead of per-request (Miklos)
> - Make default granularity of time minutes instead of seconds
> - Removed the reviewed-bys since the interface of this has changed (now
>   minutes, instead of seconds)
> 
> v5:
> https://lore.kernel.org/linux-fsdevel/20240826203234.4079338-1-joannelkoong@gmail.com/
> Changes from v5 -> v6:
> - Gate sysctl.o behind CONFIG_SYSCTL in makefile (kernel test robot)
> - Reword/clarify last sentence in cover letter (Miklos)
> 
> v4:
> https://lore.kernel.org/linux-fsdevel/20240813232241.2369855-1-joannelkoong@gmail.com/
> Changes from v4 -> v5:
> - Change timeout behavior from aborting request to aborting connection (Miklos)
> - Clarify wording for sysctl documentation (Jingbo)
> 
> v3:
> https://lore.kernel.org/linux-fsdevel/20240808190110.3188039-1-joannelkoong@gmail.com/
> Changes from v3 -> v4:
> - Fix wording on some comments to make it more clear
> - Use simpler logic for timer (eg remove extra if checks, use mod timer API) (Josef)
> - Sanity-check should be on FR_FINISHING not FR_FINISHED (Jingbo)
> - Fix comment for "processing queue", add req->fpq = NULL safeguard  (Bernd)
> 
> v2:
> https://lore.kernel.org/linux-fsdevel/20240730002348.3431931-1-joannelkoong@gmail.com/
> Changes from v2 -> v3:
> - Disarm / rearm timer in dev_do_read to handle race conditions (Bernrd)
> - Disarm timer in error handling for fatal interrupt (Yafang)
> - Clean up do_fuse_request_end (Jingbo)
> - Add timer for notify retrieve requests 
> - Fix kernel test robot errors for #define no-op functions
> 
> v1:
> https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
> Changes from v1 -> v2:
> - Add timeout for background requests
> - Handle resend race condition
> - Add sysctls
> 
> 
> Joanne Koong (2):
>   fuse: add kernel-enforced timeout option for requests
>   fuse: add default_request_timeout and max_request_timeout sysctls
> 
>  Documentation/admin-guide/sysctl/fs.rst |  25 ++++++
>  fs/fuse/dev.c                           | 101 ++++++++++++++++++++++++
>  fs/fuse/dev_uring.c                     |  27 +++++++
>  fs/fuse/dev_uring_i.h                   |   6 ++
>  fs/fuse/fuse_dev_i.h                    |   3 +
>  fs/fuse/fuse_i.h                        |  27 +++++++
>  fs/fuse/inode.c                         |  41 +++++++++-
>  fs/fuse/sysctl.c                        |  24 ++++++
>  include/uapi/linux/fuse.h               |  10 ++-
>  9 files changed, 261 insertions(+), 3 deletions(-)
> 

Nice work, Joanne. You can add:

Reviewed-by: Jeff Layton <jlayton@kernel.org>