mbox series

[v5,0/2] fuse: add timeout option for requests

Message ID 20240826203234.4079338-1-joannelkoong@gmail.com (mailing list archive)
Headers show
Series fuse: add timeout option for requests | expand

Message

Joanne Koong Aug. 26, 2024, 8:32 p.m. UTC
There are situations where fuse servers can become unresponsive or
stuck, for example if the server is in a deadlock. Currently, there's
no good way to detect if a server is stuck and needs to be killed
manually.

This patchset adds a timeout option for requests where if the server does not
reply to the request by the time the timeout elapses, the connection will be
aborted. This patchset also adds two dynamically configurable fuse sysctls
"default_request_timeout" and "max_request_timeout" for controlling/enforcing
timeout behavior system-wide.

Existing fuse servers will not be affected unless they explicitly opt into the
timeout.

v4:
https://lore.kernel.org/linux-fsdevel/20240813232241.2369855-1-joannelkoong@gmail.com/
Changes from v4 -> v5:
- Change timeout behavior from aborting request to aborting connection (Miklos)
- Clarify wording for sysctl documentation (Jingbo)

v3:
https://lore.kernel.org/linux-fsdevel/20240808190110.3188039-1-joannelkoong@gmail.com/
Changes from v3 -> v4:
- Fix wording on some comments to make it more clear
- Use simpler logic for timer (eg remove extra if checks, use mod timer API)
  (Josef)
- Sanity-check should be on FR_FINISHING not FR_FINISHED (Jingbo)
- Fix comment for "processing queue", add req->fpq = NULL safeguard  (Bernd)

v2:
https://lore.kernel.org/linux-fsdevel/20240730002348.3431931-1-joannelkoong@gmail.com/
Changes from v2 -> v3:
- Disarm / rearm timer in dev_do_read to handle race conditions (Bernrd)
- Disarm timer in error handling for fatal interrupt (Yafang)
- Clean up do_fuse_request_end (Jingbo)
- Add timer for notify retrieve requests 
- Fix kernel test robot errors for #define no-op functions

v1:
https://lore.kernel.org/linux-fsdevel/20240717213458.1613347-1-joannelkoong@gmail.com/
Changes from v1 -> v2:
- Add timeout for background requests
- Handle resend race condition
- Add sysctls


Joanne Koong (2):
  fuse: add optional kernel-enforced timeout for requests
  fuse: add default_request_timeout and max_request_timeout sysctls

 Documentation/admin-guide/sysctl/fs.rst | 31 ++++++++++++++++++
 fs/fuse/Makefile                        |  2 +-
 fs/fuse/dev.c                           | 26 ++++++++++++++-
 fs/fuse/fuse_i.h                        | 24 ++++++++++++++
 fs/fuse/inode.c                         | 24 ++++++++++++++
 fs/fuse/sysctl.c                        | 42 +++++++++++++++++++++++++
 6 files changed, 147 insertions(+), 2 deletions(-)
 create mode 100644 fs/fuse/sysctl.c

Comments

Miklos Szeredi Aug. 27, 2024, 6:49 a.m. UTC | #1
On Mon, 26 Aug 2024 at 22:33, Joanne Koong <joannelkoong@gmail.com> wrote:
>
> There are situations where fuse servers can become unresponsive or
> stuck, for example if the server is in a deadlock. Currently, there's
> no good way to detect if a server is stuck and needs to be killed
> manually.
>
> This patchset adds a timeout option for requests where if the server does not
> reply to the request by the time the timeout elapses, the connection will be
> aborted. This patchset also adds two dynamically configurable fuse sysctls
> "default_request_timeout" and "max_request_timeout" for controlling/enforcing
> timeout behavior system-wide.
>
> Existing fuse servers will not be affected unless they explicitly opt into the
> timeout.

This last paragraph seems to contradict the purpose of the system-wide
maximum timeout.  If the server can opt out of timeouts, then
enforcing a maximum timeout is pointless.

Am I missing something?

Thanks,
Miklos
Joanne Koong Aug. 27, 2024, 5:24 p.m. UTC | #2
On Mon, Aug 26, 2024 at 11:49 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Mon, 26 Aug 2024 at 22:33, Joanne Koong <joannelkoong@gmail.com> wrote:
> >
> > There are situations where fuse servers can become unresponsive or
> > stuck, for example if the server is in a deadlock. Currently, there's
> > no good way to detect if a server is stuck and needs to be killed
> > manually.
> >
> > This patchset adds a timeout option for requests where if the server does not
> > reply to the request by the time the timeout elapses, the connection will be
> > aborted. This patchset also adds two dynamically configurable fuse sysctls
> > "default_request_timeout" and "max_request_timeout" for controlling/enforcing
> > timeout behavior system-wide.
> >
> > Existing fuse servers will not be affected unless they explicitly opt into the
> > timeout.
>
> This last paragraph seems to contradict the purpose of the system-wide
> maximum timeout.  If the server can opt out of timeouts, then
> enforcing a maximum timeout is pointless.
>
> Am I missing something?

Ah by that last paragraph, my intention was to convey that for
existing systems that run fuse servers, pre-existing behavior will
stay as is and they don't have to worry about any functional changes
from this patchset if they don't explicitly opt into it. I'll change
this wording to "Existing systems running fuse servers will not be
affected unless they explicitly opt into the timeout".


Thanks,
Joanne
>
> Thanks,
> Miklos