diff mbox series

fuse: add optional kernel-enforced timeout for fuse requests

Message ID 20240717213458.1613347-1-joannelkoong@gmail.com (mailing list archive)
State New
Headers show
Series fuse: add optional kernel-enforced timeout for fuse requests | expand

Commit Message

Joanne Koong July 17, 2024, 9:34 p.m. UTC
There are situations where fuse servers can become unresponsive or take
too long to reply to a request. Currently there is no upper bound on
how long a request may take, which may be frustrating to users who get
stuck waiting for a request to complete.

This commit adds a daemon timeout option (in seconds) for fuse requests.
If the timeout elapses before the request is replied to, the request will
fail with -ETIME.

There are 3 possibilities for a request that times out:
a) The request times out before the request has been sent to userspace
b) The request times out after the request has been sent to userspace
and before it receives a reply from the server
c) The request times out after the request has been sent to userspace
and the server replies while the kernel is timing out the request

Proper synchronization must be added to ensure that the request is
handled correctly in all of these cases. To this effect, there is a new
FR_PROCESSING bit added to the request flags, which is set atomically by
either the timeout handler (see fuse_request_timeout()) which is invoked
after the request timeout elapses or set by the request reply handler
(see dev_do_write()), whichever gets there first.

If the reply handler and the timeout handler are executing simultaneously
and the reply handler sets FR_PROCESSING before the timeout handler, then
the request is re-queued onto the waitqueue and the kernel will process the
reply as though the timeout did not elapse. If the timeout handler sets
FR_PROCESSING before the reply handler, then the request will fail with
-ETIME and the request will be cleaned up.

Proper acquires on the request reference must be added to ensure that the
timeout handler does not drop the last refcount on the request while the
reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
still accessing the request. (By "forwarder handler", this is the handler
that forwards the request to userspace).

Currently, this is the lifecycle of the request refcount:

Request is created:
fuse_simple_request -> allocates request, sets refcount to 1
  __fuse_request_send -> acquires refcount
    queues request and waits for reply...
fuse_simple_request -> drops refcount

Request is freed:
fuse_dev_do_write
  fuse_request_end -> drops refcount on request

The timeout handler drops the refcount on the request so that the
request is properly cleaned up if a reply is never received. Because of
this, both the forwarder handler and the reply handler must acquire a refcount
on the request while it accesses the request, and the refcount must be
acquired while the lock of the list the request is on is held.

There is a potential race if the request is being forwarded to
userspace while the timeout handler is executing (eg FR_PENDING has
already been cleared but dev_do_read() hasn't finished executing). This
is a problem because this would free the request but the request has not
been removed from the fpq list it's on. To prevent this, dev_do_read()
must check FR_PROCESSING at the end of its logic and remove the request
from the fpq list if the timeout occurred.

There is also the case where the connection may be aborted or the
device may be released while the timeout handler is running. To protect
against an extra refcount drop on the request, the timeout handler
checks the connected state of the list and lets the abort handler drop the
last reference if the abort is running simultaneously. Similarly, the
timeout handler also needs to check if the req->out.h.error is set to
-ESTALE, which indicates that the device release is cleaning up the
request. In both these cases, the timeout handler will return without
dropping the refcount.

Please also note that background requests are not applicable for timeouts
since they are asynchronous.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 fs/fuse/dev.c    | 177 ++++++++++++++++++++++++++++++++++++++++++++---
 fs/fuse/fuse_i.h |  12 ++++
 fs/fuse/inode.c  |   7 ++
 3 files changed, 188 insertions(+), 8 deletions(-)

Comments

Joanne Koong July 17, 2024, 10:02 p.m. UTC | #1
On Wed, Jul 17, 2024 at 2:35 PM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> There are situations where fuse servers can become unresponsive or take
> too long to reply to a request. Currently there is no upper bound on
> how long a request may take, which may be frustrating to users who get
> stuck waiting for a request to complete.
>
> This commit adds a daemon timeout option (in seconds) for fuse requests.
> If the timeout elapses before the request is replied to, the request will
> fail with -ETIME.

For reference, this is the userspace program I've been using to test
the timeout paths:
https://github.com/libfuse/libfuse/pull/994/commits/bb523c2db3417adfd939baef2345a485ad3f8c52
To locally simulate some of the racier conditions (eg request handler
handles the request and timeout handler re-adds the request to the
waitqueue), I manually added in sleeps in the kernel handlers to hit
those paths

> --
> 2.43.0
>
Bernd Schubert July 17, 2024, 10:23 p.m. UTC | #2
Hi Joanne,

On 7/17/24 23:34, Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or take
> too long to reply to a request. Currently there is no upper bound on
> how long a request may take, which may be frustrating to users who get
> stuck waiting for a request to complete.
> 
> This commit adds a daemon timeout option (in seconds) for fuse requests.
> If the timeout elapses before the request is replied to, the request will
> fail with -ETIME.
> 
> There are 3 possibilities for a request that times out:
> a) The request times out before the request has been sent to userspace
> b) The request times out after the request has been sent to userspace
> and before it receives a reply from the server
> c) The request times out after the request has been sent to userspace
> and the server replies while the kernel is timing out the request
> 
> Proper synchronization must be added to ensure that the request is
> handled correctly in all of these cases. To this effect, there is a new
> FR_PROCESSING bit added to the request flags, which is set atomically by
> either the timeout handler (see fuse_request_timeout()) which is invoked
> after the request timeout elapses or set by the request reply handler
> (see dev_do_write()), whichever gets there first.
> 
> If the reply handler and the timeout handler are executing simultaneously
> and the reply handler sets FR_PROCESSING before the timeout handler, then
> the request is re-queued onto the waitqueue and the kernel will process the
> reply as though the timeout did not elapse. If the timeout handler sets
> FR_PROCESSING before the reply handler, then the request will fail with
> -ETIME and the request will be cleaned up.
> 
> Proper acquires on the request reference must be added to ensure that the
> timeout handler does not drop the last refcount on the request while the
> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
> still accessing the request. (By "forwarder handler", this is the handler
> that forwards the request to userspace).
> 
> Currently, this is the lifecycle of the request refcount:
> 
> Request is created:
> fuse_simple_request -> allocates request, sets refcount to 1
>   __fuse_request_send -> acquires refcount
>     queues request and waits for reply...
> fuse_simple_request -> drops refcount
> 
> Request is freed:
> fuse_dev_do_write
>   fuse_request_end -> drops refcount on request
> 
> The timeout handler drops the refcount on the request so that the
> request is properly cleaned up if a reply is never received. Because of
> this, both the forwarder handler and the reply handler must acquire a refcount
> on the request while it accesses the request, and the refcount must be
> acquired while the lock of the list the request is on is held.
> 
> There is a potential race if the request is being forwarded to
> userspace while the timeout handler is executing (eg FR_PENDING has
> already been cleared but dev_do_read() hasn't finished executing). This
> is a problem because this would free the request but the request has not
> been removed from the fpq list it's on. To prevent this, dev_do_read()
> must check FR_PROCESSING at the end of its logic and remove the request
> from the fpq list if the timeout occurred.
> 
> There is also the case where the connection may be aborted or the
> device may be released while the timeout handler is running. To protect
> against an extra refcount drop on the request, the timeout handler
> checks the connected state of the list and lets the abort handler drop the
> last reference if the abort is running simultaneously. Similarly, the
> timeout handler also needs to check if the req->out.h.error is set to
> -ESTALE, which indicates that the device release is cleaning up the
> request. In both these cases, the timeout handler will return without
> dropping the refcount.
> 
> Please also note that background requests are not applicable for timeouts
> since they are asynchronous.


This and that thread here actually make me wonder if this is the right
approach

https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/


In  th3 thread above a request got interrupted, but fuse-server still
does not manage stop it. From my point of view, interrupting a request
suggests to add a rather short kernel lifetime for it. With that one
either needs to wake up in intervals and check if request timeout got
exceeded or it needs to be an async kernel thread. I think that async
thread would also allow to give a timeout to background requests.

Or we add an async timeout to bg and interupted requests additionally?


(I only basically reviewed, can't do carefully right now - on vacation
and as I just noticed, on a laptop that gives me electric shocks when I
connect it to power.)


Thanks,
Bernd



> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  fs/fuse/dev.c    | 177 ++++++++++++++++++++++++++++++++++++++++++++---
>  fs/fuse/fuse_i.h |  12 ++++
>  fs/fuse/inode.c  |   7 ++
>  3 files changed, 188 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 9eb191b5c4de..7dd7b244951b 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req)
>  }
>  EXPORT_SYMBOL_GPL(fuse_request_end);
>  
> +/* fuse_request_end for requests that timeout */
> +static void fuse_request_timeout(struct fuse_req *req)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +	struct fuse_iqueue *fiq = &fc->iq;
> +	struct fuse_pqueue *fpq;
> +
> +	spin_lock(&fiq->lock);
> +	if (!fiq->connected) {
> +		spin_unlock(&fiq->lock);
> +		/*
> +		 * Connection is being aborted. The abort will release
> +		 * the refcount on the request
> +		 */
> +		req->out.h.error = -ECONNABORTED;
> +		return;
> +	}
> +	if (test_bit(FR_PENDING, &req->flags)) {
> +		/* Request is not yet in userspace, bail out */
> +		list_del(&req->list);
> +		spin_unlock(&fiq->lock);
> +		req->out.h.error = -ETIME;
> +		__fuse_put_request(req);
> +		return;
> +	}
> +	if (test_bit(FR_INTERRUPTED, &req->flags))
> +		list_del_init(&req->intr_entry);
> +
> +	fpq = req->fpq;
> +	spin_unlock(&fiq->lock);
> +
> +	if (fpq) {
> +		spin_lock(&fpq->lock);
> +		if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) {
> +			spin_unlock(&fpq->lock);
> +			/*
> +			 * Connection is being aborted. The abort will release
> +			 * the refcount on the request
> +			 */
> +			req->out.h.error = -ECONNABORTED;
> +			return;
> +		}
> +		if (req->out.h.error == -ESTALE) {
> +			/*
> +			 * Device is being released. The fuse_dev_release call
> +			 * will drop the refcount on the request
> +			 */
> +			spin_unlock(&fpq->lock);
> +			return;
> +		}
> +		if (!test_bit(FR_PRIVATE, &req->flags))
> +			list_del_init(&req->list);
> +		spin_unlock(&fpq->lock);
> +	}
> +
> +	req->out.h.error = -ETIME;
> +
> +	if (test_bit(FR_ASYNC, &req->flags))
> +		req->args->end(req->fm, req->args, req->out.h.error);
> +
> +	fuse_put_request(req);
> +}
> +
>  static int queue_interrupt(struct fuse_req *req)
>  {
>  	struct fuse_iqueue *fiq = &req->fm->fc->iq;
> @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req)
>  	return 0;
>  }
>  
> +enum wait_type {
> +	WAIT_TYPE_INTERRUPTIBLE,
> +	WAIT_TYPE_KILLABLE,
> +	WAIT_TYPE_NONINTERRUPTIBLE,
> +};
> +
> +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +
> +	return wait_event_interruptible_timeout(req->waitq,
> +						test_bit(FR_FINISHED,
> +							 &req->flags),
> +						fc->daemon_timeout);
> +}
> +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO);
> +
> +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +	int err;
> +
> +wait_answer_start:
> +	if (type == WAIT_TYPE_INTERRUPTIBLE)
> +		err = fuse_wait_event_interruptible_timeout(req);
> +	else if (type == WAIT_TYPE_KILLABLE)
> +		err = wait_event_killable_timeout(req->waitq,
> +						  test_bit(FR_FINISHED, &req->flags),
> +						  fc->daemon_timeout);
> +
> +	else if (type == WAIT_TYPE_NONINTERRUPTIBLE)
> +		err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags),
> +					 fc->daemon_timeout);
> +	else
> +		WARN_ON(1);
> +
> +	/* request was answered */
> +	if (err > 0)
> +		return 0;
> +
> +	/* request was not answered in time */
> +	if (err == 0) {
> +		if (test_and_set_bit(FR_PROCESSING, &req->flags))
> +			/* request reply is being processed by kernel right now.
> +			 * we should wait for the answer.
> +			 */
> +			goto wait_answer_start;
> +
> +		fuse_request_timeout(req);
> +		return 0;
> +	}
> +
> +	/* else request was interrupted */
> +	return err;
> +}
> +
>  static void request_wait_answer(struct fuse_req *req)
>  {
>  	struct fuse_conn *fc = req->fm->fc;
> @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req)
>  
>  	if (!fc->no_interrupt) {
>  		/* Any signal may interrupt this */
> -		err = wait_event_interruptible(req->waitq,
> -					test_bit(FR_FINISHED, &req->flags));
> +		if (fc->daemon_timeout)
> +			err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE);
> +		else
> +			err = wait_event_interruptible(req->waitq,
> +						       test_bit(FR_FINISHED, &req->flags));
>  		if (!err)
>  			return;
>  
> @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req)
>  
>  	if (!test_bit(FR_FORCE, &req->flags)) {
>  		/* Only fatal signals may interrupt this */
> -		err = wait_event_killable(req->waitq,
> -					test_bit(FR_FINISHED, &req->flags));
> +		if (fc->daemon_timeout)
> +			err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE);
> +		else
> +			err = wait_event_killable(req->waitq,
> +						  test_bit(FR_FINISHED, &req->flags));
>  		if (!err)
>  			return;
>  
> @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req)
>  	 * Either request is already in userspace, or it was forced.
>  	 * Wait it out.
>  	 */
> -	wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> +	if (fc->daemon_timeout)
> +		wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE);
> +	else
> +		wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
>  }
>  
>  static void __fuse_request_send(struct fuse_req *req)
> @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>  	req = list_entry(fiq->pending.next, struct fuse_req, list);
>  	clear_bit(FR_PENDING, &req->flags);
>  	list_del_init(&req->list);
> +	/* Acquire a reference since fuse_request_timeout may also be executing  */
> +	__fuse_get_request(req);
> +	req->fpq = fpq;
>  	spin_unlock(&fiq->lock);
>  
>  	args = req->args;
> @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>  		if (args->opcode == FUSE_SETXATTR)
>  			req->out.h.error = -E2BIG;
>  		fuse_request_end(req);
> +		fuse_put_request(req);
>  		goto restart;
>  	}
>  	spin_lock(&fpq->lock);
> @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>  	}
>  	hash = fuse_req_hash(req->in.h.unique);
>  	list_move_tail(&req->list, &fpq->processing[hash]);
> -	__fuse_get_request(req);
>  	set_bit(FR_SENT, &req->flags);
>  	spin_unlock(&fpq->lock);
>  	/* matches barrier in request_wait_answer() */
>  	smp_mb__after_atomic();
>  	if (test_bit(FR_INTERRUPTED, &req->flags))
>  		queue_interrupt(req);
> +
> +	/* Check if request timed out */
> +	if (test_bit(FR_PROCESSING, &req->flags)) {
> +		spin_lock(&fpq->lock);
> +		if (!test_bit(FR_PRIVATE, &req->flags))
> +			list_del_init(&req->list);
> +		spin_unlock(&fpq->lock);
> +		fuse_put_request(req);
> +		return -ETIME;
> +	}
> +
>  	fuse_put_request(req);
>  
>  	return reqsize;
> @@ -1332,6 +1474,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>  		list_del_init(&req->list);
>  	spin_unlock(&fpq->lock);
>  	fuse_request_end(req);
> +	fuse_put_request(req);
>  	return err;
>  
>   err_unlock:
> @@ -1951,9 +2094,10 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
>  		goto copy_finish;
>  	}
>  
> +	__fuse_get_request(req);
> +
>  	/* Is it an interrupt reply ID? */
>  	if (oh.unique & FUSE_INT_REQ_BIT) {
> -		__fuse_get_request(req);
>  		spin_unlock(&fpq->lock);
>  
>  		err = 0;
> @@ -1969,6 +2113,13 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
>  		goto copy_finish;
>  	}
>  
> +	if (test_and_set_bit(FR_PROCESSING, &req->flags)) {
> +		/* request has timed out already */
> +		spin_unlock(&fpq->lock);
> +		fuse_put_request(req);
> +		goto copy_finish;
> +	}
> +
>  	clear_bit(FR_SENT, &req->flags);
>  	list_move(&req->list, &fpq->io);
>  	req->out.h = oh;
> @@ -1995,6 +2146,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
>  	spin_unlock(&fpq->lock);
>  
>  	fuse_request_end(req);
> +	fuse_put_request(req);
>  out:
>  	return err ? err : nbytes;
>  
> @@ -2260,13 +2412,22 @@ int fuse_dev_release(struct inode *inode, struct file *file)
>  	if (fud) {
>  		struct fuse_conn *fc = fud->fc;
>  		struct fuse_pqueue *fpq = &fud->pq;
> +		struct fuse_req *req;
>  		LIST_HEAD(to_end);
>  		unsigned int i;
>  
>  		spin_lock(&fpq->lock);
>  		WARN_ON(!list_empty(&fpq->io));
> -		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++)
> +		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
> +			/*
> +			 * Set the req error to -ESTALE so that if the timeout
> +			 * handler tries handling it, it knows it's being
> +			 * released
> +			 */
> +			list_for_each_entry(req, &fpq->processing[i], list)
> +				req->out.h.error = -ESTALE;
>  			list_splice_init(&fpq->processing[i], &to_end);
> +		}
>  		spin_unlock(&fpq->lock);
>  
>  		end_requests(&to_end);
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index f23919610313..cbabebbcd5bd 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -375,6 +375,9 @@ struct fuse_io_priv {
>   * FR_FINISHED:		request is finished
>   * FR_PRIVATE:		request is on private list
>   * FR_ASYNC:		request is asynchronous
> + * FR_PROCESSING:	request is being processed. this gets set when either
> + *			the reply is getting processed or the kernel is processing
> + *			a request timeout
>   */
>  enum fuse_req_flag {
>  	FR_ISREPLY,
> @@ -389,6 +392,7 @@ enum fuse_req_flag {
>  	FR_FINISHED,
>  	FR_PRIVATE,
>  	FR_ASYNC,
> +	FR_PROCESSING,
>  };
>  
>  /**
> @@ -435,6 +439,9 @@ struct fuse_req {
>  
>  	/** fuse_mount this request belongs to */
>  	struct fuse_mount *fm;
> +
> +	/** page queue this request has been added to */
> +	struct fuse_pqueue *fpq;
>  };
>  
>  struct fuse_iqueue;
> @@ -574,6 +581,8 @@ struct fuse_fs_context {
>  	enum fuse_dax_mode dax_mode;
>  	unsigned int max_read;
>  	unsigned int blksize;
> +	/*  Daemon timeout (in seconds). 0 = no timeout (infinite wait) */
> +	unsigned int daemon_timeout;
>  	const char *subtype;
>  
>  	/* DAX device, may be NULL */
> @@ -633,6 +642,9 @@ struct fuse_conn {
>  	/** Constrain ->max_pages to this value during feature negotiation */
>  	unsigned int max_pages_limit;
>  
> +	/* Daemon timeout (in jiffies). 0 = no timeout (infinite wait) */
> +	unsigned long daemon_timeout;
> +
>  	/** Input queue */
>  	struct fuse_iqueue iq;
>  
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 99e44ea7d875..a2d53a8b8e34 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -733,6 +733,7 @@ enum {
>  	OPT_ALLOW_OTHER,
>  	OPT_MAX_READ,
>  	OPT_BLKSIZE,
> +	OPT_DAEMON_TIMEOUT,
>  	OPT_ERR
>  };
>  
> @@ -747,6 +748,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = {
>  	fsparam_u32	("max_read",		OPT_MAX_READ),
>  	fsparam_u32	("blksize",		OPT_BLKSIZE),
>  	fsparam_string	("subtype",		OPT_SUBTYPE),
> +	fsparam_u32	("daemon_timeout",	OPT_DAEMON_TIMEOUT),
>  	{}
>  };
>  
> @@ -830,6 +832,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
>  		ctx->blksize = result.uint_32;
>  		break;
>  
> +	case OPT_DAEMON_TIMEOUT:
> +		ctx->daemon_timeout = result.uint_32;
> +		break;
> +
>  	default:
>  		return -EINVAL;
>  	}
> @@ -1724,6 +1730,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
>  	fc->group_id = ctx->group_id;
>  	fc->legacy_opts_show = ctx->legacy_opts_show;
>  	fc->max_read = max_t(unsigned int, 4096, ctx->max_read);
> +	fc->daemon_timeout = ctx->daemon_timeout * HZ;
>  	fc->destroy = ctx->destroy;
>  	fc->no_control = ctx->no_control;
>  	fc->no_force_umount = ctx->no_force_umount;
Joanne Koong July 18, 2024, 5:24 a.m. UTC | #3
On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert
<bernd.schubert@fastmail.fm> wrote:
>
> Hi Joanne,
>
> On 7/17/24 23:34, Joanne Koong wrote:
> > There are situations where fuse servers can become unresponsive or take
> > too long to reply to a request. Currently there is no upper bound on
> > how long a request may take, which may be frustrating to users who get
> > stuck waiting for a request to complete.
> >
> > This commit adds a daemon timeout option (in seconds) for fuse requests.
> > If the timeout elapses before the request is replied to, the request will
> > fail with -ETIME.
> >
> > There are 3 possibilities for a request that times out:
> > a) The request times out before the request has been sent to userspace
> > b) The request times out after the request has been sent to userspace
> > and before it receives a reply from the server
> > c) The request times out after the request has been sent to userspace
> > and the server replies while the kernel is timing out the request
> >
> > Proper synchronization must be added to ensure that the request is
> > handled correctly in all of these cases. To this effect, there is a new
> > FR_PROCESSING bit added to the request flags, which is set atomically by
> > either the timeout handler (see fuse_request_timeout()) which is invoked
> > after the request timeout elapses or set by the request reply handler
> > (see dev_do_write()), whichever gets there first.
> >
> > If the reply handler and the timeout handler are executing simultaneously
> > and the reply handler sets FR_PROCESSING before the timeout handler, then
> > the request is re-queued onto the waitqueue and the kernel will process the
> > reply as though the timeout did not elapse. If the timeout handler sets
> > FR_PROCESSING before the reply handler, then the request will fail with
> > -ETIME and the request will be cleaned up.
> >
> > Proper acquires on the request reference must be added to ensure that the
> > timeout handler does not drop the last refcount on the request while the
> > reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
> > still accessing the request. (By "forwarder handler", this is the handler
> > that forwards the request to userspace).
> >
> > Currently, this is the lifecycle of the request refcount:
> >
> > Request is created:
> > fuse_simple_request -> allocates request, sets refcount to 1
> >   __fuse_request_send -> acquires refcount
> >     queues request and waits for reply...
> > fuse_simple_request -> drops refcount
> >
> > Request is freed:
> > fuse_dev_do_write
> >   fuse_request_end -> drops refcount on request
> >
> > The timeout handler drops the refcount on the request so that the
> > request is properly cleaned up if a reply is never received. Because of
> > this, both the forwarder handler and the reply handler must acquire a refcount
> > on the request while it accesses the request, and the refcount must be
> > acquired while the lock of the list the request is on is held.
> >
> > There is a potential race if the request is being forwarded to
> > userspace while the timeout handler is executing (eg FR_PENDING has
> > already been cleared but dev_do_read() hasn't finished executing). This
> > is a problem because this would free the request but the request has not
> > been removed from the fpq list it's on. To prevent this, dev_do_read()
> > must check FR_PROCESSING at the end of its logic and remove the request
> > from the fpq list if the timeout occurred.
> >
> > There is also the case where the connection may be aborted or the
> > device may be released while the timeout handler is running. To protect
> > against an extra refcount drop on the request, the timeout handler
> > checks the connected state of the list and lets the abort handler drop the
> > last reference if the abort is running simultaneously. Similarly, the
> > timeout handler also needs to check if the req->out.h.error is set to
> > -ESTALE, which indicates that the device release is cleaning up the
> > request. In both these cases, the timeout handler will return without
> > dropping the refcount.
> >
> > Please also note that background requests are not applicable for timeouts
> > since they are asynchronous.
>
>
> This and that thread here actually make me wonder if this is the right
> approach
>
> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/
>
>
> In  th3 thread above a request got interrupted, but fuse-server still
> does not manage stop it. From my point of view, interrupting a request
> suggests to add a rather short kernel lifetime for it. With that one

Hi Bernd,

I believe this solution fixes the problem outlined in that thread
(namely, that the process gets stuck waiting for a reply). If the
request is interrupted before it times out, the kernel will wait with
a timeout again on the request (timeout would start over, but the
request will still eventually sooner or later time out). I'm not sure
I agree that we want to cancel the request altogether if it's
interrupted. For example, if the user uses the user-defined signal
SIGUSR1, it can be unexpected and arbitrary behavior for the request
to be aborted by the kernel. I also don't think this can be consistent
for what the fuse server will see since some requests may have already
been forwarded to userspace when the request is aborted and some
requests may not have.

I think if we were to enforce that the request should be aborted when
it's interrupted regardless of whether a timeout is specified or not,
then we should do it similarly to how the timeout handler logic
handles it in this patch,rather than the implementation in the thread
linked above (namely, that the request should be explicitly cleaned up
immediately instead of when the interrupt request sends a reply); I
don't believe the implementation in the link handles the case where
for example the fuse server is in a deadlock and does not reply to the
interrupt request. Also, as I understand it, it is optional for
servers to reply or not to the interrupt request.

> either needs to wake up in intervals and check if request timeout got
> exceeded or it needs to be an async kernel thread. I think that async
> thread would also allow to give a timeout to background requests.

in my opinion, background requests do not need timeouts. As I
understand it, background requests are used only for direct i/o async
read/writes, writing back dirty pages,and readahead requests generated
by the kernel. I don't think fuse servers would have a need for timing
out background requests.

>
> Or we add an async timeout to bg and interupted requests additionally?

The interrupted request will already have a timeout on it since it
waits with a timeout again for the reply after it's interrupted.

>
>
> (I only basically reviewed, can't do carefully right now - on vacation
> and as I just noticed, on a laptop that gives me electric shocks when I
> connect it to power.)

No worries, thanks for your comments and hope you have a great
vacation (without getting shocked :))!


Thanks,
Joanne
>
>
> Thanks,
> Bernd
>
>

>
Josef Bacik July 18, 2024, 2:44 p.m. UTC | #4
On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote:
> There are situations where fuse servers can become unresponsive or take
> too long to reply to a request. Currently there is no upper bound on
> how long a request may take, which may be frustrating to users who get
> stuck waiting for a request to complete.
> 
> This commit adds a daemon timeout option (in seconds) for fuse requests.
> If the timeout elapses before the request is replied to, the request will
> fail with -ETIME.
> 
> There are 3 possibilities for a request that times out:
> a) The request times out before the request has been sent to userspace
> b) The request times out after the request has been sent to userspace
> and before it receives a reply from the server
> c) The request times out after the request has been sent to userspace
> and the server replies while the kernel is timing out the request
> 
> Proper synchronization must be added to ensure that the request is
> handled correctly in all of these cases. To this effect, there is a new
> FR_PROCESSING bit added to the request flags, which is set atomically by
> either the timeout handler (see fuse_request_timeout()) which is invoked
> after the request timeout elapses or set by the request reply handler
> (see dev_do_write()), whichever gets there first.
> 
> If the reply handler and the timeout handler are executing simultaneously
> and the reply handler sets FR_PROCESSING before the timeout handler, then
> the request is re-queued onto the waitqueue and the kernel will process the
> reply as though the timeout did not elapse. If the timeout handler sets
> FR_PROCESSING before the reply handler, then the request will fail with
> -ETIME and the request will be cleaned up.
> 
> Proper acquires on the request reference must be added to ensure that the
> timeout handler does not drop the last refcount on the request while the
> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
> still accessing the request. (By "forwarder handler", this is the handler
> that forwards the request to userspace).
> 
> Currently, this is the lifecycle of the request refcount:
> 
> Request is created:
> fuse_simple_request -> allocates request, sets refcount to 1
>   __fuse_request_send -> acquires refcount
>     queues request and waits for reply...
> fuse_simple_request -> drops refcount
> 
> Request is freed:
> fuse_dev_do_write
>   fuse_request_end -> drops refcount on request
> 
> The timeout handler drops the refcount on the request so that the
> request is properly cleaned up if a reply is never received. Because of
> this, both the forwarder handler and the reply handler must acquire a refcount
> on the request while it accesses the request, and the refcount must be
> acquired while the lock of the list the request is on is held.
> 
> There is a potential race if the request is being forwarded to
> userspace while the timeout handler is executing (eg FR_PENDING has
> already been cleared but dev_do_read() hasn't finished executing). This
> is a problem because this would free the request but the request has not
> been removed from the fpq list it's on. To prevent this, dev_do_read()
> must check FR_PROCESSING at the end of its logic and remove the request
> from the fpq list if the timeout occurred.
> 
> There is also the case where the connection may be aborted or the
> device may be released while the timeout handler is running. To protect
> against an extra refcount drop on the request, the timeout handler
> checks the connected state of the list and lets the abort handler drop the
> last reference if the abort is running simultaneously. Similarly, the
> timeout handler also needs to check if the req->out.h.error is set to
> -ESTALE, which indicates that the device release is cleaning up the
> request. In both these cases, the timeout handler will return without
> dropping the refcount.
> 
> Please also note that background requests are not applicable for timeouts
> since they are asynchronous.
> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  fs/fuse/dev.c    | 177 ++++++++++++++++++++++++++++++++++++++++++++---
>  fs/fuse/fuse_i.h |  12 ++++
>  fs/fuse/inode.c  |   7 ++
>  3 files changed, 188 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 9eb191b5c4de..7dd7b244951b 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req)
>  }
>  EXPORT_SYMBOL_GPL(fuse_request_end);
>  
> +/* fuse_request_end for requests that timeout */
> +static void fuse_request_timeout(struct fuse_req *req)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +	struct fuse_iqueue *fiq = &fc->iq;
> +	struct fuse_pqueue *fpq;
> +
> +	spin_lock(&fiq->lock);
> +	if (!fiq->connected) {
> +		spin_unlock(&fiq->lock);
> +		/*
> +		 * Connection is being aborted. The abort will release
> +		 * the refcount on the request
> +		 */
> +		req->out.h.error = -ECONNABORTED;
> +		return;
> +	}
> +	if (test_bit(FR_PENDING, &req->flags)) {
> +		/* Request is not yet in userspace, bail out */
> +		list_del(&req->list);
> +		spin_unlock(&fiq->lock);
> +		req->out.h.error = -ETIME;
> +		__fuse_put_request(req);

Why is this safe?  We could be the last holder of the reference on this request
correct?  The only places using __fuse_put_request() would be where we are in a
path where the caller already holds a reference on the request.  Since this is
async it may not be the case right?

If it is safe then it's just confusing and warrants a comment.

> +		return;
> +	}
> +	if (test_bit(FR_INTERRUPTED, &req->flags))
> +		list_del_init(&req->intr_entry);
> +
> +	fpq = req->fpq;
> +	spin_unlock(&fiq->lock);
> +
> +	if (fpq) {
> +		spin_lock(&fpq->lock);
> +		if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) {
                                       ^^

You don't need the extra () there.

> +			spin_unlock(&fpq->lock);
> +			/*
> +			 * Connection is being aborted. The abort will release
> +			 * the refcount on the request
> +			 */
> +			req->out.h.error = -ECONNABORTED;
> +			return;
> +		}
> +		if (req->out.h.error == -ESTALE) {
> +			/*
> +			 * Device is being released. The fuse_dev_release call
> +			 * will drop the refcount on the request
> +			 */
> +			spin_unlock(&fpq->lock);
> +			return;
> +		}
> +		if (!test_bit(FR_PRIVATE, &req->flags))
> +			list_del_init(&req->list);
> +		spin_unlock(&fpq->lock);
> +	}
> +
> +	req->out.h.error = -ETIME;
> +
> +	if (test_bit(FR_ASYNC, &req->flags))
> +		req->args->end(req->fm, req->args, req->out.h.error);
> +
> +	fuse_put_request(req);
> +}

Just a general styling thing, we have two different states for requests here,
PENDING and !PENDING correct?  I think it may be better to do something like

if (test_bit(FR_PENDING, &req->flags))
	timeout_pending_req();
else
	timeout_inflight_req();

and then in timeout_pending_req() you do

spin_lock(&fiq->lock);
if (!test_bit(FR_PENDING, &req->flags)) {
	spin_unlock(&fiq_lock);
	timeout_inflight_req();
	return;
}

This will keep the two different state cleanup functions separate and a little
cleaner to grok.

> +
>  static int queue_interrupt(struct fuse_req *req)
>  {
>  	struct fuse_iqueue *fiq = &req->fm->fc->iq;
> @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req)
>  	return 0;
>  }
>  
> +enum wait_type {
> +	WAIT_TYPE_INTERRUPTIBLE,
> +	WAIT_TYPE_KILLABLE,
> +	WAIT_TYPE_NONINTERRUPTIBLE,
> +};
> +
> +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +
> +	return wait_event_interruptible_timeout(req->waitq,
> +						test_bit(FR_FINISHED,
> +							 &req->flags),
> +						fc->daemon_timeout);
> +}
> +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO);
> +
> +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type)
> +{
> +	struct fuse_conn *fc = req->fm->fc;
> +	int err;
> +
> +wait_answer_start:
> +	if (type == WAIT_TYPE_INTERRUPTIBLE)
> +		err = fuse_wait_event_interruptible_timeout(req);
> +	else if (type == WAIT_TYPE_KILLABLE)
> +		err = wait_event_killable_timeout(req->waitq,
> +						  test_bit(FR_FINISHED, &req->flags),
> +						  fc->daemon_timeout);
> +
> +	else if (type == WAIT_TYPE_NONINTERRUPTIBLE)
> +		err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags),
> +					 fc->daemon_timeout);
> +	else
> +		WARN_ON(1);

This will leak some random value for err, so initialize err to something that
will be dealt with, like -EINVAL;

> +
> +	/* request was answered */
> +	if (err > 0)
> +		return 0;
> +
> +	/* request was not answered in time */
> +	if (err == 0) {
> +		if (test_and_set_bit(FR_PROCESSING, &req->flags))
> +			/* request reply is being processed by kernel right now.
> +			 * we should wait for the answer.
> +			 */

Format for multiline comments is

/*
 * blah
 * blah
 */

and since this is a 1 line if statement put it above the if statement.

> +			goto wait_answer_start;
> +
> +		fuse_request_timeout(req);
> +		return 0;
> +	}
> +
> +	/* else request was interrupted */
> +	return err;
> +}
> +
>  static void request_wait_answer(struct fuse_req *req)
>  {
>  	struct fuse_conn *fc = req->fm->fc;
> @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req)
>  
>  	if (!fc->no_interrupt) {
>  		/* Any signal may interrupt this */
> -		err = wait_event_interruptible(req->waitq,
> -					test_bit(FR_FINISHED, &req->flags));
> +		if (fc->daemon_timeout)
> +			err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE);
> +		else
> +			err = wait_event_interruptible(req->waitq,
> +						       test_bit(FR_FINISHED, &req->flags));
>  		if (!err)
>  			return;
>  
> @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req)
>  
>  	if (!test_bit(FR_FORCE, &req->flags)) {
>  		/* Only fatal signals may interrupt this */
> -		err = wait_event_killable(req->waitq,
> -					test_bit(FR_FINISHED, &req->flags));
> +		if (fc->daemon_timeout)
> +			err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE);
> +		else
> +			err = wait_event_killable(req->waitq,
> +						  test_bit(FR_FINISHED, &req->flags));
>  		if (!err)
>  			return;
>  
> @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req)
>  	 * Either request is already in userspace, or it was forced.
>  	 * Wait it out.
>  	 */
> -	wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> +	if (fc->daemon_timeout)
> +		wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE);
> +	else
> +		wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
>  }
>  
>  static void __fuse_request_send(struct fuse_req *req)
> @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>  	req = list_entry(fiq->pending.next, struct fuse_req, list);
>  	clear_bit(FR_PENDING, &req->flags);
>  	list_del_init(&req->list);
> +	/* Acquire a reference since fuse_request_timeout may also be executing  */
> +	__fuse_get_request(req);
> +	req->fpq = fpq;
>  	spin_unlock(&fiq->lock);
>  

There's a race here with completion.  If we timeout a request right here, we can
end up sending that same request below.

You are going to need to check

test_bit(FR_PROCESSING)

after you take the fpq->lock just below here to make sure you didn't race with
the timeout handler and time the request out already.

>  	args = req->args;
> @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>  		if (args->opcode == FUSE_SETXATTR)
>  			req->out.h.error = -E2BIG;
>  		fuse_request_end(req);
> +		fuse_put_request(req);
>  		goto restart;
>  	}
>  	spin_lock(&fpq->lock);
> @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
>  	}
>  	hash = fuse_req_hash(req->in.h.unique);
>  	list_move_tail(&req->list, &fpq->processing[hash]);
> -	__fuse_get_request(req);
>  	set_bit(FR_SENT, &req->flags);
>  	spin_unlock(&fpq->lock);
>  	/* matches barrier in request_wait_answer() */
>  	smp_mb__after_atomic();
>  	if (test_bit(FR_INTERRUPTED, &req->flags))
>  		queue_interrupt(req);
> +
> +	/* Check if request timed out */
> +	if (test_bit(FR_PROCESSING, &req->flags)) {
> +		spin_lock(&fpq->lock);
> +		if (!test_bit(FR_PRIVATE, &req->flags))
> +			list_del_init(&req->list);
> +		spin_unlock(&fpq->lock);
> +		fuse_put_request(req);
> +		return -ETIME;
> +	}

This isn't quite right, we could have FR_PROCESSING set because we completed the
request before we got here.  If you put a schedule_timeout(HZ); right above this
you could easily see where a request gets completed by userspace, but now you've
fimed it out.

Additionally if we have FR_PROCESSING set from the timeout, shouldn't this
cleanup have been done already?  I don't understand why we need to handle this
here, we should just return and whoever is waiting on the request will get the
error.

Thanks,

Josef
Joanne Koong July 18, 2024, 4:58 p.m. UTC | #5
On Thu, Jul 18, 2024 at 7:44 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote:
...
> > ---
> >  fs/fuse/dev.c    | 177 ++++++++++++++++++++++++++++++++++++++++++++---
> >  fs/fuse/fuse_i.h |  12 ++++
> >  fs/fuse/inode.c  |   7 ++
> >  3 files changed, 188 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > index 9eb191b5c4de..7dd7b244951b 100644
> > --- a/fs/fuse/dev.c
> > +++ b/fs/fuse/dev.c
> > @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req)
> >  }
> >  EXPORT_SYMBOL_GPL(fuse_request_end);
> >
> > +/* fuse_request_end for requests that timeout */
> > +static void fuse_request_timeout(struct fuse_req *req)
> > +{
> > +     struct fuse_conn *fc = req->fm->fc;
> > +     struct fuse_iqueue *fiq = &fc->iq;
> > +     struct fuse_pqueue *fpq;
> > +
> > +     spin_lock(&fiq->lock);
> > +     if (!fiq->connected) {
> > +             spin_unlock(&fiq->lock);
> > +             /*
> > +              * Connection is being aborted. The abort will release
> > +              * the refcount on the request
> > +              */
> > +             req->out.h.error = -ECONNABORTED;
> > +             return;
> > +     }
> > +     if (test_bit(FR_PENDING, &req->flags)) {
> > +             /* Request is not yet in userspace, bail out */
> > +             list_del(&req->list);
> > +             spin_unlock(&fiq->lock);
> > +             req->out.h.error = -ETIME;
> > +             __fuse_put_request(req);
>
> Why is this safe?  We could be the last holder of the reference on this request
> correct?  The only places using __fuse_put_request() would be where we are in a
> path where the caller already holds a reference on the request.  Since this is
> async it may not be the case right?
>
> If it is safe then it's just confusing and warrants a comment.
>

There is always a refcount still held on the request by
fuse_simple_request() when this is called. I'll add a comment about
this.
I also just noticed that I use fuse_put_request()  at the end of this
function, I'll change that to __fuse_put_request() as well.

> > +             return;
> > +     }
> > +     if (test_bit(FR_INTERRUPTED, &req->flags))
> > +             list_del_init(&req->intr_entry);
> > +
> > +     fpq = req->fpq;
> > +     spin_unlock(&fiq->lock);
> > +
> > +     if (fpq) {
> > +             spin_lock(&fpq->lock);
> > +             if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) {
>                                        ^^
>
> You don't need the extra () there.
>
> > +                     spin_unlock(&fpq->lock);
> > +                     /*
> > +                      * Connection is being aborted. The abort will release
> > +                      * the refcount on the request
> > +                      */
> > +                     req->out.h.error = -ECONNABORTED;
> > +                     return;
> > +             }
> > +             if (req->out.h.error == -ESTALE) {
> > +                     /*
> > +                      * Device is being released. The fuse_dev_release call
> > +                      * will drop the refcount on the request
> > +                      */
> > +                     spin_unlock(&fpq->lock);
> > +                     return;
> > +             }
> > +             if (!test_bit(FR_PRIVATE, &req->flags))
> > +                     list_del_init(&req->list);
> > +             spin_unlock(&fpq->lock);
> > +     }
> > +
> > +     req->out.h.error = -ETIME;
> > +
> > +     if (test_bit(FR_ASYNC, &req->flags))
> > +             req->args->end(req->fm, req->args, req->out.h.error);
> > +
> > +     fuse_put_request(req);
> > +}
>
> Just a general styling thing, we have two different states for requests here,
> PENDING and !PENDING correct?  I think it may be better to do something like
>
> if (test_bit(FR_PENDING, &req->flags))
>         timeout_pending_req();
> else
>         timeout_inflight_req();
>
> and then in timeout_pending_req() you do
>
> spin_lock(&fiq->lock);
> if (!test_bit(FR_PENDING, &req->flags)) {
>         spin_unlock(&fiq_lock);
>         timeout_inflight_req();
>         return;
> }
>
> This will keep the two different state cleanup functions separate and a little
> cleaner to grok.
>
Thanks for the suggestion, I will make this change for v2.
> > +
> >  static int queue_interrupt(struct fuse_req *req)
> >  {
> >       struct fuse_iqueue *fiq = &req->fm->fc->iq;
> > @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req)
> >       return 0;
> >  }
> >
> > +enum wait_type {
> > +     WAIT_TYPE_INTERRUPTIBLE,
> > +     WAIT_TYPE_KILLABLE,
> > +     WAIT_TYPE_NONINTERRUPTIBLE,
> > +};
> > +
> > +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req)
> > +{
> > +     struct fuse_conn *fc = req->fm->fc;
> > +
> > +     return wait_event_interruptible_timeout(req->waitq,
> > +                                             test_bit(FR_FINISHED,
> > +                                                      &req->flags),
> > +                                             fc->daemon_timeout);
> > +}
> > +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO);
> > +
> > +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type)
> > +{
> > +     struct fuse_conn *fc = req->fm->fc;
> > +     int err;
> > +
> > +wait_answer_start:
> > +     if (type == WAIT_TYPE_INTERRUPTIBLE)
> > +             err = fuse_wait_event_interruptible_timeout(req);
> > +     else if (type == WAIT_TYPE_KILLABLE)
> > +             err = wait_event_killable_timeout(req->waitq,
> > +                                               test_bit(FR_FINISHED, &req->flags),
> > +                                               fc->daemon_timeout);
> > +
> > +     else if (type == WAIT_TYPE_NONINTERRUPTIBLE)
> > +             err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags),
> > +                                      fc->daemon_timeout);
> > +     else
> > +             WARN_ON(1);
>
> This will leak some random value for err, so initialize err to something that
> will be dealt with, like -EINVAL;
>
> > +
> > +     /* request was answered */
> > +     if (err > 0)
> > +             return 0;
> > +
> > +     /* request was not answered in time */
> > +     if (err == 0) {
> > +             if (test_and_set_bit(FR_PROCESSING, &req->flags))
> > +                     /* request reply is being processed by kernel right now.
> > +                      * we should wait for the answer.
> > +                      */
>
> Format for multiline comments is
>
> /*
>  * blah
>  * blah
>  */
>
> and since this is a 1 line if statement put it above the if statement.
>
> > +                     goto wait_answer_start;
> > +
> > +             fuse_request_timeout(req);
> > +             return 0;
> > +     }
> > +
> > +     /* else request was interrupted */
> > +     return err;
> > +}
> > +
> >  static void request_wait_answer(struct fuse_req *req)
> >  {
> >       struct fuse_conn *fc = req->fm->fc;
> > @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req)
> >
> >       if (!fc->no_interrupt) {
> >               /* Any signal may interrupt this */
> > -             err = wait_event_interruptible(req->waitq,
> > -                                     test_bit(FR_FINISHED, &req->flags));
> > +             if (fc->daemon_timeout)
> > +                     err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE);
> > +             else
> > +                     err = wait_event_interruptible(req->waitq,
> > +                                                    test_bit(FR_FINISHED, &req->flags));
> >               if (!err)
> >                       return;
> >
> > @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req)
> >
> >       if (!test_bit(FR_FORCE, &req->flags)) {
> >               /* Only fatal signals may interrupt this */
> > -             err = wait_event_killable(req->waitq,
> > -                                     test_bit(FR_FINISHED, &req->flags));
> > +             if (fc->daemon_timeout)
> > +                     err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE);
> > +             else
> > +                     err = wait_event_killable(req->waitq,
> > +                                               test_bit(FR_FINISHED, &req->flags));
> >               if (!err)
> >                       return;
> >
> > @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req)
> >        * Either request is already in userspace, or it was forced.
> >        * Wait it out.
> >        */
> > -     wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> > +     if (fc->daemon_timeout)
> > +             wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE);
> > +     else
> > +             wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> >  }
> >
> >  static void __fuse_request_send(struct fuse_req *req)
> > @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >       req = list_entry(fiq->pending.next, struct fuse_req, list);
> >       clear_bit(FR_PENDING, &req->flags);
> >       list_del_init(&req->list);
> > +     /* Acquire a reference since fuse_request_timeout may also be executing  */
> > +     __fuse_get_request(req);
> > +     req->fpq = fpq;
> >       spin_unlock(&fiq->lock);
> >
>
> There's a race here with completion.  If we timeout a request right here, we can
> end up sending that same request below.

I don't think there's any way around this unless we take the fpq lock
while we do the fuse_copy stuff, because even if we check the
FR_PROCESSING bit, the timeout handler could start running after the
fpq lock is released when we do the fuse_copy calls.

In my point of view, I don't think this race matters. We could have
this situation happen on a regular timed-out request. For example, we
send out a request to userspace and if the server takes too long to
reply, the request is cancelled/invalidated in the kernel but the
server will still see the request anyways.

WDYT?

>
> You are going to need to check
>
> test_bit(FR_PROCESSING)
>
> after you take the fpq->lock just below here to make sure you didn't race with
> the timeout handler and time the request out already.
>
> >       args = req->args;
> > @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >               if (args->opcode == FUSE_SETXATTR)
> >                       req->out.h.error = -E2BIG;
> >               fuse_request_end(req);
> > +             fuse_put_request(req);
> >               goto restart;
> >       }
> >       spin_lock(&fpq->lock);
> > @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> >       }
> >       hash = fuse_req_hash(req->in.h.unique);
> >       list_move_tail(&req->list, &fpq->processing[hash]);
> > -     __fuse_get_request(req);
> >       set_bit(FR_SENT, &req->flags);
> >       spin_unlock(&fpq->lock);
> >       /* matches barrier in request_wait_answer() */
> >       smp_mb__after_atomic();
> >       if (test_bit(FR_INTERRUPTED, &req->flags))
> >               queue_interrupt(req);
> > +
> > +     /* Check if request timed out */
> > +     if (test_bit(FR_PROCESSING, &req->flags)) {
> > +             spin_lock(&fpq->lock);
> > +             if (!test_bit(FR_PRIVATE, &req->flags))
> > +                     list_del_init(&req->list);
> > +             spin_unlock(&fpq->lock);
> > +             fuse_put_request(req);
> > +             return -ETIME;
> > +     }
>
> This isn't quite right, we could have FR_PROCESSING set because we completed the
> request before we got here.  If you put a schedule_timeout(HZ); right above this
> you could easily see where a request gets completed by userspace, but now you've
> fimed it out.

Oh I see, you're talking about the race where a request is replied to
immediately after the fuse_copy calls and before this gets called.
Then when we get here, we can't differentiate between whether
FR_PROCESSING was set by the timeout handler or the reply handler.

I think the simplest way around this is to check if the FR_SENT flag
was cleared (the reply handler clears it while holding the fpq lock
where FR_PROCESSING gets set and the timeout handler doesn't clear
it), then return -ETIME if it wasn't and 0 if it was.

I'll add this into v2.

>
> Additionally if we have FR_PROCESSING set from the timeout, shouldn't this
> cleanup have been done already?  I don't understand why we need to handle this
> here, we should just return and whoever is waiting on the request will get the
> error.

In most cases yes, but there is a race where the timeout handler may
finish executing before the logic in dev_do_read that adds the request
to the fpq lists. If this happens, the freed request will remain on
the list.

i think this race currently exists  prior to these changes as well -
in the case you mentioned above where the request may have been
completed right after the fuse_copy calls in dev_do_read  and before
dev_do_read moves the request to the fpq lists. We would get into the
same situation with a freed request still on the list.


Thanks,
Joanne
>
> Thanks,
>
> Josef
Bernd Schubert July 18, 2024, 10:11 p.m. UTC | #6
On 7/18/24 07:24, Joanne Koong wrote:
> On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert
> <bernd.schubert@fastmail.fm> wrote:
>>
>> Hi Joanne,
>>
>> On 7/17/24 23:34, Joanne Koong wrote:
>>> There are situations where fuse servers can become unresponsive or take
>>> too long to reply to a request. Currently there is no upper bound on
>>> how long a request may take, which may be frustrating to users who get
>>> stuck waiting for a request to complete.
>>>
>>> This commit adds a daemon timeout option (in seconds) for fuse requests.
>>> If the timeout elapses before the request is replied to, the request will
>>> fail with -ETIME.
>>>
>>> There are 3 possibilities for a request that times out:
>>> a) The request times out before the request has been sent to userspace
>>> b) The request times out after the request has been sent to userspace
>>> and before it receives a reply from the server
>>> c) The request times out after the request has been sent to userspace
>>> and the server replies while the kernel is timing out the request
>>>
>>> Proper synchronization must be added to ensure that the request is
>>> handled correctly in all of these cases. To this effect, there is a new
>>> FR_PROCESSING bit added to the request flags, which is set atomically by
>>> either the timeout handler (see fuse_request_timeout()) which is invoked
>>> after the request timeout elapses or set by the request reply handler
>>> (see dev_do_write()), whichever gets there first.
>>>
>>> If the reply handler and the timeout handler are executing simultaneously
>>> and the reply handler sets FR_PROCESSING before the timeout handler, then
>>> the request is re-queued onto the waitqueue and the kernel will process the
>>> reply as though the timeout did not elapse. If the timeout handler sets
>>> FR_PROCESSING before the reply handler, then the request will fail with
>>> -ETIME and the request will be cleaned up.
>>>
>>> Proper acquires on the request reference must be added to ensure that the
>>> timeout handler does not drop the last refcount on the request while the
>>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
>>> still accessing the request. (By "forwarder handler", this is the handler
>>> that forwards the request to userspace).
>>>
>>> Currently, this is the lifecycle of the request refcount:
>>>
>>> Request is created:
>>> fuse_simple_request -> allocates request, sets refcount to 1
>>>   __fuse_request_send -> acquires refcount
>>>     queues request and waits for reply...
>>> fuse_simple_request -> drops refcount
>>>
>>> Request is freed:
>>> fuse_dev_do_write
>>>   fuse_request_end -> drops refcount on request
>>>
>>> The timeout handler drops the refcount on the request so that the
>>> request is properly cleaned up if a reply is never received. Because of
>>> this, both the forwarder handler and the reply handler must acquire a refcount
>>> on the request while it accesses the request, and the refcount must be
>>> acquired while the lock of the list the request is on is held.
>>>
>>> There is a potential race if the request is being forwarded to
>>> userspace while the timeout handler is executing (eg FR_PENDING has
>>> already been cleared but dev_do_read() hasn't finished executing). This
>>> is a problem because this would free the request but the request has not
>>> been removed from the fpq list it's on. To prevent this, dev_do_read()
>>> must check FR_PROCESSING at the end of its logic and remove the request
>>> from the fpq list if the timeout occurred.
>>>
>>> There is also the case where the connection may be aborted or the
>>> device may be released while the timeout handler is running. To protect
>>> against an extra refcount drop on the request, the timeout handler
>>> checks the connected state of the list and lets the abort handler drop the
>>> last reference if the abort is running simultaneously. Similarly, the
>>> timeout handler also needs to check if the req->out.h.error is set to
>>> -ESTALE, which indicates that the device release is cleaning up the
>>> request. In both these cases, the timeout handler will return without
>>> dropping the refcount.
>>>
>>> Please also note that background requests are not applicable for timeouts
>>> since they are asynchronous.
>>
>>
>> This and that thread here actually make me wonder if this is the right
>> approach
>>
>> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/
>>
>>
>> In  th3 thread above a request got interrupted, but fuse-server still
>> does not manage stop it. From my point of view, interrupting a request
>> suggests to add a rather short kernel lifetime for it. With that one
> 
> Hi Bernd,
> 
> I believe this solution fixes the problem outlined in that thread
> (namely, that the process gets stuck waiting for a reply). If the
> request is interrupted before it times out, the kernel will wait with
> a timeout again on the request (timeout would start over, but the
> request will still eventually sooner or later time out). I'm not sure
> I agree that we want to cancel the request altogether if it's
> interrupted. For example, if the user uses the user-defined signal
> SIGUSR1, it can be unexpected and arbitrary behavior for the request
> to be aborted by the kernel. I also don't think this can be consistent
> for what the fuse server will see since some requests may have already
> been forwarded to userspace when the request is aborted and some
> requests may not have.
> 
> I think if we were to enforce that the request should be aborted when
> it's interrupted regardless of whether a timeout is specified or not,
> then we should do it similarly to how the timeout handler logic
> handles it in this patch,rather than the implementation in the thread
> linked above (namely, that the request should be explicitly cleaned up
> immediately instead of when the interrupt request sends a reply); I
> don't believe the implementation in the link handles the case where
> for example the fuse server is in a deadlock and does not reply to the
> interrupt request. Also, as I understand it, it is optional for
> servers to reply or not to the interrupt request.

Hi Joanne,

yeah, the solution in the link above is definitely not ideal and I think
a timout based solution would be better. But I think your patch wouldn't
work either right now, unless server side sets a request timeout.
Btw, I would rename the variable 'daemon_timeout' to somethink like
req_timeout.

> 
>> either needs to wake up in intervals and check if request timeout got
>> exceeded or it needs to be an async kernel thread. I think that async
>> thread would also allow to give a timeout to background requests.
> 
> in my opinion, background requests do not need timeouts. As I
> understand it, background requests are used only for direct i/o async
> read/writes, writing back dirty pages,and readahead requests generated
> by the kernel. I don't think fuse servers would have a need for timing
> out background requests.

There is another discussion here, where timeouts are a possible although
ugly solution to avoid page copies

https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@fastmail.fm/T/


That is the bg writeback code path.

> 
>>
>> Or we add an async timeout to bg and interupted requests additionally?
> 
> The interrupted request will already have a timeout on it since it
> waits with a timeout again for the reply after it's interrupted.

If daemon side configures timeouts. And interrupted requests might want
to have a different timeout. I will check when I'm back if we can update
your patch a bit for that.

Your patch hooks in quite nicely and basically without overhead into fg
(sync) requests. Timing out bg requests will have a bit overhead (unless
I miss something), so maybe we need two solutions here. And if we want
to go that route at all, to avoid these extra fuse page copies.


> 
>>
>>
>> (I only basically reviewed, can't do carefully right now - on vacation
>> and as I just noticed, on a laptop that gives me electric shocks when I
>> connect it to power.)
> 
> No worries, thanks for your comments and hope you have a great
> vacation (without getting shocked :))!

Thank you! For now I'm not connecting power, 3h of battery left :)


Thanks,
Bernd
Joanne Koong July 19, 2024, 12:37 a.m. UTC | #7
On Thu, Jul 18, 2024 at 3:11 PM Bernd Schubert
<bs_lists@aakef.fastmail.fm> wrote:
>
>
>
> On 7/18/24 07:24, Joanne Koong wrote:
> > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert
> > <bernd.schubert@fastmail.fm> wrote:
> >>
> >> Hi Joanne,
> >>
> >> On 7/17/24 23:34, Joanne Koong wrote:
> >>> There are situations where fuse servers can become unresponsive or take
> >>> too long to reply to a request. Currently there is no upper bound on
> >>> how long a request may take, which may be frustrating to users who get
> >>> stuck waiting for a request to complete.
> >>>
> >>> This commit adds a daemon timeout option (in seconds) for fuse requests.
> >>> If the timeout elapses before the request is replied to, the request will
> >>> fail with -ETIME.
> >>>
> >>> There are 3 possibilities for a request that times out:
> >>> a) The request times out before the request has been sent to userspace
> >>> b) The request times out after the request has been sent to userspace
> >>> and before it receives a reply from the server
> >>> c) The request times out after the request has been sent to userspace
> >>> and the server replies while the kernel is timing out the request
> >>>
> >>> Proper synchronization must be added to ensure that the request is
> >>> handled correctly in all of these cases. To this effect, there is a new
> >>> FR_PROCESSING bit added to the request flags, which is set atomically by
> >>> either the timeout handler (see fuse_request_timeout()) which is invoked
> >>> after the request timeout elapses or set by the request reply handler
> >>> (see dev_do_write()), whichever gets there first.
> >>>
> >>> If the reply handler and the timeout handler are executing simultaneously
> >>> and the reply handler sets FR_PROCESSING before the timeout handler, then
> >>> the request is re-queued onto the waitqueue and the kernel will process the
> >>> reply as though the timeout did not elapse. If the timeout handler sets
> >>> FR_PROCESSING before the reply handler, then the request will fail with
> >>> -ETIME and the request will be cleaned up.
> >>>
> >>> Proper acquires on the request reference must be added to ensure that the
> >>> timeout handler does not drop the last refcount on the request while the
> >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
> >>> still accessing the request. (By "forwarder handler", this is the handler
> >>> that forwards the request to userspace).
> >>>
> >>> Currently, this is the lifecycle of the request refcount:
> >>>
> >>> Request is created:
> >>> fuse_simple_request -> allocates request, sets refcount to 1
> >>>   __fuse_request_send -> acquires refcount
> >>>     queues request and waits for reply...
> >>> fuse_simple_request -> drops refcount
> >>>
> >>> Request is freed:
> >>> fuse_dev_do_write
> >>>   fuse_request_end -> drops refcount on request
> >>>
> >>> The timeout handler drops the refcount on the request so that the
> >>> request is properly cleaned up if a reply is never received. Because of
> >>> this, both the forwarder handler and the reply handler must acquire a refcount
> >>> on the request while it accesses the request, and the refcount must be
> >>> acquired while the lock of the list the request is on is held.
> >>>
> >>> There is a potential race if the request is being forwarded to
> >>> userspace while the timeout handler is executing (eg FR_PENDING has
> >>> already been cleared but dev_do_read() hasn't finished executing). This
> >>> is a problem because this would free the request but the request has not
> >>> been removed from the fpq list it's on. To prevent this, dev_do_read()
> >>> must check FR_PROCESSING at the end of its logic and remove the request
> >>> from the fpq list if the timeout occurred.
> >>>
> >>> There is also the case where the connection may be aborted or the
> >>> device may be released while the timeout handler is running. To protect
> >>> against an extra refcount drop on the request, the timeout handler
> >>> checks the connected state of the list and lets the abort handler drop the
> >>> last reference if the abort is running simultaneously. Similarly, the
> >>> timeout handler also needs to check if the req->out.h.error is set to
> >>> -ESTALE, which indicates that the device release is cleaning up the
> >>> request. In both these cases, the timeout handler will return without
> >>> dropping the refcount.
> >>>
> >>> Please also note that background requests are not applicable for timeouts
> >>> since they are asynchronous.
> >>
> >>
> >> This and that thread here actually make me wonder if this is the right
> >> approach
> >>
> >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/
> >>
> >>
> >> In  th3 thread above a request got interrupted, but fuse-server still
> >> does not manage stop it. From my point of view, interrupting a request
> >> suggests to add a rather short kernel lifetime for it. With that one
> >
> > Hi Bernd,
> >
> > I believe this solution fixes the problem outlined in that thread
> > (namely, that the process gets stuck waiting for a reply). If the
> > request is interrupted before it times out, the kernel will wait with
> > a timeout again on the request (timeout would start over, but the
> > request will still eventually sooner or later time out). I'm not sure
> > I agree that we want to cancel the request altogether if it's
> > interrupted. For example, if the user uses the user-defined signal
> > SIGUSR1, it can be unexpected and arbitrary behavior for the request
> > to be aborted by the kernel. I also don't think this can be consistent
> > for what the fuse server will see since some requests may have already
> > been forwarded to userspace when the request is aborted and some
> > requests may not have.
> >
> > I think if we were to enforce that the request should be aborted when
> > it's interrupted regardless of whether a timeout is specified or not,
> > then we should do it similarly to how the timeout handler logic
> > handles it in this patch,rather than the implementation in the thread
> > linked above (namely, that the request should be explicitly cleaned up
> > immediately instead of when the interrupt request sends a reply); I
> > don't believe the implementation in the link handles the case where
> > for example the fuse server is in a deadlock and does not reply to the
> > interrupt request. Also, as I understand it, it is optional for
> > servers to reply or not to the interrupt request.
>
> Hi Joanne,
>
> yeah, the solution in the link above is definitely not ideal and I think
> a timout based solution would be better. But I think your patch wouldn't
> work either right now, unless server side sets a request timeout.
> Btw, I would rename the variable 'daemon_timeout' to somethink like
> req_timeout.
>
Hi Bernd,

I think we need to figure out if we indeed want the kernel to abort
interrupted requests if no request timeout was explicitly set by the
server. I'm leaning towards no, for the reasons in my previous reply;
in addition to that I'm also not sure if we would be potentially
breaking existing filesystems if we introduced this new behavior.
Curious to hear your and others' thoughts on this.

(Btw, if we did want to add this in, i think the change would be
actually pretty simple. We could pretty much just reuse all the logic
that's implemented in the timeout handling code. It's very much the
same scenario (request getting aborted and needing to protect against
races with different handlers))

I will rename daemon_timeout to req_timeout in v2. Thanks for the suggestion.

> >
> >> either needs to wake up in intervals and check if request timeout got
> >> exceeded or it needs to be an async kernel thread. I think that async
> >> thread would also allow to give a timeout to background requests.
> >
> > in my opinion, background requests do not need timeouts. As I
> > understand it, background requests are used only for direct i/o async
> > read/writes, writing back dirty pages,and readahead requests generated
> > by the kernel. I don't think fuse servers would have a need for timing
> > out background requests.
>
> There is another discussion here, where timeouts are a possible although
> ugly solution to avoid page copies
>
> https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@fastmail.fm/T/
>
Thanks for the link, it's an interesting read.

>
> That is the bg writeback code path.
>
> >
> >>
> >> Or we add an async timeout to bg and interupted requests additionally?
> >
> > The interrupted request will already have a timeout on it since it
> > waits with a timeout again for the reply after it's interrupted.
>
> If daemon side configures timeouts. And interrupted requests might want
> to have a different timeout. I will check when I'm back if we can update
> your patch a bit for that.
>
> Your patch hooks in quite nicely and basically without overhead into fg
> (sync) requests. Timing out bg requests will have a bit overhead (unless
> I miss something), so maybe we need two solutions here. And if we want
> to go that route at all, to avoid these extra fuse page copies.
>
Agreed, I think if we do decide to go down this route, it seems
cleaner to me to have the background request timeouts handled
separately. Maybe something like having a timer per batch (where
"batch" is the group of requests that get flushed at the same time)?
That seems to me like the approach with the least overhead.


Thanks,
Joanne
>
> >
> >>
> >>
> >> (I only basically reviewed, can't do carefully right now - on vacation
> >> and as I just noticed, on a laptop that gives me electric shocks when I
> >> connect it to power.)
> >
> > No worries, thanks for your comments and hope you have a great
> > vacation (without getting shocked :))!
>
> Thank you! For now I'm not connecting power, 3h of battery left :)
>
>
> Thanks,
> Bernd
Josef Bacik July 19, 2024, 1:06 p.m. UTC | #8
On Thu, Jul 18, 2024 at 05:37:06PM -0700, Joanne Koong wrote:
> On Thu, Jul 18, 2024 at 3:11 PM Bernd Schubert
> <bs_lists@aakef.fastmail.fm> wrote:
> >
> >
> >
> > On 7/18/24 07:24, Joanne Koong wrote:
> > > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert
> > > <bernd.schubert@fastmail.fm> wrote:
> > >>
> > >> Hi Joanne,
> > >>
> > >> On 7/17/24 23:34, Joanne Koong wrote:
> > >>> There are situations where fuse servers can become unresponsive or take
> > >>> too long to reply to a request. Currently there is no upper bound on
> > >>> how long a request may take, which may be frustrating to users who get
> > >>> stuck waiting for a request to complete.
> > >>>
> > >>> This commit adds a daemon timeout option (in seconds) for fuse requests.
> > >>> If the timeout elapses before the request is replied to, the request will
> > >>> fail with -ETIME.
> > >>>
> > >>> There are 3 possibilities for a request that times out:
> > >>> a) The request times out before the request has been sent to userspace
> > >>> b) The request times out after the request has been sent to userspace
> > >>> and before it receives a reply from the server
> > >>> c) The request times out after the request has been sent to userspace
> > >>> and the server replies while the kernel is timing out the request
> > >>>
> > >>> Proper synchronization must be added to ensure that the request is
> > >>> handled correctly in all of these cases. To this effect, there is a new
> > >>> FR_PROCESSING bit added to the request flags, which is set atomically by
> > >>> either the timeout handler (see fuse_request_timeout()) which is invoked
> > >>> after the request timeout elapses or set by the request reply handler
> > >>> (see dev_do_write()), whichever gets there first.
> > >>>
> > >>> If the reply handler and the timeout handler are executing simultaneously
> > >>> and the reply handler sets FR_PROCESSING before the timeout handler, then
> > >>> the request is re-queued onto the waitqueue and the kernel will process the
> > >>> reply as though the timeout did not elapse. If the timeout handler sets
> > >>> FR_PROCESSING before the reply handler, then the request will fail with
> > >>> -ETIME and the request will be cleaned up.
> > >>>
> > >>> Proper acquires on the request reference must be added to ensure that the
> > >>> timeout handler does not drop the last refcount on the request while the
> > >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
> > >>> still accessing the request. (By "forwarder handler", this is the handler
> > >>> that forwards the request to userspace).
> > >>>
> > >>> Currently, this is the lifecycle of the request refcount:
> > >>>
> > >>> Request is created:
> > >>> fuse_simple_request -> allocates request, sets refcount to 1
> > >>>   __fuse_request_send -> acquires refcount
> > >>>     queues request and waits for reply...
> > >>> fuse_simple_request -> drops refcount
> > >>>
> > >>> Request is freed:
> > >>> fuse_dev_do_write
> > >>>   fuse_request_end -> drops refcount on request
> > >>>
> > >>> The timeout handler drops the refcount on the request so that the
> > >>> request is properly cleaned up if a reply is never received. Because of
> > >>> this, both the forwarder handler and the reply handler must acquire a refcount
> > >>> on the request while it accesses the request, and the refcount must be
> > >>> acquired while the lock of the list the request is on is held.
> > >>>
> > >>> There is a potential race if the request is being forwarded to
> > >>> userspace while the timeout handler is executing (eg FR_PENDING has
> > >>> already been cleared but dev_do_read() hasn't finished executing). This
> > >>> is a problem because this would free the request but the request has not
> > >>> been removed from the fpq list it's on. To prevent this, dev_do_read()
> > >>> must check FR_PROCESSING at the end of its logic and remove the request
> > >>> from the fpq list if the timeout occurred.
> > >>>
> > >>> There is also the case where the connection may be aborted or the
> > >>> device may be released while the timeout handler is running. To protect
> > >>> against an extra refcount drop on the request, the timeout handler
> > >>> checks the connected state of the list and lets the abort handler drop the
> > >>> last reference if the abort is running simultaneously. Similarly, the
> > >>> timeout handler also needs to check if the req->out.h.error is set to
> > >>> -ESTALE, which indicates that the device release is cleaning up the
> > >>> request. In both these cases, the timeout handler will return without
> > >>> dropping the refcount.
> > >>>
> > >>> Please also note that background requests are not applicable for timeouts
> > >>> since they are asynchronous.
> > >>
> > >>
> > >> This and that thread here actually make me wonder if this is the right
> > >> approach
> > >>
> > >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/
> > >>
> > >>
> > >> In  th3 thread above a request got interrupted, but fuse-server still
> > >> does not manage stop it. From my point of view, interrupting a request
> > >> suggests to add a rather short kernel lifetime for it. With that one
> > >
> > > Hi Bernd,
> > >
> > > I believe this solution fixes the problem outlined in that thread
> > > (namely, that the process gets stuck waiting for a reply). If the
> > > request is interrupted before it times out, the kernel will wait with
> > > a timeout again on the request (timeout would start over, but the
> > > request will still eventually sooner or later time out). I'm not sure
> > > I agree that we want to cancel the request altogether if it's
> > > interrupted. For example, if the user uses the user-defined signal
> > > SIGUSR1, it can be unexpected and arbitrary behavior for the request
> > > to be aborted by the kernel. I also don't think this can be consistent
> > > for what the fuse server will see since some requests may have already
> > > been forwarded to userspace when the request is aborted and some
> > > requests may not have.
> > >
> > > I think if we were to enforce that the request should be aborted when
> > > it's interrupted regardless of whether a timeout is specified or not,
> > > then we should do it similarly to how the timeout handler logic
> > > handles it in this patch,rather than the implementation in the thread
> > > linked above (namely, that the request should be explicitly cleaned up
> > > immediately instead of when the interrupt request sends a reply); I
> > > don't believe the implementation in the link handles the case where
> > > for example the fuse server is in a deadlock and does not reply to the
> > > interrupt request. Also, as I understand it, it is optional for
> > > servers to reply or not to the interrupt request.
> >
> > Hi Joanne,
> >
> > yeah, the solution in the link above is definitely not ideal and I think
> > a timout based solution would be better. But I think your patch wouldn't
> > work either right now, unless server side sets a request timeout.
> > Btw, I would rename the variable 'daemon_timeout' to somethink like
> > req_timeout.
> >
> Hi Bernd,
> 
> I think we need to figure out if we indeed want the kernel to abort
> interrupted requests if no request timeout was explicitly set by the
> server. I'm leaning towards no, for the reasons in my previous reply;
> in addition to that I'm also not sure if we would be potentially
> breaking existing filesystems if we introduced this new behavior.
> Curious to hear your and others' thoughts on this.
> 
> (Btw, if we did want to add this in, i think the change would be
> actually pretty simple. We could pretty much just reuse all the logic
> that's implemented in the timeout handling code. It's very much the
> same scenario (request getting aborted and needing to protect against
> races with different handlers))
> 
> I will rename daemon_timeout to req_timeout in v2. Thanks for the suggestion.
> 
> > >
> > >> either needs to wake up in intervals and check if request timeout got
> > >> exceeded or it needs to be an async kernel thread. I think that async
> > >> thread would also allow to give a timeout to background requests.
> > >
> > > in my opinion, background requests do not need timeouts. As I
> > > understand it, background requests are used only for direct i/o async
> > > read/writes, writing back dirty pages,and readahead requests generated
> > > by the kernel. I don't think fuse servers would have a need for timing
> > > out background requests.
> >
> > There is another discussion here, where timeouts are a possible although
> > ugly solution to avoid page copies
> >
> > https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@fastmail.fm/T/
> >
> Thanks for the link, it's an interesting read.
> 
> >
> > That is the bg writeback code path.
> >
> > >
> > >>
> > >> Or we add an async timeout to bg and interupted requests additionally?
> > >
> > > The interrupted request will already have a timeout on it since it
> > > waits with a timeout again for the reply after it's interrupted.
> >
> > If daemon side configures timeouts. And interrupted requests might want
> > to have a different timeout. I will check when I'm back if we can update
> > your patch a bit for that.
> >
> > Your patch hooks in quite nicely and basically without overhead into fg
> > (sync) requests. Timing out bg requests will have a bit overhead (unless
> > I miss something), so maybe we need two solutions here. And if we want
> > to go that route at all, to avoid these extra fuse page copies.
> >
> Agreed, I think if we do decide to go down this route, it seems
> cleaner to me to have the background request timeouts handled
> separately. Maybe something like having a timer per batch (where
> "batch" is the group of requests that get flushed at the same time)?
> That seems to me like the approach with the least overhead.
> 

I don't want to have a bunch of different timeouts, we should just have one and
have consistent behavior across all classes of requests.

I think the only thing we should have that is "separate" is a way to set request
timeouts that aren't set by the daemon itself.  Administrators should be able to
set a per-mount timeout via sysfs/algo in order to have some sort of control
over possibly malicious FUSE file systems.

But that should just tie into whatever mechanism you come up with, and
everything should all behave consistently with that timeout.  Thanks,

Josef
Joanne Koong July 22, 2024, 6:58 p.m. UTC | #9
On Fri, Jul 19, 2024 at 6:06 AM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Thu, Jul 18, 2024 at 05:37:06PM -0700, Joanne Koong wrote:
> > On Thu, Jul 18, 2024 at 3:11 PM Bernd Schubert
> > <bs_lists@aakef.fastmail.fm> wrote:
> > >
> > >
> > >
> > > On 7/18/24 07:24, Joanne Koong wrote:
> > > > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert
> > > > <bernd.schubert@fastmail.fm> wrote:
> > > >>
> > > >> Hi Joanne,
> > > >>
> > > >> On 7/17/24 23:34, Joanne Koong wrote:
> > > >>> There are situations where fuse servers can become unresponsive or take
> > > >>> too long to reply to a request. Currently there is no upper bound on
> > > >>> how long a request may take, which may be frustrating to users who get
> > > >>> stuck waiting for a request to complete.
> > > >>>
> > > >>> This commit adds a daemon timeout option (in seconds) for fuse requests.
> > > >>> If the timeout elapses before the request is replied to, the request will
> > > >>> fail with -ETIME.
> > > >>>
> > > >>> There are 3 possibilities for a request that times out:
> > > >>> a) The request times out before the request has been sent to userspace
> > > >>> b) The request times out after the request has been sent to userspace
> > > >>> and before it receives a reply from the server
> > > >>> c) The request times out after the request has been sent to userspace
> > > >>> and the server replies while the kernel is timing out the request
> > > >>>
> > > >>> Proper synchronization must be added to ensure that the request is
> > > >>> handled correctly in all of these cases. To this effect, there is a new
> > > >>> FR_PROCESSING bit added to the request flags, which is set atomically by
> > > >>> either the timeout handler (see fuse_request_timeout()) which is invoked
> > > >>> after the request timeout elapses or set by the request reply handler
> > > >>> (see dev_do_write()), whichever gets there first.
> > > >>>
> > > >>> If the reply handler and the timeout handler are executing simultaneously
> > > >>> and the reply handler sets FR_PROCESSING before the timeout handler, then
> > > >>> the request is re-queued onto the waitqueue and the kernel will process the
> > > >>> reply as though the timeout did not elapse. If the timeout handler sets
> > > >>> FR_PROCESSING before the reply handler, then the request will fail with
> > > >>> -ETIME and the request will be cleaned up.
> > > >>>
> > > >>> Proper acquires on the request reference must be added to ensure that the
> > > >>> timeout handler does not drop the last refcount on the request while the
> > > >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
> > > >>> still accessing the request. (By "forwarder handler", this is the handler
> > > >>> that forwards the request to userspace).
> > > >>>
> > > >>> Currently, this is the lifecycle of the request refcount:
> > > >>>
> > > >>> Request is created:
> > > >>> fuse_simple_request -> allocates request, sets refcount to 1
> > > >>>   __fuse_request_send -> acquires refcount
> > > >>>     queues request and waits for reply...
> > > >>> fuse_simple_request -> drops refcount
> > > >>>
> > > >>> Request is freed:
> > > >>> fuse_dev_do_write
> > > >>>   fuse_request_end -> drops refcount on request
> > > >>>
> > > >>> The timeout handler drops the refcount on the request so that the
> > > >>> request is properly cleaned up if a reply is never received. Because of
> > > >>> this, both the forwarder handler and the reply handler must acquire a refcount
> > > >>> on the request while it accesses the request, and the refcount must be
> > > >>> acquired while the lock of the list the request is on is held.
> > > >>>
> > > >>> There is a potential race if the request is being forwarded to
> > > >>> userspace while the timeout handler is executing (eg FR_PENDING has
> > > >>> already been cleared but dev_do_read() hasn't finished executing). This
> > > >>> is a problem because this would free the request but the request has not
> > > >>> been removed from the fpq list it's on. To prevent this, dev_do_read()
> > > >>> must check FR_PROCESSING at the end of its logic and remove the request
> > > >>> from the fpq list if the timeout occurred.
> > > >>>
> > > >>> There is also the case where the connection may be aborted or the
> > > >>> device may be released while the timeout handler is running. To protect
> > > >>> against an extra refcount drop on the request, the timeout handler
> > > >>> checks the connected state of the list and lets the abort handler drop the
> > > >>> last reference if the abort is running simultaneously. Similarly, the
> > > >>> timeout handler also needs to check if the req->out.h.error is set to
> > > >>> -ESTALE, which indicates that the device release is cleaning up the
> > > >>> request. In both these cases, the timeout handler will return without
> > > >>> dropping the refcount.
> > > >>>
> > > >>> Please also note that background requests are not applicable for timeouts
> > > >>> since they are asynchronous.
> > > >>
> > > >>
> > > >> This and that thread here actually make me wonder if this is the right
> > > >> approach
> > > >>
> > > >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/
> > > >>
> > > >>
> > > >> In  th3 thread above a request got interrupted, but fuse-server still
> > > >> does not manage stop it. From my point of view, interrupting a request
> > > >> suggests to add a rather short kernel lifetime for it. With that one
> > > >
> > > > Hi Bernd,
> > > >
> > > > I believe this solution fixes the problem outlined in that thread
> > > > (namely, that the process gets stuck waiting for a reply). If the
> > > > request is interrupted before it times out, the kernel will wait with
> > > > a timeout again on the request (timeout would start over, but the
> > > > request will still eventually sooner or later time out). I'm not sure
> > > > I agree that we want to cancel the request altogether if it's
> > > > interrupted. For example, if the user uses the user-defined signal
> > > > SIGUSR1, it can be unexpected and arbitrary behavior for the request
> > > > to be aborted by the kernel. I also don't think this can be consistent
> > > > for what the fuse server will see since some requests may have already
> > > > been forwarded to userspace when the request is aborted and some
> > > > requests may not have.
> > > >
> > > > I think if we were to enforce that the request should be aborted when
> > > > it's interrupted regardless of whether a timeout is specified or not,
> > > > then we should do it similarly to how the timeout handler logic
> > > > handles it in this patch,rather than the implementation in the thread
> > > > linked above (namely, that the request should be explicitly cleaned up
> > > > immediately instead of when the interrupt request sends a reply); I
> > > > don't believe the implementation in the link handles the case where
> > > > for example the fuse server is in a deadlock and does not reply to the
> > > > interrupt request. Also, as I understand it, it is optional for
> > > > servers to reply or not to the interrupt request.
> > >
> > > Hi Joanne,
> > >
> > > yeah, the solution in the link above is definitely not ideal and I think
> > > a timout based solution would be better. But I think your patch wouldn't
> > > work either right now, unless server side sets a request timeout.
> > > Btw, I would rename the variable 'daemon_timeout' to somethink like
> > > req_timeout.
> > >
> > Hi Bernd,
> >
> > I think we need to figure out if we indeed want the kernel to abort
> > interrupted requests if no request timeout was explicitly set by the
> > server. I'm leaning towards no, for the reasons in my previous reply;
> > in addition to that I'm also not sure if we would be potentially
> > breaking existing filesystems if we introduced this new behavior.
> > Curious to hear your and others' thoughts on this.
> >
> > (Btw, if we did want to add this in, i think the change would be
> > actually pretty simple. We could pretty much just reuse all the logic
> > that's implemented in the timeout handling code. It's very much the
> > same scenario (request getting aborted and needing to protect against
> > races with different handlers))
> >
> > I will rename daemon_timeout to req_timeout in v2. Thanks for the suggestion.
> >
> > > >
> > > >> either needs to wake up in intervals and check if request timeout got
> > > >> exceeded or it needs to be an async kernel thread. I think that async
> > > >> thread would also allow to give a timeout to background requests.
> > > >
> > > > in my opinion, background requests do not need timeouts. As I
> > > > understand it, background requests are used only for direct i/o async
> > > > read/writes, writing back dirty pages,and readahead requests generated
> > > > by the kernel. I don't think fuse servers would have a need for timing
> > > > out background requests.
> > >
> > > There is another discussion here, where timeouts are a possible although
> > > ugly solution to avoid page copies
> > >
> > > https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@fastmail.fm/T/
> > >
> > Thanks for the link, it's an interesting read.
> >
> > >
> > > That is the bg writeback code path.
> > >
> > > >
> > > >>
> > > >> Or we add an async timeout to bg and interupted requests additionally?
> > > >
> > > > The interrupted request will already have a timeout on it since it
> > > > waits with a timeout again for the reply after it's interrupted.
> > >
> > > If daemon side configures timeouts. And interrupted requests might want
> > > to have a different timeout. I will check when I'm back if we can update
> > > your patch a bit for that.
> > >
> > > Your patch hooks in quite nicely and basically without overhead into fg
> > > (sync) requests. Timing out bg requests will have a bit overhead (unless
> > > I miss something), so maybe we need two solutions here. And if we want
> > > to go that route at all, to avoid these extra fuse page copies.
> > >
> > Agreed, I think if we do decide to go down this route, it seems
> > cleaner to me to have the background request timeouts handled
> > separately. Maybe something like having a timer per batch (where
> > "batch" is the group of requests that get flushed at the same time)?
> > That seems to me like the approach with the least overhead.
> >
>
> I don't want to have a bunch of different timeouts, we should just have one and
> have consistent behavior across all classes of requests.
>
> I think the only thing we should have that is "separate" is a way to set request
> timeouts that aren't set by the daemon itself.  Administrators should be able to
> set a per-mount timeout via sysfs/algo in order to have some sort of control
> over possibly malicious FUSE file systems.
>
> But that should just tie into whatever mechanism you come up with, and
> everything should all behave consistently with that timeout.  Thanks,
>
> Josef

To summarize this thread so far, there are 2 open questions:
1) should interrupted requests be automatically aborted/cancelled by
default (even if no timeout is set)?
2) should background requests also have some timeout enforced on them?

I think the decision on 2) is the blocker for this patch ( 1) could be
added in the future as a separate patch).
Is this the route we want to go down for avoiding the extra page
copies? Who makes the call on this? I'm assuming it's the fuse
maintainer (Miklos)?
Josef Bacik July 22, 2024, 8:52 p.m. UTC | #10
On Mon, Jul 22, 2024 at 11:58:03AM -0700, Joanne Koong wrote:
> On Fri, Jul 19, 2024 at 6:06 AM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > On Thu, Jul 18, 2024 at 05:37:06PM -0700, Joanne Koong wrote:
> > > On Thu, Jul 18, 2024 at 3:11 PM Bernd Schubert
> > > <bs_lists@aakef.fastmail.fm> wrote:
> > > >
> > > >
> > > >
> > > > On 7/18/24 07:24, Joanne Koong wrote:
> > > > > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert
> > > > > <bernd.schubert@fastmail.fm> wrote:
> > > > >>
> > > > >> Hi Joanne,
> > > > >>
> > > > >> On 7/17/24 23:34, Joanne Koong wrote:
> > > > >>> There are situations where fuse servers can become unresponsive or take
> > > > >>> too long to reply to a request. Currently there is no upper bound on
> > > > >>> how long a request may take, which may be frustrating to users who get
> > > > >>> stuck waiting for a request to complete.
> > > > >>>
> > > > >>> This commit adds a daemon timeout option (in seconds) for fuse requests.
> > > > >>> If the timeout elapses before the request is replied to, the request will
> > > > >>> fail with -ETIME.
> > > > >>>
> > > > >>> There are 3 possibilities for a request that times out:
> > > > >>> a) The request times out before the request has been sent to userspace
> > > > >>> b) The request times out after the request has been sent to userspace
> > > > >>> and before it receives a reply from the server
> > > > >>> c) The request times out after the request has been sent to userspace
> > > > >>> and the server replies while the kernel is timing out the request
> > > > >>>
> > > > >>> Proper synchronization must be added to ensure that the request is
> > > > >>> handled correctly in all of these cases. To this effect, there is a new
> > > > >>> FR_PROCESSING bit added to the request flags, which is set atomically by
> > > > >>> either the timeout handler (see fuse_request_timeout()) which is invoked
> > > > >>> after the request timeout elapses or set by the request reply handler
> > > > >>> (see dev_do_write()), whichever gets there first.
> > > > >>>
> > > > >>> If the reply handler and the timeout handler are executing simultaneously
> > > > >>> and the reply handler sets FR_PROCESSING before the timeout handler, then
> > > > >>> the request is re-queued onto the waitqueue and the kernel will process the
> > > > >>> reply as though the timeout did not elapse. If the timeout handler sets
> > > > >>> FR_PROCESSING before the reply handler, then the request will fail with
> > > > >>> -ETIME and the request will be cleaned up.
> > > > >>>
> > > > >>> Proper acquires on the request reference must be added to ensure that the
> > > > >>> timeout handler does not drop the last refcount on the request while the
> > > > >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
> > > > >>> still accessing the request. (By "forwarder handler", this is the handler
> > > > >>> that forwards the request to userspace).
> > > > >>>
> > > > >>> Currently, this is the lifecycle of the request refcount:
> > > > >>>
> > > > >>> Request is created:
> > > > >>> fuse_simple_request -> allocates request, sets refcount to 1
> > > > >>>   __fuse_request_send -> acquires refcount
> > > > >>>     queues request and waits for reply...
> > > > >>> fuse_simple_request -> drops refcount
> > > > >>>
> > > > >>> Request is freed:
> > > > >>> fuse_dev_do_write
> > > > >>>   fuse_request_end -> drops refcount on request
> > > > >>>
> > > > >>> The timeout handler drops the refcount on the request so that the
> > > > >>> request is properly cleaned up if a reply is never received. Because of
> > > > >>> this, both the forwarder handler and the reply handler must acquire a refcount
> > > > >>> on the request while it accesses the request, and the refcount must be
> > > > >>> acquired while the lock of the list the request is on is held.
> > > > >>>
> > > > >>> There is a potential race if the request is being forwarded to
> > > > >>> userspace while the timeout handler is executing (eg FR_PENDING has
> > > > >>> already been cleared but dev_do_read() hasn't finished executing). This
> > > > >>> is a problem because this would free the request but the request has not
> > > > >>> been removed from the fpq list it's on. To prevent this, dev_do_read()
> > > > >>> must check FR_PROCESSING at the end of its logic and remove the request
> > > > >>> from the fpq list if the timeout occurred.
> > > > >>>
> > > > >>> There is also the case where the connection may be aborted or the
> > > > >>> device may be released while the timeout handler is running. To protect
> > > > >>> against an extra refcount drop on the request, the timeout handler
> > > > >>> checks the connected state of the list and lets the abort handler drop the
> > > > >>> last reference if the abort is running simultaneously. Similarly, the
> > > > >>> timeout handler also needs to check if the req->out.h.error is set to
> > > > >>> -ESTALE, which indicates that the device release is cleaning up the
> > > > >>> request. In both these cases, the timeout handler will return without
> > > > >>> dropping the refcount.
> > > > >>>
> > > > >>> Please also note that background requests are not applicable for timeouts
> > > > >>> since they are asynchronous.
> > > > >>
> > > > >>
> > > > >> This and that thread here actually make me wonder if this is the right
> > > > >> approach
> > > > >>
> > > > >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/
> > > > >>
> > > > >>
> > > > >> In  th3 thread above a request got interrupted, but fuse-server still
> > > > >> does not manage stop it. From my point of view, interrupting a request
> > > > >> suggests to add a rather short kernel lifetime for it. With that one
> > > > >
> > > > > Hi Bernd,
> > > > >
> > > > > I believe this solution fixes the problem outlined in that thread
> > > > > (namely, that the process gets stuck waiting for a reply). If the
> > > > > request is interrupted before it times out, the kernel will wait with
> > > > > a timeout again on the request (timeout would start over, but the
> > > > > request will still eventually sooner or later time out). I'm not sure
> > > > > I agree that we want to cancel the request altogether if it's
> > > > > interrupted. For example, if the user uses the user-defined signal
> > > > > SIGUSR1, it can be unexpected and arbitrary behavior for the request
> > > > > to be aborted by the kernel. I also don't think this can be consistent
> > > > > for what the fuse server will see since some requests may have already
> > > > > been forwarded to userspace when the request is aborted and some
> > > > > requests may not have.
> > > > >
> > > > > I think if we were to enforce that the request should be aborted when
> > > > > it's interrupted regardless of whether a timeout is specified or not,
> > > > > then we should do it similarly to how the timeout handler logic
> > > > > handles it in this patch,rather than the implementation in the thread
> > > > > linked above (namely, that the request should be explicitly cleaned up
> > > > > immediately instead of when the interrupt request sends a reply); I
> > > > > don't believe the implementation in the link handles the case where
> > > > > for example the fuse server is in a deadlock and does not reply to the
> > > > > interrupt request. Also, as I understand it, it is optional for
> > > > > servers to reply or not to the interrupt request.
> > > >
> > > > Hi Joanne,
> > > >
> > > > yeah, the solution in the link above is definitely not ideal and I think
> > > > a timout based solution would be better. But I think your patch wouldn't
> > > > work either right now, unless server side sets a request timeout.
> > > > Btw, I would rename the variable 'daemon_timeout' to somethink like
> > > > req_timeout.
> > > >
> > > Hi Bernd,
> > >
> > > I think we need to figure out if we indeed want the kernel to abort
> > > interrupted requests if no request timeout was explicitly set by the
> > > server. I'm leaning towards no, for the reasons in my previous reply;
> > > in addition to that I'm also not sure if we would be potentially
> > > breaking existing filesystems if we introduced this new behavior.
> > > Curious to hear your and others' thoughts on this.
> > >
> > > (Btw, if we did want to add this in, i think the change would be
> > > actually pretty simple. We could pretty much just reuse all the logic
> > > that's implemented in the timeout handling code. It's very much the
> > > same scenario (request getting aborted and needing to protect against
> > > races with different handlers))
> > >
> > > I will rename daemon_timeout to req_timeout in v2. Thanks for the suggestion.
> > >
> > > > >
> > > > >> either needs to wake up in intervals and check if request timeout got
> > > > >> exceeded or it needs to be an async kernel thread. I think that async
> > > > >> thread would also allow to give a timeout to background requests.
> > > > >
> > > > > in my opinion, background requests do not need timeouts. As I
> > > > > understand it, background requests are used only for direct i/o async
> > > > > read/writes, writing back dirty pages,and readahead requests generated
> > > > > by the kernel. I don't think fuse servers would have a need for timing
> > > > > out background requests.
> > > >
> > > > There is another discussion here, where timeouts are a possible although
> > > > ugly solution to avoid page copies
> > > >
> > > > https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@fastmail.fm/T/
> > > >
> > > Thanks for the link, it's an interesting read.
> > >
> > > >
> > > > That is the bg writeback code path.
> > > >
> > > > >
> > > > >>
> > > > >> Or we add an async timeout to bg and interupted requests additionally?
> > > > >
> > > > > The interrupted request will already have a timeout on it since it
> > > > > waits with a timeout again for the reply after it's interrupted.
> > > >
> > > > If daemon side configures timeouts. And interrupted requests might want
> > > > to have a different timeout. I will check when I'm back if we can update
> > > > your patch a bit for that.
> > > >
> > > > Your patch hooks in quite nicely and basically without overhead into fg
> > > > (sync) requests. Timing out bg requests will have a bit overhead (unless
> > > > I miss something), so maybe we need two solutions here. And if we want
> > > > to go that route at all, to avoid these extra fuse page copies.
> > > >
> > > Agreed, I think if we do decide to go down this route, it seems
> > > cleaner to me to have the background request timeouts handled
> > > separately. Maybe something like having a timer per batch (where
> > > "batch" is the group of requests that get flushed at the same time)?
> > > That seems to me like the approach with the least overhead.
> > >
> >
> > I don't want to have a bunch of different timeouts, we should just have one and
> > have consistent behavior across all classes of requests.
> >
> > I think the only thing we should have that is "separate" is a way to set request
> > timeouts that aren't set by the daemon itself.  Administrators should be able to
> > set a per-mount timeout via sysfs/algo in order to have some sort of control
> > over possibly malicious FUSE file systems.
> >
> > But that should just tie into whatever mechanism you come up with, and
> > everything should all behave consistently with that timeout.  Thanks,
> >
> > Josef
> 
> To summarize this thread so far, there are 2 open questions:
> 1) should interrupted requests be automatically aborted/cancelled by
> default (even if no timeout is set)?
> 2) should background requests also have some timeout enforced on them?

Yes I think background requests should have a timeout enforced on them if it's
set.  Page writeout is actually one of the bigger problems because stuff will
just hang forever, like if you hit sync or something (which a lot of
applications do).

For #1 I have to think some more and look at what the mechanics/expectations of
those requests are, but if it's a thing you can leave for a follup them that
sounds good.

Additionally I think leaving the extra page copy thing as future work once this
work is done is the best bet.

Miklos are you around?  We've had a few different patches/discussions going on.
I assume you are/have been on summer vacation.  Thanks,

Josef
Joanne Koong July 22, 2024, 9:15 p.m. UTC | #11
On Mon, Jul 22, 2024 at 1:52 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> On Mon, Jul 22, 2024 at 11:58:03AM -0700, Joanne Koong wrote:
> > On Fri, Jul 19, 2024 at 6:06 AM Josef Bacik <josef@toxicpanda.com> wrote:
> > >
> > > On Thu, Jul 18, 2024 at 05:37:06PM -0700, Joanne Koong wrote:
> > > > On Thu, Jul 18, 2024 at 3:11 PM Bernd Schubert
> > > > <bs_lists@aakef.fastmail.fm> wrote:
> > > > >
> > > > >
> > > > >
> > > > > On 7/18/24 07:24, Joanne Koong wrote:
> > > > > > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert
> > > > > > <bernd.schubert@fastmail.fm> wrote:
> > > > > >>
> > > > > >> Hi Joanne,
> > > > > >>
> > > > > >> On 7/17/24 23:34, Joanne Koong wrote:
> > > > > >>> There are situations where fuse servers can become unresponsive or take
> > > > > >>> too long to reply to a request. Currently there is no upper bound on
> > > > > >>> how long a request may take, which may be frustrating to users who get
> > > > > >>> stuck waiting for a request to complete.
> > > > > >>>
> > > > > >>> This commit adds a daemon timeout option (in seconds) for fuse requests.
> > > > > >>> If the timeout elapses before the request is replied to, the request will
> > > > > >>> fail with -ETIME.
> > > > > >>>
> > > > > >>> There are 3 possibilities for a request that times out:
> > > > > >>> a) The request times out before the request has been sent to userspace
> > > > > >>> b) The request times out after the request has been sent to userspace
> > > > > >>> and before it receives a reply from the server
> > > > > >>> c) The request times out after the request has been sent to userspace
> > > > > >>> and the server replies while the kernel is timing out the request
> > > > > >>>
> > > > > >>> Proper synchronization must be added to ensure that the request is
> > > > > >>> handled correctly in all of these cases. To this effect, there is a new
> > > > > >>> FR_PROCESSING bit added to the request flags, which is set atomically by
> > > > > >>> either the timeout handler (see fuse_request_timeout()) which is invoked
> > > > > >>> after the request timeout elapses or set by the request reply handler
> > > > > >>> (see dev_do_write()), whichever gets there first.
> > > > > >>>
> > > > > >>> If the reply handler and the timeout handler are executing simultaneously
> > > > > >>> and the reply handler sets FR_PROCESSING before the timeout handler, then
> > > > > >>> the request is re-queued onto the waitqueue and the kernel will process the
> > > > > >>> reply as though the timeout did not elapse. If the timeout handler sets
> > > > > >>> FR_PROCESSING before the reply handler, then the request will fail with
> > > > > >>> -ETIME and the request will be cleaned up.
> > > > > >>>
> > > > > >>> Proper acquires on the request reference must be added to ensure that the
> > > > > >>> timeout handler does not drop the last refcount on the request while the
> > > > > >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is
> > > > > >>> still accessing the request. (By "forwarder handler", this is the handler
> > > > > >>> that forwards the request to userspace).
> > > > > >>>
> > > > > >>> Currently, this is the lifecycle of the request refcount:
> > > > > >>>
> > > > > >>> Request is created:
> > > > > >>> fuse_simple_request -> allocates request, sets refcount to 1
> > > > > >>>   __fuse_request_send -> acquires refcount
> > > > > >>>     queues request and waits for reply...
> > > > > >>> fuse_simple_request -> drops refcount
> > > > > >>>
> > > > > >>> Request is freed:
> > > > > >>> fuse_dev_do_write
> > > > > >>>   fuse_request_end -> drops refcount on request
> > > > > >>>
> > > > > >>> The timeout handler drops the refcount on the request so that the
> > > > > >>> request is properly cleaned up if a reply is never received. Because of
> > > > > >>> this, both the forwarder handler and the reply handler must acquire a refcount
> > > > > >>> on the request while it accesses the request, and the refcount must be
> > > > > >>> acquired while the lock of the list the request is on is held.
> > > > > >>>
> > > > > >>> There is a potential race if the request is being forwarded to
> > > > > >>> userspace while the timeout handler is executing (eg FR_PENDING has
> > > > > >>> already been cleared but dev_do_read() hasn't finished executing). This
> > > > > >>> is a problem because this would free the request but the request has not
> > > > > >>> been removed from the fpq list it's on. To prevent this, dev_do_read()
> > > > > >>> must check FR_PROCESSING at the end of its logic and remove the request
> > > > > >>> from the fpq list if the timeout occurred.
> > > > > >>>
> > > > > >>> There is also the case where the connection may be aborted or the
> > > > > >>> device may be released while the timeout handler is running. To protect
> > > > > >>> against an extra refcount drop on the request, the timeout handler
> > > > > >>> checks the connected state of the list and lets the abort handler drop the
> > > > > >>> last reference if the abort is running simultaneously. Similarly, the
> > > > > >>> timeout handler also needs to check if the req->out.h.error is set to
> > > > > >>> -ESTALE, which indicates that the device release is cleaning up the
> > > > > >>> request. In both these cases, the timeout handler will return without
> > > > > >>> dropping the refcount.
> > > > > >>>
> > > > > >>> Please also note that background requests are not applicable for timeouts
> > > > > >>> since they are asynchronous.
> > > > > >>
> > > > > >>
> > > > > >> This and that thread here actually make me wonder if this is the right
> > > > > >> approach
> > > > > >>
> > > > > >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/
> > > > > >>
> > > > > >>
> > > > > >> In  th3 thread above a request got interrupted, but fuse-server still
> > > > > >> does not manage stop it. From my point of view, interrupting a request
> > > > > >> suggests to add a rather short kernel lifetime for it. With that one
> > > > > >
> > > > > > Hi Bernd,
> > > > > >
> > > > > > I believe this solution fixes the problem outlined in that thread
> > > > > > (namely, that the process gets stuck waiting for a reply). If the
> > > > > > request is interrupted before it times out, the kernel will wait with
> > > > > > a timeout again on the request (timeout would start over, but the
> > > > > > request will still eventually sooner or later time out). I'm not sure
> > > > > > I agree that we want to cancel the request altogether if it's
> > > > > > interrupted. For example, if the user uses the user-defined signal
> > > > > > SIGUSR1, it can be unexpected and arbitrary behavior for the request
> > > > > > to be aborted by the kernel. I also don't think this can be consistent
> > > > > > for what the fuse server will see since some requests may have already
> > > > > > been forwarded to userspace when the request is aborted and some
> > > > > > requests may not have.
> > > > > >
> > > > > > I think if we were to enforce that the request should be aborted when
> > > > > > it's interrupted regardless of whether a timeout is specified or not,
> > > > > > then we should do it similarly to how the timeout handler logic
> > > > > > handles it in this patch,rather than the implementation in the thread
> > > > > > linked above (namely, that the request should be explicitly cleaned up
> > > > > > immediately instead of when the interrupt request sends a reply); I
> > > > > > don't believe the implementation in the link handles the case where
> > > > > > for example the fuse server is in a deadlock and does not reply to the
> > > > > > interrupt request. Also, as I understand it, it is optional for
> > > > > > servers to reply or not to the interrupt request.
> > > > >
> > > > > Hi Joanne,
> > > > >
> > > > > yeah, the solution in the link above is definitely not ideal and I think
> > > > > a timout based solution would be better. But I think your patch wouldn't
> > > > > work either right now, unless server side sets a request timeout.
> > > > > Btw, I would rename the variable 'daemon_timeout' to somethink like
> > > > > req_timeout.
> > > > >
> > > > Hi Bernd,
> > > >
> > > > I think we need to figure out if we indeed want the kernel to abort
> > > > interrupted requests if no request timeout was explicitly set by the
> > > > server. I'm leaning towards no, for the reasons in my previous reply;
> > > > in addition to that I'm also not sure if we would be potentially
> > > > breaking existing filesystems if we introduced this new behavior.
> > > > Curious to hear your and others' thoughts on this.
> > > >
> > > > (Btw, if we did want to add this in, i think the change would be
> > > > actually pretty simple. We could pretty much just reuse all the logic
> > > > that's implemented in the timeout handling code. It's very much the
> > > > same scenario (request getting aborted and needing to protect against
> > > > races with different handlers))
> > > >
> > > > I will rename daemon_timeout to req_timeout in v2. Thanks for the suggestion.
> > > >
> > > > > >
> > > > > >> either needs to wake up in intervals and check if request timeout got
> > > > > >> exceeded or it needs to be an async kernel thread. I think that async
> > > > > >> thread would also allow to give a timeout to background requests.
> > > > > >
> > > > > > in my opinion, background requests do not need timeouts. As I
> > > > > > understand it, background requests are used only for direct i/o async
> > > > > > read/writes, writing back dirty pages,and readahead requests generated
> > > > > > by the kernel. I don't think fuse servers would have a need for timing
> > > > > > out background requests.
> > > > >
> > > > > There is another discussion here, where timeouts are a possible although
> > > > > ugly solution to avoid page copies
> > > > >
> > > > > https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@fastmail.fm/T/
> > > > >
> > > > Thanks for the link, it's an interesting read.
> > > >
> > > > >
> > > > > That is the bg writeback code path.
> > > > >
> > > > > >
> > > > > >>
> > > > > >> Or we add an async timeout to bg and interupted requests additionally?
> > > > > >
> > > > > > The interrupted request will already have a timeout on it since it
> > > > > > waits with a timeout again for the reply after it's interrupted.
> > > > >
> > > > > If daemon side configures timeouts. And interrupted requests might want
> > > > > to have a different timeout. I will check when I'm back if we can update
> > > > > your patch a bit for that.
> > > > >
> > > > > Your patch hooks in quite nicely and basically without overhead into fg
> > > > > (sync) requests. Timing out bg requests will have a bit overhead (unless
> > > > > I miss something), so maybe we need two solutions here. And if we want
> > > > > to go that route at all, to avoid these extra fuse page copies.
> > > > >
> > > > Agreed, I think if we do decide to go down this route, it seems
> > > > cleaner to me to have the background request timeouts handled
> > > > separately. Maybe something like having a timer per batch (where
> > > > "batch" is the group of requests that get flushed at the same time)?
> > > > That seems to me like the approach with the least overhead.
> > > >
> > >
> > > I don't want to have a bunch of different timeouts, we should just have one and
> > > have consistent behavior across all classes of requests.
> > >
> > > I think the only thing we should have that is "separate" is a way to set request
> > > timeouts that aren't set by the daemon itself.  Administrators should be able to
> > > set a per-mount timeout via sysfs/algo in order to have some sort of control
> > > over possibly malicious FUSE file systems.
> > >
> > > But that should just tie into whatever mechanism you come up with, and
> > > everything should all behave consistently with that timeout.  Thanks,
> > >
> > > Josef
> >
> > To summarize this thread so far, there are 2 open questions:
> > 1) should interrupted requests be automatically aborted/cancelled by
> > default (even if no timeout is set)?
> > 2) should background requests also have some timeout enforced on them?
>
> Yes I think background requests should have a timeout enforced on them if it's
> set.  Page writeout is actually one of the bigger problems because stuff will
> just hang forever, like if you hit sync or something (which a lot of
> applications do).
>

Sounds good, I will add in timeouts for background requests for v2.

> For #1 I have to think some more and look at what the mechanics/expectations of
> those requests are, but if it's a thing you can leave for a follup them that
> sounds good.
>
> Additionally I think leaving the extra page copy thing as future work once this
> work is done is the best bet.
>
> Miklos are you around?  We've had a few different patches/discussions going on.
> I assume you are/have been on summer vacation.  Thanks,
>
> Josef
Joanne Koong July 24, 2024, 8:44 p.m. UTC | #12
On Thu, Jul 18, 2024 at 9:58 AM Joanne Koong <joannelkoong@gmail.com> wrote:
>
> On Thu, Jul 18, 2024 at 7:44 AM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote:
> ...
> > > ---
> > >  fs/fuse/dev.c    | 177 ++++++++++++++++++++++++++++++++++++++++++++---
> > >  fs/fuse/fuse_i.h |  12 ++++
> > >  fs/fuse/inode.c  |   7 ++
> > >  3 files changed, 188 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> > > index 9eb191b5c4de..7dd7b244951b 100644
> > > --- a/fs/fuse/dev.c
> > > +++ b/fs/fuse/dev.c
> > > @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req)
> > >  }
> > >  EXPORT_SYMBOL_GPL(fuse_request_end);
> > >
> > > +/* fuse_request_end for requests that timeout */
> > > +static void fuse_request_timeout(struct fuse_req *req)
> > > +{
> > > +     struct fuse_conn *fc = req->fm->fc;
> > > +     struct fuse_iqueue *fiq = &fc->iq;
> > > +     struct fuse_pqueue *fpq;
> > > +
> > > +     spin_lock(&fiq->lock);
> > > +     if (!fiq->connected) {
> > > +             spin_unlock(&fiq->lock);
> > > +             /*
> > > +              * Connection is being aborted. The abort will release
> > > +              * the refcount on the request
> > > +              */
> > > +             req->out.h.error = -ECONNABORTED;
> > > +             return;
> > > +     }
> > > +     if (test_bit(FR_PENDING, &req->flags)) {
> > > +             /* Request is not yet in userspace, bail out */
> > > +             list_del(&req->list);
> > > +             spin_unlock(&fiq->lock);
> > > +             req->out.h.error = -ETIME;
> > > +             __fuse_put_request(req);
> >
> > Why is this safe?  We could be the last holder of the reference on this request
> > correct?  The only places using __fuse_put_request() would be where we are in a
> > path where the caller already holds a reference on the request.  Since this is
> > async it may not be the case right?
> >
> > If it is safe then it's just confusing and warrants a comment.
> >
>
> There is always a refcount still held on the request by
> fuse_simple_request() when this is called. I'll add a comment about
> this.
> I also just noticed that I use fuse_put_request()  at the end of this
> function, I'll change that to __fuse_put_request() as well.
>
> > > +             return;
> > > +     }
> > > +     if (test_bit(FR_INTERRUPTED, &req->flags))
> > > +             list_del_init(&req->intr_entry);
> > > +
> > > +     fpq = req->fpq;
> > > +     spin_unlock(&fiq->lock);
> > > +
> > > +     if (fpq) {
> > > +             spin_lock(&fpq->lock);
> > > +             if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) {
> >                                        ^^
> >
> > You don't need the extra () there.
> >
> > > +                     spin_unlock(&fpq->lock);
> > > +                     /*
> > > +                      * Connection is being aborted. The abort will release
> > > +                      * the refcount on the request
> > > +                      */
> > > +                     req->out.h.error = -ECONNABORTED;
> > > +                     return;
> > > +             }
> > > +             if (req->out.h.error == -ESTALE) {
> > > +                     /*
> > > +                      * Device is being released. The fuse_dev_release call
> > > +                      * will drop the refcount on the request
> > > +                      */
> > > +                     spin_unlock(&fpq->lock);
> > > +                     return;
> > > +             }
> > > +             if (!test_bit(FR_PRIVATE, &req->flags))
> > > +                     list_del_init(&req->list);
> > > +             spin_unlock(&fpq->lock);
> > > +     }
> > > +
> > > +     req->out.h.error = -ETIME;
> > > +
> > > +     if (test_bit(FR_ASYNC, &req->flags))
> > > +             req->args->end(req->fm, req->args, req->out.h.error);
> > > +
> > > +     fuse_put_request(req);
> > > +}
> >
> > Just a general styling thing, we have two different states for requests here,
> > PENDING and !PENDING correct?  I think it may be better to do something like
> >
> > if (test_bit(FR_PENDING, &req->flags))
> >         timeout_pending_req();
> > else
> >         timeout_inflight_req();
> >
> > and then in timeout_pending_req() you do
> >
> > spin_lock(&fiq->lock);
> > if (!test_bit(FR_PENDING, &req->flags)) {
> >         spin_unlock(&fiq_lock);
> >         timeout_inflight_req();
> >         return;
> > }
> >
> > This will keep the two different state cleanup functions separate and a little
> > cleaner to grok.
> >
> Thanks for the suggestion, I will make this change for v2.
> > > +
> > >  static int queue_interrupt(struct fuse_req *req)
> > >  {
> > >       struct fuse_iqueue *fiq = &req->fm->fc->iq;
> > > @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req)
> > >       return 0;
> > >  }
> > >
> > > +enum wait_type {
> > > +     WAIT_TYPE_INTERRUPTIBLE,
> > > +     WAIT_TYPE_KILLABLE,
> > > +     WAIT_TYPE_NONINTERRUPTIBLE,
> > > +};
> > > +
> > > +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req)
> > > +{
> > > +     struct fuse_conn *fc = req->fm->fc;
> > > +
> > > +     return wait_event_interruptible_timeout(req->waitq,
> > > +                                             test_bit(FR_FINISHED,
> > > +                                                      &req->flags),
> > > +                                             fc->daemon_timeout);
> > > +}
> > > +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO);
> > > +
> > > +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type)
> > > +{
> > > +     struct fuse_conn *fc = req->fm->fc;
> > > +     int err;
> > > +
> > > +wait_answer_start:
> > > +     if (type == WAIT_TYPE_INTERRUPTIBLE)
> > > +             err = fuse_wait_event_interruptible_timeout(req);
> > > +     else if (type == WAIT_TYPE_KILLABLE)
> > > +             err = wait_event_killable_timeout(req->waitq,
> > > +                                               test_bit(FR_FINISHED, &req->flags),
> > > +                                               fc->daemon_timeout);
> > > +
> > > +     else if (type == WAIT_TYPE_NONINTERRUPTIBLE)
> > > +             err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags),
> > > +                                      fc->daemon_timeout);
> > > +     else
> > > +             WARN_ON(1);
> >
> > This will leak some random value for err, so initialize err to something that
> > will be dealt with, like -EINVAL;
> >
> > > +
> > > +     /* request was answered */
> > > +     if (err > 0)
> > > +             return 0;
> > > +
> > > +     /* request was not answered in time */
> > > +     if (err == 0) {
> > > +             if (test_and_set_bit(FR_PROCESSING, &req->flags))
> > > +                     /* request reply is being processed by kernel right now.
> > > +                      * we should wait for the answer.
> > > +                      */
> >
> > Format for multiline comments is
> >
> > /*
> >  * blah
> >  * blah
> >  */
> >
> > and since this is a 1 line if statement put it above the if statement.
> >
> > > +                     goto wait_answer_start;
> > > +
> > > +             fuse_request_timeout(req);
> > > +             return 0;
> > > +     }
> > > +
> > > +     /* else request was interrupted */
> > > +     return err;
> > > +}
> > > +
> > >  static void request_wait_answer(struct fuse_req *req)
> > >  {
> > >       struct fuse_conn *fc = req->fm->fc;
> > > @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req)
> > >
> > >       if (!fc->no_interrupt) {
> > >               /* Any signal may interrupt this */
> > > -             err = wait_event_interruptible(req->waitq,
> > > -                                     test_bit(FR_FINISHED, &req->flags));
> > > +             if (fc->daemon_timeout)
> > > +                     err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE);
> > > +             else
> > > +                     err = wait_event_interruptible(req->waitq,
> > > +                                                    test_bit(FR_FINISHED, &req->flags));
> > >               if (!err)
> > >                       return;
> > >
> > > @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req)
> > >
> > >       if (!test_bit(FR_FORCE, &req->flags)) {
> > >               /* Only fatal signals may interrupt this */
> > > -             err = wait_event_killable(req->waitq,
> > > -                                     test_bit(FR_FINISHED, &req->flags));
> > > +             if (fc->daemon_timeout)
> > > +                     err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE);
> > > +             else
> > > +                     err = wait_event_killable(req->waitq,
> > > +                                               test_bit(FR_FINISHED, &req->flags));
> > >               if (!err)
> > >                       return;
> > >
> > > @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req)
> > >        * Either request is already in userspace, or it was forced.
> > >        * Wait it out.
> > >        */
> > > -     wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> > > +     if (fc->daemon_timeout)
> > > +             wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE);
> > > +     else
> > > +             wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
> > >  }
> > >
> > >  static void __fuse_request_send(struct fuse_req *req)
> > > @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> > >       req = list_entry(fiq->pending.next, struct fuse_req, list);
> > >       clear_bit(FR_PENDING, &req->flags);
> > >       list_del_init(&req->list);
> > > +     /* Acquire a reference since fuse_request_timeout may also be executing  */
> > > +     __fuse_get_request(req);
> > > +     req->fpq = fpq;
> > >       spin_unlock(&fiq->lock);
> > >
> >
> > There's a race here with completion.  If we timeout a request right here, we can
> > end up sending that same request below.
>
> I don't think there's any way around this unless we take the fpq lock
> while we do the fuse_copy stuff, because even if we check the
> FR_PROCESSING bit, the timeout handler could start running after the
> fpq lock is released when we do the fuse_copy calls.
>
> In my point of view, I don't think this race matters. We could have
> this situation happen on a regular timed-out request. For example, we
> send out a request to userspace and if the server takes too long to
> reply, the request is cancelled/invalidated in the kernel but the
> server will still see the request anyways.
>
> WDYT?
>
> >
> > You are going to need to check
> >
> > test_bit(FR_PROCESSING)
> >
> > after you take the fpq->lock just below here to make sure you didn't race with
> > the timeout handler and time the request out already.
> >
> > >       args = req->args;
> > > @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> > >               if (args->opcode == FUSE_SETXATTR)
> > >                       req->out.h.error = -E2BIG;
> > >               fuse_request_end(req);
> > > +             fuse_put_request(req);
> > >               goto restart;
> > >       }
> > >       spin_lock(&fpq->lock);
> > > @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
> > >       }
> > >       hash = fuse_req_hash(req->in.h.unique);
> > >       list_move_tail(&req->list, &fpq->processing[hash]);
> > > -     __fuse_get_request(req);
> > >       set_bit(FR_SENT, &req->flags);
> > >       spin_unlock(&fpq->lock);
> > >       /* matches barrier in request_wait_answer() */
> > >       smp_mb__after_atomic();
> > >       if (test_bit(FR_INTERRUPTED, &req->flags))
> > >               queue_interrupt(req);
> > > +
> > > +     /* Check if request timed out */
> > > +     if (test_bit(FR_PROCESSING, &req->flags)) {
> > > +             spin_lock(&fpq->lock);
> > > +             if (!test_bit(FR_PRIVATE, &req->flags))
> > > +                     list_del_init(&req->list);
> > > +             spin_unlock(&fpq->lock);
> > > +             fuse_put_request(req);
> > > +             return -ETIME;
> > > +     }
> >
> > This isn't quite right, we could have FR_PROCESSING set because we completed the
> > request before we got here.  If you put a schedule_timeout(HZ); right above this
> > you could easily see where a request gets completed by userspace, but now you've
> > fimed it out.
>
> Oh I see, you're talking about the race where a request is replied to
> immediately after the fuse_copy calls and before this gets called.
> Then when we get here, we can't differentiate between whether
> FR_PROCESSING was set by the timeout handler or the reply handler.
>
> I think the simplest way around this is to check if the FR_SENT flag
> was cleared (the reply handler clears it while holding the fpq lock
> where FR_PROCESSING gets set and the timeout handler doesn't clear
> it), then return -ETIME if it wasn't and 0 if it was.
>
> I'll add this into v2.
>
> >
> > Additionally if we have FR_PROCESSING set from the timeout, shouldn't this
> > cleanup have been done already?  I don't understand why we need to handle this
> > here, we should just return and whoever is waiting on the request will get the
> > error.
>
> In most cases yes, but there is a race where the timeout handler may
> finish executing before the logic in dev_do_read that adds the request
> to the fpq lists. If this happens, the freed request will remain on
> the list.
>
> i think this race currently exists  prior to these changes as well -

amendment: this statement is not accurate. In the existing code, there
is no race between the reply handler and dev_do_read, because the
reply handler can only handle the request once the request is on the
fpq->processing list.
(We do need to account for this race with the timeout handler though
since the timeout handler can get triggered at any time).

Also, while working on v2 I noticed we also need to handle races
between the timeout handler and requests being re-sent
(fuse_resend()). This will get addressed in v2.

> in the case you mentioned above where the request may have been
> completed right after the fuse_copy calls in dev_do_read  and before
> dev_do_read moves the request to the fpq lists. We would get into the
> same situation with a freed request still on the list.
>
>
> Thanks,
> Joanne
> >
> > Thanks,
> >
> > Josef
diff mbox series

Patch

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 9eb191b5c4de..7dd7b244951b 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -331,6 +331,69 @@  void fuse_request_end(struct fuse_req *req)
 }
 EXPORT_SYMBOL_GPL(fuse_request_end);
 
+/* fuse_request_end for requests that timeout */
+static void fuse_request_timeout(struct fuse_req *req)
+{
+	struct fuse_conn *fc = req->fm->fc;
+	struct fuse_iqueue *fiq = &fc->iq;
+	struct fuse_pqueue *fpq;
+
+	spin_lock(&fiq->lock);
+	if (!fiq->connected) {
+		spin_unlock(&fiq->lock);
+		/*
+		 * Connection is being aborted. The abort will release
+		 * the refcount on the request
+		 */
+		req->out.h.error = -ECONNABORTED;
+		return;
+	}
+	if (test_bit(FR_PENDING, &req->flags)) {
+		/* Request is not yet in userspace, bail out */
+		list_del(&req->list);
+		spin_unlock(&fiq->lock);
+		req->out.h.error = -ETIME;
+		__fuse_put_request(req);
+		return;
+	}
+	if (test_bit(FR_INTERRUPTED, &req->flags))
+		list_del_init(&req->intr_entry);
+
+	fpq = req->fpq;
+	spin_unlock(&fiq->lock);
+
+	if (fpq) {
+		spin_lock(&fpq->lock);
+		if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) {
+			spin_unlock(&fpq->lock);
+			/*
+			 * Connection is being aborted. The abort will release
+			 * the refcount on the request
+			 */
+			req->out.h.error = -ECONNABORTED;
+			return;
+		}
+		if (req->out.h.error == -ESTALE) {
+			/*
+			 * Device is being released. The fuse_dev_release call
+			 * will drop the refcount on the request
+			 */
+			spin_unlock(&fpq->lock);
+			return;
+		}
+		if (!test_bit(FR_PRIVATE, &req->flags))
+			list_del_init(&req->list);
+		spin_unlock(&fpq->lock);
+	}
+
+	req->out.h.error = -ETIME;
+
+	if (test_bit(FR_ASYNC, &req->flags))
+		req->args->end(req->fm, req->args, req->out.h.error);
+
+	fuse_put_request(req);
+}
+
 static int queue_interrupt(struct fuse_req *req)
 {
 	struct fuse_iqueue *fiq = &req->fm->fc->iq;
@@ -361,6 +424,62 @@  static int queue_interrupt(struct fuse_req *req)
 	return 0;
 }
 
+enum wait_type {
+	WAIT_TYPE_INTERRUPTIBLE,
+	WAIT_TYPE_KILLABLE,
+	WAIT_TYPE_NONINTERRUPTIBLE,
+};
+
+static int fuse_wait_event_interruptible_timeout(struct fuse_req *req)
+{
+	struct fuse_conn *fc = req->fm->fc;
+
+	return wait_event_interruptible_timeout(req->waitq,
+						test_bit(FR_FINISHED,
+							 &req->flags),
+						fc->daemon_timeout);
+}
+ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO);
+
+static int wait_answer_timeout(struct fuse_req *req, enum wait_type type)
+{
+	struct fuse_conn *fc = req->fm->fc;
+	int err;
+
+wait_answer_start:
+	if (type == WAIT_TYPE_INTERRUPTIBLE)
+		err = fuse_wait_event_interruptible_timeout(req);
+	else if (type == WAIT_TYPE_KILLABLE)
+		err = wait_event_killable_timeout(req->waitq,
+						  test_bit(FR_FINISHED, &req->flags),
+						  fc->daemon_timeout);
+
+	else if (type == WAIT_TYPE_NONINTERRUPTIBLE)
+		err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags),
+					 fc->daemon_timeout);
+	else
+		WARN_ON(1);
+
+	/* request was answered */
+	if (err > 0)
+		return 0;
+
+	/* request was not answered in time */
+	if (err == 0) {
+		if (test_and_set_bit(FR_PROCESSING, &req->flags))
+			/* request reply is being processed by kernel right now.
+			 * we should wait for the answer.
+			 */
+			goto wait_answer_start;
+
+		fuse_request_timeout(req);
+		return 0;
+	}
+
+	/* else request was interrupted */
+	return err;
+}
+
 static void request_wait_answer(struct fuse_req *req)
 {
 	struct fuse_conn *fc = req->fm->fc;
@@ -369,8 +488,11 @@  static void request_wait_answer(struct fuse_req *req)
 
 	if (!fc->no_interrupt) {
 		/* Any signal may interrupt this */
-		err = wait_event_interruptible(req->waitq,
-					test_bit(FR_FINISHED, &req->flags));
+		if (fc->daemon_timeout)
+			err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE);
+		else
+			err = wait_event_interruptible(req->waitq,
+						       test_bit(FR_FINISHED, &req->flags));
 		if (!err)
 			return;
 
@@ -383,8 +505,11 @@  static void request_wait_answer(struct fuse_req *req)
 
 	if (!test_bit(FR_FORCE, &req->flags)) {
 		/* Only fatal signals may interrupt this */
-		err = wait_event_killable(req->waitq,
-					test_bit(FR_FINISHED, &req->flags));
+		if (fc->daemon_timeout)
+			err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE);
+		else
+			err = wait_event_killable(req->waitq,
+						  test_bit(FR_FINISHED, &req->flags));
 		if (!err)
 			return;
 
@@ -404,7 +529,10 @@  static void request_wait_answer(struct fuse_req *req)
 	 * Either request is already in userspace, or it was forced.
 	 * Wait it out.
 	 */
-	wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
+	if (fc->daemon_timeout)
+		wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE);
+	else
+		wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
 }
 
 static void __fuse_request_send(struct fuse_req *req)
@@ -1268,6 +1396,9 @@  static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	req = list_entry(fiq->pending.next, struct fuse_req, list);
 	clear_bit(FR_PENDING, &req->flags);
 	list_del_init(&req->list);
+	/* Acquire a reference since fuse_request_timeout may also be executing  */
+	__fuse_get_request(req);
+	req->fpq = fpq;
 	spin_unlock(&fiq->lock);
 
 	args = req->args;
@@ -1280,6 +1411,7 @@  static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 		if (args->opcode == FUSE_SETXATTR)
 			req->out.h.error = -E2BIG;
 		fuse_request_end(req);
+		fuse_put_request(req);
 		goto restart;
 	}
 	spin_lock(&fpq->lock);
@@ -1316,13 +1448,23 @@  static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 	}
 	hash = fuse_req_hash(req->in.h.unique);
 	list_move_tail(&req->list, &fpq->processing[hash]);
-	__fuse_get_request(req);
 	set_bit(FR_SENT, &req->flags);
 	spin_unlock(&fpq->lock);
 	/* matches barrier in request_wait_answer() */
 	smp_mb__after_atomic();
 	if (test_bit(FR_INTERRUPTED, &req->flags))
 		queue_interrupt(req);
+
+	/* Check if request timed out */
+	if (test_bit(FR_PROCESSING, &req->flags)) {
+		spin_lock(&fpq->lock);
+		if (!test_bit(FR_PRIVATE, &req->flags))
+			list_del_init(&req->list);
+		spin_unlock(&fpq->lock);
+		fuse_put_request(req);
+		return -ETIME;
+	}
+
 	fuse_put_request(req);
 
 	return reqsize;
@@ -1332,6 +1474,7 @@  static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file,
 		list_del_init(&req->list);
 	spin_unlock(&fpq->lock);
 	fuse_request_end(req);
+	fuse_put_request(req);
 	return err;
 
  err_unlock:
@@ -1951,9 +2094,10 @@  static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 		goto copy_finish;
 	}
 
+	__fuse_get_request(req);
+
 	/* Is it an interrupt reply ID? */
 	if (oh.unique & FUSE_INT_REQ_BIT) {
-		__fuse_get_request(req);
 		spin_unlock(&fpq->lock);
 
 		err = 0;
@@ -1969,6 +2113,13 @@  static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 		goto copy_finish;
 	}
 
+	if (test_and_set_bit(FR_PROCESSING, &req->flags)) {
+		/* request has timed out already */
+		spin_unlock(&fpq->lock);
+		fuse_put_request(req);
+		goto copy_finish;
+	}
+
 	clear_bit(FR_SENT, &req->flags);
 	list_move(&req->list, &fpq->io);
 	req->out.h = oh;
@@ -1995,6 +2146,7 @@  static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
 	spin_unlock(&fpq->lock);
 
 	fuse_request_end(req);
+	fuse_put_request(req);
 out:
 	return err ? err : nbytes;
 
@@ -2260,13 +2412,22 @@  int fuse_dev_release(struct inode *inode, struct file *file)
 	if (fud) {
 		struct fuse_conn *fc = fud->fc;
 		struct fuse_pqueue *fpq = &fud->pq;
+		struct fuse_req *req;
 		LIST_HEAD(to_end);
 		unsigned int i;
 
 		spin_lock(&fpq->lock);
 		WARN_ON(!list_empty(&fpq->io));
-		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++)
+		for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) {
+			/*
+			 * Set the req error to -ESTALE so that if the timeout
+			 * handler tries handling it, it knows it's being
+			 * released
+			 */
+			list_for_each_entry(req, &fpq->processing[i], list)
+				req->out.h.error = -ESTALE;
 			list_splice_init(&fpq->processing[i], &to_end);
+		}
 		spin_unlock(&fpq->lock);
 
 		end_requests(&to_end);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f23919610313..cbabebbcd5bd 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -375,6 +375,9 @@  struct fuse_io_priv {
  * FR_FINISHED:		request is finished
  * FR_PRIVATE:		request is on private list
  * FR_ASYNC:		request is asynchronous
+ * FR_PROCESSING:	request is being processed. this gets set when either
+ *			the reply is getting processed or the kernel is processing
+ *			a request timeout
  */
 enum fuse_req_flag {
 	FR_ISREPLY,
@@ -389,6 +392,7 @@  enum fuse_req_flag {
 	FR_FINISHED,
 	FR_PRIVATE,
 	FR_ASYNC,
+	FR_PROCESSING,
 };
 
 /**
@@ -435,6 +439,9 @@  struct fuse_req {
 
 	/** fuse_mount this request belongs to */
 	struct fuse_mount *fm;
+
+	/** page queue this request has been added to */
+	struct fuse_pqueue *fpq;
 };
 
 struct fuse_iqueue;
@@ -574,6 +581,8 @@  struct fuse_fs_context {
 	enum fuse_dax_mode dax_mode;
 	unsigned int max_read;
 	unsigned int blksize;
+	/*  Daemon timeout (in seconds). 0 = no timeout (infinite wait) */
+	unsigned int daemon_timeout;
 	const char *subtype;
 
 	/* DAX device, may be NULL */
@@ -633,6 +642,9 @@  struct fuse_conn {
 	/** Constrain ->max_pages to this value during feature negotiation */
 	unsigned int max_pages_limit;
 
+	/* Daemon timeout (in jiffies). 0 = no timeout (infinite wait) */
+	unsigned long daemon_timeout;
+
 	/** Input queue */
 	struct fuse_iqueue iq;
 
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 99e44ea7d875..a2d53a8b8e34 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -733,6 +733,7 @@  enum {
 	OPT_ALLOW_OTHER,
 	OPT_MAX_READ,
 	OPT_BLKSIZE,
+	OPT_DAEMON_TIMEOUT,
 	OPT_ERR
 };
 
@@ -747,6 +748,7 @@  static const struct fs_parameter_spec fuse_fs_parameters[] = {
 	fsparam_u32	("max_read",		OPT_MAX_READ),
 	fsparam_u32	("blksize",		OPT_BLKSIZE),
 	fsparam_string	("subtype",		OPT_SUBTYPE),
+	fsparam_u32	("daemon_timeout",	OPT_DAEMON_TIMEOUT),
 	{}
 };
 
@@ -830,6 +832,10 @@  static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param)
 		ctx->blksize = result.uint_32;
 		break;
 
+	case OPT_DAEMON_TIMEOUT:
+		ctx->daemon_timeout = result.uint_32;
+		break;
+
 	default:
 		return -EINVAL;
 	}
@@ -1724,6 +1730,7 @@  int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx)
 	fc->group_id = ctx->group_id;
 	fc->legacy_opts_show = ctx->legacy_opts_show;
 	fc->max_read = max_t(unsigned int, 4096, ctx->max_read);
+	fc->daemon_timeout = ctx->daemon_timeout * HZ;
 	fc->destroy = ctx->destroy;
 	fc->no_control = ctx->no_control;
 	fc->no_force_umount = ctx->no_force_umount;