Message ID | 20240717213458.1613347-1-joannelkoong@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | fuse: add optional kernel-enforced timeout for fuse requests | expand |
On Wed, Jul 17, 2024 at 2:35 PM Joanne Koong <joannelkoong@gmail.com> wrote: > > There are situations where fuse servers can become unresponsive or take > too long to reply to a request. Currently there is no upper bound on > how long a request may take, which may be frustrating to users who get > stuck waiting for a request to complete. > > This commit adds a daemon timeout option (in seconds) for fuse requests. > If the timeout elapses before the request is replied to, the request will > fail with -ETIME. For reference, this is the userspace program I've been using to test the timeout paths: https://github.com/libfuse/libfuse/pull/994/commits/bb523c2db3417adfd939baef2345a485ad3f8c52 To locally simulate some of the racier conditions (eg request handler handles the request and timeout handler re-adds the request to the waitqueue), I manually added in sleeps in the kernel handlers to hit those paths > -- > 2.43.0 >
Hi Joanne, On 7/17/24 23:34, Joanne Koong wrote: > There are situations where fuse servers can become unresponsive or take > too long to reply to a request. Currently there is no upper bound on > how long a request may take, which may be frustrating to users who get > stuck waiting for a request to complete. > > This commit adds a daemon timeout option (in seconds) for fuse requests. > If the timeout elapses before the request is replied to, the request will > fail with -ETIME. > > There are 3 possibilities for a request that times out: > a) The request times out before the request has been sent to userspace > b) The request times out after the request has been sent to userspace > and before it receives a reply from the server > c) The request times out after the request has been sent to userspace > and the server replies while the kernel is timing out the request > > Proper synchronization must be added to ensure that the request is > handled correctly in all of these cases. To this effect, there is a new > FR_PROCESSING bit added to the request flags, which is set atomically by > either the timeout handler (see fuse_request_timeout()) which is invoked > after the request timeout elapses or set by the request reply handler > (see dev_do_write()), whichever gets there first. > > If the reply handler and the timeout handler are executing simultaneously > and the reply handler sets FR_PROCESSING before the timeout handler, then > the request is re-queued onto the waitqueue and the kernel will process the > reply as though the timeout did not elapse. If the timeout handler sets > FR_PROCESSING before the reply handler, then the request will fail with > -ETIME and the request will be cleaned up. > > Proper acquires on the request reference must be added to ensure that the > timeout handler does not drop the last refcount on the request while the > reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is > still accessing the request. (By "forwarder handler", this is the handler > that forwards the request to userspace). > > Currently, this is the lifecycle of the request refcount: > > Request is created: > fuse_simple_request -> allocates request, sets refcount to 1 > __fuse_request_send -> acquires refcount > queues request and waits for reply... > fuse_simple_request -> drops refcount > > Request is freed: > fuse_dev_do_write > fuse_request_end -> drops refcount on request > > The timeout handler drops the refcount on the request so that the > request is properly cleaned up if a reply is never received. Because of > this, both the forwarder handler and the reply handler must acquire a refcount > on the request while it accesses the request, and the refcount must be > acquired while the lock of the list the request is on is held. > > There is a potential race if the request is being forwarded to > userspace while the timeout handler is executing (eg FR_PENDING has > already been cleared but dev_do_read() hasn't finished executing). This > is a problem because this would free the request but the request has not > been removed from the fpq list it's on. To prevent this, dev_do_read() > must check FR_PROCESSING at the end of its logic and remove the request > from the fpq list if the timeout occurred. > > There is also the case where the connection may be aborted or the > device may be released while the timeout handler is running. To protect > against an extra refcount drop on the request, the timeout handler > checks the connected state of the list and lets the abort handler drop the > last reference if the abort is running simultaneously. Similarly, the > timeout handler also needs to check if the req->out.h.error is set to > -ESTALE, which indicates that the device release is cleaning up the > request. In both these cases, the timeout handler will return without > dropping the refcount. > > Please also note that background requests are not applicable for timeouts > since they are asynchronous. This and that thread here actually make me wonder if this is the right approach https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/ In th3 thread above a request got interrupted, but fuse-server still does not manage stop it. From my point of view, interrupting a request suggests to add a rather short kernel lifetime for it. With that one either needs to wake up in intervals and check if request timeout got exceeded or it needs to be an async kernel thread. I think that async thread would also allow to give a timeout to background requests. Or we add an async timeout to bg and interupted requests additionally? (I only basically reviewed, can't do carefully right now - on vacation and as I just noticed, on a laptop that gives me electric shocks when I connect it to power.) Thanks, Bernd > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com> > --- > fs/fuse/dev.c | 177 ++++++++++++++++++++++++++++++++++++++++++++--- > fs/fuse/fuse_i.h | 12 ++++ > fs/fuse/inode.c | 7 ++ > 3 files changed, 188 insertions(+), 8 deletions(-) > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index 9eb191b5c4de..7dd7b244951b 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req) > } > EXPORT_SYMBOL_GPL(fuse_request_end); > > +/* fuse_request_end for requests that timeout */ > +static void fuse_request_timeout(struct fuse_req *req) > +{ > + struct fuse_conn *fc = req->fm->fc; > + struct fuse_iqueue *fiq = &fc->iq; > + struct fuse_pqueue *fpq; > + > + spin_lock(&fiq->lock); > + if (!fiq->connected) { > + spin_unlock(&fiq->lock); > + /* > + * Connection is being aborted. The abort will release > + * the refcount on the request > + */ > + req->out.h.error = -ECONNABORTED; > + return; > + } > + if (test_bit(FR_PENDING, &req->flags)) { > + /* Request is not yet in userspace, bail out */ > + list_del(&req->list); > + spin_unlock(&fiq->lock); > + req->out.h.error = -ETIME; > + __fuse_put_request(req); > + return; > + } > + if (test_bit(FR_INTERRUPTED, &req->flags)) > + list_del_init(&req->intr_entry); > + > + fpq = req->fpq; > + spin_unlock(&fiq->lock); > + > + if (fpq) { > + spin_lock(&fpq->lock); > + if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) { > + spin_unlock(&fpq->lock); > + /* > + * Connection is being aborted. The abort will release > + * the refcount on the request > + */ > + req->out.h.error = -ECONNABORTED; > + return; > + } > + if (req->out.h.error == -ESTALE) { > + /* > + * Device is being released. The fuse_dev_release call > + * will drop the refcount on the request > + */ > + spin_unlock(&fpq->lock); > + return; > + } > + if (!test_bit(FR_PRIVATE, &req->flags)) > + list_del_init(&req->list); > + spin_unlock(&fpq->lock); > + } > + > + req->out.h.error = -ETIME; > + > + if (test_bit(FR_ASYNC, &req->flags)) > + req->args->end(req->fm, req->args, req->out.h.error); > + > + fuse_put_request(req); > +} > + > static int queue_interrupt(struct fuse_req *req) > { > struct fuse_iqueue *fiq = &req->fm->fc->iq; > @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req) > return 0; > } > > +enum wait_type { > + WAIT_TYPE_INTERRUPTIBLE, > + WAIT_TYPE_KILLABLE, > + WAIT_TYPE_NONINTERRUPTIBLE, > +}; > + > +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req) > +{ > + struct fuse_conn *fc = req->fm->fc; > + > + return wait_event_interruptible_timeout(req->waitq, > + test_bit(FR_FINISHED, > + &req->flags), > + fc->daemon_timeout); > +} > +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO); > + > +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type) > +{ > + struct fuse_conn *fc = req->fm->fc; > + int err; > + > +wait_answer_start: > + if (type == WAIT_TYPE_INTERRUPTIBLE) > + err = fuse_wait_event_interruptible_timeout(req); > + else if (type == WAIT_TYPE_KILLABLE) > + err = wait_event_killable_timeout(req->waitq, > + test_bit(FR_FINISHED, &req->flags), > + fc->daemon_timeout); > + > + else if (type == WAIT_TYPE_NONINTERRUPTIBLE) > + err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags), > + fc->daemon_timeout); > + else > + WARN_ON(1); > + > + /* request was answered */ > + if (err > 0) > + return 0; > + > + /* request was not answered in time */ > + if (err == 0) { > + if (test_and_set_bit(FR_PROCESSING, &req->flags)) > + /* request reply is being processed by kernel right now. > + * we should wait for the answer. > + */ > + goto wait_answer_start; > + > + fuse_request_timeout(req); > + return 0; > + } > + > + /* else request was interrupted */ > + return err; > +} > + > static void request_wait_answer(struct fuse_req *req) > { > struct fuse_conn *fc = req->fm->fc; > @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req) > > if (!fc->no_interrupt) { > /* Any signal may interrupt this */ > - err = wait_event_interruptible(req->waitq, > - test_bit(FR_FINISHED, &req->flags)); > + if (fc->daemon_timeout) > + err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE); > + else > + err = wait_event_interruptible(req->waitq, > + test_bit(FR_FINISHED, &req->flags)); > if (!err) > return; > > @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req) > > if (!test_bit(FR_FORCE, &req->flags)) { > /* Only fatal signals may interrupt this */ > - err = wait_event_killable(req->waitq, > - test_bit(FR_FINISHED, &req->flags)); > + if (fc->daemon_timeout) > + err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE); > + else > + err = wait_event_killable(req->waitq, > + test_bit(FR_FINISHED, &req->flags)); > if (!err) > return; > > @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req) > * Either request is already in userspace, or it was forced. > * Wait it out. > */ > - wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > + if (fc->daemon_timeout) > + wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE); > + else > + wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > } > > static void __fuse_request_send(struct fuse_req *req) > @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > req = list_entry(fiq->pending.next, struct fuse_req, list); > clear_bit(FR_PENDING, &req->flags); > list_del_init(&req->list); > + /* Acquire a reference since fuse_request_timeout may also be executing */ > + __fuse_get_request(req); > + req->fpq = fpq; > spin_unlock(&fiq->lock); > > args = req->args; > @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > if (args->opcode == FUSE_SETXATTR) > req->out.h.error = -E2BIG; > fuse_request_end(req); > + fuse_put_request(req); > goto restart; > } > spin_lock(&fpq->lock); > @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > } > hash = fuse_req_hash(req->in.h.unique); > list_move_tail(&req->list, &fpq->processing[hash]); > - __fuse_get_request(req); > set_bit(FR_SENT, &req->flags); > spin_unlock(&fpq->lock); > /* matches barrier in request_wait_answer() */ > smp_mb__after_atomic(); > if (test_bit(FR_INTERRUPTED, &req->flags)) > queue_interrupt(req); > + > + /* Check if request timed out */ > + if (test_bit(FR_PROCESSING, &req->flags)) { > + spin_lock(&fpq->lock); > + if (!test_bit(FR_PRIVATE, &req->flags)) > + list_del_init(&req->list); > + spin_unlock(&fpq->lock); > + fuse_put_request(req); > + return -ETIME; > + } > + > fuse_put_request(req); > > return reqsize; > @@ -1332,6 +1474,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > list_del_init(&req->list); > spin_unlock(&fpq->lock); > fuse_request_end(req); > + fuse_put_request(req); > return err; > > err_unlock: > @@ -1951,9 +2094,10 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, > goto copy_finish; > } > > + __fuse_get_request(req); > + > /* Is it an interrupt reply ID? */ > if (oh.unique & FUSE_INT_REQ_BIT) { > - __fuse_get_request(req); > spin_unlock(&fpq->lock); > > err = 0; > @@ -1969,6 +2113,13 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, > goto copy_finish; > } > > + if (test_and_set_bit(FR_PROCESSING, &req->flags)) { > + /* request has timed out already */ > + spin_unlock(&fpq->lock); > + fuse_put_request(req); > + goto copy_finish; > + } > + > clear_bit(FR_SENT, &req->flags); > list_move(&req->list, &fpq->io); > req->out.h = oh; > @@ -1995,6 +2146,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, > spin_unlock(&fpq->lock); > > fuse_request_end(req); > + fuse_put_request(req); > out: > return err ? err : nbytes; > > @@ -2260,13 +2412,22 @@ int fuse_dev_release(struct inode *inode, struct file *file) > if (fud) { > struct fuse_conn *fc = fud->fc; > struct fuse_pqueue *fpq = &fud->pq; > + struct fuse_req *req; > LIST_HEAD(to_end); > unsigned int i; > > spin_lock(&fpq->lock); > WARN_ON(!list_empty(&fpq->io)); > - for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) > + for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) { > + /* > + * Set the req error to -ESTALE so that if the timeout > + * handler tries handling it, it knows it's being > + * released > + */ > + list_for_each_entry(req, &fpq->processing[i], list) > + req->out.h.error = -ESTALE; > list_splice_init(&fpq->processing[i], &to_end); > + } > spin_unlock(&fpq->lock); > > end_requests(&to_end); > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index f23919610313..cbabebbcd5bd 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -375,6 +375,9 @@ struct fuse_io_priv { > * FR_FINISHED: request is finished > * FR_PRIVATE: request is on private list > * FR_ASYNC: request is asynchronous > + * FR_PROCESSING: request is being processed. this gets set when either > + * the reply is getting processed or the kernel is processing > + * a request timeout > */ > enum fuse_req_flag { > FR_ISREPLY, > @@ -389,6 +392,7 @@ enum fuse_req_flag { > FR_FINISHED, > FR_PRIVATE, > FR_ASYNC, > + FR_PROCESSING, > }; > > /** > @@ -435,6 +439,9 @@ struct fuse_req { > > /** fuse_mount this request belongs to */ > struct fuse_mount *fm; > + > + /** page queue this request has been added to */ > + struct fuse_pqueue *fpq; > }; > > struct fuse_iqueue; > @@ -574,6 +581,8 @@ struct fuse_fs_context { > enum fuse_dax_mode dax_mode; > unsigned int max_read; > unsigned int blksize; > + /* Daemon timeout (in seconds). 0 = no timeout (infinite wait) */ > + unsigned int daemon_timeout; > const char *subtype; > > /* DAX device, may be NULL */ > @@ -633,6 +642,9 @@ struct fuse_conn { > /** Constrain ->max_pages to this value during feature negotiation */ > unsigned int max_pages_limit; > > + /* Daemon timeout (in jiffies). 0 = no timeout (infinite wait) */ > + unsigned long daemon_timeout; > + > /** Input queue */ > struct fuse_iqueue iq; > > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c > index 99e44ea7d875..a2d53a8b8e34 100644 > --- a/fs/fuse/inode.c > +++ b/fs/fuse/inode.c > @@ -733,6 +733,7 @@ enum { > OPT_ALLOW_OTHER, > OPT_MAX_READ, > OPT_BLKSIZE, > + OPT_DAEMON_TIMEOUT, > OPT_ERR > }; > > @@ -747,6 +748,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = { > fsparam_u32 ("max_read", OPT_MAX_READ), > fsparam_u32 ("blksize", OPT_BLKSIZE), > fsparam_string ("subtype", OPT_SUBTYPE), > + fsparam_u32 ("daemon_timeout", OPT_DAEMON_TIMEOUT), > {} > }; > > @@ -830,6 +832,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param) > ctx->blksize = result.uint_32; > break; > > + case OPT_DAEMON_TIMEOUT: > + ctx->daemon_timeout = result.uint_32; > + break; > + > default: > return -EINVAL; > } > @@ -1724,6 +1730,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) > fc->group_id = ctx->group_id; > fc->legacy_opts_show = ctx->legacy_opts_show; > fc->max_read = max_t(unsigned int, 4096, ctx->max_read); > + fc->daemon_timeout = ctx->daemon_timeout * HZ; > fc->destroy = ctx->destroy; > fc->no_control = ctx->no_control; > fc->no_force_umount = ctx->no_force_umount;
On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert <bernd.schubert@fastmail.fm> wrote: > > Hi Joanne, > > On 7/17/24 23:34, Joanne Koong wrote: > > There are situations where fuse servers can become unresponsive or take > > too long to reply to a request. Currently there is no upper bound on > > how long a request may take, which may be frustrating to users who get > > stuck waiting for a request to complete. > > > > This commit adds a daemon timeout option (in seconds) for fuse requests. > > If the timeout elapses before the request is replied to, the request will > > fail with -ETIME. > > > > There are 3 possibilities for a request that times out: > > a) The request times out before the request has been sent to userspace > > b) The request times out after the request has been sent to userspace > > and before it receives a reply from the server > > c) The request times out after the request has been sent to userspace > > and the server replies while the kernel is timing out the request > > > > Proper synchronization must be added to ensure that the request is > > handled correctly in all of these cases. To this effect, there is a new > > FR_PROCESSING bit added to the request flags, which is set atomically by > > either the timeout handler (see fuse_request_timeout()) which is invoked > > after the request timeout elapses or set by the request reply handler > > (see dev_do_write()), whichever gets there first. > > > > If the reply handler and the timeout handler are executing simultaneously > > and the reply handler sets FR_PROCESSING before the timeout handler, then > > the request is re-queued onto the waitqueue and the kernel will process the > > reply as though the timeout did not elapse. If the timeout handler sets > > FR_PROCESSING before the reply handler, then the request will fail with > > -ETIME and the request will be cleaned up. > > > > Proper acquires on the request reference must be added to ensure that the > > timeout handler does not drop the last refcount on the request while the > > reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is > > still accessing the request. (By "forwarder handler", this is the handler > > that forwards the request to userspace). > > > > Currently, this is the lifecycle of the request refcount: > > > > Request is created: > > fuse_simple_request -> allocates request, sets refcount to 1 > > __fuse_request_send -> acquires refcount > > queues request and waits for reply... > > fuse_simple_request -> drops refcount > > > > Request is freed: > > fuse_dev_do_write > > fuse_request_end -> drops refcount on request > > > > The timeout handler drops the refcount on the request so that the > > request is properly cleaned up if a reply is never received. Because of > > this, both the forwarder handler and the reply handler must acquire a refcount > > on the request while it accesses the request, and the refcount must be > > acquired while the lock of the list the request is on is held. > > > > There is a potential race if the request is being forwarded to > > userspace while the timeout handler is executing (eg FR_PENDING has > > already been cleared but dev_do_read() hasn't finished executing). This > > is a problem because this would free the request but the request has not > > been removed from the fpq list it's on. To prevent this, dev_do_read() > > must check FR_PROCESSING at the end of its logic and remove the request > > from the fpq list if the timeout occurred. > > > > There is also the case where the connection may be aborted or the > > device may be released while the timeout handler is running. To protect > > against an extra refcount drop on the request, the timeout handler > > checks the connected state of the list and lets the abort handler drop the > > last reference if the abort is running simultaneously. Similarly, the > > timeout handler also needs to check if the req->out.h.error is set to > > -ESTALE, which indicates that the device release is cleaning up the > > request. In both these cases, the timeout handler will return without > > dropping the refcount. > > > > Please also note that background requests are not applicable for timeouts > > since they are asynchronous. > > > This and that thread here actually make me wonder if this is the right > approach > > https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/ > > > In th3 thread above a request got interrupted, but fuse-server still > does not manage stop it. From my point of view, interrupting a request > suggests to add a rather short kernel lifetime for it. With that one Hi Bernd, I believe this solution fixes the problem outlined in that thread (namely, that the process gets stuck waiting for a reply). If the request is interrupted before it times out, the kernel will wait with a timeout again on the request (timeout would start over, but the request will still eventually sooner or later time out). I'm not sure I agree that we want to cancel the request altogether if it's interrupted. For example, if the user uses the user-defined signal SIGUSR1, it can be unexpected and arbitrary behavior for the request to be aborted by the kernel. I also don't think this can be consistent for what the fuse server will see since some requests may have already been forwarded to userspace when the request is aborted and some requests may not have. I think if we were to enforce that the request should be aborted when it's interrupted regardless of whether a timeout is specified or not, then we should do it similarly to how the timeout handler logic handles it in this patch,rather than the implementation in the thread linked above (namely, that the request should be explicitly cleaned up immediately instead of when the interrupt request sends a reply); I don't believe the implementation in the link handles the case where for example the fuse server is in a deadlock and does not reply to the interrupt request. Also, as I understand it, it is optional for servers to reply or not to the interrupt request. > either needs to wake up in intervals and check if request timeout got > exceeded or it needs to be an async kernel thread. I think that async > thread would also allow to give a timeout to background requests. in my opinion, background requests do not need timeouts. As I understand it, background requests are used only for direct i/o async read/writes, writing back dirty pages,and readahead requests generated by the kernel. I don't think fuse servers would have a need for timing out background requests. > > Or we add an async timeout to bg and interupted requests additionally? The interrupted request will already have a timeout on it since it waits with a timeout again for the reply after it's interrupted. > > > (I only basically reviewed, can't do carefully right now - on vacation > and as I just noticed, on a laptop that gives me electric shocks when I > connect it to power.) No worries, thanks for your comments and hope you have a great vacation (without getting shocked :))! Thanks, Joanne > > > Thanks, > Bernd > > >
On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote: > There are situations where fuse servers can become unresponsive or take > too long to reply to a request. Currently there is no upper bound on > how long a request may take, which may be frustrating to users who get > stuck waiting for a request to complete. > > This commit adds a daemon timeout option (in seconds) for fuse requests. > If the timeout elapses before the request is replied to, the request will > fail with -ETIME. > > There are 3 possibilities for a request that times out: > a) The request times out before the request has been sent to userspace > b) The request times out after the request has been sent to userspace > and before it receives a reply from the server > c) The request times out after the request has been sent to userspace > and the server replies while the kernel is timing out the request > > Proper synchronization must be added to ensure that the request is > handled correctly in all of these cases. To this effect, there is a new > FR_PROCESSING bit added to the request flags, which is set atomically by > either the timeout handler (see fuse_request_timeout()) which is invoked > after the request timeout elapses or set by the request reply handler > (see dev_do_write()), whichever gets there first. > > If the reply handler and the timeout handler are executing simultaneously > and the reply handler sets FR_PROCESSING before the timeout handler, then > the request is re-queued onto the waitqueue and the kernel will process the > reply as though the timeout did not elapse. If the timeout handler sets > FR_PROCESSING before the reply handler, then the request will fail with > -ETIME and the request will be cleaned up. > > Proper acquires on the request reference must be added to ensure that the > timeout handler does not drop the last refcount on the request while the > reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is > still accessing the request. (By "forwarder handler", this is the handler > that forwards the request to userspace). > > Currently, this is the lifecycle of the request refcount: > > Request is created: > fuse_simple_request -> allocates request, sets refcount to 1 > __fuse_request_send -> acquires refcount > queues request and waits for reply... > fuse_simple_request -> drops refcount > > Request is freed: > fuse_dev_do_write > fuse_request_end -> drops refcount on request > > The timeout handler drops the refcount on the request so that the > request is properly cleaned up if a reply is never received. Because of > this, both the forwarder handler and the reply handler must acquire a refcount > on the request while it accesses the request, and the refcount must be > acquired while the lock of the list the request is on is held. > > There is a potential race if the request is being forwarded to > userspace while the timeout handler is executing (eg FR_PENDING has > already been cleared but dev_do_read() hasn't finished executing). This > is a problem because this would free the request but the request has not > been removed from the fpq list it's on. To prevent this, dev_do_read() > must check FR_PROCESSING at the end of its logic and remove the request > from the fpq list if the timeout occurred. > > There is also the case where the connection may be aborted or the > device may be released while the timeout handler is running. To protect > against an extra refcount drop on the request, the timeout handler > checks the connected state of the list and lets the abort handler drop the > last reference if the abort is running simultaneously. Similarly, the > timeout handler also needs to check if the req->out.h.error is set to > -ESTALE, which indicates that the device release is cleaning up the > request. In both these cases, the timeout handler will return without > dropping the refcount. > > Please also note that background requests are not applicable for timeouts > since they are asynchronous. > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com> > --- > fs/fuse/dev.c | 177 ++++++++++++++++++++++++++++++++++++++++++++--- > fs/fuse/fuse_i.h | 12 ++++ > fs/fuse/inode.c | 7 ++ > 3 files changed, 188 insertions(+), 8 deletions(-) > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index 9eb191b5c4de..7dd7b244951b 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req) > } > EXPORT_SYMBOL_GPL(fuse_request_end); > > +/* fuse_request_end for requests that timeout */ > +static void fuse_request_timeout(struct fuse_req *req) > +{ > + struct fuse_conn *fc = req->fm->fc; > + struct fuse_iqueue *fiq = &fc->iq; > + struct fuse_pqueue *fpq; > + > + spin_lock(&fiq->lock); > + if (!fiq->connected) { > + spin_unlock(&fiq->lock); > + /* > + * Connection is being aborted. The abort will release > + * the refcount on the request > + */ > + req->out.h.error = -ECONNABORTED; > + return; > + } > + if (test_bit(FR_PENDING, &req->flags)) { > + /* Request is not yet in userspace, bail out */ > + list_del(&req->list); > + spin_unlock(&fiq->lock); > + req->out.h.error = -ETIME; > + __fuse_put_request(req); Why is this safe? We could be the last holder of the reference on this request correct? The only places using __fuse_put_request() would be where we are in a path where the caller already holds a reference on the request. Since this is async it may not be the case right? If it is safe then it's just confusing and warrants a comment. > + return; > + } > + if (test_bit(FR_INTERRUPTED, &req->flags)) > + list_del_init(&req->intr_entry); > + > + fpq = req->fpq; > + spin_unlock(&fiq->lock); > + > + if (fpq) { > + spin_lock(&fpq->lock); > + if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) { ^^ You don't need the extra () there. > + spin_unlock(&fpq->lock); > + /* > + * Connection is being aborted. The abort will release > + * the refcount on the request > + */ > + req->out.h.error = -ECONNABORTED; > + return; > + } > + if (req->out.h.error == -ESTALE) { > + /* > + * Device is being released. The fuse_dev_release call > + * will drop the refcount on the request > + */ > + spin_unlock(&fpq->lock); > + return; > + } > + if (!test_bit(FR_PRIVATE, &req->flags)) > + list_del_init(&req->list); > + spin_unlock(&fpq->lock); > + } > + > + req->out.h.error = -ETIME; > + > + if (test_bit(FR_ASYNC, &req->flags)) > + req->args->end(req->fm, req->args, req->out.h.error); > + > + fuse_put_request(req); > +} Just a general styling thing, we have two different states for requests here, PENDING and !PENDING correct? I think it may be better to do something like if (test_bit(FR_PENDING, &req->flags)) timeout_pending_req(); else timeout_inflight_req(); and then in timeout_pending_req() you do spin_lock(&fiq->lock); if (!test_bit(FR_PENDING, &req->flags)) { spin_unlock(&fiq_lock); timeout_inflight_req(); return; } This will keep the two different state cleanup functions separate and a little cleaner to grok. > + > static int queue_interrupt(struct fuse_req *req) > { > struct fuse_iqueue *fiq = &req->fm->fc->iq; > @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req) > return 0; > } > > +enum wait_type { > + WAIT_TYPE_INTERRUPTIBLE, > + WAIT_TYPE_KILLABLE, > + WAIT_TYPE_NONINTERRUPTIBLE, > +}; > + > +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req) > +{ > + struct fuse_conn *fc = req->fm->fc; > + > + return wait_event_interruptible_timeout(req->waitq, > + test_bit(FR_FINISHED, > + &req->flags), > + fc->daemon_timeout); > +} > +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO); > + > +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type) > +{ > + struct fuse_conn *fc = req->fm->fc; > + int err; > + > +wait_answer_start: > + if (type == WAIT_TYPE_INTERRUPTIBLE) > + err = fuse_wait_event_interruptible_timeout(req); > + else if (type == WAIT_TYPE_KILLABLE) > + err = wait_event_killable_timeout(req->waitq, > + test_bit(FR_FINISHED, &req->flags), > + fc->daemon_timeout); > + > + else if (type == WAIT_TYPE_NONINTERRUPTIBLE) > + err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags), > + fc->daemon_timeout); > + else > + WARN_ON(1); This will leak some random value for err, so initialize err to something that will be dealt with, like -EINVAL; > + > + /* request was answered */ > + if (err > 0) > + return 0; > + > + /* request was not answered in time */ > + if (err == 0) { > + if (test_and_set_bit(FR_PROCESSING, &req->flags)) > + /* request reply is being processed by kernel right now. > + * we should wait for the answer. > + */ Format for multiline comments is /* * blah * blah */ and since this is a 1 line if statement put it above the if statement. > + goto wait_answer_start; > + > + fuse_request_timeout(req); > + return 0; > + } > + > + /* else request was interrupted */ > + return err; > +} > + > static void request_wait_answer(struct fuse_req *req) > { > struct fuse_conn *fc = req->fm->fc; > @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req) > > if (!fc->no_interrupt) { > /* Any signal may interrupt this */ > - err = wait_event_interruptible(req->waitq, > - test_bit(FR_FINISHED, &req->flags)); > + if (fc->daemon_timeout) > + err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE); > + else > + err = wait_event_interruptible(req->waitq, > + test_bit(FR_FINISHED, &req->flags)); > if (!err) > return; > > @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req) > > if (!test_bit(FR_FORCE, &req->flags)) { > /* Only fatal signals may interrupt this */ > - err = wait_event_killable(req->waitq, > - test_bit(FR_FINISHED, &req->flags)); > + if (fc->daemon_timeout) > + err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE); > + else > + err = wait_event_killable(req->waitq, > + test_bit(FR_FINISHED, &req->flags)); > if (!err) > return; > > @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req) > * Either request is already in userspace, or it was forced. > * Wait it out. > */ > - wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > + if (fc->daemon_timeout) > + wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE); > + else > + wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > } > > static void __fuse_request_send(struct fuse_req *req) > @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > req = list_entry(fiq->pending.next, struct fuse_req, list); > clear_bit(FR_PENDING, &req->flags); > list_del_init(&req->list); > + /* Acquire a reference since fuse_request_timeout may also be executing */ > + __fuse_get_request(req); > + req->fpq = fpq; > spin_unlock(&fiq->lock); > There's a race here with completion. If we timeout a request right here, we can end up sending that same request below. You are going to need to check test_bit(FR_PROCESSING) after you take the fpq->lock just below here to make sure you didn't race with the timeout handler and time the request out already. > args = req->args; > @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > if (args->opcode == FUSE_SETXATTR) > req->out.h.error = -E2BIG; > fuse_request_end(req); > + fuse_put_request(req); > goto restart; > } > spin_lock(&fpq->lock); > @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > } > hash = fuse_req_hash(req->in.h.unique); > list_move_tail(&req->list, &fpq->processing[hash]); > - __fuse_get_request(req); > set_bit(FR_SENT, &req->flags); > spin_unlock(&fpq->lock); > /* matches barrier in request_wait_answer() */ > smp_mb__after_atomic(); > if (test_bit(FR_INTERRUPTED, &req->flags)) > queue_interrupt(req); > + > + /* Check if request timed out */ > + if (test_bit(FR_PROCESSING, &req->flags)) { > + spin_lock(&fpq->lock); > + if (!test_bit(FR_PRIVATE, &req->flags)) > + list_del_init(&req->list); > + spin_unlock(&fpq->lock); > + fuse_put_request(req); > + return -ETIME; > + } This isn't quite right, we could have FR_PROCESSING set because we completed the request before we got here. If you put a schedule_timeout(HZ); right above this you could easily see where a request gets completed by userspace, but now you've fimed it out. Additionally if we have FR_PROCESSING set from the timeout, shouldn't this cleanup have been done already? I don't understand why we need to handle this here, we should just return and whoever is waiting on the request will get the error. Thanks, Josef
On Thu, Jul 18, 2024 at 7:44 AM Josef Bacik <josef@toxicpanda.com> wrote: > > On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote: ... > > --- > > fs/fuse/dev.c | 177 ++++++++++++++++++++++++++++++++++++++++++++--- > > fs/fuse/fuse_i.h | 12 ++++ > > fs/fuse/inode.c | 7 ++ > > 3 files changed, 188 insertions(+), 8 deletions(-) > > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > > index 9eb191b5c4de..7dd7b244951b 100644 > > --- a/fs/fuse/dev.c > > +++ b/fs/fuse/dev.c > > @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req) > > } > > EXPORT_SYMBOL_GPL(fuse_request_end); > > > > +/* fuse_request_end for requests that timeout */ > > +static void fuse_request_timeout(struct fuse_req *req) > > +{ > > + struct fuse_conn *fc = req->fm->fc; > > + struct fuse_iqueue *fiq = &fc->iq; > > + struct fuse_pqueue *fpq; > > + > > + spin_lock(&fiq->lock); > > + if (!fiq->connected) { > > + spin_unlock(&fiq->lock); > > + /* > > + * Connection is being aborted. The abort will release > > + * the refcount on the request > > + */ > > + req->out.h.error = -ECONNABORTED; > > + return; > > + } > > + if (test_bit(FR_PENDING, &req->flags)) { > > + /* Request is not yet in userspace, bail out */ > > + list_del(&req->list); > > + spin_unlock(&fiq->lock); > > + req->out.h.error = -ETIME; > > + __fuse_put_request(req); > > Why is this safe? We could be the last holder of the reference on this request > correct? The only places using __fuse_put_request() would be where we are in a > path where the caller already holds a reference on the request. Since this is > async it may not be the case right? > > If it is safe then it's just confusing and warrants a comment. > There is always a refcount still held on the request by fuse_simple_request() when this is called. I'll add a comment about this. I also just noticed that I use fuse_put_request() at the end of this function, I'll change that to __fuse_put_request() as well. > > + return; > > + } > > + if (test_bit(FR_INTERRUPTED, &req->flags)) > > + list_del_init(&req->intr_entry); > > + > > + fpq = req->fpq; > > + spin_unlock(&fiq->lock); > > + > > + if (fpq) { > > + spin_lock(&fpq->lock); > > + if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) { > ^^ > > You don't need the extra () there. > > > + spin_unlock(&fpq->lock); > > + /* > > + * Connection is being aborted. The abort will release > > + * the refcount on the request > > + */ > > + req->out.h.error = -ECONNABORTED; > > + return; > > + } > > + if (req->out.h.error == -ESTALE) { > > + /* > > + * Device is being released. The fuse_dev_release call > > + * will drop the refcount on the request > > + */ > > + spin_unlock(&fpq->lock); > > + return; > > + } > > + if (!test_bit(FR_PRIVATE, &req->flags)) > > + list_del_init(&req->list); > > + spin_unlock(&fpq->lock); > > + } > > + > > + req->out.h.error = -ETIME; > > + > > + if (test_bit(FR_ASYNC, &req->flags)) > > + req->args->end(req->fm, req->args, req->out.h.error); > > + > > + fuse_put_request(req); > > +} > > Just a general styling thing, we have two different states for requests here, > PENDING and !PENDING correct? I think it may be better to do something like > > if (test_bit(FR_PENDING, &req->flags)) > timeout_pending_req(); > else > timeout_inflight_req(); > > and then in timeout_pending_req() you do > > spin_lock(&fiq->lock); > if (!test_bit(FR_PENDING, &req->flags)) { > spin_unlock(&fiq_lock); > timeout_inflight_req(); > return; > } > > This will keep the two different state cleanup functions separate and a little > cleaner to grok. > Thanks for the suggestion, I will make this change for v2. > > + > > static int queue_interrupt(struct fuse_req *req) > > { > > struct fuse_iqueue *fiq = &req->fm->fc->iq; > > @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req) > > return 0; > > } > > > > +enum wait_type { > > + WAIT_TYPE_INTERRUPTIBLE, > > + WAIT_TYPE_KILLABLE, > > + WAIT_TYPE_NONINTERRUPTIBLE, > > +}; > > + > > +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req) > > +{ > > + struct fuse_conn *fc = req->fm->fc; > > + > > + return wait_event_interruptible_timeout(req->waitq, > > + test_bit(FR_FINISHED, > > + &req->flags), > > + fc->daemon_timeout); > > +} > > +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO); > > + > > +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type) > > +{ > > + struct fuse_conn *fc = req->fm->fc; > > + int err; > > + > > +wait_answer_start: > > + if (type == WAIT_TYPE_INTERRUPTIBLE) > > + err = fuse_wait_event_interruptible_timeout(req); > > + else if (type == WAIT_TYPE_KILLABLE) > > + err = wait_event_killable_timeout(req->waitq, > > + test_bit(FR_FINISHED, &req->flags), > > + fc->daemon_timeout); > > + > > + else if (type == WAIT_TYPE_NONINTERRUPTIBLE) > > + err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags), > > + fc->daemon_timeout); > > + else > > + WARN_ON(1); > > This will leak some random value for err, so initialize err to something that > will be dealt with, like -EINVAL; > > > + > > + /* request was answered */ > > + if (err > 0) > > + return 0; > > + > > + /* request was not answered in time */ > > + if (err == 0) { > > + if (test_and_set_bit(FR_PROCESSING, &req->flags)) > > + /* request reply is being processed by kernel right now. > > + * we should wait for the answer. > > + */ > > Format for multiline comments is > > /* > * blah > * blah > */ > > and since this is a 1 line if statement put it above the if statement. > > > + goto wait_answer_start; > > + > > + fuse_request_timeout(req); > > + return 0; > > + } > > + > > + /* else request was interrupted */ > > + return err; > > +} > > + > > static void request_wait_answer(struct fuse_req *req) > > { > > struct fuse_conn *fc = req->fm->fc; > > @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req) > > > > if (!fc->no_interrupt) { > > /* Any signal may interrupt this */ > > - err = wait_event_interruptible(req->waitq, > > - test_bit(FR_FINISHED, &req->flags)); > > + if (fc->daemon_timeout) > > + err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE); > > + else > > + err = wait_event_interruptible(req->waitq, > > + test_bit(FR_FINISHED, &req->flags)); > > if (!err) > > return; > > > > @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req) > > > > if (!test_bit(FR_FORCE, &req->flags)) { > > /* Only fatal signals may interrupt this */ > > - err = wait_event_killable(req->waitq, > > - test_bit(FR_FINISHED, &req->flags)); > > + if (fc->daemon_timeout) > > + err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE); > > + else > > + err = wait_event_killable(req->waitq, > > + test_bit(FR_FINISHED, &req->flags)); > > if (!err) > > return; > > > > @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req) > > * Either request is already in userspace, or it was forced. > > * Wait it out. > > */ > > - wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > > + if (fc->daemon_timeout) > > + wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE); > > + else > > + wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > > } > > > > static void __fuse_request_send(struct fuse_req *req) > > @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > > req = list_entry(fiq->pending.next, struct fuse_req, list); > > clear_bit(FR_PENDING, &req->flags); > > list_del_init(&req->list); > > + /* Acquire a reference since fuse_request_timeout may also be executing */ > > + __fuse_get_request(req); > > + req->fpq = fpq; > > spin_unlock(&fiq->lock); > > > > There's a race here with completion. If we timeout a request right here, we can > end up sending that same request below. I don't think there's any way around this unless we take the fpq lock while we do the fuse_copy stuff, because even if we check the FR_PROCESSING bit, the timeout handler could start running after the fpq lock is released when we do the fuse_copy calls. In my point of view, I don't think this race matters. We could have this situation happen on a regular timed-out request. For example, we send out a request to userspace and if the server takes too long to reply, the request is cancelled/invalidated in the kernel but the server will still see the request anyways. WDYT? > > You are going to need to check > > test_bit(FR_PROCESSING) > > after you take the fpq->lock just below here to make sure you didn't race with > the timeout handler and time the request out already. > > > args = req->args; > > @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > > if (args->opcode == FUSE_SETXATTR) > > req->out.h.error = -E2BIG; > > fuse_request_end(req); > > + fuse_put_request(req); > > goto restart; > > } > > spin_lock(&fpq->lock); > > @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > > } > > hash = fuse_req_hash(req->in.h.unique); > > list_move_tail(&req->list, &fpq->processing[hash]); > > - __fuse_get_request(req); > > set_bit(FR_SENT, &req->flags); > > spin_unlock(&fpq->lock); > > /* matches barrier in request_wait_answer() */ > > smp_mb__after_atomic(); > > if (test_bit(FR_INTERRUPTED, &req->flags)) > > queue_interrupt(req); > > + > > + /* Check if request timed out */ > > + if (test_bit(FR_PROCESSING, &req->flags)) { > > + spin_lock(&fpq->lock); > > + if (!test_bit(FR_PRIVATE, &req->flags)) > > + list_del_init(&req->list); > > + spin_unlock(&fpq->lock); > > + fuse_put_request(req); > > + return -ETIME; > > + } > > This isn't quite right, we could have FR_PROCESSING set because we completed the > request before we got here. If you put a schedule_timeout(HZ); right above this > you could easily see where a request gets completed by userspace, but now you've > fimed it out. Oh I see, you're talking about the race where a request is replied to immediately after the fuse_copy calls and before this gets called. Then when we get here, we can't differentiate between whether FR_PROCESSING was set by the timeout handler or the reply handler. I think the simplest way around this is to check if the FR_SENT flag was cleared (the reply handler clears it while holding the fpq lock where FR_PROCESSING gets set and the timeout handler doesn't clear it), then return -ETIME if it wasn't and 0 if it was. I'll add this into v2. > > Additionally if we have FR_PROCESSING set from the timeout, shouldn't this > cleanup have been done already? I don't understand why we need to handle this > here, we should just return and whoever is waiting on the request will get the > error. In most cases yes, but there is a race where the timeout handler may finish executing before the logic in dev_do_read that adds the request to the fpq lists. If this happens, the freed request will remain on the list. i think this race currently exists prior to these changes as well - in the case you mentioned above where the request may have been completed right after the fuse_copy calls in dev_do_read and before dev_do_read moves the request to the fpq lists. We would get into the same situation with a freed request still on the list. Thanks, Joanne > > Thanks, > > Josef
On 7/18/24 07:24, Joanne Koong wrote: > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert > <bernd.schubert@fastmail.fm> wrote: >> >> Hi Joanne, >> >> On 7/17/24 23:34, Joanne Koong wrote: >>> There are situations where fuse servers can become unresponsive or take >>> too long to reply to a request. Currently there is no upper bound on >>> how long a request may take, which may be frustrating to users who get >>> stuck waiting for a request to complete. >>> >>> This commit adds a daemon timeout option (in seconds) for fuse requests. >>> If the timeout elapses before the request is replied to, the request will >>> fail with -ETIME. >>> >>> There are 3 possibilities for a request that times out: >>> a) The request times out before the request has been sent to userspace >>> b) The request times out after the request has been sent to userspace >>> and before it receives a reply from the server >>> c) The request times out after the request has been sent to userspace >>> and the server replies while the kernel is timing out the request >>> >>> Proper synchronization must be added to ensure that the request is >>> handled correctly in all of these cases. To this effect, there is a new >>> FR_PROCESSING bit added to the request flags, which is set atomically by >>> either the timeout handler (see fuse_request_timeout()) which is invoked >>> after the request timeout elapses or set by the request reply handler >>> (see dev_do_write()), whichever gets there first. >>> >>> If the reply handler and the timeout handler are executing simultaneously >>> and the reply handler sets FR_PROCESSING before the timeout handler, then >>> the request is re-queued onto the waitqueue and the kernel will process the >>> reply as though the timeout did not elapse. If the timeout handler sets >>> FR_PROCESSING before the reply handler, then the request will fail with >>> -ETIME and the request will be cleaned up. >>> >>> Proper acquires on the request reference must be added to ensure that the >>> timeout handler does not drop the last refcount on the request while the >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is >>> still accessing the request. (By "forwarder handler", this is the handler >>> that forwards the request to userspace). >>> >>> Currently, this is the lifecycle of the request refcount: >>> >>> Request is created: >>> fuse_simple_request -> allocates request, sets refcount to 1 >>> __fuse_request_send -> acquires refcount >>> queues request and waits for reply... >>> fuse_simple_request -> drops refcount >>> >>> Request is freed: >>> fuse_dev_do_write >>> fuse_request_end -> drops refcount on request >>> >>> The timeout handler drops the refcount on the request so that the >>> request is properly cleaned up if a reply is never received. Because of >>> this, both the forwarder handler and the reply handler must acquire a refcount >>> on the request while it accesses the request, and the refcount must be >>> acquired while the lock of the list the request is on is held. >>> >>> There is a potential race if the request is being forwarded to >>> userspace while the timeout handler is executing (eg FR_PENDING has >>> already been cleared but dev_do_read() hasn't finished executing). This >>> is a problem because this would free the request but the request has not >>> been removed from the fpq list it's on. To prevent this, dev_do_read() >>> must check FR_PROCESSING at the end of its logic and remove the request >>> from the fpq list if the timeout occurred. >>> >>> There is also the case where the connection may be aborted or the >>> device may be released while the timeout handler is running. To protect >>> against an extra refcount drop on the request, the timeout handler >>> checks the connected state of the list and lets the abort handler drop the >>> last reference if the abort is running simultaneously. Similarly, the >>> timeout handler also needs to check if the req->out.h.error is set to >>> -ESTALE, which indicates that the device release is cleaning up the >>> request. In both these cases, the timeout handler will return without >>> dropping the refcount. >>> >>> Please also note that background requests are not applicable for timeouts >>> since they are asynchronous. >> >> >> This and that thread here actually make me wonder if this is the right >> approach >> >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/ >> >> >> In th3 thread above a request got interrupted, but fuse-server still >> does not manage stop it. From my point of view, interrupting a request >> suggests to add a rather short kernel lifetime for it. With that one > > Hi Bernd, > > I believe this solution fixes the problem outlined in that thread > (namely, that the process gets stuck waiting for a reply). If the > request is interrupted before it times out, the kernel will wait with > a timeout again on the request (timeout would start over, but the > request will still eventually sooner or later time out). I'm not sure > I agree that we want to cancel the request altogether if it's > interrupted. For example, if the user uses the user-defined signal > SIGUSR1, it can be unexpected and arbitrary behavior for the request > to be aborted by the kernel. I also don't think this can be consistent > for what the fuse server will see since some requests may have already > been forwarded to userspace when the request is aborted and some > requests may not have. > > I think if we were to enforce that the request should be aborted when > it's interrupted regardless of whether a timeout is specified or not, > then we should do it similarly to how the timeout handler logic > handles it in this patch,rather than the implementation in the thread > linked above (namely, that the request should be explicitly cleaned up > immediately instead of when the interrupt request sends a reply); I > don't believe the implementation in the link handles the case where > for example the fuse server is in a deadlock and does not reply to the > interrupt request. Also, as I understand it, it is optional for > servers to reply or not to the interrupt request. Hi Joanne, yeah, the solution in the link above is definitely not ideal and I think a timout based solution would be better. But I think your patch wouldn't work either right now, unless server side sets a request timeout. Btw, I would rename the variable 'daemon_timeout' to somethink like req_timeout. > >> either needs to wake up in intervals and check if request timeout got >> exceeded or it needs to be an async kernel thread. I think that async >> thread would also allow to give a timeout to background requests. > > in my opinion, background requests do not need timeouts. As I > understand it, background requests are used only for direct i/o async > read/writes, writing back dirty pages,and readahead requests generated > by the kernel. I don't think fuse servers would have a need for timing > out background requests. There is another discussion here, where timeouts are a possible although ugly solution to avoid page copies https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@fastmail.fm/T/ That is the bg writeback code path. > >> >> Or we add an async timeout to bg and interupted requests additionally? > > The interrupted request will already have a timeout on it since it > waits with a timeout again for the reply after it's interrupted. If daemon side configures timeouts. And interrupted requests might want to have a different timeout. I will check when I'm back if we can update your patch a bit for that. Your patch hooks in quite nicely and basically without overhead into fg (sync) requests. Timing out bg requests will have a bit overhead (unless I miss something), so maybe we need two solutions here. And if we want to go that route at all, to avoid these extra fuse page copies. > >> >> >> (I only basically reviewed, can't do carefully right now - on vacation >> and as I just noticed, on a laptop that gives me electric shocks when I >> connect it to power.) > > No worries, thanks for your comments and hope you have a great > vacation (without getting shocked :))! Thank you! For now I'm not connecting power, 3h of battery left :) Thanks, Bernd
On Thu, Jul 18, 2024 at 3:11 PM Bernd Schubert <bs_lists@aakef.fastmail.fm> wrote: > > > > On 7/18/24 07:24, Joanne Koong wrote: > > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert > > <bernd.schubert@fastmail.fm> wrote: > >> > >> Hi Joanne, > >> > >> On 7/17/24 23:34, Joanne Koong wrote: > >>> There are situations where fuse servers can become unresponsive or take > >>> too long to reply to a request. Currently there is no upper bound on > >>> how long a request may take, which may be frustrating to users who get > >>> stuck waiting for a request to complete. > >>> > >>> This commit adds a daemon timeout option (in seconds) for fuse requests. > >>> If the timeout elapses before the request is replied to, the request will > >>> fail with -ETIME. > >>> > >>> There are 3 possibilities for a request that times out: > >>> a) The request times out before the request has been sent to userspace > >>> b) The request times out after the request has been sent to userspace > >>> and before it receives a reply from the server > >>> c) The request times out after the request has been sent to userspace > >>> and the server replies while the kernel is timing out the request > >>> > >>> Proper synchronization must be added to ensure that the request is > >>> handled correctly in all of these cases. To this effect, there is a new > >>> FR_PROCESSING bit added to the request flags, which is set atomically by > >>> either the timeout handler (see fuse_request_timeout()) which is invoked > >>> after the request timeout elapses or set by the request reply handler > >>> (see dev_do_write()), whichever gets there first. > >>> > >>> If the reply handler and the timeout handler are executing simultaneously > >>> and the reply handler sets FR_PROCESSING before the timeout handler, then > >>> the request is re-queued onto the waitqueue and the kernel will process the > >>> reply as though the timeout did not elapse. If the timeout handler sets > >>> FR_PROCESSING before the reply handler, then the request will fail with > >>> -ETIME and the request will be cleaned up. > >>> > >>> Proper acquires on the request reference must be added to ensure that the > >>> timeout handler does not drop the last refcount on the request while the > >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is > >>> still accessing the request. (By "forwarder handler", this is the handler > >>> that forwards the request to userspace). > >>> > >>> Currently, this is the lifecycle of the request refcount: > >>> > >>> Request is created: > >>> fuse_simple_request -> allocates request, sets refcount to 1 > >>> __fuse_request_send -> acquires refcount > >>> queues request and waits for reply... > >>> fuse_simple_request -> drops refcount > >>> > >>> Request is freed: > >>> fuse_dev_do_write > >>> fuse_request_end -> drops refcount on request > >>> > >>> The timeout handler drops the refcount on the request so that the > >>> request is properly cleaned up if a reply is never received. Because of > >>> this, both the forwarder handler and the reply handler must acquire a refcount > >>> on the request while it accesses the request, and the refcount must be > >>> acquired while the lock of the list the request is on is held. > >>> > >>> There is a potential race if the request is being forwarded to > >>> userspace while the timeout handler is executing (eg FR_PENDING has > >>> already been cleared but dev_do_read() hasn't finished executing). This > >>> is a problem because this would free the request but the request has not > >>> been removed from the fpq list it's on. To prevent this, dev_do_read() > >>> must check FR_PROCESSING at the end of its logic and remove the request > >>> from the fpq list if the timeout occurred. > >>> > >>> There is also the case where the connection may be aborted or the > >>> device may be released while the timeout handler is running. To protect > >>> against an extra refcount drop on the request, the timeout handler > >>> checks the connected state of the list and lets the abort handler drop the > >>> last reference if the abort is running simultaneously. Similarly, the > >>> timeout handler also needs to check if the req->out.h.error is set to > >>> -ESTALE, which indicates that the device release is cleaning up the > >>> request. In both these cases, the timeout handler will return without > >>> dropping the refcount. > >>> > >>> Please also note that background requests are not applicable for timeouts > >>> since they are asynchronous. > >> > >> > >> This and that thread here actually make me wonder if this is the right > >> approach > >> > >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/ > >> > >> > >> In th3 thread above a request got interrupted, but fuse-server still > >> does not manage stop it. From my point of view, interrupting a request > >> suggests to add a rather short kernel lifetime for it. With that one > > > > Hi Bernd, > > > > I believe this solution fixes the problem outlined in that thread > > (namely, that the process gets stuck waiting for a reply). If the > > request is interrupted before it times out, the kernel will wait with > > a timeout again on the request (timeout would start over, but the > > request will still eventually sooner or later time out). I'm not sure > > I agree that we want to cancel the request altogether if it's > > interrupted. For example, if the user uses the user-defined signal > > SIGUSR1, it can be unexpected and arbitrary behavior for the request > > to be aborted by the kernel. I also don't think this can be consistent > > for what the fuse server will see since some requests may have already > > been forwarded to userspace when the request is aborted and some > > requests may not have. > > > > I think if we were to enforce that the request should be aborted when > > it's interrupted regardless of whether a timeout is specified or not, > > then we should do it similarly to how the timeout handler logic > > handles it in this patch,rather than the implementation in the thread > > linked above (namely, that the request should be explicitly cleaned up > > immediately instead of when the interrupt request sends a reply); I > > don't believe the implementation in the link handles the case where > > for example the fuse server is in a deadlock and does not reply to the > > interrupt request. Also, as I understand it, it is optional for > > servers to reply or not to the interrupt request. > > Hi Joanne, > > yeah, the solution in the link above is definitely not ideal and I think > a timout based solution would be better. But I think your patch wouldn't > work either right now, unless server side sets a request timeout. > Btw, I would rename the variable 'daemon_timeout' to somethink like > req_timeout. > Hi Bernd, I think we need to figure out if we indeed want the kernel to abort interrupted requests if no request timeout was explicitly set by the server. I'm leaning towards no, for the reasons in my previous reply; in addition to that I'm also not sure if we would be potentially breaking existing filesystems if we introduced this new behavior. Curious to hear your and others' thoughts on this. (Btw, if we did want to add this in, i think the change would be actually pretty simple. We could pretty much just reuse all the logic that's implemented in the timeout handling code. It's very much the same scenario (request getting aborted and needing to protect against races with different handlers)) I will rename daemon_timeout to req_timeout in v2. Thanks for the suggestion. > > > >> either needs to wake up in intervals and check if request timeout got > >> exceeded or it needs to be an async kernel thread. I think that async > >> thread would also allow to give a timeout to background requests. > > > > in my opinion, background requests do not need timeouts. As I > > understand it, background requests are used only for direct i/o async > > read/writes, writing back dirty pages,and readahead requests generated > > by the kernel. I don't think fuse servers would have a need for timing > > out background requests. > > There is another discussion here, where timeouts are a possible although > ugly solution to avoid page copies > > https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@fastmail.fm/T/ > Thanks for the link, it's an interesting read. > > That is the bg writeback code path. > > > > >> > >> Or we add an async timeout to bg and interupted requests additionally? > > > > The interrupted request will already have a timeout on it since it > > waits with a timeout again for the reply after it's interrupted. > > If daemon side configures timeouts. And interrupted requests might want > to have a different timeout. I will check when I'm back if we can update > your patch a bit for that. > > Your patch hooks in quite nicely and basically without overhead into fg > (sync) requests. Timing out bg requests will have a bit overhead (unless > I miss something), so maybe we need two solutions here. And if we want > to go that route at all, to avoid these extra fuse page copies. > Agreed, I think if we do decide to go down this route, it seems cleaner to me to have the background request timeouts handled separately. Maybe something like having a timer per batch (where "batch" is the group of requests that get flushed at the same time)? That seems to me like the approach with the least overhead. Thanks, Joanne > > > > >> > >> > >> (I only basically reviewed, can't do carefully right now - on vacation > >> and as I just noticed, on a laptop that gives me electric shocks when I > >> connect it to power.) > > > > No worries, thanks for your comments and hope you have a great > > vacation (without getting shocked :))! > > Thank you! For now I'm not connecting power, 3h of battery left :) > > > Thanks, > Bernd
On Thu, Jul 18, 2024 at 05:37:06PM -0700, Joanne Koong wrote: > On Thu, Jul 18, 2024 at 3:11 PM Bernd Schubert > <bs_lists@aakef.fastmail.fm> wrote: > > > > > > > > On 7/18/24 07:24, Joanne Koong wrote: > > > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert > > > <bernd.schubert@fastmail.fm> wrote: > > >> > > >> Hi Joanne, > > >> > > >> On 7/17/24 23:34, Joanne Koong wrote: > > >>> There are situations where fuse servers can become unresponsive or take > > >>> too long to reply to a request. Currently there is no upper bound on > > >>> how long a request may take, which may be frustrating to users who get > > >>> stuck waiting for a request to complete. > > >>> > > >>> This commit adds a daemon timeout option (in seconds) for fuse requests. > > >>> If the timeout elapses before the request is replied to, the request will > > >>> fail with -ETIME. > > >>> > > >>> There are 3 possibilities for a request that times out: > > >>> a) The request times out before the request has been sent to userspace > > >>> b) The request times out after the request has been sent to userspace > > >>> and before it receives a reply from the server > > >>> c) The request times out after the request has been sent to userspace > > >>> and the server replies while the kernel is timing out the request > > >>> > > >>> Proper synchronization must be added to ensure that the request is > > >>> handled correctly in all of these cases. To this effect, there is a new > > >>> FR_PROCESSING bit added to the request flags, which is set atomically by > > >>> either the timeout handler (see fuse_request_timeout()) which is invoked > > >>> after the request timeout elapses or set by the request reply handler > > >>> (see dev_do_write()), whichever gets there first. > > >>> > > >>> If the reply handler and the timeout handler are executing simultaneously > > >>> and the reply handler sets FR_PROCESSING before the timeout handler, then > > >>> the request is re-queued onto the waitqueue and the kernel will process the > > >>> reply as though the timeout did not elapse. If the timeout handler sets > > >>> FR_PROCESSING before the reply handler, then the request will fail with > > >>> -ETIME and the request will be cleaned up. > > >>> > > >>> Proper acquires on the request reference must be added to ensure that the > > >>> timeout handler does not drop the last refcount on the request while the > > >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is > > >>> still accessing the request. (By "forwarder handler", this is the handler > > >>> that forwards the request to userspace). > > >>> > > >>> Currently, this is the lifecycle of the request refcount: > > >>> > > >>> Request is created: > > >>> fuse_simple_request -> allocates request, sets refcount to 1 > > >>> __fuse_request_send -> acquires refcount > > >>> queues request and waits for reply... > > >>> fuse_simple_request -> drops refcount > > >>> > > >>> Request is freed: > > >>> fuse_dev_do_write > > >>> fuse_request_end -> drops refcount on request > > >>> > > >>> The timeout handler drops the refcount on the request so that the > > >>> request is properly cleaned up if a reply is never received. Because of > > >>> this, both the forwarder handler and the reply handler must acquire a refcount > > >>> on the request while it accesses the request, and the refcount must be > > >>> acquired while the lock of the list the request is on is held. > > >>> > > >>> There is a potential race if the request is being forwarded to > > >>> userspace while the timeout handler is executing (eg FR_PENDING has > > >>> already been cleared but dev_do_read() hasn't finished executing). This > > >>> is a problem because this would free the request but the request has not > > >>> been removed from the fpq list it's on. To prevent this, dev_do_read() > > >>> must check FR_PROCESSING at the end of its logic and remove the request > > >>> from the fpq list if the timeout occurred. > > >>> > > >>> There is also the case where the connection may be aborted or the > > >>> device may be released while the timeout handler is running. To protect > > >>> against an extra refcount drop on the request, the timeout handler > > >>> checks the connected state of the list and lets the abort handler drop the > > >>> last reference if the abort is running simultaneously. Similarly, the > > >>> timeout handler also needs to check if the req->out.h.error is set to > > >>> -ESTALE, which indicates that the device release is cleaning up the > > >>> request. In both these cases, the timeout handler will return without > > >>> dropping the refcount. > > >>> > > >>> Please also note that background requests are not applicable for timeouts > > >>> since they are asynchronous. > > >> > > >> > > >> This and that thread here actually make me wonder if this is the right > > >> approach > > >> > > >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/ > > >> > > >> > > >> In th3 thread above a request got interrupted, but fuse-server still > > >> does not manage stop it. From my point of view, interrupting a request > > >> suggests to add a rather short kernel lifetime for it. With that one > > > > > > Hi Bernd, > > > > > > I believe this solution fixes the problem outlined in that thread > > > (namely, that the process gets stuck waiting for a reply). If the > > > request is interrupted before it times out, the kernel will wait with > > > a timeout again on the request (timeout would start over, but the > > > request will still eventually sooner or later time out). I'm not sure > > > I agree that we want to cancel the request altogether if it's > > > interrupted. For example, if the user uses the user-defined signal > > > SIGUSR1, it can be unexpected and arbitrary behavior for the request > > > to be aborted by the kernel. I also don't think this can be consistent > > > for what the fuse server will see since some requests may have already > > > been forwarded to userspace when the request is aborted and some > > > requests may not have. > > > > > > I think if we were to enforce that the request should be aborted when > > > it's interrupted regardless of whether a timeout is specified or not, > > > then we should do it similarly to how the timeout handler logic > > > handles it in this patch,rather than the implementation in the thread > > > linked above (namely, that the request should be explicitly cleaned up > > > immediately instead of when the interrupt request sends a reply); I > > > don't believe the implementation in the link handles the case where > > > for example the fuse server is in a deadlock and does not reply to the > > > interrupt request. Also, as I understand it, it is optional for > > > servers to reply or not to the interrupt request. > > > > Hi Joanne, > > > > yeah, the solution in the link above is definitely not ideal and I think > > a timout based solution would be better. But I think your patch wouldn't > > work either right now, unless server side sets a request timeout. > > Btw, I would rename the variable 'daemon_timeout' to somethink like > > req_timeout. > > > Hi Bernd, > > I think we need to figure out if we indeed want the kernel to abort > interrupted requests if no request timeout was explicitly set by the > server. I'm leaning towards no, for the reasons in my previous reply; > in addition to that I'm also not sure if we would be potentially > breaking existing filesystems if we introduced this new behavior. > Curious to hear your and others' thoughts on this. > > (Btw, if we did want to add this in, i think the change would be > actually pretty simple. We could pretty much just reuse all the logic > that's implemented in the timeout handling code. It's very much the > same scenario (request getting aborted and needing to protect against > races with different handlers)) > > I will rename daemon_timeout to req_timeout in v2. Thanks for the suggestion. > > > > > > >> either needs to wake up in intervals and check if request timeout got > > >> exceeded or it needs to be an async kernel thread. I think that async > > >> thread would also allow to give a timeout to background requests. > > > > > > in my opinion, background requests do not need timeouts. As I > > > understand it, background requests are used only for direct i/o async > > > read/writes, writing back dirty pages,and readahead requests generated > > > by the kernel. I don't think fuse servers would have a need for timing > > > out background requests. > > > > There is another discussion here, where timeouts are a possible although > > ugly solution to avoid page copies > > > > https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@fastmail.fm/T/ > > > Thanks for the link, it's an interesting read. > > > > > That is the bg writeback code path. > > > > > > > >> > > >> Or we add an async timeout to bg and interupted requests additionally? > > > > > > The interrupted request will already have a timeout on it since it > > > waits with a timeout again for the reply after it's interrupted. > > > > If daemon side configures timeouts. And interrupted requests might want > > to have a different timeout. I will check when I'm back if we can update > > your patch a bit for that. > > > > Your patch hooks in quite nicely and basically without overhead into fg > > (sync) requests. Timing out bg requests will have a bit overhead (unless > > I miss something), so maybe we need two solutions here. And if we want > > to go that route at all, to avoid these extra fuse page copies. > > > Agreed, I think if we do decide to go down this route, it seems > cleaner to me to have the background request timeouts handled > separately. Maybe something like having a timer per batch (where > "batch" is the group of requests that get flushed at the same time)? > That seems to me like the approach with the least overhead. > I don't want to have a bunch of different timeouts, we should just have one and have consistent behavior across all classes of requests. I think the only thing we should have that is "separate" is a way to set request timeouts that aren't set by the daemon itself. Administrators should be able to set a per-mount timeout via sysfs/algo in order to have some sort of control over possibly malicious FUSE file systems. But that should just tie into whatever mechanism you come up with, and everything should all behave consistently with that timeout. Thanks, Josef
On Fri, Jul 19, 2024 at 6:06 AM Josef Bacik <josef@toxicpanda.com> wrote: > > On Thu, Jul 18, 2024 at 05:37:06PM -0700, Joanne Koong wrote: > > On Thu, Jul 18, 2024 at 3:11 PM Bernd Schubert > > <bs_lists@aakef.fastmail.fm> wrote: > > > > > > > > > > > > On 7/18/24 07:24, Joanne Koong wrote: > > > > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert > > > > <bernd.schubert@fastmail.fm> wrote: > > > >> > > > >> Hi Joanne, > > > >> > > > >> On 7/17/24 23:34, Joanne Koong wrote: > > > >>> There are situations where fuse servers can become unresponsive or take > > > >>> too long to reply to a request. Currently there is no upper bound on > > > >>> how long a request may take, which may be frustrating to users who get > > > >>> stuck waiting for a request to complete. > > > >>> > > > >>> This commit adds a daemon timeout option (in seconds) for fuse requests. > > > >>> If the timeout elapses before the request is replied to, the request will > > > >>> fail with -ETIME. > > > >>> > > > >>> There are 3 possibilities for a request that times out: > > > >>> a) The request times out before the request has been sent to userspace > > > >>> b) The request times out after the request has been sent to userspace > > > >>> and before it receives a reply from the server > > > >>> c) The request times out after the request has been sent to userspace > > > >>> and the server replies while the kernel is timing out the request > > > >>> > > > >>> Proper synchronization must be added to ensure that the request is > > > >>> handled correctly in all of these cases. To this effect, there is a new > > > >>> FR_PROCESSING bit added to the request flags, which is set atomically by > > > >>> either the timeout handler (see fuse_request_timeout()) which is invoked > > > >>> after the request timeout elapses or set by the request reply handler > > > >>> (see dev_do_write()), whichever gets there first. > > > >>> > > > >>> If the reply handler and the timeout handler are executing simultaneously > > > >>> and the reply handler sets FR_PROCESSING before the timeout handler, then > > > >>> the request is re-queued onto the waitqueue and the kernel will process the > > > >>> reply as though the timeout did not elapse. If the timeout handler sets > > > >>> FR_PROCESSING before the reply handler, then the request will fail with > > > >>> -ETIME and the request will be cleaned up. > > > >>> > > > >>> Proper acquires on the request reference must be added to ensure that the > > > >>> timeout handler does not drop the last refcount on the request while the > > > >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is > > > >>> still accessing the request. (By "forwarder handler", this is the handler > > > >>> that forwards the request to userspace). > > > >>> > > > >>> Currently, this is the lifecycle of the request refcount: > > > >>> > > > >>> Request is created: > > > >>> fuse_simple_request -> allocates request, sets refcount to 1 > > > >>> __fuse_request_send -> acquires refcount > > > >>> queues request and waits for reply... > > > >>> fuse_simple_request -> drops refcount > > > >>> > > > >>> Request is freed: > > > >>> fuse_dev_do_write > > > >>> fuse_request_end -> drops refcount on request > > > >>> > > > >>> The timeout handler drops the refcount on the request so that the > > > >>> request is properly cleaned up if a reply is never received. Because of > > > >>> this, both the forwarder handler and the reply handler must acquire a refcount > > > >>> on the request while it accesses the request, and the refcount must be > > > >>> acquired while the lock of the list the request is on is held. > > > >>> > > > >>> There is a potential race if the request is being forwarded to > > > >>> userspace while the timeout handler is executing (eg FR_PENDING has > > > >>> already been cleared but dev_do_read() hasn't finished executing). This > > > >>> is a problem because this would free the request but the request has not > > > >>> been removed from the fpq list it's on. To prevent this, dev_do_read() > > > >>> must check FR_PROCESSING at the end of its logic and remove the request > > > >>> from the fpq list if the timeout occurred. > > > >>> > > > >>> There is also the case where the connection may be aborted or the > > > >>> device may be released while the timeout handler is running. To protect > > > >>> against an extra refcount drop on the request, the timeout handler > > > >>> checks the connected state of the list and lets the abort handler drop the > > > >>> last reference if the abort is running simultaneously. Similarly, the > > > >>> timeout handler also needs to check if the req->out.h.error is set to > > > >>> -ESTALE, which indicates that the device release is cleaning up the > > > >>> request. In both these cases, the timeout handler will return without > > > >>> dropping the refcount. > > > >>> > > > >>> Please also note that background requests are not applicable for timeouts > > > >>> since they are asynchronous. > > > >> > > > >> > > > >> This and that thread here actually make me wonder if this is the right > > > >> approach > > > >> > > > >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/ > > > >> > > > >> > > > >> In th3 thread above a request got interrupted, but fuse-server still > > > >> does not manage stop it. From my point of view, interrupting a request > > > >> suggests to add a rather short kernel lifetime for it. With that one > > > > > > > > Hi Bernd, > > > > > > > > I believe this solution fixes the problem outlined in that thread > > > > (namely, that the process gets stuck waiting for a reply). If the > > > > request is interrupted before it times out, the kernel will wait with > > > > a timeout again on the request (timeout would start over, but the > > > > request will still eventually sooner or later time out). I'm not sure > > > > I agree that we want to cancel the request altogether if it's > > > > interrupted. For example, if the user uses the user-defined signal > > > > SIGUSR1, it can be unexpected and arbitrary behavior for the request > > > > to be aborted by the kernel. I also don't think this can be consistent > > > > for what the fuse server will see since some requests may have already > > > > been forwarded to userspace when the request is aborted and some > > > > requests may not have. > > > > > > > > I think if we were to enforce that the request should be aborted when > > > > it's interrupted regardless of whether a timeout is specified or not, > > > > then we should do it similarly to how the timeout handler logic > > > > handles it in this patch,rather than the implementation in the thread > > > > linked above (namely, that the request should be explicitly cleaned up > > > > immediately instead of when the interrupt request sends a reply); I > > > > don't believe the implementation in the link handles the case where > > > > for example the fuse server is in a deadlock and does not reply to the > > > > interrupt request. Also, as I understand it, it is optional for > > > > servers to reply or not to the interrupt request. > > > > > > Hi Joanne, > > > > > > yeah, the solution in the link above is definitely not ideal and I think > > > a timout based solution would be better. But I think your patch wouldn't > > > work either right now, unless server side sets a request timeout. > > > Btw, I would rename the variable 'daemon_timeout' to somethink like > > > req_timeout. > > > > > Hi Bernd, > > > > I think we need to figure out if we indeed want the kernel to abort > > interrupted requests if no request timeout was explicitly set by the > > server. I'm leaning towards no, for the reasons in my previous reply; > > in addition to that I'm also not sure if we would be potentially > > breaking existing filesystems if we introduced this new behavior. > > Curious to hear your and others' thoughts on this. > > > > (Btw, if we did want to add this in, i think the change would be > > actually pretty simple. We could pretty much just reuse all the logic > > that's implemented in the timeout handling code. It's very much the > > same scenario (request getting aborted and needing to protect against > > races with different handlers)) > > > > I will rename daemon_timeout to req_timeout in v2. Thanks for the suggestion. > > > > > > > > > >> either needs to wake up in intervals and check if request timeout got > > > >> exceeded or it needs to be an async kernel thread. I think that async > > > >> thread would also allow to give a timeout to background requests. > > > > > > > > in my opinion, background requests do not need timeouts. As I > > > > understand it, background requests are used only for direct i/o async > > > > read/writes, writing back dirty pages,and readahead requests generated > > > > by the kernel. I don't think fuse servers would have a need for timing > > > > out background requests. > > > > > > There is another discussion here, where timeouts are a possible although > > > ugly solution to avoid page copies > > > > > > https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@fastmail.fm/T/ > > > > > Thanks for the link, it's an interesting read. > > > > > > > > That is the bg writeback code path. > > > > > > > > > > >> > > > >> Or we add an async timeout to bg and interupted requests additionally? > > > > > > > > The interrupted request will already have a timeout on it since it > > > > waits with a timeout again for the reply after it's interrupted. > > > > > > If daemon side configures timeouts. And interrupted requests might want > > > to have a different timeout. I will check when I'm back if we can update > > > your patch a bit for that. > > > > > > Your patch hooks in quite nicely and basically without overhead into fg > > > (sync) requests. Timing out bg requests will have a bit overhead (unless > > > I miss something), so maybe we need two solutions here. And if we want > > > to go that route at all, to avoid these extra fuse page copies. > > > > > Agreed, I think if we do decide to go down this route, it seems > > cleaner to me to have the background request timeouts handled > > separately. Maybe something like having a timer per batch (where > > "batch" is the group of requests that get flushed at the same time)? > > That seems to me like the approach with the least overhead. > > > > I don't want to have a bunch of different timeouts, we should just have one and > have consistent behavior across all classes of requests. > > I think the only thing we should have that is "separate" is a way to set request > timeouts that aren't set by the daemon itself. Administrators should be able to > set a per-mount timeout via sysfs/algo in order to have some sort of control > over possibly malicious FUSE file systems. > > But that should just tie into whatever mechanism you come up with, and > everything should all behave consistently with that timeout. Thanks, > > Josef To summarize this thread so far, there are 2 open questions: 1) should interrupted requests be automatically aborted/cancelled by default (even if no timeout is set)? 2) should background requests also have some timeout enforced on them? I think the decision on 2) is the blocker for this patch ( 1) could be added in the future as a separate patch). Is this the route we want to go down for avoiding the extra page copies? Who makes the call on this? I'm assuming it's the fuse maintainer (Miklos)?
On Mon, Jul 22, 2024 at 11:58:03AM -0700, Joanne Koong wrote: > On Fri, Jul 19, 2024 at 6:06 AM Josef Bacik <josef@toxicpanda.com> wrote: > > > > On Thu, Jul 18, 2024 at 05:37:06PM -0700, Joanne Koong wrote: > > > On Thu, Jul 18, 2024 at 3:11 PM Bernd Schubert > > > <bs_lists@aakef.fastmail.fm> wrote: > > > > > > > > > > > > > > > > On 7/18/24 07:24, Joanne Koong wrote: > > > > > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert > > > > > <bernd.schubert@fastmail.fm> wrote: > > > > >> > > > > >> Hi Joanne, > > > > >> > > > > >> On 7/17/24 23:34, Joanne Koong wrote: > > > > >>> There are situations where fuse servers can become unresponsive or take > > > > >>> too long to reply to a request. Currently there is no upper bound on > > > > >>> how long a request may take, which may be frustrating to users who get > > > > >>> stuck waiting for a request to complete. > > > > >>> > > > > >>> This commit adds a daemon timeout option (in seconds) for fuse requests. > > > > >>> If the timeout elapses before the request is replied to, the request will > > > > >>> fail with -ETIME. > > > > >>> > > > > >>> There are 3 possibilities for a request that times out: > > > > >>> a) The request times out before the request has been sent to userspace > > > > >>> b) The request times out after the request has been sent to userspace > > > > >>> and before it receives a reply from the server > > > > >>> c) The request times out after the request has been sent to userspace > > > > >>> and the server replies while the kernel is timing out the request > > > > >>> > > > > >>> Proper synchronization must be added to ensure that the request is > > > > >>> handled correctly in all of these cases. To this effect, there is a new > > > > >>> FR_PROCESSING bit added to the request flags, which is set atomically by > > > > >>> either the timeout handler (see fuse_request_timeout()) which is invoked > > > > >>> after the request timeout elapses or set by the request reply handler > > > > >>> (see dev_do_write()), whichever gets there first. > > > > >>> > > > > >>> If the reply handler and the timeout handler are executing simultaneously > > > > >>> and the reply handler sets FR_PROCESSING before the timeout handler, then > > > > >>> the request is re-queued onto the waitqueue and the kernel will process the > > > > >>> reply as though the timeout did not elapse. If the timeout handler sets > > > > >>> FR_PROCESSING before the reply handler, then the request will fail with > > > > >>> -ETIME and the request will be cleaned up. > > > > >>> > > > > >>> Proper acquires on the request reference must be added to ensure that the > > > > >>> timeout handler does not drop the last refcount on the request while the > > > > >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is > > > > >>> still accessing the request. (By "forwarder handler", this is the handler > > > > >>> that forwards the request to userspace). > > > > >>> > > > > >>> Currently, this is the lifecycle of the request refcount: > > > > >>> > > > > >>> Request is created: > > > > >>> fuse_simple_request -> allocates request, sets refcount to 1 > > > > >>> __fuse_request_send -> acquires refcount > > > > >>> queues request and waits for reply... > > > > >>> fuse_simple_request -> drops refcount > > > > >>> > > > > >>> Request is freed: > > > > >>> fuse_dev_do_write > > > > >>> fuse_request_end -> drops refcount on request > > > > >>> > > > > >>> The timeout handler drops the refcount on the request so that the > > > > >>> request is properly cleaned up if a reply is never received. Because of > > > > >>> this, both the forwarder handler and the reply handler must acquire a refcount > > > > >>> on the request while it accesses the request, and the refcount must be > > > > >>> acquired while the lock of the list the request is on is held. > > > > >>> > > > > >>> There is a potential race if the request is being forwarded to > > > > >>> userspace while the timeout handler is executing (eg FR_PENDING has > > > > >>> already been cleared but dev_do_read() hasn't finished executing). This > > > > >>> is a problem because this would free the request but the request has not > > > > >>> been removed from the fpq list it's on. To prevent this, dev_do_read() > > > > >>> must check FR_PROCESSING at the end of its logic and remove the request > > > > >>> from the fpq list if the timeout occurred. > > > > >>> > > > > >>> There is also the case where the connection may be aborted or the > > > > >>> device may be released while the timeout handler is running. To protect > > > > >>> against an extra refcount drop on the request, the timeout handler > > > > >>> checks the connected state of the list and lets the abort handler drop the > > > > >>> last reference if the abort is running simultaneously. Similarly, the > > > > >>> timeout handler also needs to check if the req->out.h.error is set to > > > > >>> -ESTALE, which indicates that the device release is cleaning up the > > > > >>> request. In both these cases, the timeout handler will return without > > > > >>> dropping the refcount. > > > > >>> > > > > >>> Please also note that background requests are not applicable for timeouts > > > > >>> since they are asynchronous. > > > > >> > > > > >> > > > > >> This and that thread here actually make me wonder if this is the right > > > > >> approach > > > > >> > > > > >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/ > > > > >> > > > > >> > > > > >> In th3 thread above a request got interrupted, but fuse-server still > > > > >> does not manage stop it. From my point of view, interrupting a request > > > > >> suggests to add a rather short kernel lifetime for it. With that one > > > > > > > > > > Hi Bernd, > > > > > > > > > > I believe this solution fixes the problem outlined in that thread > > > > > (namely, that the process gets stuck waiting for a reply). If the > > > > > request is interrupted before it times out, the kernel will wait with > > > > > a timeout again on the request (timeout would start over, but the > > > > > request will still eventually sooner or later time out). I'm not sure > > > > > I agree that we want to cancel the request altogether if it's > > > > > interrupted. For example, if the user uses the user-defined signal > > > > > SIGUSR1, it can be unexpected and arbitrary behavior for the request > > > > > to be aborted by the kernel. I also don't think this can be consistent > > > > > for what the fuse server will see since some requests may have already > > > > > been forwarded to userspace when the request is aborted and some > > > > > requests may not have. > > > > > > > > > > I think if we were to enforce that the request should be aborted when > > > > > it's interrupted regardless of whether a timeout is specified or not, > > > > > then we should do it similarly to how the timeout handler logic > > > > > handles it in this patch,rather than the implementation in the thread > > > > > linked above (namely, that the request should be explicitly cleaned up > > > > > immediately instead of when the interrupt request sends a reply); I > > > > > don't believe the implementation in the link handles the case where > > > > > for example the fuse server is in a deadlock and does not reply to the > > > > > interrupt request. Also, as I understand it, it is optional for > > > > > servers to reply or not to the interrupt request. > > > > > > > > Hi Joanne, > > > > > > > > yeah, the solution in the link above is definitely not ideal and I think > > > > a timout based solution would be better. But I think your patch wouldn't > > > > work either right now, unless server side sets a request timeout. > > > > Btw, I would rename the variable 'daemon_timeout' to somethink like > > > > req_timeout. > > > > > > > Hi Bernd, > > > > > > I think we need to figure out if we indeed want the kernel to abort > > > interrupted requests if no request timeout was explicitly set by the > > > server. I'm leaning towards no, for the reasons in my previous reply; > > > in addition to that I'm also not sure if we would be potentially > > > breaking existing filesystems if we introduced this new behavior. > > > Curious to hear your and others' thoughts on this. > > > > > > (Btw, if we did want to add this in, i think the change would be > > > actually pretty simple. We could pretty much just reuse all the logic > > > that's implemented in the timeout handling code. It's very much the > > > same scenario (request getting aborted and needing to protect against > > > races with different handlers)) > > > > > > I will rename daemon_timeout to req_timeout in v2. Thanks for the suggestion. > > > > > > > > > > > > >> either needs to wake up in intervals and check if request timeout got > > > > >> exceeded or it needs to be an async kernel thread. I think that async > > > > >> thread would also allow to give a timeout to background requests. > > > > > > > > > > in my opinion, background requests do not need timeouts. As I > > > > > understand it, background requests are used only for direct i/o async > > > > > read/writes, writing back dirty pages,and readahead requests generated > > > > > by the kernel. I don't think fuse servers would have a need for timing > > > > > out background requests. > > > > > > > > There is another discussion here, where timeouts are a possible although > > > > ugly solution to avoid page copies > > > > > > > > https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@fastmail.fm/T/ > > > > > > > Thanks for the link, it's an interesting read. > > > > > > > > > > > That is the bg writeback code path. > > > > > > > > > > > > > >> > > > > >> Or we add an async timeout to bg and interupted requests additionally? > > > > > > > > > > The interrupted request will already have a timeout on it since it > > > > > waits with a timeout again for the reply after it's interrupted. > > > > > > > > If daemon side configures timeouts. And interrupted requests might want > > > > to have a different timeout. I will check when I'm back if we can update > > > > your patch a bit for that. > > > > > > > > Your patch hooks in quite nicely and basically without overhead into fg > > > > (sync) requests. Timing out bg requests will have a bit overhead (unless > > > > I miss something), so maybe we need two solutions here. And if we want > > > > to go that route at all, to avoid these extra fuse page copies. > > > > > > > Agreed, I think if we do decide to go down this route, it seems > > > cleaner to me to have the background request timeouts handled > > > separately. Maybe something like having a timer per batch (where > > > "batch" is the group of requests that get flushed at the same time)? > > > That seems to me like the approach with the least overhead. > > > > > > > I don't want to have a bunch of different timeouts, we should just have one and > > have consistent behavior across all classes of requests. > > > > I think the only thing we should have that is "separate" is a way to set request > > timeouts that aren't set by the daemon itself. Administrators should be able to > > set a per-mount timeout via sysfs/algo in order to have some sort of control > > over possibly malicious FUSE file systems. > > > > But that should just tie into whatever mechanism you come up with, and > > everything should all behave consistently with that timeout. Thanks, > > > > Josef > > To summarize this thread so far, there are 2 open questions: > 1) should interrupted requests be automatically aborted/cancelled by > default (even if no timeout is set)? > 2) should background requests also have some timeout enforced on them? Yes I think background requests should have a timeout enforced on them if it's set. Page writeout is actually one of the bigger problems because stuff will just hang forever, like if you hit sync or something (which a lot of applications do). For #1 I have to think some more and look at what the mechanics/expectations of those requests are, but if it's a thing you can leave for a follup them that sounds good. Additionally I think leaving the extra page copy thing as future work once this work is done is the best bet. Miklos are you around? We've had a few different patches/discussions going on. I assume you are/have been on summer vacation. Thanks, Josef
On Mon, Jul 22, 2024 at 1:52 PM Josef Bacik <josef@toxicpanda.com> wrote: > > On Mon, Jul 22, 2024 at 11:58:03AM -0700, Joanne Koong wrote: > > On Fri, Jul 19, 2024 at 6:06 AM Josef Bacik <josef@toxicpanda.com> wrote: > > > > > > On Thu, Jul 18, 2024 at 05:37:06PM -0700, Joanne Koong wrote: > > > > On Thu, Jul 18, 2024 at 3:11 PM Bernd Schubert > > > > <bs_lists@aakef.fastmail.fm> wrote: > > > > > > > > > > > > > > > > > > > > On 7/18/24 07:24, Joanne Koong wrote: > > > > > > On Wed, Jul 17, 2024 at 3:23 PM Bernd Schubert > > > > > > <bernd.schubert@fastmail.fm> wrote: > > > > > >> > > > > > >> Hi Joanne, > > > > > >> > > > > > >> On 7/17/24 23:34, Joanne Koong wrote: > > > > > >>> There are situations where fuse servers can become unresponsive or take > > > > > >>> too long to reply to a request. Currently there is no upper bound on > > > > > >>> how long a request may take, which may be frustrating to users who get > > > > > >>> stuck waiting for a request to complete. > > > > > >>> > > > > > >>> This commit adds a daemon timeout option (in seconds) for fuse requests. > > > > > >>> If the timeout elapses before the request is replied to, the request will > > > > > >>> fail with -ETIME. > > > > > >>> > > > > > >>> There are 3 possibilities for a request that times out: > > > > > >>> a) The request times out before the request has been sent to userspace > > > > > >>> b) The request times out after the request has been sent to userspace > > > > > >>> and before it receives a reply from the server > > > > > >>> c) The request times out after the request has been sent to userspace > > > > > >>> and the server replies while the kernel is timing out the request > > > > > >>> > > > > > >>> Proper synchronization must be added to ensure that the request is > > > > > >>> handled correctly in all of these cases. To this effect, there is a new > > > > > >>> FR_PROCESSING bit added to the request flags, which is set atomically by > > > > > >>> either the timeout handler (see fuse_request_timeout()) which is invoked > > > > > >>> after the request timeout elapses or set by the request reply handler > > > > > >>> (see dev_do_write()), whichever gets there first. > > > > > >>> > > > > > >>> If the reply handler and the timeout handler are executing simultaneously > > > > > >>> and the reply handler sets FR_PROCESSING before the timeout handler, then > > > > > >>> the request is re-queued onto the waitqueue and the kernel will process the > > > > > >>> reply as though the timeout did not elapse. If the timeout handler sets > > > > > >>> FR_PROCESSING before the reply handler, then the request will fail with > > > > > >>> -ETIME and the request will be cleaned up. > > > > > >>> > > > > > >>> Proper acquires on the request reference must be added to ensure that the > > > > > >>> timeout handler does not drop the last refcount on the request while the > > > > > >>> reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is > > > > > >>> still accessing the request. (By "forwarder handler", this is the handler > > > > > >>> that forwards the request to userspace). > > > > > >>> > > > > > >>> Currently, this is the lifecycle of the request refcount: > > > > > >>> > > > > > >>> Request is created: > > > > > >>> fuse_simple_request -> allocates request, sets refcount to 1 > > > > > >>> __fuse_request_send -> acquires refcount > > > > > >>> queues request and waits for reply... > > > > > >>> fuse_simple_request -> drops refcount > > > > > >>> > > > > > >>> Request is freed: > > > > > >>> fuse_dev_do_write > > > > > >>> fuse_request_end -> drops refcount on request > > > > > >>> > > > > > >>> The timeout handler drops the refcount on the request so that the > > > > > >>> request is properly cleaned up if a reply is never received. Because of > > > > > >>> this, both the forwarder handler and the reply handler must acquire a refcount > > > > > >>> on the request while it accesses the request, and the refcount must be > > > > > >>> acquired while the lock of the list the request is on is held. > > > > > >>> > > > > > >>> There is a potential race if the request is being forwarded to > > > > > >>> userspace while the timeout handler is executing (eg FR_PENDING has > > > > > >>> already been cleared but dev_do_read() hasn't finished executing). This > > > > > >>> is a problem because this would free the request but the request has not > > > > > >>> been removed from the fpq list it's on. To prevent this, dev_do_read() > > > > > >>> must check FR_PROCESSING at the end of its logic and remove the request > > > > > >>> from the fpq list if the timeout occurred. > > > > > >>> > > > > > >>> There is also the case where the connection may be aborted or the > > > > > >>> device may be released while the timeout handler is running. To protect > > > > > >>> against an extra refcount drop on the request, the timeout handler > > > > > >>> checks the connected state of the list and lets the abort handler drop the > > > > > >>> last reference if the abort is running simultaneously. Similarly, the > > > > > >>> timeout handler also needs to check if the req->out.h.error is set to > > > > > >>> -ESTALE, which indicates that the device release is cleaning up the > > > > > >>> request. In both these cases, the timeout handler will return without > > > > > >>> dropping the refcount. > > > > > >>> > > > > > >>> Please also note that background requests are not applicable for timeouts > > > > > >>> since they are asynchronous. > > > > > >> > > > > > >> > > > > > >> This and that thread here actually make me wonder if this is the right > > > > > >> approach > > > > > >> > > > > > >> https://lore.kernel.org/lkml/20240613040147.329220-1-haifeng.xu@shopee.com/T/ > > > > > >> > > > > > >> > > > > > >> In th3 thread above a request got interrupted, but fuse-server still > > > > > >> does not manage stop it. From my point of view, interrupting a request > > > > > >> suggests to add a rather short kernel lifetime for it. With that one > > > > > > > > > > > > Hi Bernd, > > > > > > > > > > > > I believe this solution fixes the problem outlined in that thread > > > > > > (namely, that the process gets stuck waiting for a reply). If the > > > > > > request is interrupted before it times out, the kernel will wait with > > > > > > a timeout again on the request (timeout would start over, but the > > > > > > request will still eventually sooner or later time out). I'm not sure > > > > > > I agree that we want to cancel the request altogether if it's > > > > > > interrupted. For example, if the user uses the user-defined signal > > > > > > SIGUSR1, it can be unexpected and arbitrary behavior for the request > > > > > > to be aborted by the kernel. I also don't think this can be consistent > > > > > > for what the fuse server will see since some requests may have already > > > > > > been forwarded to userspace when the request is aborted and some > > > > > > requests may not have. > > > > > > > > > > > > I think if we were to enforce that the request should be aborted when > > > > > > it's interrupted regardless of whether a timeout is specified or not, > > > > > > then we should do it similarly to how the timeout handler logic > > > > > > handles it in this patch,rather than the implementation in the thread > > > > > > linked above (namely, that the request should be explicitly cleaned up > > > > > > immediately instead of when the interrupt request sends a reply); I > > > > > > don't believe the implementation in the link handles the case where > > > > > > for example the fuse server is in a deadlock and does not reply to the > > > > > > interrupt request. Also, as I understand it, it is optional for > > > > > > servers to reply or not to the interrupt request. > > > > > > > > > > Hi Joanne, > > > > > > > > > > yeah, the solution in the link above is definitely not ideal and I think > > > > > a timout based solution would be better. But I think your patch wouldn't > > > > > work either right now, unless server side sets a request timeout. > > > > > Btw, I would rename the variable 'daemon_timeout' to somethink like > > > > > req_timeout. > > > > > > > > > Hi Bernd, > > > > > > > > I think we need to figure out if we indeed want the kernel to abort > > > > interrupted requests if no request timeout was explicitly set by the > > > > server. I'm leaning towards no, for the reasons in my previous reply; > > > > in addition to that I'm also not sure if we would be potentially > > > > breaking existing filesystems if we introduced this new behavior. > > > > Curious to hear your and others' thoughts on this. > > > > > > > > (Btw, if we did want to add this in, i think the change would be > > > > actually pretty simple. We could pretty much just reuse all the logic > > > > that's implemented in the timeout handling code. It's very much the > > > > same scenario (request getting aborted and needing to protect against > > > > races with different handlers)) > > > > > > > > I will rename daemon_timeout to req_timeout in v2. Thanks for the suggestion. > > > > > > > > > > > > > > > >> either needs to wake up in intervals and check if request timeout got > > > > > >> exceeded or it needs to be an async kernel thread. I think that async > > > > > >> thread would also allow to give a timeout to background requests. > > > > > > > > > > > > in my opinion, background requests do not need timeouts. As I > > > > > > understand it, background requests are used only for direct i/o async > > > > > > read/writes, writing back dirty pages,and readahead requests generated > > > > > > by the kernel. I don't think fuse servers would have a need for timing > > > > > > out background requests. > > > > > > > > > > There is another discussion here, where timeouts are a possible although > > > > > ugly solution to avoid page copies > > > > > > > > > > https://lore.kernel.org/linux-kernel/233a9fdf-13ea-488b-a593-5566fc9f5d92@fastmail.fm/T/ > > > > > > > > > Thanks for the link, it's an interesting read. > > > > > > > > > > > > > > That is the bg writeback code path. > > > > > > > > > > > > > > > > >> > > > > > >> Or we add an async timeout to bg and interupted requests additionally? > > > > > > > > > > > > The interrupted request will already have a timeout on it since it > > > > > > waits with a timeout again for the reply after it's interrupted. > > > > > > > > > > If daemon side configures timeouts. And interrupted requests might want > > > > > to have a different timeout. I will check when I'm back if we can update > > > > > your patch a bit for that. > > > > > > > > > > Your patch hooks in quite nicely and basically without overhead into fg > > > > > (sync) requests. Timing out bg requests will have a bit overhead (unless > > > > > I miss something), so maybe we need two solutions here. And if we want > > > > > to go that route at all, to avoid these extra fuse page copies. > > > > > > > > > Agreed, I think if we do decide to go down this route, it seems > > > > cleaner to me to have the background request timeouts handled > > > > separately. Maybe something like having a timer per batch (where > > > > "batch" is the group of requests that get flushed at the same time)? > > > > That seems to me like the approach with the least overhead. > > > > > > > > > > I don't want to have a bunch of different timeouts, we should just have one and > > > have consistent behavior across all classes of requests. > > > > > > I think the only thing we should have that is "separate" is a way to set request > > > timeouts that aren't set by the daemon itself. Administrators should be able to > > > set a per-mount timeout via sysfs/algo in order to have some sort of control > > > over possibly malicious FUSE file systems. > > > > > > But that should just tie into whatever mechanism you come up with, and > > > everything should all behave consistently with that timeout. Thanks, > > > > > > Josef > > > > To summarize this thread so far, there are 2 open questions: > > 1) should interrupted requests be automatically aborted/cancelled by > > default (even if no timeout is set)? > > 2) should background requests also have some timeout enforced on them? > > Yes I think background requests should have a timeout enforced on them if it's > set. Page writeout is actually one of the bigger problems because stuff will > just hang forever, like if you hit sync or something (which a lot of > applications do). > Sounds good, I will add in timeouts for background requests for v2. > For #1 I have to think some more and look at what the mechanics/expectations of > those requests are, but if it's a thing you can leave for a follup them that > sounds good. > > Additionally I think leaving the extra page copy thing as future work once this > work is done is the best bet. > > Miklos are you around? We've had a few different patches/discussions going on. > I assume you are/have been on summer vacation. Thanks, > > Josef
On Thu, Jul 18, 2024 at 9:58 AM Joanne Koong <joannelkoong@gmail.com> wrote: > > On Thu, Jul 18, 2024 at 7:44 AM Josef Bacik <josef@toxicpanda.com> wrote: > > > > On Wed, Jul 17, 2024 at 02:34:58PM -0700, Joanne Koong wrote: > ... > > > --- > > > fs/fuse/dev.c | 177 ++++++++++++++++++++++++++++++++++++++++++++--- > > > fs/fuse/fuse_i.h | 12 ++++ > > > fs/fuse/inode.c | 7 ++ > > > 3 files changed, 188 insertions(+), 8 deletions(-) > > > > > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > > > index 9eb191b5c4de..7dd7b244951b 100644 > > > --- a/fs/fuse/dev.c > > > +++ b/fs/fuse/dev.c > > > @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req) > > > } > > > EXPORT_SYMBOL_GPL(fuse_request_end); > > > > > > +/* fuse_request_end for requests that timeout */ > > > +static void fuse_request_timeout(struct fuse_req *req) > > > +{ > > > + struct fuse_conn *fc = req->fm->fc; > > > + struct fuse_iqueue *fiq = &fc->iq; > > > + struct fuse_pqueue *fpq; > > > + > > > + spin_lock(&fiq->lock); > > > + if (!fiq->connected) { > > > + spin_unlock(&fiq->lock); > > > + /* > > > + * Connection is being aborted. The abort will release > > > + * the refcount on the request > > > + */ > > > + req->out.h.error = -ECONNABORTED; > > > + return; > > > + } > > > + if (test_bit(FR_PENDING, &req->flags)) { > > > + /* Request is not yet in userspace, bail out */ > > > + list_del(&req->list); > > > + spin_unlock(&fiq->lock); > > > + req->out.h.error = -ETIME; > > > + __fuse_put_request(req); > > > > Why is this safe? We could be the last holder of the reference on this request > > correct? The only places using __fuse_put_request() would be where we are in a > > path where the caller already holds a reference on the request. Since this is > > async it may not be the case right? > > > > If it is safe then it's just confusing and warrants a comment. > > > > There is always a refcount still held on the request by > fuse_simple_request() when this is called. I'll add a comment about > this. > I also just noticed that I use fuse_put_request() at the end of this > function, I'll change that to __fuse_put_request() as well. > > > > + return; > > > + } > > > + if (test_bit(FR_INTERRUPTED, &req->flags)) > > > + list_del_init(&req->intr_entry); > > > + > > > + fpq = req->fpq; > > > + spin_unlock(&fiq->lock); > > > + > > > + if (fpq) { > > > + spin_lock(&fpq->lock); > > > + if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) { > > ^^ > > > > You don't need the extra () there. > > > > > + spin_unlock(&fpq->lock); > > > + /* > > > + * Connection is being aborted. The abort will release > > > + * the refcount on the request > > > + */ > > > + req->out.h.error = -ECONNABORTED; > > > + return; > > > + } > > > + if (req->out.h.error == -ESTALE) { > > > + /* > > > + * Device is being released. The fuse_dev_release call > > > + * will drop the refcount on the request > > > + */ > > > + spin_unlock(&fpq->lock); > > > + return; > > > + } > > > + if (!test_bit(FR_PRIVATE, &req->flags)) > > > + list_del_init(&req->list); > > > + spin_unlock(&fpq->lock); > > > + } > > > + > > > + req->out.h.error = -ETIME; > > > + > > > + if (test_bit(FR_ASYNC, &req->flags)) > > > + req->args->end(req->fm, req->args, req->out.h.error); > > > + > > > + fuse_put_request(req); > > > +} > > > > Just a general styling thing, we have two different states for requests here, > > PENDING and !PENDING correct? I think it may be better to do something like > > > > if (test_bit(FR_PENDING, &req->flags)) > > timeout_pending_req(); > > else > > timeout_inflight_req(); > > > > and then in timeout_pending_req() you do > > > > spin_lock(&fiq->lock); > > if (!test_bit(FR_PENDING, &req->flags)) { > > spin_unlock(&fiq_lock); > > timeout_inflight_req(); > > return; > > } > > > > This will keep the two different state cleanup functions separate and a little > > cleaner to grok. > > > Thanks for the suggestion, I will make this change for v2. > > > + > > > static int queue_interrupt(struct fuse_req *req) > > > { > > > struct fuse_iqueue *fiq = &req->fm->fc->iq; > > > @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req) > > > return 0; > > > } > > > > > > +enum wait_type { > > > + WAIT_TYPE_INTERRUPTIBLE, > > > + WAIT_TYPE_KILLABLE, > > > + WAIT_TYPE_NONINTERRUPTIBLE, > > > +}; > > > + > > > +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req) > > > +{ > > > + struct fuse_conn *fc = req->fm->fc; > > > + > > > + return wait_event_interruptible_timeout(req->waitq, > > > + test_bit(FR_FINISHED, > > > + &req->flags), > > > + fc->daemon_timeout); > > > +} > > > +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO); > > > + > > > +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type) > > > +{ > > > + struct fuse_conn *fc = req->fm->fc; > > > + int err; > > > + > > > +wait_answer_start: > > > + if (type == WAIT_TYPE_INTERRUPTIBLE) > > > + err = fuse_wait_event_interruptible_timeout(req); > > > + else if (type == WAIT_TYPE_KILLABLE) > > > + err = wait_event_killable_timeout(req->waitq, > > > + test_bit(FR_FINISHED, &req->flags), > > > + fc->daemon_timeout); > > > + > > > + else if (type == WAIT_TYPE_NONINTERRUPTIBLE) > > > + err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags), > > > + fc->daemon_timeout); > > > + else > > > + WARN_ON(1); > > > > This will leak some random value for err, so initialize err to something that > > will be dealt with, like -EINVAL; > > > > > + > > > + /* request was answered */ > > > + if (err > 0) > > > + return 0; > > > + > > > + /* request was not answered in time */ > > > + if (err == 0) { > > > + if (test_and_set_bit(FR_PROCESSING, &req->flags)) > > > + /* request reply is being processed by kernel right now. > > > + * we should wait for the answer. > > > + */ > > > > Format for multiline comments is > > > > /* > > * blah > > * blah > > */ > > > > and since this is a 1 line if statement put it above the if statement. > > > > > + goto wait_answer_start; > > > + > > > + fuse_request_timeout(req); > > > + return 0; > > > + } > > > + > > > + /* else request was interrupted */ > > > + return err; > > > +} > > > + > > > static void request_wait_answer(struct fuse_req *req) > > > { > > > struct fuse_conn *fc = req->fm->fc; > > > @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req) > > > > > > if (!fc->no_interrupt) { > > > /* Any signal may interrupt this */ > > > - err = wait_event_interruptible(req->waitq, > > > - test_bit(FR_FINISHED, &req->flags)); > > > + if (fc->daemon_timeout) > > > + err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE); > > > + else > > > + err = wait_event_interruptible(req->waitq, > > > + test_bit(FR_FINISHED, &req->flags)); > > > if (!err) > > > return; > > > > > > @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req) > > > > > > if (!test_bit(FR_FORCE, &req->flags)) { > > > /* Only fatal signals may interrupt this */ > > > - err = wait_event_killable(req->waitq, > > > - test_bit(FR_FINISHED, &req->flags)); > > > + if (fc->daemon_timeout) > > > + err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE); > > > + else > > > + err = wait_event_killable(req->waitq, > > > + test_bit(FR_FINISHED, &req->flags)); > > > if (!err) > > > return; > > > > > > @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req) > > > * Either request is already in userspace, or it was forced. > > > * Wait it out. > > > */ > > > - wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > > > + if (fc->daemon_timeout) > > > + wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE); > > > + else > > > + wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); > > > } > > > > > > static void __fuse_request_send(struct fuse_req *req) > > > @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > > > req = list_entry(fiq->pending.next, struct fuse_req, list); > > > clear_bit(FR_PENDING, &req->flags); > > > list_del_init(&req->list); > > > + /* Acquire a reference since fuse_request_timeout may also be executing */ > > > + __fuse_get_request(req); > > > + req->fpq = fpq; > > > spin_unlock(&fiq->lock); > > > > > > > There's a race here with completion. If we timeout a request right here, we can > > end up sending that same request below. > > I don't think there's any way around this unless we take the fpq lock > while we do the fuse_copy stuff, because even if we check the > FR_PROCESSING bit, the timeout handler could start running after the > fpq lock is released when we do the fuse_copy calls. > > In my point of view, I don't think this race matters. We could have > this situation happen on a regular timed-out request. For example, we > send out a request to userspace and if the server takes too long to > reply, the request is cancelled/invalidated in the kernel but the > server will still see the request anyways. > > WDYT? > > > > > You are going to need to check > > > > test_bit(FR_PROCESSING) > > > > after you take the fpq->lock just below here to make sure you didn't race with > > the timeout handler and time the request out already. > > > > > args = req->args; > > > @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > > > if (args->opcode == FUSE_SETXATTR) > > > req->out.h.error = -E2BIG; > > > fuse_request_end(req); > > > + fuse_put_request(req); > > > goto restart; > > > } > > > spin_lock(&fpq->lock); > > > @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, > > > } > > > hash = fuse_req_hash(req->in.h.unique); > > > list_move_tail(&req->list, &fpq->processing[hash]); > > > - __fuse_get_request(req); > > > set_bit(FR_SENT, &req->flags); > > > spin_unlock(&fpq->lock); > > > /* matches barrier in request_wait_answer() */ > > > smp_mb__after_atomic(); > > > if (test_bit(FR_INTERRUPTED, &req->flags)) > > > queue_interrupt(req); > > > + > > > + /* Check if request timed out */ > > > + if (test_bit(FR_PROCESSING, &req->flags)) { > > > + spin_lock(&fpq->lock); > > > + if (!test_bit(FR_PRIVATE, &req->flags)) > > > + list_del_init(&req->list); > > > + spin_unlock(&fpq->lock); > > > + fuse_put_request(req); > > > + return -ETIME; > > > + } > > > > This isn't quite right, we could have FR_PROCESSING set because we completed the > > request before we got here. If you put a schedule_timeout(HZ); right above this > > you could easily see where a request gets completed by userspace, but now you've > > fimed it out. > > Oh I see, you're talking about the race where a request is replied to > immediately after the fuse_copy calls and before this gets called. > Then when we get here, we can't differentiate between whether > FR_PROCESSING was set by the timeout handler or the reply handler. > > I think the simplest way around this is to check if the FR_SENT flag > was cleared (the reply handler clears it while holding the fpq lock > where FR_PROCESSING gets set and the timeout handler doesn't clear > it), then return -ETIME if it wasn't and 0 if it was. > > I'll add this into v2. > > > > > Additionally if we have FR_PROCESSING set from the timeout, shouldn't this > > cleanup have been done already? I don't understand why we need to handle this > > here, we should just return and whoever is waiting on the request will get the > > error. > > In most cases yes, but there is a race where the timeout handler may > finish executing before the logic in dev_do_read that adds the request > to the fpq lists. If this happens, the freed request will remain on > the list. > > i think this race currently exists prior to these changes as well - amendment: this statement is not accurate. In the existing code, there is no race between the reply handler and dev_do_read, because the reply handler can only handle the request once the request is on the fpq->processing list. (We do need to account for this race with the timeout handler though since the timeout handler can get triggered at any time). Also, while working on v2 I noticed we also need to handle races between the timeout handler and requests being re-sent (fuse_resend()). This will get addressed in v2. > in the case you mentioned above where the request may have been > completed right after the fuse_copy calls in dev_do_read and before > dev_do_read moves the request to the fpq lists. We would get into the > same situation with a freed request still on the list. > > > Thanks, > Joanne > > > > Thanks, > > > > Josef
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 9eb191b5c4de..7dd7b244951b 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -331,6 +331,69 @@ void fuse_request_end(struct fuse_req *req) } EXPORT_SYMBOL_GPL(fuse_request_end); +/* fuse_request_end for requests that timeout */ +static void fuse_request_timeout(struct fuse_req *req) +{ + struct fuse_conn *fc = req->fm->fc; + struct fuse_iqueue *fiq = &fc->iq; + struct fuse_pqueue *fpq; + + spin_lock(&fiq->lock); + if (!fiq->connected) { + spin_unlock(&fiq->lock); + /* + * Connection is being aborted. The abort will release + * the refcount on the request + */ + req->out.h.error = -ECONNABORTED; + return; + } + if (test_bit(FR_PENDING, &req->flags)) { + /* Request is not yet in userspace, bail out */ + list_del(&req->list); + spin_unlock(&fiq->lock); + req->out.h.error = -ETIME; + __fuse_put_request(req); + return; + } + if (test_bit(FR_INTERRUPTED, &req->flags)) + list_del_init(&req->intr_entry); + + fpq = req->fpq; + spin_unlock(&fiq->lock); + + if (fpq) { + spin_lock(&fpq->lock); + if (!fpq->connected && (!test_bit(FR_PRIVATE, &req->flags))) { + spin_unlock(&fpq->lock); + /* + * Connection is being aborted. The abort will release + * the refcount on the request + */ + req->out.h.error = -ECONNABORTED; + return; + } + if (req->out.h.error == -ESTALE) { + /* + * Device is being released. The fuse_dev_release call + * will drop the refcount on the request + */ + spin_unlock(&fpq->lock); + return; + } + if (!test_bit(FR_PRIVATE, &req->flags)) + list_del_init(&req->list); + spin_unlock(&fpq->lock); + } + + req->out.h.error = -ETIME; + + if (test_bit(FR_ASYNC, &req->flags)) + req->args->end(req->fm, req->args, req->out.h.error); + + fuse_put_request(req); +} + static int queue_interrupt(struct fuse_req *req) { struct fuse_iqueue *fiq = &req->fm->fc->iq; @@ -361,6 +424,62 @@ static int queue_interrupt(struct fuse_req *req) return 0; } +enum wait_type { + WAIT_TYPE_INTERRUPTIBLE, + WAIT_TYPE_KILLABLE, + WAIT_TYPE_NONINTERRUPTIBLE, +}; + +static int fuse_wait_event_interruptible_timeout(struct fuse_req *req) +{ + struct fuse_conn *fc = req->fm->fc; + + return wait_event_interruptible_timeout(req->waitq, + test_bit(FR_FINISHED, + &req->flags), + fc->daemon_timeout); +} +ALLOW_ERROR_INJECTION(fuse_wait_event_interruptible_timeout, ERRNO); + +static int wait_answer_timeout(struct fuse_req *req, enum wait_type type) +{ + struct fuse_conn *fc = req->fm->fc; + int err; + +wait_answer_start: + if (type == WAIT_TYPE_INTERRUPTIBLE) + err = fuse_wait_event_interruptible_timeout(req); + else if (type == WAIT_TYPE_KILLABLE) + err = wait_event_killable_timeout(req->waitq, + test_bit(FR_FINISHED, &req->flags), + fc->daemon_timeout); + + else if (type == WAIT_TYPE_NONINTERRUPTIBLE) + err = wait_event_timeout(req->waitq, test_bit(FR_FINISHED, &req->flags), + fc->daemon_timeout); + else + WARN_ON(1); + + /* request was answered */ + if (err > 0) + return 0; + + /* request was not answered in time */ + if (err == 0) { + if (test_and_set_bit(FR_PROCESSING, &req->flags)) + /* request reply is being processed by kernel right now. + * we should wait for the answer. + */ + goto wait_answer_start; + + fuse_request_timeout(req); + return 0; + } + + /* else request was interrupted */ + return err; +} + static void request_wait_answer(struct fuse_req *req) { struct fuse_conn *fc = req->fm->fc; @@ -369,8 +488,11 @@ static void request_wait_answer(struct fuse_req *req) if (!fc->no_interrupt) { /* Any signal may interrupt this */ - err = wait_event_interruptible(req->waitq, - test_bit(FR_FINISHED, &req->flags)); + if (fc->daemon_timeout) + err = wait_answer_timeout(req, WAIT_TYPE_INTERRUPTIBLE); + else + err = wait_event_interruptible(req->waitq, + test_bit(FR_FINISHED, &req->flags)); if (!err) return; @@ -383,8 +505,11 @@ static void request_wait_answer(struct fuse_req *req) if (!test_bit(FR_FORCE, &req->flags)) { /* Only fatal signals may interrupt this */ - err = wait_event_killable(req->waitq, - test_bit(FR_FINISHED, &req->flags)); + if (fc->daemon_timeout) + err = wait_answer_timeout(req, WAIT_TYPE_KILLABLE); + else + err = wait_event_killable(req->waitq, + test_bit(FR_FINISHED, &req->flags)); if (!err) return; @@ -404,7 +529,10 @@ static void request_wait_answer(struct fuse_req *req) * Either request is already in userspace, or it was forced. * Wait it out. */ - wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); + if (fc->daemon_timeout) + wait_answer_timeout(req, WAIT_TYPE_NONINTERRUPTIBLE); + else + wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags)); } static void __fuse_request_send(struct fuse_req *req) @@ -1268,6 +1396,9 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, req = list_entry(fiq->pending.next, struct fuse_req, list); clear_bit(FR_PENDING, &req->flags); list_del_init(&req->list); + /* Acquire a reference since fuse_request_timeout may also be executing */ + __fuse_get_request(req); + req->fpq = fpq; spin_unlock(&fiq->lock); args = req->args; @@ -1280,6 +1411,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, if (args->opcode == FUSE_SETXATTR) req->out.h.error = -E2BIG; fuse_request_end(req); + fuse_put_request(req); goto restart; } spin_lock(&fpq->lock); @@ -1316,13 +1448,23 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, } hash = fuse_req_hash(req->in.h.unique); list_move_tail(&req->list, &fpq->processing[hash]); - __fuse_get_request(req); set_bit(FR_SENT, &req->flags); spin_unlock(&fpq->lock); /* matches barrier in request_wait_answer() */ smp_mb__after_atomic(); if (test_bit(FR_INTERRUPTED, &req->flags)) queue_interrupt(req); + + /* Check if request timed out */ + if (test_bit(FR_PROCESSING, &req->flags)) { + spin_lock(&fpq->lock); + if (!test_bit(FR_PRIVATE, &req->flags)) + list_del_init(&req->list); + spin_unlock(&fpq->lock); + fuse_put_request(req); + return -ETIME; + } + fuse_put_request(req); return reqsize; @@ -1332,6 +1474,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, list_del_init(&req->list); spin_unlock(&fpq->lock); fuse_request_end(req); + fuse_put_request(req); return err; err_unlock: @@ -1951,9 +2094,10 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, goto copy_finish; } + __fuse_get_request(req); + /* Is it an interrupt reply ID? */ if (oh.unique & FUSE_INT_REQ_BIT) { - __fuse_get_request(req); spin_unlock(&fpq->lock); err = 0; @@ -1969,6 +2113,13 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, goto copy_finish; } + if (test_and_set_bit(FR_PROCESSING, &req->flags)) { + /* request has timed out already */ + spin_unlock(&fpq->lock); + fuse_put_request(req); + goto copy_finish; + } + clear_bit(FR_SENT, &req->flags); list_move(&req->list, &fpq->io); req->out.h = oh; @@ -1995,6 +2146,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, spin_unlock(&fpq->lock); fuse_request_end(req); + fuse_put_request(req); out: return err ? err : nbytes; @@ -2260,13 +2412,22 @@ int fuse_dev_release(struct inode *inode, struct file *file) if (fud) { struct fuse_conn *fc = fud->fc; struct fuse_pqueue *fpq = &fud->pq; + struct fuse_req *req; LIST_HEAD(to_end); unsigned int i; spin_lock(&fpq->lock); WARN_ON(!list_empty(&fpq->io)); - for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) + for (i = 0; i < FUSE_PQ_HASH_SIZE; i++) { + /* + * Set the req error to -ESTALE so that if the timeout + * handler tries handling it, it knows it's being + * released + */ + list_for_each_entry(req, &fpq->processing[i], list) + req->out.h.error = -ESTALE; list_splice_init(&fpq->processing[i], &to_end); + } spin_unlock(&fpq->lock); end_requests(&to_end); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index f23919610313..cbabebbcd5bd 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -375,6 +375,9 @@ struct fuse_io_priv { * FR_FINISHED: request is finished * FR_PRIVATE: request is on private list * FR_ASYNC: request is asynchronous + * FR_PROCESSING: request is being processed. this gets set when either + * the reply is getting processed or the kernel is processing + * a request timeout */ enum fuse_req_flag { FR_ISREPLY, @@ -389,6 +392,7 @@ enum fuse_req_flag { FR_FINISHED, FR_PRIVATE, FR_ASYNC, + FR_PROCESSING, }; /** @@ -435,6 +439,9 @@ struct fuse_req { /** fuse_mount this request belongs to */ struct fuse_mount *fm; + + /** page queue this request has been added to */ + struct fuse_pqueue *fpq; }; struct fuse_iqueue; @@ -574,6 +581,8 @@ struct fuse_fs_context { enum fuse_dax_mode dax_mode; unsigned int max_read; unsigned int blksize; + /* Daemon timeout (in seconds). 0 = no timeout (infinite wait) */ + unsigned int daemon_timeout; const char *subtype; /* DAX device, may be NULL */ @@ -633,6 +642,9 @@ struct fuse_conn { /** Constrain ->max_pages to this value during feature negotiation */ unsigned int max_pages_limit; + /* Daemon timeout (in jiffies). 0 = no timeout (infinite wait) */ + unsigned long daemon_timeout; + /** Input queue */ struct fuse_iqueue iq; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index 99e44ea7d875..a2d53a8b8e34 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -733,6 +733,7 @@ enum { OPT_ALLOW_OTHER, OPT_MAX_READ, OPT_BLKSIZE, + OPT_DAEMON_TIMEOUT, OPT_ERR }; @@ -747,6 +748,7 @@ static const struct fs_parameter_spec fuse_fs_parameters[] = { fsparam_u32 ("max_read", OPT_MAX_READ), fsparam_u32 ("blksize", OPT_BLKSIZE), fsparam_string ("subtype", OPT_SUBTYPE), + fsparam_u32 ("daemon_timeout", OPT_DAEMON_TIMEOUT), {} }; @@ -830,6 +832,10 @@ static int fuse_parse_param(struct fs_context *fsc, struct fs_parameter *param) ctx->blksize = result.uint_32; break; + case OPT_DAEMON_TIMEOUT: + ctx->daemon_timeout = result.uint_32; + break; + default: return -EINVAL; } @@ -1724,6 +1730,7 @@ int fuse_fill_super_common(struct super_block *sb, struct fuse_fs_context *ctx) fc->group_id = ctx->group_id; fc->legacy_opts_show = ctx->legacy_opts_show; fc->max_read = max_t(unsigned int, 4096, ctx->max_read); + fc->daemon_timeout = ctx->daemon_timeout * HZ; fc->destroy = ctx->destroy; fc->no_control = ctx->no_control; fc->no_force_umount = ctx->no_force_umount;
There are situations where fuse servers can become unresponsive or take too long to reply to a request. Currently there is no upper bound on how long a request may take, which may be frustrating to users who get stuck waiting for a request to complete. This commit adds a daemon timeout option (in seconds) for fuse requests. If the timeout elapses before the request is replied to, the request will fail with -ETIME. There are 3 possibilities for a request that times out: a) The request times out before the request has been sent to userspace b) The request times out after the request has been sent to userspace and before it receives a reply from the server c) The request times out after the request has been sent to userspace and the server replies while the kernel is timing out the request Proper synchronization must be added to ensure that the request is handled correctly in all of these cases. To this effect, there is a new FR_PROCESSING bit added to the request flags, which is set atomically by either the timeout handler (see fuse_request_timeout()) which is invoked after the request timeout elapses or set by the request reply handler (see dev_do_write()), whichever gets there first. If the reply handler and the timeout handler are executing simultaneously and the reply handler sets FR_PROCESSING before the timeout handler, then the request is re-queued onto the waitqueue and the kernel will process the reply as though the timeout did not elapse. If the timeout handler sets FR_PROCESSING before the reply handler, then the request will fail with -ETIME and the request will be cleaned up. Proper acquires on the request reference must be added to ensure that the timeout handler does not drop the last refcount on the request while the reply handler (dev_do_write()) or forwarder handler (dev_do_read()) is still accessing the request. (By "forwarder handler", this is the handler that forwards the request to userspace). Currently, this is the lifecycle of the request refcount: Request is created: fuse_simple_request -> allocates request, sets refcount to 1 __fuse_request_send -> acquires refcount queues request and waits for reply... fuse_simple_request -> drops refcount Request is freed: fuse_dev_do_write fuse_request_end -> drops refcount on request The timeout handler drops the refcount on the request so that the request is properly cleaned up if a reply is never received. Because of this, both the forwarder handler and the reply handler must acquire a refcount on the request while it accesses the request, and the refcount must be acquired while the lock of the list the request is on is held. There is a potential race if the request is being forwarded to userspace while the timeout handler is executing (eg FR_PENDING has already been cleared but dev_do_read() hasn't finished executing). This is a problem because this would free the request but the request has not been removed from the fpq list it's on. To prevent this, dev_do_read() must check FR_PROCESSING at the end of its logic and remove the request from the fpq list if the timeout occurred. There is also the case where the connection may be aborted or the device may be released while the timeout handler is running. To protect against an extra refcount drop on the request, the timeout handler checks the connected state of the list and lets the abort handler drop the last reference if the abort is running simultaneously. Similarly, the timeout handler also needs to check if the req->out.h.error is set to -ESTALE, which indicates that the device release is cleaning up the request. In both these cases, the timeout handler will return without dropping the refcount. Please also note that background requests are not applicable for timeouts since they are asynchronous. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> --- fs/fuse/dev.c | 177 ++++++++++++++++++++++++++++++++++++++++++++--- fs/fuse/fuse_i.h | 12 ++++ fs/fuse/inode.c | 7 ++ 3 files changed, 188 insertions(+), 8 deletions(-)