Message ID | 1599553026-11745-1-git-send-email-pragalla@qti.qualcomm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [V4] fuse: Fix VM_BUG_ON_PAGE issue while accessing zero ref count page | expand |
On Tue, Sep 8, 2020 at 10:17 AM Pradeep P V K <pragalla@qti.qualcomm.com> wrote: > > From: Pradeep P V K <ppvk@codeaurora.org> > > There is a potential race between fuse_abort_conn() and > fuse_copy_page() as shown below, due to which VM_BUG_ON_PAGE > crash is observed for accessing a free page. > > context#1: context#2: > fuse_dev_do_read() fuse_abort_conn() > ->fuse_copy_args() ->end_requests() This shouldn't happen due to FR_LOCKED logic. Are you seeing this on an upstream kernel? Which version? Thanks, Miklos
On Tue, Sep 08, 2020 at 01:47:06PM +0530, Pradeep P V K wrote: > Changes since V3: > - Fix smatch warnings. > > Changes since V2: > - Moved the spin lock from fuse_copy_pages() to fuse_ref_page(). > > Changes since V1: > - Modified the logic as per kernel v5.9-rc1. > - Added Reported by tag. > > Reported-by: kernel test robot <lkp@intel.com> > Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Umm, the way this is written, it looks like Dan reported the original bug rather than a bug in v3. The usual way is to credit Dan in the 'Changes since' rather than putting in a 'Reported-by'. > static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, > - unsigned offset, unsigned count) > + unsigned offset, unsigned count, struct fuse_conn *fc) I'm no expert on fuse, but it looks to me like you should put a pointer to the fuse_conn in struct fuse_copy_state rather than passing it down through all these callers.
On 2020-09-08 16:55, Miklos Szeredi wrote: > On Tue, Sep 8, 2020 at 10:17 AM Pradeep P V K > <pragalla@qti.qualcomm.com> wrote: >> >> From: Pradeep P V K <ppvk@codeaurora.org> >> >> There is a potential race between fuse_abort_conn() and >> fuse_copy_page() as shown below, due to which VM_BUG_ON_PAGE >> crash is observed for accessing a free page. >> >> context#1: context#2: >> fuse_dev_do_read() fuse_abort_conn() >> ->fuse_copy_args() ->end_requests() > > This shouldn't happen due to FR_LOCKED logic. Are you seeing this on > an upstream kernel? Which version? > > Thanks, > Miklos This is happen just after unlock_request() in fuse_ref_page(). In unlock_request(), it will clear the FR_LOCKED bit. As there is no protection between context#1 & context#2 during unlock_request(), there are chances that it could happen. The value of request flags under "fuse_req" DS is "1561" and this tells FR_PRIVATE bit is set and there by, it adds the request to end_io list and free. This was seen on upstream kernel - v4.19 stable. Thanks and Regards, Pradeep
On 2020-09-08 16:57, Matthew Wilcox wrote: > On Tue, Sep 08, 2020 at 01:47:06PM +0530, Pradeep P V K wrote: >> Changes since V3: >> - Fix smatch warnings. >> >> Changes since V2: >> - Moved the spin lock from fuse_copy_pages() to fuse_ref_page(). >> >> Changes since V1: >> - Modified the logic as per kernel v5.9-rc1. >> - Added Reported by tag. >> >> Reported-by: kernel test robot <lkp@intel.com> >> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> > > Umm, the way this is written, it looks like Dan reported the original > bug rather than a bug in v3. The usual way is to credit Dan in the > 'Changes since' rather than putting in a 'Reported-by'. > sure thanks. i will note this for my next patches to upstream. >> static int fuse_ref_page(struct fuse_copy_state *cs, struct page >> *page, >> - unsigned offset, unsigned count) >> + unsigned offset, unsigned count, struct fuse_conn *fc) > > I'm no expert on fuse, but it looks to me like you should put a pointer > to the fuse_conn in struct fuse_copy_state rather than passing it down > through all these callers. True but will wait for other expert suggestions and comments too. Thanks and Regards, Pradeep
On Thu, Sep 10, 2020 at 5:42 PM <ppvk@codeaurora.org> wrote: > > On 2020-09-08 16:55, Miklos Szeredi wrote: > > On Tue, Sep 8, 2020 at 10:17 AM Pradeep P V K > > <pragalla@qti.qualcomm.com> wrote: > >> > >> From: Pradeep P V K <ppvk@codeaurora.org> > >> > >> There is a potential race between fuse_abort_conn() and > >> fuse_copy_page() as shown below, due to which VM_BUG_ON_PAGE > >> crash is observed for accessing a free page. > >> > >> context#1: context#2: > >> fuse_dev_do_read() fuse_abort_conn() > >> ->fuse_copy_args() ->end_requests() > > > > This shouldn't happen due to FR_LOCKED logic. Are you seeing this on > > an upstream kernel? Which version? > > > > Thanks, > > Miklos > > This is happen just after unlock_request() in fuse_ref_page(). In > unlock_request(), it will clear the FR_LOCKED bit. > As there is no protection between context#1 & context#2 during > unlock_request(), there are chances that it could happen. Ah, indeed, I missed that one. Similar issue in fuse_try_move_page(), which dereferences oldpage after unlock_request(). Fix for both is to grab a reference to the page from ap->pages[] array *before* calling unlock_request(). Attached untested patch. Could you please verify that it fixes the bug? Thanks, Miklos
On 2020-09-14 13:41, Miklos Szeredi wrote: > On Thu, Sep 10, 2020 at 5:42 PM <ppvk@codeaurora.org> wrote: >> >> On 2020-09-08 16:55, Miklos Szeredi wrote: >> > On Tue, Sep 8, 2020 at 10:17 AM Pradeep P V K >> > <pragalla@qti.qualcomm.com> wrote: >> >> >> >> From: Pradeep P V K <ppvk@codeaurora.org> >> >> >> >> There is a potential race between fuse_abort_conn() and >> >> fuse_copy_page() as shown below, due to which VM_BUG_ON_PAGE >> >> crash is observed for accessing a free page. >> >> >> >> context#1: context#2: >> >> fuse_dev_do_read() fuse_abort_conn() >> >> ->fuse_copy_args() ->end_requests() >> > >> > This shouldn't happen due to FR_LOCKED logic. Are you seeing this on >> > an upstream kernel? Which version? >> > >> > Thanks, >> > Miklos >> >> This is happen just after unlock_request() in fuse_ref_page(). In >> unlock_request(), it will clear the FR_LOCKED bit. >> As there is no protection between context#1 & context#2 during >> unlock_request(), there are chances that it could happen. > > Ah, indeed, I missed that one. > > Similar issue in fuse_try_move_page(), which dereferences oldpage > after unlock_request(). > > Fix for both is to grab a reference to the page from ap->pages[] array > *before* calling unlock_request(). > > Attached untested patch. Could you please verify that it fixes the > bug? > Thanks for the patch. It is an one time issue and bit hard to reproduce but still we will verify the above proposed patch and update the test results here. Minor comments on the commit text of the proposed patch : This issue was originally reported by me and kernel test robot identified compilation errors on the patch that i submitted. This confusion might be due to un proper commit text note on "changes since v1" > Thanks, > Miklos Thanks and Regards, Pradeep
On 2020-09-14 19:02, ppvk@codeaurora.org wrote: > On 2020-09-14 13:41, Miklos Szeredi wrote: >> On Thu, Sep 10, 2020 at 5:42 PM <ppvk@codeaurora.org> wrote: >>> >>> On 2020-09-08 16:55, Miklos Szeredi wrote: >>> > On Tue, Sep 8, 2020 at 10:17 AM Pradeep P V K >>> > <pragalla@qti.qualcomm.com> wrote: >>> >> >>> >> From: Pradeep P V K <ppvk@codeaurora.org> >>> >> >>> >> There is a potential race between fuse_abort_conn() and >>> >> fuse_copy_page() as shown below, due to which VM_BUG_ON_PAGE >>> >> crash is observed for accessing a free page. >>> >> >>> >> context#1: context#2: >>> >> fuse_dev_do_read() fuse_abort_conn() >>> >> ->fuse_copy_args() ->end_requests() >>> > >>> > This shouldn't happen due to FR_LOCKED logic. Are you seeing this on >>> > an upstream kernel? Which version? >>> > >>> > Thanks, >>> > Miklos >>> >>> This is happen just after unlock_request() in fuse_ref_page(). In >>> unlock_request(), it will clear the FR_LOCKED bit. >>> As there is no protection between context#1 & context#2 during >>> unlock_request(), there are chances that it could happen. >> >> Ah, indeed, I missed that one. >> >> Similar issue in fuse_try_move_page(), which dereferences oldpage >> after unlock_request(). >> >> Fix for both is to grab a reference to the page from ap->pages[] array >> *before* calling unlock_request(). >> >> Attached untested patch. Could you please verify that it fixes the >> bug? >> > Thanks for the patch. It is an one time issue and bit hard to > reproduce but still we > will verify the above proposed patch and update the test results here. > Not seen any issue during 24 hours(+) of stability run with your proposed patch. This covers reads/writes on fuse paths + reboots + other concurrency's. > Minor comments on the commit text of the proposed patch : This issue > was originally reported by me and kernel test robot > identified compilation errors on the patch that i submitted. > This confusion might be due to un proper commit text note on "changes > since v1" > >> Thanks, >> Miklos > > Thanks and Regards, > Pradeep
On Wed, Sep 16, 2020 at 5:31 PM <ppvk@codeaurora.org> wrote: > > On 2020-09-14 19:02, ppvk@codeaurora.org wrote: > > Thanks for the patch. It is an one time issue and bit hard to > > reproduce but still we > > will verify the above proposed patch and update the test results here. > > > Not seen any issue during 24 hours(+) of stability run with your > proposed patch. > This covers reads/writes on fuse paths + reboots + other concurrency's. Great, thanks for verifying. Pushed to fuse.git#for-next. Thanks, Miklos
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 02b3c36..25f2086 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -875,7 +875,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep) } static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, - unsigned offset, unsigned count) + unsigned offset, unsigned count, struct fuse_conn *fc) { struct pipe_buffer *buf; int err; @@ -883,15 +883,18 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, if (cs->nr_segs >= cs->pipe->max_usage) return -EIO; + spin_lock(&fc->lock); err = unlock_request(cs->req); - if (err) + if (err) { + spin_unlock(&fc->lock); return err; - + } fuse_copy_finish(cs); buf = cs->pipebufs; get_page(page); buf->page = page; + spin_unlock(&fc->lock); buf->offset = offset; buf->len = count; @@ -907,7 +910,7 @@ static int fuse_ref_page(struct fuse_copy_state *cs, struct page *page, * done atomically */ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep, - unsigned offset, unsigned count, int zeroing) + unsigned offset, unsigned count, int zeroing, struct fuse_conn *fc) { int err; struct page *page = *pagep; @@ -917,7 +920,7 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep, while (count) { if (cs->write && cs->pipebufs && page) { - return fuse_ref_page(cs, page, offset, count); + return fuse_ref_page(cs, page, offset, count, fc); } else if (!cs->len) { if (cs->move_pages && page && offset == 0 && count == PAGE_SIZE) { @@ -945,7 +948,7 @@ static int fuse_copy_page(struct fuse_copy_state *cs, struct page **pagep, /* Copy pages in the request to/from userspace buffer */ static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes, - int zeroing) + int zeroing, struct fuse_conn *fc) { unsigned i; struct fuse_req *req = cs->req; @@ -957,7 +960,7 @@ static int fuse_copy_pages(struct fuse_copy_state *cs, unsigned nbytes, unsigned int offset = ap->descs[i].offset; unsigned int count = min(nbytes, ap->descs[i].length); - err = fuse_copy_page(cs, &ap->pages[i], offset, count, zeroing); + err = fuse_copy_page(cs, &ap->pages[i], offset, count, zeroing, fc); if (err) return err; @@ -983,7 +986,7 @@ static int fuse_copy_one(struct fuse_copy_state *cs, void *val, unsigned size) /* Copy request arguments to/from userspace buffer */ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs, unsigned argpages, struct fuse_arg *args, - int zeroing) + int zeroing, struct fuse_conn *fc) { int err = 0; unsigned i; @@ -991,7 +994,7 @@ static int fuse_copy_args(struct fuse_copy_state *cs, unsigned numargs, for (i = 0; !err && i < numargs; i++) { struct fuse_arg *arg = &args[i]; if (i == numargs - 1 && argpages) - err = fuse_copy_pages(cs, arg->size, zeroing); + err = fuse_copy_pages(cs, arg->size, zeroing, fc); else err = fuse_copy_one(cs, arg->value, arg->size); } @@ -1260,7 +1263,7 @@ static ssize_t fuse_dev_do_read(struct fuse_dev *fud, struct file *file, err = fuse_copy_one(cs, &req->in.h, sizeof(req->in.h)); if (!err) err = fuse_copy_args(cs, args->in_numargs, args->in_pages, - (struct fuse_arg *) args->in_args, 0); + (struct fuse_arg *) args->in_args, 0, fc); fuse_copy_finish(cs); spin_lock(&fpq->lock); clear_bit(FR_LOCKED, &req->flags); @@ -1590,7 +1593,7 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size, goto out_iput; this_num = min_t(unsigned, num, PAGE_SIZE - offset); - err = fuse_copy_page(cs, &page, offset, this_num, 0); + err = fuse_copy_page(cs, &page, offset, this_num, 0, fc); if (!err && offset == 0 && (this_num == PAGE_SIZE || file_size == end)) SetPageUptodate(page); @@ -1792,7 +1795,7 @@ static struct fuse_req *request_find(struct fuse_pqueue *fpq, u64 unique) } static int copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args, - unsigned nbytes) + unsigned nbytes, struct fuse_conn *fc) { unsigned reqsize = sizeof(struct fuse_out_header); @@ -1809,7 +1812,7 @@ static int copy_out_args(struct fuse_copy_state *cs, struct fuse_args *args, lastarg->size -= diffsize; } return fuse_copy_args(cs, args->out_numargs, args->out_pages, - args->out_args, args->page_zeroing); + args->out_args, args->page_zeroing, fc); } /* @@ -1894,7 +1897,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud, if (oh.error) err = nbytes != sizeof(oh) ? -EINVAL : 0; else - err = copy_out_args(cs, req->args, nbytes); + err = copy_out_args(cs, req->args, nbytes, fc); fuse_copy_finish(cs); spin_lock(&fpq->lock);