Message ID | 20250328161823.2240125-1-fisaksen@baylibre.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | usb: gadget: f_fs: Invalidate io_data when USB request is dequeued or completed | expand |
On Fri, Mar 28, 2025 at 05:17:15PM +0100, Frode Isaksen wrote: > From: Frode Isaksen <frode@meta.com> > > Invalidate io_data by setting context to NULL when USB request is > dequeued or completed, and check for NULL io_data in epfile_io_complete(). > The invalidation of io_data in req->context is done when exiting > epfile_io(), since then io_data will become invalid as it is allocated > on the stack. > The epfile_io_complete() may be called after ffs_epfile_io() returns > in case the wait_for_completion_interruptible() is interrupted. > This fixes a use-after-free error with the following call stack: > > Unable to handle kernel paging request at virtual address ffffffc02f7bbcc0 > pc : ffs_epfile_io_complete+0x30/0x48 > lr : usb_gadget_giveback_request+0x30/0xf8 > Call trace: > ffs_epfile_io_complete+0x30/0x48 > usb_gadget_giveback_request+0x30/0xf8 > dwc3_remove_requests+0x264/0x2e8 > dwc3_gadget_pullup+0x1d0/0x250 > kretprobe_trampoline+0x0/0xc4 > usb_gadget_remove_driver+0x40/0xf4 > usb_gadget_unregister_driver+0xdc/0x178 > unregister_gadget_item+0x40/0x6c > ffs_closed+0xd4/0x10c > ffs_data_clear+0x2c/0xf0 > ffs_data_closed+0x178/0x1ec > ffs_ep0_release+0x24/0x38 > __fput+0xe8/0x27c > > Signed-off-by: Frode Isaksen <frode@meta.com> > --- > This bug was discovered, tested and fixed (no more crashes seen) on Meta Quest 3 device. > Also tested on T.I. AM62x board. What commit id does this fix? Should it go to stable? > > drivers/usb/gadget/function/f_fs.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c > index 2dea9e42a0f8..f1be0a5c0bd0 100644 > --- a/drivers/usb/gadget/function/f_fs.c > +++ b/drivers/usb/gadget/function/f_fs.c > @@ -738,6 +738,9 @@ static void ffs_epfile_io_complete(struct usb_ep *_ep, struct usb_request *req) > { > struct ffs_io_data *io_data = req->context; > > + if (WARN_ON(io_data == NULL)) > + return; If this happens you just crashed the box (remember about panic-on-warn, which is still set in a few billion Linux systems these days...) Just handle the issue properly, no need to dump the stack and crash a device. But, what keeps io_data from changing after you have checked it? Where is the lock here? thanks, greg k-h
On 3/28/25 10:02 PM, Greg KH wrote: > On Fri, Mar 28, 2025 at 05:17:15PM +0100, Frode Isaksen wrote: >> From: Frode Isaksen <frode@meta.com> >> >> Invalidate io_data by setting context to NULL when USB request is >> dequeued or completed, and check for NULL io_data in epfile_io_complete(). >> The invalidation of io_data in req->context is done when exiting >> epfile_io(), since then io_data will become invalid as it is allocated >> on the stack. >> The epfile_io_complete() may be called after ffs_epfile_io() returns >> in case the wait_for_completion_interruptible() is interrupted. >> This fixes a use-after-free error with the following call stack: >> >> Unable to handle kernel paging request at virtual address ffffffc02f7bbcc0 >> pc : ffs_epfile_io_complete+0x30/0x48 >> lr : usb_gadget_giveback_request+0x30/0xf8 >> Call trace: >> ffs_epfile_io_complete+0x30/0x48 >> usb_gadget_giveback_request+0x30/0xf8 >> dwc3_remove_requests+0x264/0x2e8 >> dwc3_gadget_pullup+0x1d0/0x250 >> kretprobe_trampoline+0x0/0xc4 >> usb_gadget_remove_driver+0x40/0xf4 >> usb_gadget_unregister_driver+0xdc/0x178 >> unregister_gadget_item+0x40/0x6c >> ffs_closed+0xd4/0x10c >> ffs_data_clear+0x2c/0xf0 >> ffs_data_closed+0x178/0x1ec >> ffs_ep0_release+0x24/0x38 >> __fput+0xe8/0x27c >> >> Signed-off-by: Frode Isaksen <frode@meta.com> >> --- >> This bug was discovered, tested and fixed (no more crashes seen) on Meta Quest 3 device. >> Also tested on T.I. AM62x board. > What commit id does this fix? Should it go to stable? This has always been there, so the is no specific commit when this was added. Will add the Cc tag to stable in v2. > >> drivers/usb/gadget/function/f_fs.c | 5 +++++ >> 1 file changed, 5 insertions(+) >> >> diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c >> index 2dea9e42a0f8..f1be0a5c0bd0 100644 >> --- a/drivers/usb/gadget/function/f_fs.c >> +++ b/drivers/usb/gadget/function/f_fs.c >> @@ -738,6 +738,9 @@ static void ffs_epfile_io_complete(struct usb_ep *_ep, struct usb_request *req) >> { >> struct ffs_io_data *io_data = req->context; >> >> + if (WARN_ON(io_data == NULL)) >> + return; > If this happens you just crashed the box (remember about panic-on-warn, > which is still set in a few billion Linux systems these days...) > > Just handle the issue properly, no need to dump the stack and crash a > device. OK, removing the WARN_ON for v2. > > But, what keeps io_data from changing after you have checked it? Where > is the lock here? There is no lock here, as I didn't want to introduce extra complexity (and bugs...). But this code has been running without a crash on millions of devices for more than a year. Thanks, Frode > > thanks, > > greg k-h
On Mon, Mar 31, 2025 at 10:18:29AM +0200, Frode Isaksen wrote: > On 3/28/25 10:02 PM, Greg KH wrote: > > On Fri, Mar 28, 2025 at 05:17:15PM +0100, Frode Isaksen wrote: > > > From: Frode Isaksen <frode@meta.com> > > > > > > Invalidate io_data by setting context to NULL when USB request is > > > dequeued or completed, and check for NULL io_data in epfile_io_complete(). > > > The invalidation of io_data in req->context is done when exiting > > > epfile_io(), since then io_data will become invalid as it is allocated > > > on the stack. > > > The epfile_io_complete() may be called after ffs_epfile_io() returns > > > in case the wait_for_completion_interruptible() is interrupted. > > > This fixes a use-after-free error with the following call stack: > > > > > > Unable to handle kernel paging request at virtual address ffffffc02f7bbcc0 > > > pc : ffs_epfile_io_complete+0x30/0x48 > > > lr : usb_gadget_giveback_request+0x30/0xf8 > > > Call trace: > > > ffs_epfile_io_complete+0x30/0x48 > > > usb_gadget_giveback_request+0x30/0xf8 > > > dwc3_remove_requests+0x264/0x2e8 > > > dwc3_gadget_pullup+0x1d0/0x250 > > > kretprobe_trampoline+0x0/0xc4 > > > usb_gadget_remove_driver+0x40/0xf4 > > > usb_gadget_unregister_driver+0xdc/0x178 > > > unregister_gadget_item+0x40/0x6c > > > ffs_closed+0xd4/0x10c > > > ffs_data_clear+0x2c/0xf0 > > > ffs_data_closed+0x178/0x1ec > > > ffs_ep0_release+0x24/0x38 > > > __fput+0xe8/0x27c > > > > > > Signed-off-by: Frode Isaksen <frode@meta.com> > > > --- > > > This bug was discovered, tested and fixed (no more crashes seen) on Meta Quest 3 device. > > > Also tested on T.I. AM62x board. > > What commit id does this fix? Should it go to stable? > > This has always been there, so the is no specific commit when this was > added. > > Will add the Cc tag to stable in v2. > > > > > > drivers/usb/gadget/function/f_fs.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c > > > index 2dea9e42a0f8..f1be0a5c0bd0 100644 > > > --- a/drivers/usb/gadget/function/f_fs.c > > > +++ b/drivers/usb/gadget/function/f_fs.c > > > @@ -738,6 +738,9 @@ static void ffs_epfile_io_complete(struct usb_ep *_ep, struct usb_request *req) > > > { > > > struct ffs_io_data *io_data = req->context; > > > + if (WARN_ON(io_data == NULL)) > > > + return; > > If this happens you just crashed the box (remember about panic-on-warn, > > which is still set in a few billion Linux systems these days...) > > > > Just handle the issue properly, no need to dump the stack and crash a > > device. > OK, removing the WARN_ON for v2. > > > > But, what keeps io_data from changing after you have checked it? Where > > is the lock here? > > There is no lock here, as I didn't want to introduce extra complexity (and > bugs...). But this code has been running without a crash on millions of > devices for more than a year. The fix has? Great, but again, you need to at least say why this value will not change right after testing for it, otherwise you have just reduced the race window, not removed it. thanks, greg k-h
On 3/31/25 10:57 AM, Greg KH wrote: > On Mon, Mar 31, 2025 at 10:18:29AM +0200, Frode Isaksen wrote: >> On 3/28/25 10:02 PM, Greg KH wrote: >>> On Fri, Mar 28, 2025 at 05:17:15PM +0100, Frode Isaksen wrote: >>>> From: Frode Isaksen <frode@meta.com> >>>> >>>> Invalidate io_data by setting context to NULL when USB request is >>>> dequeued or completed, and check for NULL io_data in epfile_io_complete(). >>>> The invalidation of io_data in req->context is done when exiting >>>> epfile_io(), since then io_data will become invalid as it is allocated >>>> on the stack. >>>> The epfile_io_complete() may be called after ffs_epfile_io() returns >>>> in case the wait_for_completion_interruptible() is interrupted. >>>> This fixes a use-after-free error with the following call stack: >>>> >>>> Unable to handle kernel paging request at virtual address ffffffc02f7bbcc0 >>>> pc : ffs_epfile_io_complete+0x30/0x48 >>>> lr : usb_gadget_giveback_request+0x30/0xf8 >>>> Call trace: >>>> ffs_epfile_io_complete+0x30/0x48 >>>> usb_gadget_giveback_request+0x30/0xf8 >>>> dwc3_remove_requests+0x264/0x2e8 >>>> dwc3_gadget_pullup+0x1d0/0x250 >>>> kretprobe_trampoline+0x0/0xc4 >>>> usb_gadget_remove_driver+0x40/0xf4 >>>> usb_gadget_unregister_driver+0xdc/0x178 >>>> unregister_gadget_item+0x40/0x6c >>>> ffs_closed+0xd4/0x10c >>>> ffs_data_clear+0x2c/0xf0 >>>> ffs_data_closed+0x178/0x1ec >>>> ffs_ep0_release+0x24/0x38 >>>> __fput+0xe8/0x27c >>>> >>>> Signed-off-by: Frode Isaksen <frode@meta.com> >>>> --- >>>> This bug was discovered, tested and fixed (no more crashes seen) on Meta Quest 3 device. >>>> Also tested on T.I. AM62x board. >>> What commit id does this fix? Should it go to stable? >> This has always been there, so the is no specific commit when this was >> added. >> >> Will add the Cc tag to stable in v2. >> >>>> drivers/usb/gadget/function/f_fs.c | 5 +++++ >>>> 1 file changed, 5 insertions(+) >>>> >>>> diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c >>>> index 2dea9e42a0f8..f1be0a5c0bd0 100644 >>>> --- a/drivers/usb/gadget/function/f_fs.c >>>> +++ b/drivers/usb/gadget/function/f_fs.c >>>> @@ -738,6 +738,9 @@ static void ffs_epfile_io_complete(struct usb_ep *_ep, struct usb_request *req) >>>> { >>>> struct ffs_io_data *io_data = req->context; >>>> + if (WARN_ON(io_data == NULL)) >>>> + return; >>> If this happens you just crashed the box (remember about panic-on-warn, >>> which is still set in a few billion Linux systems these days...) >>> >>> Just handle the issue properly, no need to dump the stack and crash a >>> device. >> OK, removing the WARN_ON for v2. >>> But, what keeps io_data from changing after you have checked it? Where >>> is the lock here? >> There is no lock here, as I didn't want to introduce extra complexity (and >> bugs...). But this code has been running without a crash on millions of >> devices for more than a year. > The fix has? Great, but again, you need to at least say why this value > will not change right after testing for it, otherwise you have just > reduced the race window, not removed it. I agree that this is only reducing the race window and not eliminating it completely, but I have no idea how to fix this easily. Thanks, Frode > > thanks, > > greg k-h
On Mon, Mar 31, 2025 at 03:17:08PM +0200, Frode Isaksen wrote: > On 3/31/25 10:57 AM, Greg KH wrote: > > On Mon, Mar 31, 2025 at 10:18:29AM +0200, Frode Isaksen wrote: > > > On 3/28/25 10:02 PM, Greg KH wrote: > > > > On Fri, Mar 28, 2025 at 05:17:15PM +0100, Frode Isaksen wrote: > > > > > From: Frode Isaksen <frode@meta.com> > > > > > > > > > > Invalidate io_data by setting context to NULL when USB request is > > > > > dequeued or completed, and check for NULL io_data in epfile_io_complete(). > > > > > The invalidation of io_data in req->context is done when exiting > > > > > epfile_io(), since then io_data will become invalid as it is allocated > > > > > on the stack. > > > > > The epfile_io_complete() may be called after ffs_epfile_io() returns > > > > > in case the wait_for_completion_interruptible() is interrupted. > > > > > This fixes a use-after-free error with the following call stack: > > > > > > > > > > Unable to handle kernel paging request at virtual address ffffffc02f7bbcc0 > > > > > pc : ffs_epfile_io_complete+0x30/0x48 > > > > > lr : usb_gadget_giveback_request+0x30/0xf8 > > > > > Call trace: > > > > > ffs_epfile_io_complete+0x30/0x48 > > > > > usb_gadget_giveback_request+0x30/0xf8 > > > > > dwc3_remove_requests+0x264/0x2e8 > > > > > dwc3_gadget_pullup+0x1d0/0x250 > > > > > kretprobe_trampoline+0x0/0xc4 > > > > > usb_gadget_remove_driver+0x40/0xf4 > > > > > usb_gadget_unregister_driver+0xdc/0x178 > > > > > unregister_gadget_item+0x40/0x6c > > > > > ffs_closed+0xd4/0x10c > > > > > ffs_data_clear+0x2c/0xf0 > > > > > ffs_data_closed+0x178/0x1ec > > > > > ffs_ep0_release+0x24/0x38 > > > > > __fput+0xe8/0x27c > > > > > > > > > > Signed-off-by: Frode Isaksen <frode@meta.com> > > > > > --- > > > > > This bug was discovered, tested and fixed (no more crashes seen) on Meta Quest 3 device. > > > > > Also tested on T.I. AM62x board. > > > > What commit id does this fix? Should it go to stable? > > > This has always been there, so the is no specific commit when this was > > > added. > > > > > > Will add the Cc tag to stable in v2. > > > > > > > > drivers/usb/gadget/function/f_fs.c | 5 +++++ > > > > > 1 file changed, 5 insertions(+) > > > > > > > > > > diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c > > > > > index 2dea9e42a0f8..f1be0a5c0bd0 100644 > > > > > --- a/drivers/usb/gadget/function/f_fs.c > > > > > +++ b/drivers/usb/gadget/function/f_fs.c > > > > > @@ -738,6 +738,9 @@ static void ffs_epfile_io_complete(struct usb_ep *_ep, struct usb_request *req) > > > > > { > > > > > struct ffs_io_data *io_data = req->context; > > > > > + if (WARN_ON(io_data == NULL)) > > > > > + return; > > > > If this happens you just crashed the box (remember about panic-on-warn, > > > > which is still set in a few billion Linux systems these days...) > > > > > > > > Just handle the issue properly, no need to dump the stack and crash a > > > > device. > > > OK, removing the WARN_ON for v2. > > > > But, what keeps io_data from changing after you have checked it? Where > > > > is the lock here? > > > There is no lock here, as I didn't want to introduce extra complexity (and > > > bugs...). But this code has been running without a crash on millions of > > > devices for more than a year. > > The fix has? Great, but again, you need to at least say why this value > > will not change right after testing for it, otherwise you have just > > reduced the race window, not removed it. > > I agree that this is only reducing the race window and not eliminating it > completely, but I have no idea how to fix this easily. The comment in the code explains where the race can not happen, which implies where it can happen, so perhaps that is a good start? good luck! greg k-h
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c index 2dea9e42a0f8..f1be0a5c0bd0 100644 --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -738,6 +738,9 @@ static void ffs_epfile_io_complete(struct usb_ep *_ep, struct usb_request *req) { struct ffs_io_data *io_data = req->context; + if (WARN_ON(io_data == NULL)) + return; + if (req->status) io_data->status = req->status; else @@ -1126,6 +1129,7 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data) spin_lock_irq(&epfile->ffs->eps_lock); if (epfile->ep != ep) { ret = -ESHUTDOWN; + req->context = NULL; goto error_lock; } /* @@ -1140,6 +1144,7 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data) interrupted = io_data->status < 0; } + req->context = NULL; if (interrupted) ret = -EINTR; else if (io_data->read && io_data->status > 0)