Message ID | 20230930184821.310143-1-arakesh@google.com (mailing list archive) |
---|---|
Headers | show |
Series | usb: gadget: uvc: stability fixes on STREAMOFF. | expand |
Hi On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: >We have been seeing two main stability issues that uvc gadget driver >runs into when stopping streams: > 1. Attempting to queue usb_requests to a disabled usb_ep > 2. use-after-free issue for inflight usb_requests > >The three patches below fix the two issues above. Patch 1/3 fixes the >first issue, and Patch 2/3 and 3/3 fix the second issue. > >Avichal Rakesh (3): > usb: gadget: uvc: prevent use of disabled endpoint > usb: gadget: uvc: Allocate uvc_requests one at a time > usb: gadget: uvc: Fix use-after-free for inflight usb_requests > > drivers/usb/gadget/function/f_uvc.c | 11 +- > drivers/usb/gadget/function/f_uvc.h | 2 +- > drivers/usb/gadget/function/uvc.h | 6 +- > drivers/usb/gadget/function/uvc_v4l2.c | 21 ++- > drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- > 5 files changed, 164 insertions(+), 65 deletions(-) These patches are not applying on gregkh/usb-testing since Greg did take my patches first. I have already rebased them. In the updated version I the stack runs into the following error, when enabling lockdep. Could you try your version with lockdep enabled? [ 41.278520] configfs-gadget.vz gadget.0: uvc: reset UVC [ 47.156261] configfs-gadget.vz gadget.0: uvc: uvc_function_set_alt(2, 0) [ 47.169177] [ 47.170903] ============================================ [ 47.176857] WARNING: possible recursive locking detected [ 47.182798] 6.5.0-20230919-1+ #19 Tainted: G C [ 47.189323] -------------------------------------------- [ 47.195256] vzuvcd/412 is trying to acquire lock: [ 47.200511] ffffff8009560928 (&video->req_lock){....}-{3:3}, at: uvc_video_complete+0x44/0x2e0 [ 47.210172] [ 47.210172] but task is already holding lock: [ 47.216687] ffffff8009560928 (&video->req_lock){....}-{3:3}, at: uvcg_video_enable+0x2d0/0x5c0 [ 47.226333] [ 47.226333] other info that might help us debug this: [ 47.233625] Possible unsafe locking scenario: [ 47.233625] [ 47.240242] CPU0 [ 47.242974] ---- [ 47.245709] lock(&video->req_lock); [ 47.249802] lock(&video->req_lock); [ 47.253897] [ 47.253897] *** DEADLOCK *** [ 47.253897] [ 47.260511] May be due to missing lock nesting notation [ 47.260511] Regards, Michael
Thank you for testing the patch, Michael! On 10/3/23 04:09, Michael Grzeschik wrote: > Hi > > On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: >> We have been seeing two main stability issues that uvc gadget driver >> runs into when stopping streams: >> 1. Attempting to queue usb_requests to a disabled usb_ep >> 2. use-after-free issue for inflight usb_requests >> >> The three patches below fix the two issues above. Patch 1/3 fixes the >> first issue, and Patch 2/3 and 3/3 fix the second issue. >> >> Avichal Rakesh (3): >> usb: gadget: uvc: prevent use of disabled endpoint >> usb: gadget: uvc: Allocate uvc_requests one at a time >> usb: gadget: uvc: Fix use-after-free for inflight usb_requests >> >> drivers/usb/gadget/function/f_uvc.c | 11 +- >> drivers/usb/gadget/function/f_uvc.h | 2 +- >> drivers/usb/gadget/function/uvc.h | 6 +- >> drivers/usb/gadget/function/uvc_v4l2.c | 21 ++- >> drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- >> 5 files changed, 164 insertions(+), 65 deletions(-) > > These patches are not applying on gregkh/usb-testing since > Greg did take my patches first. I have already rebased them. Ah, I didn't realize Greg had picked up your changes in his tree. Rebased the patches in V2. Also want to point out that https://lore.kernel.org/20230911002451.2860049-2-m.grzeschik@pengutronix.de/ may have introduced an issue when setting the uvc->state to UVC_STATE_CONNECTED in uvcg_video_enable. Effectively, uvc_video_enable can be called 1. because the host asks to halt the stream, or 2. if the gadget was disabled. Setting uvc->state to CONNECTED is fine for (1). In (2), uvc_v4l2 sets uvc->state to DISCONNECTED before calling uvcg_video_enable. In this case, your change would overwrite the value to CONNECTED without any further checks. > > In the updated version I the stack runs into the > following error, when enabling lockdep. Could you > try your version with lockdep enabled? > > [ 41.278520] configfs-gadget.vz gadget.0: uvc: reset UVC > [ 47.156261] configfs-gadget.vz gadget.0: uvc: uvc_function_set_alt(2, 0) > [ 47.169177] > [ 47.170903] ============================================ > [ 47.176857] WARNING: possible recursive locking detected > [ 47.182798] 6.5.0-20230919-1+ #19 Tainted: G C > [ 47.189323] -------------------------------------------- > [ 47.195256] vzuvcd/412 is trying to acquire lock: > [ 47.200511] ffffff8009560928 (&video->req_lock){....}-{3:3}, at: uvc_video_complete+0x44/0x2e0 > [ 47.210172] > [ 47.210172] but task is already holding lock: > [ 47.216687] ffffff8009560928 (&video->req_lock){....}-{3:3}, at: uvcg_video_enable+0x2d0/0x5c0 > [ 47.226333] > [ 47.226333] other info that might help us debug this: > [ 47.233625] Possible unsafe locking scenario: > [ 47.233625] > [ 47.240242] CPU0 > [ 47.242974] ---- > [ 47.245709] lock(&video->req_lock); > [ 47.249802] lock(&video->req_lock); > [ 47.253897] > [ 47.253897] *** DEADLOCK *** > [ 47.253897] > [ 47.260511] May be due to missing lock nesting notation > [ 47.260511] > Thank you for catching this. I couldn't repro the deadlock, but it looks like DWC3 controller can call the complete callback directly from usb_ep_dequeue for requests in its pending list. I am not sure if that is up to spec, considering calling usb_ep_queue explicitly forbids calling the complete callback from within. Regardless, it is easy enough to fix by moving the dequeue calls to before we acquire req_lock. The semantics of the rest of the patch is unchanged. Uploading the fixed patches shortly. Regards, Avi.
On Tue, Oct 03, 2023 at 04:16:00PM -0700, Avichal Rakesh wrote: > Thank you for testing the patch, Michael! > > On 10/3/23 04:09, Michael Grzeschik wrote: > > Hi > > > > On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: > >> We have been seeing two main stability issues that uvc gadget driver > >> runs into when stopping streams: > >> 1. Attempting to queue usb_requests to a disabled usb_ep > >> 2. use-after-free issue for inflight usb_requests > >> > >> The three patches below fix the two issues above. Patch 1/3 fixes the > >> first issue, and Patch 2/3 and 3/3 fix the second issue. > >> > >> Avichal Rakesh (3): > >> usb: gadget: uvc: prevent use of disabled endpoint > >> usb: gadget: uvc: Allocate uvc_requests one at a time > >> usb: gadget: uvc: Fix use-after-free for inflight usb_requests > >> > >> drivers/usb/gadget/function/f_uvc.c | 11 +- > >> drivers/usb/gadget/function/f_uvc.h | 2 +- > >> drivers/usb/gadget/function/uvc.h | 6 +- > >> drivers/usb/gadget/function/uvc_v4l2.c | 21 ++- > >> drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- > >> 5 files changed, 164 insertions(+), 65 deletions(-) > > > > These patches are not applying on gregkh/usb-testing since > > Greg did take my patches first. I have already rebased them. > > Ah, I didn't realize Greg had picked up your changes in his tree. > Rebased the patches in V2. The "v2" series here is almost impossible to follow, sorry. Please send it as a new thread, not as responses to the individual commits, how am I supposed to pick them up that way? And make it v3 please. thanks, greg k-h
On Tue, Oct 03, 2023 at 01:09:06PM +0200, Michael Grzeschik wrote: > On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: > > We have been seeing two main stability issues that uvc gadget driver > > runs into when stopping streams: > > 1. Attempting to queue usb_requests to a disabled usb_ep > > 2. use-after-free issue for inflight usb_requests > > > > The three patches below fix the two issues above. Patch 1/3 fixes the > > first issue, and Patch 2/3 and 3/3 fix the second issue. > > > > Avichal Rakesh (3): > > usb: gadget: uvc: prevent use of disabled endpoint > > usb: gadget: uvc: Allocate uvc_requests one at a time > > usb: gadget: uvc: Fix use-after-free for inflight usb_requests > > > > drivers/usb/gadget/function/f_uvc.c | 11 +- > > drivers/usb/gadget/function/f_uvc.h | 2 +- > > drivers/usb/gadget/function/uvc.h | 6 +- > > drivers/usb/gadget/function/uvc_v4l2.c | 21 ++- > > drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- > > 5 files changed, 164 insertions(+), 65 deletions(-) > > These patches are not applying on gregkh/usb-testing since > Greg did take my patches first. I have already rebased them. I think they got merged too soon :-( We could fix things on top, but there's very little time to do so for v6.7. > In the updated version I the stack runs into the > following error, when enabling lockdep. Could you > try your version with lockdep enabled? > > [ 41.278520] configfs-gadget.vz gadget.0: uvc: reset UVC > [ 47.156261] configfs-gadget.vz gadget.0: uvc: uvc_function_set_alt(2, 0) > [ 47.169177] > [ 47.170903] ============================================ > [ 47.176857] WARNING: possible recursive locking detected > [ 47.182798] 6.5.0-20230919-1+ #19 Tainted: G C > [ 47.189323] -------------------------------------------- > [ 47.195256] vzuvcd/412 is trying to acquire lock: > [ 47.200511] ffffff8009560928 (&video->req_lock){....}-{3:3}, at: uvc_video_complete+0x44/0x2e0 > [ 47.210172] > [ 47.210172] but task is already holding lock: > [ 47.216687] ffffff8009560928 (&video->req_lock){....}-{3:3}, at: uvcg_video_enable+0x2d0/0x5c0 > [ 47.226333] > [ 47.226333] other info that might help us debug this: > [ 47.233625] Possible unsafe locking scenario: > [ 47.233625] > [ 47.240242] CPU0 > [ 47.242974] ---- > [ 47.245709] lock(&video->req_lock); > [ 47.249802] lock(&video->req_lock); > [ 47.253897] > [ 47.253897] *** DEADLOCK *** > [ 47.253897] > [ 47.260511] May be due to missing lock nesting notation > [ 47.260511]
Hi Laurent On Thu, Oct 05, 2023 at 11:23:27AM +0300, Laurent Pinchart wrote: >On Tue, Oct 03, 2023 at 01:09:06PM +0200, Michael Grzeschik wrote: >> On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: >> > We have been seeing two main stability issues that uvc gadget driver >> > runs into when stopping streams: >> > 1. Attempting to queue usb_requests to a disabled usb_ep >> > 2. use-after-free issue for inflight usb_requests >> > >> > The three patches below fix the two issues above. Patch 1/3 fixes the >> > first issue, and Patch 2/3 and 3/3 fix the second issue. >> > >> > Avichal Rakesh (3): >> > usb: gadget: uvc: prevent use of disabled endpoint >> > usb: gadget: uvc: Allocate uvc_requests one at a time >> > usb: gadget: uvc: Fix use-after-free for inflight usb_requests >> > >> > drivers/usb/gadget/function/f_uvc.c | 11 +- >> > drivers/usb/gadget/function/f_uvc.h | 2 +- >> > drivers/usb/gadget/function/uvc.h | 6 +- >> > drivers/usb/gadget/function/uvc_v4l2.c | 21 ++- >> > drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- >> > 5 files changed, 164 insertions(+), 65 deletions(-) >> >> These patches are not applying on gregkh/usb-testing since >> Greg did take my patches first. I have already rebased them. > >I think they got merged too soon :-( We could fix things on top, but >there's very little time to do so for v6.7. Agreed. I was jumping from one workaround to another one, since this is not easy to fix in a proper way. And still after this long discussion with Avichal I don't think we are there yet. So far the first two patches from Avichal look legit. But the overall Use-After-Free fix is yet to be done properly. The "abondoned" method he suggested is really bad to follow and will add too much complexity and will be hard to debug. IMHO it should be possible to introduce two cleanup pathes. One path would be in the uvc_cleanup_requests that will cleanup the requests that are actually not used in the controller and are registered in the req_free list. The second path would be the complete functions that are being run from the controller and will ensure that the cleanup will really free the requests from the controller after they were consumed. What do you think? Regards, Michael >> In the updated version I the stack runs into the >> following error, when enabling lockdep. Could you >> try your version with lockdep enabled? >> >> [ 41.278520] configfs-gadget.vz gadget.0: uvc: reset UVC >> [ 47.156261] configfs-gadget.vz gadget.0: uvc: uvc_function_set_alt(2, 0) >> [ 47.169177] >> [ 47.170903] ============================================ >> [ 47.176857] WARNING: possible recursive locking detected >> [ 47.182798] 6.5.0-20230919-1+ #19 Tainted: G C >> [ 47.189323] -------------------------------------------- >> [ 47.195256] vzuvcd/412 is trying to acquire lock: >> [ 47.200511] ffffff8009560928 (&video->req_lock){....}-{3:3}, at: uvc_video_complete+0x44/0x2e0 >> [ 47.210172] >> [ 47.210172] but task is already holding lock: >> [ 47.216687] ffffff8009560928 (&video->req_lock){....}-{3:3}, at: uvcg_video_enable+0x2d0/0x5c0 >> [ 47.226333] >> [ 47.226333] other info that might help us debug this: >> [ 47.233625] Possible unsafe locking scenario: >> [ 47.233625] >> [ 47.240242] CPU0 >> [ 47.242974] ---- >> [ 47.245709] lock(&video->req_lock); >> [ 47.249802] lock(&video->req_lock); >> [ 47.253897] >> [ 47.253897] *** DEADLOCK *** >> [ 47.253897] >> [ 47.260511] May be due to missing lock nesting notation >> [ 47.260511] > >-- >Regards, > >Laurent Pinchart >
On 10/5/23 00:11, Greg Kroah-Hartman wrote: > On Tue, Oct 03, 2023 at 04:16:00PM -0700, Avichal Rakesh wrote: >> Thank you for testing the patch, Michael! >> >> On 10/3/23 04:09, Michael Grzeschik wrote: >>> Hi >>> >>> On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: >>>> We have been seeing two main stability issues that uvc gadget driver >>>> runs into when stopping streams: >>>> 1. Attempting to queue usb_requests to a disabled usb_ep >>>> 2. use-after-free issue for inflight usb_requests >>>> >>>> The three patches below fix the two issues above. Patch 1/3 fixes the >>>> first issue, and Patch 2/3 and 3/3 fix the second issue. >>>> >>>> Avichal Rakesh (3): >>>> usb: gadget: uvc: prevent use of disabled endpoint >>>> usb: gadget: uvc: Allocate uvc_requests one at a time >>>> usb: gadget: uvc: Fix use-after-free for inflight usb_requests >>>> >>>> drivers/usb/gadget/function/f_uvc.c | 11 +- >>>> drivers/usb/gadget/function/f_uvc.h | 2 +- >>>> drivers/usb/gadget/function/uvc.h | 6 +- >>>> drivers/usb/gadget/function/uvc_v4l2.c | 21 ++- >>>> drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- >>>> 5 files changed, 164 insertions(+), 65 deletions(-) >>> >>> These patches are not applying on gregkh/usb-testing since >>> Greg did take my patches first. I have already rebased them. >> >> Ah, I didn't realize Greg had picked up your changes in his tree. >> Rebased the patches in V2. > > The "v2" series here is almost impossible to follow, sorry. > > Please send it as a new thread, not as responses to the individual > commits, how am I supposed to pick them up that way? > > And make it v3 please. Sent out v3 as a new thread. Sorry about that! - Avi.
On 10/5/23 03:14, Michael Grzeschik wrote: > Hi Laurent > > On Thu, Oct 05, 2023 at 11:23:27AM +0300, Laurent Pinchart wrote: >> On Tue, Oct 03, 2023 at 01:09:06PM +0200, Michael Grzeschik wrote: >>> On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: >>> > We have been seeing two main stability issues that uvc gadget driver >>> > runs into when stopping streams: >>> > 1. Attempting to queue usb_requests to a disabled usb_ep >>> > 2. use-after-free issue for inflight usb_requests >>> > >>> > The three patches below fix the two issues above. Patch 1/3 fixes the >>> > first issue, and Patch 2/3 and 3/3 fix the second issue. >>> > >>> > Avichal Rakesh (3): >>> > usb: gadget: uvc: prevent use of disabled endpoint >>> > usb: gadget: uvc: Allocate uvc_requests one at a time >>> > usb: gadget: uvc: Fix use-after-free for inflight usb_requests >>> > >>> > drivers/usb/gadget/function/f_uvc.c | 11 +- >>> > drivers/usb/gadget/function/f_uvc.h | 2 +- >>> > drivers/usb/gadget/function/uvc.h | 6 +- >>> > drivers/usb/gadget/function/uvc_v4l2.c | 21 ++- >>> > drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- >>> > 5 files changed, 164 insertions(+), 65 deletions(-) >>> >>> These patches are not applying on gregkh/usb-testing since >>> Greg did take my patches first. I have already rebased them. >> >> I think they got merged too soon :-( We could fix things on top, but >> there's very little time to do so for v6.7. > > Agreed. I was jumping from one workaround to another one, since this > is not easy to fix in a proper way. And still after this long discussion > with Avichal I don't think we are there yet. > > > So far the first two patches from Avichal look legit. But the overall > Use-After-Free fix is yet to be done properly. > > The "abondoned" method he suggested is really bad to follow and will > add too much complexity and will be hard to debug. > > IMHO it should be possible to introduce two cleanup pathes. > > One path would be in the uvc_cleanup_requests that will cleanup the > requests that are actually not used in the controller and are registered > in the req_free list. > > The second path would be the complete functions that are being run > from the controller and will ensure that the cleanup will really free > the requests from the controller after they were consumed. > > What do you think? I am not sure I follow. Patch 3/3 does exactly what you say here. There are two cleanup paths: 1. uvcg_video_disable cleans up only the requests in req_free, and 2. complete handler cleans up the in-flight requests. The "abandoned" flag is simply to let the completion handler know which requests to clean up and which ones to re-queue back to the gadget driver. The other "complications" are around making sure we can trust the values in an inherently racey situation. The reasoning can admittedly be difficult to follow at a glance, which incidentally is why I went with a simple to prove timed wait in the past (https://lore.kernel.org/20230912041910.726442-3-arakesh@google.com). I am not suggesting we go back to a timed wait, but please do look at the patch and let me know which parts don't make sense, or are difficult to understand. We can add more documentation about our assumptions there, or if you have a way to do this that you think is simpler to reason about, then please let me know and I'll be more than happy to use that! Regards, Avi.
Hi Avichal, On Thu, Oct 05, 2023 at 11:30:32AM -0700, Avichal Rakesh wrote: >On 10/5/23 03:14, Michael Grzeschik wrote: >> On Thu, Oct 05, 2023 at 11:23:27AM +0300, Laurent Pinchart wrote: >>> On Tue, Oct 03, 2023 at 01:09:06PM +0200, Michael Grzeschik wrote: >>>> On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: >>>> > We have been seeing two main stability issues that uvc gadget driver >>>> > runs into when stopping streams: >>>> > 1. Attempting to queue usb_requests to a disabled usb_ep >>>> > 2. use-after-free issue for inflight usb_requests >>>> > >>>> > The three patches below fix the two issues above. Patch 1/3 fixes the >>>> > first issue, and Patch 2/3 and 3/3 fix the second issue. >>>> > >>>> > Avichal Rakesh (3): >>>> > usb: gadget: uvc: prevent use of disabled endpoint >>>> > usb: gadget: uvc: Allocate uvc_requests one at a time >>>> > usb: gadget: uvc: Fix use-after-free for inflight usb_requests >>>> > >>>> > drivers/usb/gadget/function/f_uvc.c | 11 +- >>>> > drivers/usb/gadget/function/f_uvc.h | 2 +- >>>> > drivers/usb/gadget/function/uvc.h | 6 +- >>>> > drivers/usb/gadget/function/uvc_v4l2.c | 21 ++- >>>> > drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- >>>> > 5 files changed, 164 insertions(+), 65 deletions(-) >>>> >>>> These patches are not applying on gregkh/usb-testing since >>>> Greg did take my patches first. I have already rebased them. >>> >>> I think they got merged too soon :-( We could fix things on top, but >>> there's very little time to do so for v6.7. >> >> Agreed. I was jumping from one workaround to another one, since this >> is not easy to fix in a proper way. And still after this long discussion >> with Avichal I don't think we are there yet. >> >> >> So far the first two patches from Avichal look legit. But the overall >> Use-After-Free fix is yet to be done properly. >> >> The "abondoned" method he suggested is really bad to follow and will >> add too much complexity and will be hard to debug. >> >> IMHO it should be possible to introduce two cleanup pathes. >> >> One path would be in the uvc_cleanup_requests that will cleanup the >> requests that are actually not used in the controller and are registered >> in the req_free list. >> >> The second path would be the complete functions that are being run >> from the controller and will ensure that the cleanup will really free >> the requests from the controller after they were consumed. >> >> What do you think? > >I am not sure I follow. Patch 3/3 does exactly what you say here. Yes, it was just to summ up what the latest state of the idea was, so Laurent does not read the whole thread in detail. Sorry for not being clear enough about that. >There are two cleanup paths: > 1. uvcg_video_disable cleans up only the requests in req_free, and > 2. complete handler cleans up the in-flight requests. > >The "abandoned" flag is simply to let the completion handler know >which requests to clean up and which ones to re-queue back to >the gadget driver. What I don't get is, why in the case of shutdown there needs to be something re-queued back to the gadget driver. There should not need to be any sort of barrier flag for the requests. Just the complete handler running past a barrier where it knows that the whole device is stopped. So every call on complete should then clean that exact request it is touching currently. I don't know where the extra complexity comes from. >The other "complications" are around making sure we can trust >the values in an inherently racey situation. The reasoning >can admittedly be difficult to follow at a glance, which incidentally >is why I went with a simple to prove timed wait in the past >(https://lore.kernel.org/20230912041910.726442-3-arakesh@google.com). > >I am not suggesting we go back to a timed wait, but please do look >at the patch and let me know which parts don't make sense, or are >difficult to understand. We can add more documentation about our >assumptions there, or if you have a way to do this that you >think is simpler to reason about, then please let me know and I'll >be more than happy to use that! I really try to spin my head around the idea of the is_abondoned flag you are using. Unfortunatly for now I am out to debug the issues I see with your series. So I did try these patches you send. Yes the deadlock error is gone with v3. But the linked list is still running into cases where dwc3_gadget_giveback(complete) is touching requests that are already freed. [ 61.408715] ------------[ cut here ]------------ [ 61.413897] kernel BUG at lib/list_debug.c:56! ... [ 61.590762] Call trace: [ 61.596890] __list_del_entry_valid+0xb8/0xe8 [ 61.603408] dwc3_gadget_giveback+0x3c/0x1b0 [ 61.607594] dwc3_remove_requests.part.0+0xcc/0x100 [ 61.612948] __dwc3_gadget_ep_disable+0xbc/0x1b8 [ 61.621019] dwc3_gadget_ep_disable+0x48/0x100 [ 61.627925] usb_ep_disable+0x3c/0x138 [ 61.638230] uvc_function_setup_continue+0x3c/0x60 [ 61.645040] uvc_v4l2_streamoff+0x5c/0x80 [ 61.659812] v4l_streamoff+0x40/0x60 [ 61.668950] __video_do_ioctl+0x344/0x420 [ 61.679548] video_usercopy+0x1d0/0x788 [ 61.685677] video_ioctl2+0x40/0x70 [ 61.697439] v4l2_ioctl+0x68/0xa0 [ 61.709200] __arm64_sys_ioctl+0x304/0xda0 [ 61.720768] invoke_syscall.constprop.0+0x70/0x130 Regards, Michael
On 10/5/23 15:05, Michael Grzeschik wrote: > Hi Avichal, > > On Thu, Oct 05, 2023 at 11:30:32AM -0700, Avichal Rakesh wrote: >> On 10/5/23 03:14, Michael Grzeschik wrote: >>> On Thu, Oct 05, 2023 at 11:23:27AM +0300, Laurent Pinchart wrote: >>>> On Tue, Oct 03, 2023 at 01:09:06PM +0200, Michael Grzeschik wrote: >>>>> On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: >>>>> > We have been seeing two main stability issues that uvc gadget driver >>>>> > runs into when stopping streams: >>>>> > 1. Attempting to queue usb_requests to a disabled usb_ep >>>>> > 2. use-after-free issue for inflight usb_requests >>>>> > >>>>> > The three patches below fix the two issues above. Patch 1/3 fixes the >>>>> > first issue, and Patch 2/3 and 3/3 fix the second issue. >>>>> > >>>>> > Avichal Rakesh (3): >>>>> > usb: gadget: uvc: prevent use of disabled endpoint >>>>> > usb: gadget: uvc: Allocate uvc_requests one at a time >>>>> > usb: gadget: uvc: Fix use-after-free for inflight usb_requests >>>>> > >>>>> > drivers/usb/gadget/function/f_uvc.c | 11 +- >>>>> > drivers/usb/gadget/function/f_uvc.h | 2 +- >>>>> > drivers/usb/gadget/function/uvc.h | 6 +- >>>>> > drivers/usb/gadget/function/uvc_v4l2.c | 21 ++- >>>>> > drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- >>>>> > 5 files changed, 164 insertions(+), 65 deletions(-) >>>>> >>>>> These patches are not applying on gregkh/usb-testing since >>>>> Greg did take my patches first. I have already rebased them. >>>> >>>> I think they got merged too soon :-( We could fix things on top, but >>>> there's very little time to do so for v6.7. >>> >>> Agreed. I was jumping from one workaround to another one, since this >>> is not easy to fix in a proper way. And still after this long discussion >>> with Avichal I don't think we are there yet. >>> >>> >>> So far the first two patches from Avichal look legit. But the overall >>> Use-After-Free fix is yet to be done properly. >>> >>> The "abondoned" method he suggested is really bad to follow and will >>> add too much complexity and will be hard to debug. >>> >>> IMHO it should be possible to introduce two cleanup pathes. >>> >>> One path would be in the uvc_cleanup_requests that will cleanup the >>> requests that are actually not used in the controller and are registered >>> in the req_free list. >>> >>> The second path would be the complete functions that are being run >>> from the controller and will ensure that the cleanup will really free >>> the requests from the controller after they were consumed. >>> >>> What do you think? >> >> I am not sure I follow. Patch 3/3 does exactly what you say here. > > Yes, it was just to summ up what the latest state of the idea was, > so Laurent does not read the whole thread in detail. Sorry for not > being clear enough about that. Whoops! Sorry about the misunderstanding! > >> There are two cleanup paths: >> 1. uvcg_video_disable cleans up only the requests in req_free, and >> 2. complete handler cleans up the in-flight requests. >> >> The "abandoned" flag is simply to let the completion handler know >> which requests to clean up and which ones to re-queue back to >> the gadget driver. > > What I don't get is, why in the case of shutdown there needs to > be something re-queued back to the gadget driver. There should not > need to be any sort of barrier flag for the requests. Just the > complete handler running past a barrier where it knows that the > whole device is stopped. So every call on complete should then clean > that exact request it is touching currently. > > I don't know where the extra complexity comes from. A lot of this complexity comes from assuming a back to back STREAMOFF -> STREAMON sequence is possible where the gadget driver doesn't have the time to clean up all in-flight usb_requests. However, looking through the usb gadget APIs again, and it looks like usb_ep_disable enforces that all requests will be sent back to the gadget driver before it returns. So you're right: With Patch 1/3 in place, I think we can just guard on uvc->state alone, because control requests are blocked until usb_ep_disable is finished anyway. I'll upload v4 with the "is_abandoned" flag removed and the checks simplified once I've verified the fix locally. That should also remove any bookkeeping issues that may have triggered the stack below. Regards, Avi. > >> The other "complications" are around making sure we can trust >> the values in an inherently racey situation. The reasoning >> can admittedly be difficult to follow at a glance, which incidentally >> is why I went with a simple to prove timed wait in the past >> (https://lore.kernel.org/20230912041910.726442-3-arakesh@google.com). >> >> I am not suggesting we go back to a timed wait, but please do look >> at the patch and let me know which parts don't make sense, or are >> difficult to understand. We can add more documentation about our >> assumptions there, or if you have a way to do this that you >> think is simpler to reason about, then please let me know and I'll >> be more than happy to use that! > > I really try to spin my head around the idea of the is_abondoned flag > you are using. Unfortunatly for now I am out to debug the issues I see > with your series. > > So I did try these patches you send. Yes the deadlock error is gone with > v3. But the linked list is still running into cases where > dwc3_gadget_giveback(complete) is touching requests that are already > freed. > > [ 61.408715] ------------[ cut here ]------------ > [ 61.413897] kernel BUG at lib/list_debug.c:56! > ... > [ 61.590762] Call trace: > [ 61.596890] __list_del_entry_valid+0xb8/0xe8 > [ 61.603408] dwc3_gadget_giveback+0x3c/0x1b0 > [ 61.607594] dwc3_remove_requests.part.0+0xcc/0x100 > [ 61.612948] __dwc3_gadget_ep_disable+0xbc/0x1b8 > [ 61.621019] dwc3_gadget_ep_disable+0x48/0x100 > [ 61.627925] usb_ep_disable+0x3c/0x138 > [ 61.638230] uvc_function_setup_continue+0x3c/0x60 > [ 61.645040] uvc_v4l2_streamoff+0x5c/0x80 > [ 61.659812] v4l_streamoff+0x40/0x60 > [ 61.668950] __video_do_ioctl+0x344/0x420 > [ 61.679548] video_usercopy+0x1d0/0x788 > [ 61.685677] video_ioctl2+0x40/0x70 > [ 61.697439] v4l2_ioctl+0x68/0xa0 > [ 61.709200] __arm64_sys_ioctl+0x304/0xda0 > [ 61.720768] invoke_syscall.constprop.0+0x70/0x130 >
On Fri, Oct 06, 2023 at 10:00:11AM -0700, Avichal Rakesh wrote: > > >On 10/5/23 15:05, Michael Grzeschik wrote: >> Hi Avichal, >> >> On Thu, Oct 05, 2023 at 11:30:32AM -0700, Avichal Rakesh wrote: >>> On 10/5/23 03:14, Michael Grzeschik wrote: >>>> On Thu, Oct 05, 2023 at 11:23:27AM +0300, Laurent Pinchart wrote: >>>>> On Tue, Oct 03, 2023 at 01:09:06PM +0200, Michael Grzeschik wrote: >>>>>> On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: >>>>>> > We have been seeing two main stability issues that uvc gadget driver >>>>>> > runs into when stopping streams: >>>>>> > 1. Attempting to queue usb_requests to a disabled usb_ep >>>>>> > 2. use-after-free issue for inflight usb_requests >>>>>> > >>>>>> > The three patches below fix the two issues above. Patch 1/3 fixes the >>>>>> > first issue, and Patch 2/3 and 3/3 fix the second issue. >>>>>> > >>>>>> > Avichal Rakesh (3): >>>>>> > usb: gadget: uvc: prevent use of disabled endpoint >>>>>> > usb: gadget: uvc: Allocate uvc_requests one at a time >>>>>> > usb: gadget: uvc: Fix use-after-free for inflight usb_requests >>>>>> > >>>>>> > drivers/usb/gadget/function/f_uvc.c | 11 +- >>>>>> > drivers/usb/gadget/function/f_uvc.h | 2 +- >>>>>> > drivers/usb/gadget/function/uvc.h | 6 +- >>>>>> > drivers/usb/gadget/function/uvc_v4l2.c | 21 ++- >>>>>> > drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- >>>>>> > 5 files changed, 164 insertions(+), 65 deletions(-) >>>>>> >>>>>> These patches are not applying on gregkh/usb-testing since >>>>>> Greg did take my patches first. I have already rebased them. >>>>> >>>>> I think they got merged too soon :-( We could fix things on top, but >>>>> there's very little time to do so for v6.7. >>>> >>>> Agreed. I was jumping from one workaround to another one, since this >>>> is not easy to fix in a proper way. And still after this long discussion >>>> with Avichal I don't think we are there yet. >>>> >>>> >>>> So far the first two patches from Avichal look legit. But the overall >>>> Use-After-Free fix is yet to be done properly. >>>> >>>> The "abondoned" method he suggested is really bad to follow and will >>>> add too much complexity and will be hard to debug. >>>> >>>> IMHO it should be possible to introduce two cleanup pathes. >>>> >>>> One path would be in the uvc_cleanup_requests that will cleanup the >>>> requests that are actually not used in the controller and are registered >>>> in the req_free list. >>>> >>>> The second path would be the complete functions that are being run >>>> from the controller and will ensure that the cleanup will really free >>>> the requests from the controller after they were consumed. >>>> >>>> What do you think? >>> >>> I am not sure I follow. Patch 3/3 does exactly what you say here. >> >> Yes, it was just to summ up what the latest state of the idea was, >> so Laurent does not read the whole thread in detail. Sorry for not >> being clear enough about that. > >Whoops! Sorry about the misunderstanding! > >> >>> There are two cleanup paths: >>> 1. uvcg_video_disable cleans up only the requests in req_free, and >>> 2. complete handler cleans up the in-flight requests. >>> >>> The "abandoned" flag is simply to let the completion handler know >>> which requests to clean up and which ones to re-queue back to >>> the gadget driver. >> >> What I don't get is, why in the case of shutdown there needs to >> be something re-queued back to the gadget driver. There should not >> need to be any sort of barrier flag for the requests. Just the >> complete handler running past a barrier where it knows that the >> whole device is stopped. So every call on complete should then clean >> that exact request it is touching currently. >> >> I don't know where the extra complexity comes from. > >A lot of this complexity comes from assuming a back to back >STREAMOFF -> STREAMON sequence is possible where the gadget driver >doesn't have the time to clean up all in-flight usb_requests. >However, looking through the usb gadget APIs again, and it >looks like usb_ep_disable enforces that all requests will >be sent back to the gadget driver before it returns. Great! >So you're right: >With Patch 1/3 in place, I think we can just guard on uvc->state >alone, because control requests are blocked until usb_ep_disable >is finished anyway. I'll upload v4 with the "is_abandoned" >flag removed and the checks simplified once I've verified the >fix locally. > >That should also remove any bookkeeping issues that may have >triggered the stack below. I am currious if we should handle -ECONNRESET and -ESHUTDOWN in more detail in the completion handler and make sure that the request will not be added into the req_free list from there. Will review your v4 then. >>> The other "complications" are around making sure we can trust >>> the values in an inherently racey situation. The reasoning >>> can admittedly be difficult to follow at a glance, which incidentally >>> is why I went with a simple to prove timed wait in the past >>> (https://lore.kernel.org/20230912041910.726442-3-arakesh@google.com). >>> >>> I am not suggesting we go back to a timed wait, but please do look >>> at the patch and let me know which parts don't make sense, or are >>> difficult to understand. We can add more documentation about our >>> assumptions there, or if you have a way to do this that you >>> think is simpler to reason about, then please let me know and I'll >>> be more than happy to use that! >> >> I really try to spin my head around the idea of the is_abondoned flag >> you are using. Unfortunatly for now I am out to debug the issues I see >> with your series. >> >> So I did try these patches you send. Yes the deadlock error is gone with >> v3. But the linked list is still running into cases where >> dwc3_gadget_giveback(complete) is touching requests that are already >> freed. >> >> [ 61.408715] ------------[ cut here ]------------ >> [ 61.413897] kernel BUG at lib/list_debug.c:56! >> ... >> [ 61.590762] Call trace: >> [ 61.596890] __list_del_entry_valid+0xb8/0xe8 >> [ 61.603408] dwc3_gadget_giveback+0x3c/0x1b0 >> [ 61.607594] dwc3_remove_requests.part.0+0xcc/0x100 >> [ 61.612948] __dwc3_gadget_ep_disable+0xbc/0x1b8 >> [ 61.621019] dwc3_gadget_ep_disable+0x48/0x100 >> [ 61.627925] usb_ep_disable+0x3c/0x138 >> [ 61.638230] uvc_function_setup_continue+0x3c/0x60 >> [ 61.645040] uvc_v4l2_streamoff+0x5c/0x80 >> [ 61.659812] v4l_streamoff+0x40/0x60 >> [ 61.668950] __video_do_ioctl+0x344/0x420 >> [ 61.679548] video_usercopy+0x1d0/0x788 >> [ 61.685677] video_ioctl2+0x40/0x70 >> [ 61.697439] v4l2_ioctl+0x68/0xa0 >> [ 61.709200] __arm64_sys_ioctl+0x304/0xda0 >> [ 61.720768] invoke_syscall.constprop.0+0x70/0x130 >> > >
On 10/6/23 15:53, Michael Grzeschik wrote: > On Fri, Oct 06, 2023 at 10:00:11AM -0700, Avichal Rakesh wrote: >> >> >> On 10/5/23 15:05, Michael Grzeschik wrote: >>> Hi Avichal, >>> >>> On Thu, Oct 05, 2023 at 11:30:32AM -0700, Avichal Rakesh wrote: >>>> On 10/5/23 03:14, Michael Grzeschik wrote: >>>>> On Thu, Oct 05, 2023 at 11:23:27AM +0300, Laurent Pinchart wrote: >>>>>> On Tue, Oct 03, 2023 at 01:09:06PM +0200, Michael Grzeschik wrote: >>>>>>> On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: >>>>>>> > We have been seeing two main stability issues that uvc gadget driver >>>>>>> > runs into when stopping streams: >>>>>>> > 1. Attempting to queue usb_requests to a disabled usb_ep >>>>>>> > 2. use-after-free issue for inflight usb_requests >>>>>>> > >>>>>>> > The three patches below fix the two issues above. Patch 1/3 fixes the >>>>>>> > first issue, and Patch 2/3 and 3/3 fix the second issue. >>>>>>> > >>>>>>> > Avichal Rakesh (3): >>>>>>> > usb: gadget: uvc: prevent use of disabled endpoint >>>>>>> > usb: gadget: uvc: Allocate uvc_requests one at a time >>>>>>> > usb: gadget: uvc: Fix use-after-free for inflight usb_requests >>>>>>> > >>>>>>> > drivers/usb/gadget/function/f_uvc.c | 11 +- >>>>>>> > drivers/usb/gadget/function/f_uvc.h | 2 +- >>>>>>> > drivers/usb/gadget/function/uvc.h | 6 +- >>>>>>> > drivers/usb/gadget/function/uvc_v4l2.c | 21 ++- >>>>>>> > drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- >>>>>>> > 5 files changed, 164 insertions(+), 65 deletions(-) >>>>>>> >>>>>>> These patches are not applying on gregkh/usb-testing since >>>>>>> Greg did take my patches first. I have already rebased them. >>>>>> >>>>>> I think they got merged too soon :-( We could fix things on top, but >>>>>> there's very little time to do so for v6.7. >>>>> >>>>> Agreed. I was jumping from one workaround to another one, since this >>>>> is not easy to fix in a proper way. And still after this long discussion >>>>> with Avichal I don't think we are there yet. >>>>> >>>>> >>>>> So far the first two patches from Avichal look legit. But the overall >>>>> Use-After-Free fix is yet to be done properly. >>>>> >>>>> The "abondoned" method he suggested is really bad to follow and will >>>>> add too much complexity and will be hard to debug. >>>>> >>>>> IMHO it should be possible to introduce two cleanup pathes. >>>>> >>>>> One path would be in the uvc_cleanup_requests that will cleanup the >>>>> requests that are actually not used in the controller and are registered >>>>> in the req_free list. >>>>> >>>>> The second path would be the complete functions that are being run >>>>> from the controller and will ensure that the cleanup will really free >>>>> the requests from the controller after they were consumed. >>>>> >>>>> What do you think? >>>> >>>> I am not sure I follow. Patch 3/3 does exactly what you say here. >>> >>> Yes, it was just to summ up what the latest state of the idea was, >>> so Laurent does not read the whole thread in detail. Sorry for not >>> being clear enough about that. >> >> Whoops! Sorry about the misunderstanding! >> >>> >>>> There are two cleanup paths: >>>> 1. uvcg_video_disable cleans up only the requests in req_free, and >>>> 2. complete handler cleans up the in-flight requests. >>>> >>>> The "abandoned" flag is simply to let the completion handler know >>>> which requests to clean up and which ones to re-queue back to >>>> the gadget driver. >>> >>> What I don't get is, why in the case of shutdown there needs to >>> be something re-queued back to the gadget driver. There should not >>> need to be any sort of barrier flag for the requests. Just the >>> complete handler running past a barrier where it knows that the >>> whole device is stopped. So every call on complete should then clean >>> that exact request it is touching currently. >>> >>> I don't know where the extra complexity comes from. >> >> A lot of this complexity comes from assuming a back to back >> STREAMOFF -> STREAMON sequence is possible where the gadget driver >> doesn't have the time to clean up all in-flight usb_requests. >> However, looking through the usb gadget APIs again, and it >> looks like usb_ep_disable enforces that all requests will >> be sent back to the gadget driver before it returns. > > Great! Uhh...apologies, I will have to take this back. I've been trying to use uvc->state as the condition for when completion handler should clean up usb_requests, and I cannot figure out a way to do so cleanly. The fundamental problem with using uvc->state is that it is not protected by any locks. So there is no real way to assert that its value has not changed between reading uvc->state and acting on it. Naively we can write something like the following in the completion handler: void uvc_video_complete(...) { if (uvc->state != UVC_EVENT_STREAMING) { usb_ep_free_request(....); } else { // handle usb_request normally } } But without any locks, there are no guarantees that uvc->state didn't mutate immediately after the if condition was checked, and the complete handler is handling a request that it should've freed instead or vice-versa. This argument would hold for any logic we guard with uvc->state, making uvc->state effectively useless as a check for freeing memory. We can work around it by either 1. Locking uvc->state with some driver level lock to ensure that we can trust the value of uvc->state at least for a little while, or 2. Using some other barrier condition that is protected by another lock If we go with (1), we'd have to add a lock around every and every write to uvc->state, which isn't terrible, but would require more testing to ensure that it doesn't create any new deadlocks. For (2), with the realization that usb_ep_disable flushes all requests, we can add a barrier in uvc_video, protected by req_lock. That should simplify the logic a little bit and will hopefully be easier to reason about. I could of course be missing a simpler solution here, and am happy to be wrong. So please let me know if you have any other ideas on how to guarantee such a check. > >> So you're right: >> With Patch 1/3 in place, I think we can just guard on uvc->state >> alone, because control requests are blocked until usb_ep_disable >> is finished anyway. I'll upload v4 with the "is_abandoned" >> flag removed and the checks simplified once I've verified the >> fix locally. >> >> That should also remove any bookkeeping issues that may have >> triggered the stack below. > > I am currious if we should handle -ECONNRESET and -ESHUTDOWN in more > detail in the completion handler and make sure that the request will not > be added into the req_free list from there. > > Will review your v4 then. Appreciate the reviews, thank you! > >>>> The other "complications" are around making sure we can trust >>>> the values in an inherently racey situation. The reasoning >>>> can admittedly be difficult to follow at a glance, which incidentally >>>> is why I went with a simple to prove timed wait in the past >>>> (https://lore.kernel.org/20230912041910.726442-3-arakesh@google.com). >>>> >>>> I am not suggesting we go back to a timed wait, but please do look >>>> at the patch and let me know which parts don't make sense, or are >>>> difficult to understand. We can add more documentation about our >>>> assumptions there, or if you have a way to do this that you >>>> think is simpler to reason about, then please let me know and I'll >>>> be more than happy to use that! >>> >>> I really try to spin my head around the idea of the is_abondoned flag >>> you are using. Unfortunatly for now I am out to debug the issues I see >>> with your series. >>> >>> So I did try these patches you send. Yes the deadlock error is gone with >>> v3. But the linked list is still running into cases where >>> dwc3_gadget_giveback(complete) is touching requests that are already >>> freed. >>> >>> [ 61.408715] ------------[ cut here ]------------ >>> [ 61.413897] kernel BUG at lib/list_debug.c:56! >>> ... >>> [ 61.590762] Call trace: >>> [ 61.596890] __list_del_entry_valid+0xb8/0xe8 >>> [ 61.603408] dwc3_gadget_giveback+0x3c/0x1b0 >>> [ 61.607594] dwc3_remove_requests.part.0+0xcc/0x100 >>> [ 61.612948] __dwc3_gadget_ep_disable+0xbc/0x1b8 >>> [ 61.621019] dwc3_gadget_ep_disable+0x48/0x100 >>> [ 61.627925] usb_ep_disable+0x3c/0x138 >>> [ 61.638230] uvc_function_setup_continue+0x3c/0x60 >>> [ 61.645040] uvc_v4l2_streamoff+0x5c/0x80 >>> [ 61.659812] v4l_streamoff+0x40/0x60 >>> [ 61.668950] __video_do_ioctl+0x344/0x420 >>> [ 61.679548] video_usercopy+0x1d0/0x788 >>> [ 61.685677] video_ioctl2+0x40/0x70 >>> [ 61.697439] v4l2_ioctl+0x68/0xa0 >>> [ 61.709200] __arm64_sys_ioctl+0x304/0xda0 >>> [ 61.720768] invoke_syscall.constprop.0+0x70/0x130 Just to confirm: this stack was with all 3 patches applied? Want to make sure this won't happen with v4. Regards, Avi.
On Fri, Oct 06, 2023 at 04:48:19PM -0700, Avichal Rakesh wrote: >On 10/6/23 15:53, Michael Grzeschik wrote: >> On Fri, Oct 06, 2023 at 10:00:11AM -0700, Avichal Rakesh wrote: >>> >>> >>> On 10/5/23 15:05, Michael Grzeschik wrote: >>>> Hi Avichal, >>>> >>>> On Thu, Oct 05, 2023 at 11:30:32AM -0700, Avichal Rakesh wrote: >>>>> On 10/5/23 03:14, Michael Grzeschik wrote: >>>>>> On Thu, Oct 05, 2023 at 11:23:27AM +0300, Laurent Pinchart wrote: >>>>>>> On Tue, Oct 03, 2023 at 01:09:06PM +0200, Michael Grzeschik wrote: >>>>>>>> On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: >>>>>>>> > We have been seeing two main stability issues that uvc gadget driver >>>>>>>> > runs into when stopping streams: >>>>>>>> > 1. Attempting to queue usb_requests to a disabled usb_ep >>>>>>>> > 2. use-after-free issue for inflight usb_requests >>>>>>>> > >>>>>>>> > The three patches below fix the two issues above. Patch 1/3 fixes the >>>>>>>> > first issue, and Patch 2/3 and 3/3 fix the second issue. >>>>>>>> > >>>>>>>> > Avichal Rakesh (3): >>>>>>>> > usb: gadget: uvc: prevent use of disabled endpoint >>>>>>>> > usb: gadget: uvc: Allocate uvc_requests one at a time >>>>>>>> > usb: gadget: uvc: Fix use-after-free for inflight usb_requests >>>>>>>> > >>>>>>>> > drivers/usb/gadget/function/f_uvc.c | 11 +- >>>>>>>> > drivers/usb/gadget/function/f_uvc.h | 2 +- >>>>>>>> > drivers/usb/gadget/function/uvc.h | 6 +- >>>>>>>> > drivers/usb/gadget/function/uvc_v4l2.c | 21 ++- >>>>>>>> > drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- >>>>>>>> > 5 files changed, 164 insertions(+), 65 deletions(-) >>>>>>>> >>>>>>>> These patches are not applying on gregkh/usb-testing since >>>>>>>> Greg did take my patches first. I have already rebased them. >>>>>>> >>>>>>> I think they got merged too soon :-( We could fix things on top, but >>>>>>> there's very little time to do so for v6.7. >>>>>> >>>>>> Agreed. I was jumping from one workaround to another one, since this >>>>>> is not easy to fix in a proper way. And still after this long discussion >>>>>> with Avichal I don't think we are there yet. >>>>>> >>>>>> >>>>>> So far the first two patches from Avichal look legit. But the overall >>>>>> Use-After-Free fix is yet to be done properly. >>>>>> >>>>>> The "abondoned" method he suggested is really bad to follow and will >>>>>> add too much complexity and will be hard to debug. >>>>>> >>>>>> IMHO it should be possible to introduce two cleanup pathes. >>>>>> >>>>>> One path would be in the uvc_cleanup_requests that will cleanup the >>>>>> requests that are actually not used in the controller and are registered >>>>>> in the req_free list. >>>>>> >>>>>> The second path would be the complete functions that are being run >>>>>> from the controller and will ensure that the cleanup will really free >>>>>> the requests from the controller after they were consumed. >>>>>> >>>>>> What do you think? >>>>> >>>>> I am not sure I follow. Patch 3/3 does exactly what you say here. >>>> >>>> Yes, it was just to summ up what the latest state of the idea was, >>>> so Laurent does not read the whole thread in detail. Sorry for not >>>> being clear enough about that. >>> >>> Whoops! Sorry about the misunderstanding! >>> >>>> >>>>> There are two cleanup paths: >>>>> 1. uvcg_video_disable cleans up only the requests in req_free, and >>>>> 2. complete handler cleans up the in-flight requests. >>>>> >>>>> The "abandoned" flag is simply to let the completion handler know >>>>> which requests to clean up and which ones to re-queue back to >>>>> the gadget driver. >>>> >>>> What I don't get is, why in the case of shutdown there needs to >>>> be something re-queued back to the gadget driver. There should not >>>> need to be any sort of barrier flag for the requests. Just the >>>> complete handler running past a barrier where it knows that the >>>> whole device is stopped. So every call on complete should then clean >>>> that exact request it is touching currently. >>>> >>>> I don't know where the extra complexity comes from. >>> >>> A lot of this complexity comes from assuming a back to back >>> STREAMOFF -> STREAMON sequence is possible where the gadget driver >>> doesn't have the time to clean up all in-flight usb_requests. >>> However, looking through the usb gadget APIs again, and it >>> looks like usb_ep_disable enforces that all requests will >>> be sent back to the gadget driver before it returns. >> >> Great! > >Uhh...apologies, I will have to take this back. I've been >trying to use uvc->state as the condition for when completion >handler should clean up usb_requests, and I cannot figure >out a way to do so cleanly. > >The fundamental problem with using uvc->state is that it is >not protected by any locks. So there is no real way to >assert that its value has not changed between reading >uvc->state and acting on it. > >Naively we can write something like the following in the >completion handler: > >void uvc_video_complete(...) { > if (uvc->state != UVC_EVENT_STREAMING) { > usb_ep_free_request(....); > } else { > // handle usb_request normally > } >} > >But without any locks, there are no guarantees that >uvc->state didn't mutate immediately after the if >condition was checked, and the complete handler is >handling a request that it should've freed instead >or vice-versa. This argument would hold for any logic >we guard with uvc->state, making uvc->state effectively >useless as a check for freeing memory. Yes, this makes total sense. Since the above condition was also part of the wait_event patch you created in the first place, I bet this issue was there aswell and was probably causing the issues I saw while testing it. >We can work around it by either >1. Locking uvc->state with some driver level lock > to ensure that we can trust the value of uvc->state > at least for a little while, or >2. Using some other barrier condition that is protected by > another lock > >If we go with (1), we'd have to add a lock around every >and every write to uvc->state, which isn't terrible, but >would require more testing to ensure that it doesn't >create any new deadlocks. > >For (2), with the realization that usb_ep_disable flushes >all requests, we can add a barrier in uvc_video, protected by >req_lock. That should simplify the logic a little bit and >will hopefully be easier to reason about. > >I could of course be missing a simpler solution here, >and am happy to be wrong. So please let me know if you >have any other ideas on how to guarantee such a check. For now, I have no better Idea. Idea (2) sounds like a good compromise. But I will have to review that code to really judge. Thanks for the work! Michael
On 10/8/23 12:48, Michael Grzeschik wrote: > On Fri, Oct 06, 2023 at 04:48:19PM -0700, Avichal Rakesh wrote: >> On 10/6/23 15:53, Michael Grzeschik wrote: >>> On Fri, Oct 06, 2023 at 10:00:11AM -0700, Avichal Rakesh wrote: >>>> >>>> >>>> On 10/5/23 15:05, Michael Grzeschik wrote: >>>>> Hi Avichal, >>>>> >>>>> On Thu, Oct 05, 2023 at 11:30:32AM -0700, Avichal Rakesh wrote: >>>>>> On 10/5/23 03:14, Michael Grzeschik wrote: >>>>> <snip> >>>>> I don't know where the extra complexity comes from. >>>> >>>> A lot of this complexity comes from assuming a back to back >>>> STREAMOFF -> STREAMON sequence is possible where the gadget driver >>>> doesn't have the time to clean up all in-flight usb_requests. >>>> However, looking through the usb gadget APIs again, and it >>>> looks like usb_ep_disable enforces that all requests will >>>> be sent back to the gadget driver before it returns. >>> >>> Great! >> >> Uhh...apologies, I will have to take this back. I've been >> trying to use uvc->state as the condition for when completion >> handler should clean up usb_requests, and I cannot figure >> out a way to do so cleanly. >> >> The fundamental problem with using uvc->state is that it is >> not protected by any locks. So there is no real way to >> assert that its value has not changed between reading >> uvc->state and acting on it. >> >> Naively we can write something like the following in the >> completion handler: >> >> void uvc_video_complete(...) { >> if (uvc->state != UVC_EVENT_STREAMING) { >> usb_ep_free_request(....); >> } else { >> // handle usb_request normally >> } >> } >> >> But without any locks, there are no guarantees that >> uvc->state didn't mutate immediately after the if >> condition was checked, and the complete handler is >> handling a request that it should've freed instead >> or vice-versa. This argument would hold for any logic >> we guard with uvc->state, making uvc->state effectively >> useless as a check for freeing memory. > > Yes, this makes total sense. Since the above condition was also part of > the wait_event patch you created in the first place, I bet this issue > was there aswell and was probably causing the issues I saw while testing > it> > >> We can work around it by either >> 1. Locking uvc->state with some driver level lock >> to ensure that we can trust the value of uvc->state >> at least for a little while, or >> 2. Using some other barrier condition that is protected by >> another lock >> >> If we go with (1), we'd have to add a lock around every >> and every write to uvc->state, which isn't terrible, but >> would require more testing to ensure that it doesn't >> create any new deadlocks. >> >> For (2), with the realization that usb_ep_disable flushes >> all requests, we can add a barrier in uvc_video, protected by >> req_lock. That should simplify the logic a little bit and >> will hopefully be easier to reason about. >> >> I could of course be missing a simpler solution here, >> and am happy to be wrong. So please let me know if you >> have any other ideas on how to guarantee such a check. > > For now, I have no better Idea. Idea (2) sounds like > a good compromise. But I will have to review that code > to really judge. > Sent out v4 patches with option (2). It simplifies the logic decently because we no longer have to reason about per-request consistency. uvc_video now tracks its own state of whether requests should be flowing or not based on calls to uvcg_video_enable. This state is protected, and is the source of truth for queueing usb_requests. The last bit of complexity left is around returning in-flight video buffers. AFAICT it should now be protected, and in my local testing I didn't notice any un-returned buffers, but please to take a look and let me know if your testing uncovers anything. Thanks, Avi.