diff mbox series

[v3] usb: gadget: uvc: Fix crash when encoding data for usb request

Message ID 20220331184024.23918-1-w36195@motorola.com (mailing list archive)
State Accepted
Commit 71d471e3faf90c9674cadc7605ac719e82cb7fac
Headers show
Series [v3] usb: gadget: uvc: Fix crash when encoding data for usb request | expand

Commit Message

Dan Vacura March 31, 2022, 6:40 p.m. UTC
During the uvcg_video_pump() process, if an error occurs and
uvcg_queue_cancel() is called, the buffer queue will be cleared out, but
the current marker (queue->buf_used) of the active buffer (no longer
active) is not reset. On the next iteration of uvcg_video_pump() the
stale buf_used count will be used and the logic of min((unsigned
int)len, buf->bytesused - queue->buf_used) may incorrectly calculate a
nbytes size, causing an invalid memory access.

[80802.185460][  T315] configfs-gadget gadget: uvc: VS request completed
with status -18.
[80802.185519][  T315] configfs-gadget gadget: uvc: VS request completed
with status -18.
...
uvcg_queue_cancel() is called and the queue is cleared out, but the
marker queue->buf_used is not reset.
...
[80802.262328][ T8682] Unable to handle kernel paging request at virtual
address ffffffc03af9f000
...
...
[80802.263138][ T8682] Call trace:
[80802.263146][ T8682]  __memcpy+0x12c/0x180
[80802.263155][ T8682]  uvcg_video_pump+0xcc/0x1e0
[80802.263165][ T8682]  process_one_work+0x2cc/0x568
[80802.263173][ T8682]  worker_thread+0x28c/0x518
[80802.263181][ T8682]  kthread+0x160/0x170
[80802.263188][ T8682]  ret_from_fork+0x10/0x18
[80802.263198][ T8682] Code: a8c12829 a88130cb a8c130

Fixes: d692522577c0 ("usb: gadget/uvc: Port UVC webcam gadget to use videobuf2 framework")
Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Vacura <w36195@motorola.com>

---
Changes in v3:
- Cc stable
Changes in v2:
- Add Fixes tag

 drivers/usb/gadget/function/uvc_queue.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Laurent Pinchart April 19, 2022, 8:46 p.m. UTC | #1
Hi Dan,

Thank you for the patch.

On Thu, Mar 31, 2022 at 01:40:23PM -0500, Dan Vacura wrote:
> During the uvcg_video_pump() process, if an error occurs and
> uvcg_queue_cancel() is called, the buffer queue will be cleared out, but
> the current marker (queue->buf_used) of the active buffer (no longer
> active) is not reset. On the next iteration of uvcg_video_pump() the
> stale buf_used count will be used and the logic of min((unsigned
> int)len, buf->bytesused - queue->buf_used) may incorrectly calculate a
> nbytes size, causing an invalid memory access.
> 
> [80802.185460][  T315] configfs-gadget gadget: uvc: VS request completed
> with status -18.
> [80802.185519][  T315] configfs-gadget gadget: uvc: VS request completed
> with status -18.
> ...
> uvcg_queue_cancel() is called and the queue is cleared out, but the
> marker queue->buf_used is not reset.
> ...
> [80802.262328][ T8682] Unable to handle kernel paging request at virtual
> address ffffffc03af9f000
> ...
> ...
> [80802.263138][ T8682] Call trace:
> [80802.263146][ T8682]  __memcpy+0x12c/0x180
> [80802.263155][ T8682]  uvcg_video_pump+0xcc/0x1e0
> [80802.263165][ T8682]  process_one_work+0x2cc/0x568
> [80802.263173][ T8682]  worker_thread+0x28c/0x518
> [80802.263181][ T8682]  kthread+0x160/0x170
> [80802.263188][ T8682]  ret_from_fork+0x10/0x18
> [80802.263198][ T8682] Code: a8c12829 a88130cb a8c130
> 
> Fixes: d692522577c0 ("usb: gadget/uvc: Port UVC webcam gadget to use videobuf2 framework")
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Dan Vacura <w36195@motorola.com>

This indeed fixes an issue, so I think we can merge the patch, but I
also believe we need further improvements on top (of course if you would
like to improve the implementation in a v4, I won't complain :-))

As replied in v2 (sorry for the late reply), it seems that this error
can occur under normal conditions. This means we shouldn't cancel the
queue, at least when the error is intermitent (if all URBs fail that's
another story).

> ---
> Changes in v3:
> - Cc stable
> Changes in v2:
> - Add Fixes tag
> 
>  drivers/usb/gadget/function/uvc_queue.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c
> index d852ac9e47e7..2cda982f3765 100644
> --- a/drivers/usb/gadget/function/uvc_queue.c
> +++ b/drivers/usb/gadget/function/uvc_queue.c
> @@ -264,6 +264,8 @@ void uvcg_queue_cancel(struct uvc_video_queue *queue, int disconnect)
>  		buf->state = UVC_BUF_STATE_ERROR;
>  		vb2_buffer_done(&buf->buf.vb2_buf, VB2_BUF_STATE_ERROR);
>  	}

A blank line would be nice here, and I would also like a comment to
state that further improvements are required:

	/*
	 * When the queue is cancelled due to an error (either when queuing a
	 * USB request, or in the request completion handler), the we empty the
	 * irqqueue but userspace may queue futher buffers. We need to reset
	 * buf_used to 0 or uvcg_video_pump() will use an incorrect stale value.
	 *
	 * TODO: It seems that a -EXDEV error can occur in the request
	 * completion handler under normal circumstances. Don't cancel the queue
	 * in that case but recover gracefully (likely with rate-limiting, to
	 * still cancel the queue if errors occur too often).
	 */

We likely need to differentiate between -EXDEV and other errors in
uvc_video_complete(), as I'd like to be conservative and cancel the
queue for unknown errors. We also need to improve the queue cancellation
implementation so that userspace gets an error when queuing further
buffers.

> +	queue->buf_used = 0;
> +
>  	/* This must be protected by the irqlock spinlock to avoid race
>  	 * conditions between uvc_queue_buffer and the disconnection event that
>  	 * could result in an interruptible wait in uvc_dequeue_buffer. Do not
Dan Vacura April 20, 2022, 9:27 p.m. UTC | #2
Hi Laurent,

Thanks for the input.

On Tue, Apr 19, 2022 at 11:46:37PM +0300, Laurent Pinchart wrote:
> 
> This indeed fixes an issue, so I think we can merge the patch, but I
> also believe we need further improvements on top (of course if you would
> like to improve the implementation in a v4, I won't complain :-))

It looks like Greg has already accepted the change and it's in
linux-next. We can discuss here how to better handle these -EXDEV errors
for future improvements, as it seems like it's been an issue in the past
as well:
https://www.mail-archive.com/linux-usb@vger.kernel.org/msg105615.html

> 
> As replied in v2 (sorry for the late reply), it seems that this error
> can occur under normal conditions. This means we shouldn't cancel the
> queue, at least when the error is intermitent (if all URBs fail that's
> another story).

My impression was that canceling the queue was still necessary as we may
be in progress for the current frame. Perhaps we don't need to flush all
the frames from the queue, but at a minimum we need to reset the
buf_used value.

> 
> 
> We likely need to differentiate between -EXDEV and other errors in
> uvc_video_complete(), as I'd like to be conservative and cancel the
> queue for unknown errors. We also need to improve the queue cancellation
> implementation so that userspace gets an error when queuing further
> buffers.

We already feedback to userspace the error, via the state of
vb2_buffer_done(). When userspace dequeues the buffer it can check if
v4l2_buffer.flags has V4L2_BUF_FLAG_ERROR to see if things failed, then
decide what to do like re-queue that frame. However, this appears to not
always occur since I believe the pump thread is independent of the
uvc_video_complete() callback. As a result, the complete callback of the
failed URB may be associated with a buffer that was already released
back to the userspace client. In this case, I don't know if there's
anything to be done, since a new buffer and subsequent URBs might
already be queued up. You suggested an error on a subsequent buffer
queue, but I don't know how helpful that'd be at this point, perhaps in
the scenario that all URBs are failing?

> 
> -- 
> Regards,
> 
> Laurent Pinchart

Appreciate the feedback,

Dan
Laurent Pinchart April 24, 2022, 10:58 p.m. UTC | #3
Hi Dan,

On Wed, Apr 20, 2022 at 04:27:27PM -0500, Dan Vacura wrote:
> On Tue, Apr 19, 2022 at 11:46:37PM +0300, Laurent Pinchart wrote:
> > 
> > This indeed fixes an issue, so I think we can merge the patch, but I
> > also believe we need further improvements on top (of course if you would
> > like to improve the implementation in a v4, I won't complain :-))
> 
> It looks like Greg has already accepted the change and it's in
> linux-next. We can discuss here how to better handle these -EXDEV errors
> for future improvements, as it seems like it's been an issue in the past
> as well:
> https://www.mail-archive.com/linux-usb@vger.kernel.org/msg105615.html
> 
> > As replied in v2 (sorry for the late reply), it seems that this error
> > can occur under normal conditions. This means we shouldn't cancel the
> > queue, at least when the error is intermitent (if all URBs fail that's
> > another story).
> 
> My impression was that canceling the queue was still necessary as we may
> be in progress for the current frame. Perhaps we don't need to flush all
> the frames from the queue, but at a minimum we need to reset the
> buf_used value.

I think we have three classes of errors:

- "Packet-level" errors, resulting in either data loss or erroneous data
  being transferred to the host for one (or more) packets in a frame.
  When such errors occur, we should probably notify the application (on
  the gadget side), but we can continue sending the rest of the frame.

- "Frame-level" errors, resulting in errors in the rest of the frame.
  When such an error occurs, we should notify the application, and stop
  sending data for the current frame, moving to the next frame.

- "Stream-level" errors, resulting in errors in all subsequent frames.
  When such an error occurs, we should notify the application and stop
  sending data until the application takes corrective measures.

I'm not sure if packet-level errors make sense, if data is lost, maybe
we would be better off just cancelling the current frame and moving to
the next one.

For both packet-level errors and frame-level errors, the buffer should
be marked as erroneous to notify the application, but there should be no
need to cancel the queue and drop all queued buffers. We can just move
to the next buffer.

For stream-level errors, I would cancel the queue, and additionally
prevent new buffers from being queued until the application stops and
restarts the stream.

Finally, which class an error belongs to may not be an intrinsic
property of the error itself, packet-level or frame-level errors that
occur too often may be worth cancelling the queue (I'm not sure how to
quantify "too often" though).

Does this make sense ?

> > We likely need to differentiate between -EXDEV and other errors in
> > uvc_video_complete(), as I'd like to be conservative and cancel the
> > queue for unknown errors. We also need to improve the queue cancellation
> > implementation so that userspace gets an error when queuing further
> > buffers.
> 
> We already feedback to userspace the error, via the state of
> vb2_buffer_done(). When userspace dequeues the buffer it can check if
> v4l2_buffer.flags has V4L2_BUF_FLAG_ERROR to see if things failed, then
> decide what to do like re-queue that frame. However, this appears to not
> always occur since I believe the pump thread is independent of the
> uvc_video_complete() callback. As a result, the complete callback of the
> failed URB may be associated with a buffer that was already released
> back to the userspace client.

Good point. That would only be the case for errors in the last
request(s) for a frame, right ?

> In this case, I don't know if there's
> anything to be done, since a new buffer and subsequent URBs might
> already be queued up. You suggested an error on a subsequent buffer
> queue, but I don't know how helpful that'd be at this point, perhaps in
> the scenario that all URBs are failing?

Should we delay sending the buffer back to userspace until all the
requests for the buffer have completed ?
Dan Vacura April 29, 2022, 6:39 p.m. UTC | #4
Hi Laurent,

ccing Michael for comments about returning v4l2 bufs too early.

On Mon, Apr 25, 2022 at 01:58:52AM +0300, Laurent Pinchart wrote:
> Hi Dan,
> 
> On Wed, Apr 20, 2022 at 04:27:27PM -0500, Dan Vacura wrote:
> > On Tue, Apr 19, 2022 at 11:46:37PM +0300, Laurent Pinchart wrote:
> > > 
> > > This indeed fixes an issue, so I think we can merge the patch, but I
> > > also believe we need further improvements on top (of course if you would
> > > like to improve the implementation in a v4, I won't complain :-))
> > 
> > It looks like Greg has already accepted the change and it's in
> > linux-next. We can discuss here how to better handle these -EXDEV errors
> > for future improvements, as it seems like it's been an issue in the past
> > as well:
> > https://www.mail-archive.com/linux-usb@vger.kernel.org/msg105615.html
> > 
> > > As replied in v2 (sorry for the late reply), it seems that this error
> > > can occur under normal conditions. This means we shouldn't cancel the
> > > queue, at least when the error is intermitent (if all URBs fail that's
> > > another story).
> > 
> > My impression was that canceling the queue was still necessary as we may
> > be in progress for the current frame. Perhaps we don't need to flush all
> > the frames from the queue, but at a minimum we need to reset the
> > buf_used value.
> 
> I think we have three classes of errors:
> 
> - "Packet-level" errors, resulting in either data loss or erroneous data
>   being transferred to the host for one (or more) packets in a frame.
>   When such errors occur, we should probably notify the application (on
>   the gadget side), but we can continue sending the rest of the frame.
> 
> - "Frame-level" errors, resulting in errors in the rest of the frame.
>   When such an error occurs, we should notify the application, and stop
>   sending data for the current frame, moving to the next frame.
> 
> - "Stream-level" errors, resulting in errors in all subsequent frames.
>   When such an error occurs, we should notify the application and stop
>   sending data until the application takes corrective measures.
> 
> I'm not sure if packet-level errors make sense, if data is lost, maybe
> we would be better off just cancelling the current frame and moving to
> the next one.
> 
> For both packet-level errors and frame-level errors, the buffer should
> be marked as erroneous to notify the application, but there should be no
> need to cancel the queue and drop all queued buffers. We can just move
> to the next buffer.
> 
> For stream-level errors, I would cancel the queue, and additionally
> prevent new buffers from being queued until the application stops and
> restarts the stream.
> 
> Finally, which class an error belongs to may not be an intrinsic
> property of the error itself, packet-level or frame-level errors that
> occur too often may be worth cancelling the queue (I'm not sure how to
> quantify "too often" though).
> 
> Does this make sense ?

Yes, this makes sense, but I'm not sure if the USB controllers send back
that info and/or if it's all standardized. For example in the dwc3
controller I see the following status values returned: -EPIPE,
-ECONNRESET, -ESHUTDOWN, and -EXDEV; whereas in the musb controller
doesn't appear to return -EXDEV. 

> 
> > > We likely need to differentiate between -EXDEV and other errors in
> > > uvc_video_complete(), as I'd like to be conservative and cancel the
> > > queue for unknown errors. We also need to improve the queue cancellation
> > > implementation so that userspace gets an error when queuing further
> > > buffers.
> > 
> > We already feedback to userspace the error, via the state of
> > vb2_buffer_done(). When userspace dequeues the buffer it can check if
> > v4l2_buffer.flags has V4L2_BUF_FLAG_ERROR to see if things failed, then
> > decide what to do like re-queue that frame. However, this appears to not
> > always occur since I believe the pump thread is independent of the
> > uvc_video_complete() callback. As a result, the complete callback of the
> > failed URB may be associated with a buffer that was already released
> > back to the userspace client.
> 
> Good point. That would only be the case for errors in the last
> request(s) for a frame, right ?

From logging it seems it can be up to the last few requests that come
back, where the queue is already empty. I guess this is timing dependent
on the hw, the system scheduling, and how deep the request queue is.

> 
> > In this case, I don't know if there's
> > anything to be done, since a new buffer and subsequent URBs might
> > already be queued up. You suggested an error on a subsequent buffer
> > queue, but I don't know how helpful that'd be at this point, perhaps in
> > the scenario that all URBs are failing?
> 
> Should we delay sending the buffer back to userspace until all the
> requests for the buffer have completed ?

Yes, I had that same thought later on. We'll either need a new
pending_complete_queue to move the buffer from the current queue, or
create logic to leverage the existing uvc_buffer_state flags; like
uvcg_queue_head() looks for the first buffer with UVC_BUF_STATE_QUEUED.
When the complete call comes in we'll have to identify if the request is
the last completed one for that buffer. It looks like the UVC_STREAM_EOF
flag will be present, hopefully that's sufficient to rely on. Also,
since this is a FIFO queue, we can assume that the first buffer with
UVC_BUF_STATE_DONE can be vb2_buffer_done()'d. If we implement this type
of logic then we can probably remove this change:
https://lore.kernel.org/r/20220402232744.3622565-3-m.grzeschik@pengutronix.de
since its purpose is similar. How does that sound and do you have an
opinion on a new queue or creating logic around uvc_buffer_state flags?

> 
> -- 
> Regards,
> 
> Laurent Pinchart
diff mbox series

Patch

diff --git a/drivers/usb/gadget/function/uvc_queue.c b/drivers/usb/gadget/function/uvc_queue.c
index d852ac9e47e7..2cda982f3765 100644
--- a/drivers/usb/gadget/function/uvc_queue.c
+++ b/drivers/usb/gadget/function/uvc_queue.c
@@ -264,6 +264,8 @@  void uvcg_queue_cancel(struct uvc_video_queue *queue, int disconnect)
 		buf->state = UVC_BUF_STATE_ERROR;
 		vb2_buffer_done(&buf->buf.vb2_buf, VB2_BUF_STATE_ERROR);
 	}
+	queue->buf_used = 0;
+
 	/* This must be protected by the irqlock spinlock to avoid race
 	 * conditions between uvc_queue_buffer and the disconnection event that
 	 * could result in an interruptible wait in uvc_dequeue_buffer. Do not