diff mbox series

[v2] usb:gadget:uvc Do not use worker thread to pump usb requests

Message ID 20231026215635.2478767-1-jchowdhary@google.com (mailing list archive)
State New, archived
Headers show
Series [v2] usb:gadget:uvc Do not use worker thread to pump usb requests | expand

Commit Message

Jayant Chowdhary Oct. 26, 2023, 9:56 p.m. UTC
This patch is based on top of
https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t:

When we use an async work queue to perform the function of pumping
usb requests to the usb controller, it is possible that thread scheduling
affects at what cadence we're able to pump requests. This could mean usb
requests miss their uframes - resulting in video stream flickers on the host
device.

In this patch, we move the pumping of usb requests to
1) uvcg_video_complete() complete handler for both isoc + bulk
   endpoints. We still send 0 length requests when there is no uvc buffer
   available to encode.
2) uvc_v4l2_qbuf - only for bulk endpoints since it is not legal to send
   0 length requests.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Signed-off-by: Jayant Chowdhary <jchowdhary@google.com>
Suggested-by: Jayant Chowdhary <jchowdhary@google.com>
Suggested-by: Avichal Rakesh <arakesh@google.com>
Tested-by: Jayant Chowdhary <jchowdhary@google.com>
---
 v1->v2: Fix code style and add self Signed-off-by

 drivers/usb/gadget/function/f_uvc.c     |  4 --
 drivers/usb/gadget/function/uvc.h       |  4 +-
 drivers/usb/gadget/function/uvc_v4l2.c  |  5 +-
 drivers/usb/gadget/function/uvc_video.c | 71 ++++++++++++++++---------
 drivers/usb/gadget/function/uvc_video.h |  2 +
 5 files changed, 51 insertions(+), 35 deletions(-)

Comments

Greg Kroah-Hartman Oct. 27, 2023, 7:19 a.m. UTC | #1
On Thu, Oct 26, 2023 at 09:56:35PM +0000, Jayant Chowdhary wrote:
> This patch is based on top of
> https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t:

That doesn't work in the changelog of a patch at all, it goes below the
--- line p lease.

> 
> When we use an async work queue to perform the function of pumping
> usb requests to the usb controller, it is possible that thread scheduling
> affects at what cadence we're able to pump requests. This could mean usb
> requests miss their uframes - resulting in video stream flickers on the host
> device.
> 
> In this patch, we move the pumping of usb requests to
> 1) uvcg_video_complete() complete handler for both isoc + bulk
>    endpoints. We still send 0 length requests when there is no uvc buffer
>    available to encode.
> 2) uvc_v4l2_qbuf - only for bulk endpoints since it is not legal to send
>    0 length requests.
> 
> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
> Signed-off-by: Jayant Chowdhary <jchowdhary@google.com>
> Suggested-by: Jayant Chowdhary <jchowdhary@google.com>
> Suggested-by: Avichal Rakesh <arakesh@google.com>
> Tested-by: Jayant Chowdhary <jchowdhary@google.com>
> ---
>  v1->v2: Fix code style and add self Signed-off-by

Great, but as signed-off-by kind of implies you tested it, no need for
the tested-by now, right?  Not a big deal, and normally I'd ignore it
but I know you at least have to do one more version of this based on the
above problem...

thanks,

greg k-h
Laurent Pinchart Oct. 27, 2023, 7:51 a.m. UTC | #2
Hi Jayant,

Thank you for the patch.

On Thu, Oct 26, 2023 at 09:56:35PM +0000, Jayant Chowdhary wrote:
> This patch is based on top of
> https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t:
> 
> When we use an async work queue to perform the function of pumping
> usb requests to the usb controller, it is possible that thread scheduling
> affects at what cadence we're able to pump requests. This could mean usb
> requests miss their uframes - resulting in video stream flickers on the host
> device.
> 
> In this patch, we move the pumping of usb requests to
> 1) uvcg_video_complete() complete handler for both isoc + bulk
>    endpoints. We still send 0 length requests when there is no uvc buffer
>    available to encode.

This means you will end up copying large amounts of data in interrupt
context. The work queue was there to avoid exactly that, as it will
introduce delays that can affect other parts of the system. I think this
is a problem.

> 2) uvc_v4l2_qbuf - only for bulk endpoints since it is not legal to send
>    0 length requests.
> 
> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
> Signed-off-by: Jayant Chowdhary <jchowdhary@google.com>
> Suggested-by: Jayant Chowdhary <jchowdhary@google.com>
> Suggested-by: Avichal Rakesh <arakesh@google.com>
> Tested-by: Jayant Chowdhary <jchowdhary@google.com>
> ---
>  v1->v2: Fix code style and add self Signed-off-by
> 
>  drivers/usb/gadget/function/f_uvc.c     |  4 --
>  drivers/usb/gadget/function/uvc.h       |  4 +-
>  drivers/usb/gadget/function/uvc_v4l2.c  |  5 +-
>  drivers/usb/gadget/function/uvc_video.c | 71 ++++++++++++++++---------
>  drivers/usb/gadget/function/uvc_video.h |  2 +
>  5 files changed, 51 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
> index ae08341961eb..53cb2539486d 100644
> --- a/drivers/usb/gadget/function/f_uvc.c
> +++ b/drivers/usb/gadget/function/f_uvc.c
> @@ -959,14 +959,10 @@ static void uvc_function_unbind(struct usb_configuration *c,
>  {
>  	struct usb_composite_dev *cdev = c->cdev;
>  	struct uvc_device *uvc = to_uvc(f);
> -	struct uvc_video *video = &uvc->video;
>  	long wait_ret = 1;
>  
>  	uvcg_info(f, "%s()\n", __func__);
>  
> -	if (video->async_wq)
> -		destroy_workqueue(video->async_wq);
> -
>  	/*
>  	 * If we know we're connected via v4l2, then there should be a cleanup
>  	 * of the device from userspace either via UVC_EVENT_DISCONNECT or
> diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
> index be0d012aa244..498f344fda4b 100644
> --- a/drivers/usb/gadget/function/uvc.h
> +++ b/drivers/usb/gadget/function/uvc.h
> @@ -88,9 +88,6 @@ struct uvc_video {
>  	struct uvc_device *uvc;
>  	struct usb_ep *ep;
>  
> -	struct work_struct pump;
> -	struct workqueue_struct *async_wq;
> -
>  	/* Frame parameters */
>  	u8 bpp;
>  	u32 fcc;
> @@ -116,6 +113,7 @@ struct uvc_video {
>  	/* Context data used by the completion handler */
>  	__u32 payload_size;
>  	__u32 max_payload_size;
> +	bool is_bulk;

This should be introduced in a separate patch.

>  
>  	struct uvc_video_queue queue;
>  	unsigned int fid;
> diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
> index f4d2e24835d4..678ea6df7b5c 100644
> --- a/drivers/usb/gadget/function/uvc_v4l2.c
> +++ b/drivers/usb/gadget/function/uvc_v4l2.c
> @@ -414,10 +414,7 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
>  	ret = uvcg_queue_buffer(&video->queue, b);
>  	if (ret < 0)
>  		return ret;
> -
> -	if (uvc->state == UVC_STATE_STREAMING)
> -		queue_work(video->async_wq, &video->pump);
> -
> +	uvcg_video_pump_qbuf(video);
>  	return ret;
>  }
>  
> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
> index ab3f02054e85..0fcd8e5edbac 100644
> --- a/drivers/usb/gadget/function/uvc_video.c
> +++ b/drivers/usb/gadget/function/uvc_video.c
> @@ -24,6 +24,8 @@
>   * Video codecs
>   */
>  
> +static void uvcg_video_pump(struct uvc_video *video);
> +
>  static int
>  uvc_video_encode_header(struct uvc_video *video, struct uvc_buffer *buf,
>  		u8 *data, int len)
> @@ -329,7 +331,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
>  	 */
>  	if (video->is_enabled) {
>  		list_add_tail(&req->list, &video->req_free);
> -		queue_work(video->async_wq, &video->pump);
> +		spin_unlock_irqrestore(&video->req_lock, flags);
> +		uvcg_video_pump(video);
> +		return;
>  	} else {
>  		uvc_video_free_request(ureq, ep);
>  	}
> @@ -409,20 +413,31 @@ uvc_video_alloc_requests(struct uvc_video *video)
>   * Video streaming
>   */
>  
> +void uvcg_video_pump_qbuf(struct uvc_video *video)
> +{
> +	/*
> +	 * Only call uvcg_video_pump() from qbuf, for bulk eps since
> +	 * for isoc, the complete handler will call uvcg_video_pump()
> +	 * consistently. Calling it for isoc eps, while correct
> +	 * will increase contention for video->req_lock since the
> +	 * complete handler will be called more often.
> +	*/
> +	if (video->is_bulk)
> +		uvcg_video_pump(video);

Am I the only one to see the *major* race condition that this patch
introduces ?

> +}
> +
>  /*
>   * uvcg_video_pump - Pump video data into the USB requests
>   *
>   * This function fills the available USB requests (listed in req_free) with
>   * video data from the queued buffers.
>   */
> -static void uvcg_video_pump(struct work_struct *work)
> +static void uvcg_video_pump(struct uvc_video *video)
>  {
> -	struct uvc_video *video = container_of(work, struct uvc_video, pump);
>  	struct uvc_video_queue *queue = &video->queue;
> -	/* video->max_payload_size is only set when using bulk transfer */
> -	bool is_bulk = video->max_payload_size;
>  	struct usb_request *req = NULL;
> -	struct uvc_buffer *buf;
> +	struct uvc_request *ureq = NULL;
> +	struct uvc_buffer *buf = NULL, *last_buf = NULL;
>  	unsigned long flags;
>  	bool buf_done;
>  	int ret;
> @@ -455,7 +470,8 @@ static void uvcg_video_pump(struct work_struct *work)
>  		if (buf != NULL) {
>  			video->encode(req, video, buf);
>  			buf_done = buf->state == UVC_BUF_STATE_DONE;
> -		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) && !is_bulk) {
> +		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) &&
> +				!video->is_bulk) {
>  			/*
>  			 * No video buffer available; the queue is still connected and
>  			 * we're transferring over ISOC. Queue a 0 length request to
> @@ -500,18 +516,30 @@ static void uvcg_video_pump(struct work_struct *work)
>  			req->no_interrupt = 1;
>  		}
>  
> -		/* Queue the USB request */
> -		ret = uvcg_video_ep_queue(video, req);
>  		spin_unlock_irqrestore(&queue->irqlock, flags);
> -
> +		spin_lock_irqsave(&video->req_lock, flags);
> +		if (video->is_enabled) {
> +			/* Queue the USB request */
> +			ret = uvcg_video_ep_queue(video, req);
> +			/* Endpoint now owns the request */
> +			req = NULL;
> +			video->req_int_count++;
> +		} else {
> +			ret =  -ENODEV;
> +			ureq = req->context;
> +			last_buf = ureq->last_buf;
> +			ureq->last_buf = NULL;
> +		}
> +		spin_unlock_irqrestore(&video->req_lock, flags);
>  		if (ret < 0) {
> +			if (last_buf != NULL) {
> +				// Return the buffer to the queue in the case the
> +				// request was not queued to the ep.

Wrong comment style.

> +				uvcg_complete_buffer(&video->queue, last_buf);
> +			}
>  			uvcg_queue_cancel(queue, 0);
>  			break;
>  		}
> -
> -		/* Endpoint now owns the request */
> -		req = NULL;
> -		video->req_int_count++;
>  	}
>  
>  	if (!req)
> @@ -556,7 +584,6 @@ uvcg_video_disable(struct uvc_video *video)
>  	}
>  	spin_unlock_irqrestore(&video->req_lock, flags);
>  
> -	cancel_work_sync(&video->pump);
>  	uvcg_queue_cancel(&video->queue, 0);
>  
>  	spin_lock_irqsave(&video->req_lock, flags);
> @@ -626,14 +653,16 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
>  	if (video->max_payload_size) {
>  		video->encode = uvc_video_encode_bulk;
>  		video->payload_size = 0;
> -	} else
> +		video->is_bulk = true;
> +	} else {
>  		video->encode = video->queue.use_sg ?
>  			uvc_video_encode_isoc_sg : uvc_video_encode_isoc;
> +		video->is_bulk = false;
> +	}
>  
>  	video->req_int_count = 0;
>  
> -	queue_work(video->async_wq, &video->pump);
> -
> +	uvcg_video_pump(video);
>  	return ret;
>  }
>  
> @@ -646,12 +675,6 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
>  	INIT_LIST_HEAD(&video->ureqs);
>  	INIT_LIST_HEAD(&video->req_free);
>  	spin_lock_init(&video->req_lock);
> -	INIT_WORK(&video->pump, uvcg_video_pump);
> -
> -	/* Allocate a work queue for asynchronous video pump handler. */
> -	video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI, 0);
> -	if (!video->async_wq)
> -		return -EINVAL;
>  
>  	video->uvc = uvc;
>  	video->fcc = V4L2_PIX_FMT_YUYV;
> diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h
> index 03adeefa343b..29c6b9a2e9c3 100644
> --- a/drivers/usb/gadget/function/uvc_video.h
> +++ b/drivers/usb/gadget/function/uvc_video.h
> @@ -18,4 +18,6 @@ int uvcg_video_enable(struct uvc_video *video, int enable);
>  
>  int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc);
>  
> +void uvcg_video_pump_qbuf(struct uvc_video *video);
> +
>  #endif /* __UVC_VIDEO_H__ */
Greg Kroah-Hartman Oct. 27, 2023, 10:44 a.m. UTC | #3
On Thu, Oct 26, 2023 at 09:56:35PM +0000, Jayant Chowdhary wrote:
> This patch is based on top of
> https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t:
> 
> When we use an async work queue to perform the function of pumping
> usb requests to the usb controller, it is possible that thread scheduling
> affects at what cadence we're able to pump requests. This could mean usb
> requests miss their uframes - resulting in video stream flickers on the host
> device.
> 
> In this patch, we move the pumping of usb requests to
> 1) uvcg_video_complete() complete handler for both isoc + bulk
>    endpoints. We still send 0 length requests when there is no uvc buffer
>    available to encode.
> 2) uvc_v4l2_qbuf - only for bulk endpoints since it is not legal to send
>    0 length requests.

Usually when you have to enumerate things in a patch, that implies it
needs to be broken up into multiple changes.  Why isn't that necessary
here?

> 
> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
> Signed-off-by: Jayant Chowdhary <jchowdhary@google.com>
> Suggested-by: Jayant Chowdhary <jchowdhary@google.com>
> Suggested-by: Avichal Rakesh <arakesh@google.com>
> Tested-by: Jayant Chowdhary <jchowdhary@google.com>
> ---
>  v1->v2: Fix code style and add self Signed-off-by
> 
>  drivers/usb/gadget/function/f_uvc.c     |  4 --
>  drivers/usb/gadget/function/uvc.h       |  4 +-
>  drivers/usb/gadget/function/uvc_v4l2.c  |  5 +-
>  drivers/usb/gadget/function/uvc_video.c | 71 ++++++++++++++++---------
>  drivers/usb/gadget/function/uvc_video.h |  2 +
>  5 files changed, 51 insertions(+), 35 deletions(-)
> 
> diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
> index ae08341961eb..53cb2539486d 100644
> --- a/drivers/usb/gadget/function/f_uvc.c
> +++ b/drivers/usb/gadget/function/f_uvc.c
> @@ -959,14 +959,10 @@ static void uvc_function_unbind(struct usb_configuration *c,
>  {
>  	struct usb_composite_dev *cdev = c->cdev;
>  	struct uvc_device *uvc = to_uvc(f);
> -	struct uvc_video *video = &uvc->video;
>  	long wait_ret = 1;
>  
>  	uvcg_info(f, "%s()\n", __func__);

meta-comment, lines like this need to be deleted in the future.  Not
relevent here, but for future work if you want to add such a cleanup to
a patch series, I'd be grateful.

>  
> -	if (video->async_wq)
> -		destroy_workqueue(video->async_wq);
> -
>  	/*
>  	 * If we know we're connected via v4l2, then there should be a cleanup
>  	 * of the device from userspace either via UVC_EVENT_DISCONNECT or
> diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
> index be0d012aa244..498f344fda4b 100644
> --- a/drivers/usb/gadget/function/uvc.h
> +++ b/drivers/usb/gadget/function/uvc.h
> @@ -88,9 +88,6 @@ struct uvc_video {
>  	struct uvc_device *uvc;
>  	struct usb_ep *ep;
>  
> -	struct work_struct pump;
> -	struct workqueue_struct *async_wq;
> -
>  	/* Frame parameters */
>  	u8 bpp;
>  	u32 fcc;
> @@ -116,6 +113,7 @@ struct uvc_video {
>  	/* Context data used by the completion handler */
>  	__u32 payload_size;
>  	__u32 max_payload_size;
> +	bool is_bulk;

As Laurent said, this should be a separate patch.

>  
>  	struct uvc_video_queue queue;
>  	unsigned int fid;
> diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
> index f4d2e24835d4..678ea6df7b5c 100644
> --- a/drivers/usb/gadget/function/uvc_v4l2.c
> +++ b/drivers/usb/gadget/function/uvc_v4l2.c
> @@ -414,10 +414,7 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
>  	ret = uvcg_queue_buffer(&video->queue, b);
>  	if (ret < 0)
>  		return ret;
> -
> -	if (uvc->state == UVC_STATE_STREAMING)
> -		queue_work(video->async_wq, &video->pump);
> -
> +	uvcg_video_pump_qbuf(video);
>  	return ret;
>  }
>  
> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
> index ab3f02054e85..0fcd8e5edbac 100644
> --- a/drivers/usb/gadget/function/uvc_video.c
> +++ b/drivers/usb/gadget/function/uvc_video.c
> @@ -24,6 +24,8 @@
>   * Video codecs
>   */
>  
> +static void uvcg_video_pump(struct uvc_video *video);
> +
>  static int
>  uvc_video_encode_header(struct uvc_video *video, struct uvc_buffer *buf,
>  		u8 *data, int len)
> @@ -329,7 +331,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
>  	 */
>  	if (video->is_enabled) {
>  		list_add_tail(&req->list, &video->req_free);
> -		queue_work(video->async_wq, &video->pump);
> +		spin_unlock_irqrestore(&video->req_lock, flags);
> +		uvcg_video_pump(video);
> +		return;
>  	} else {
>  		uvc_video_free_request(ureq, ep);
>  	}
> @@ -409,20 +413,31 @@ uvc_video_alloc_requests(struct uvc_video *video)
>   * Video streaming
>   */
>  
> +void uvcg_video_pump_qbuf(struct uvc_video *video)
> +{
> +	/*
> +	 * Only call uvcg_video_pump() from qbuf, for bulk eps since
> +	 * for isoc, the complete handler will call uvcg_video_pump()
> +	 * consistently. Calling it for isoc eps, while correct
> +	 * will increase contention for video->req_lock since the
> +	 * complete handler will be called more often.
> +	*/
> +	if (video->is_bulk)
> +		uvcg_video_pump(video);
> +}
> +
>  /*
>   * uvcg_video_pump - Pump video data into the USB requests
>   *
>   * This function fills the available USB requests (listed in req_free) with
>   * video data from the queued buffers.
>   */
> -static void uvcg_video_pump(struct work_struct *work)
> +static void uvcg_video_pump(struct uvc_video *video)
>  {
> -	struct uvc_video *video = container_of(work, struct uvc_video, pump);
>  	struct uvc_video_queue *queue = &video->queue;
> -	/* video->max_payload_size is only set when using bulk transfer */
> -	bool is_bulk = video->max_payload_size;
>  	struct usb_request *req = NULL;
> -	struct uvc_buffer *buf;
> +	struct uvc_request *ureq = NULL;
> +	struct uvc_buffer *buf = NULL, *last_buf = NULL;
>  	unsigned long flags;
>  	bool buf_done;
>  	int ret;
> @@ -455,7 +470,8 @@ static void uvcg_video_pump(struct work_struct *work)
>  		if (buf != NULL) {
>  			video->encode(req, video, buf);
>  			buf_done = buf->state == UVC_BUF_STATE_DONE;
> -		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) && !is_bulk) {
> +		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) &&
> +				!video->is_bulk) {

No need to wrap the line.

>  			/*
>  			 * No video buffer available; the queue is still connected and
>  			 * we're transferring over ISOC. Queue a 0 length request to
> @@ -500,18 +516,30 @@ static void uvcg_video_pump(struct work_struct *work)
>  			req->no_interrupt = 1;
>  		}
>  
> -		/* Queue the USB request */
> -		ret = uvcg_video_ep_queue(video, req);
>  		spin_unlock_irqrestore(&queue->irqlock, flags);
> -
> +		spin_lock_irqsave(&video->req_lock, flags);
> +		if (video->is_enabled) {
> +			/* Queue the USB request */
> +			ret = uvcg_video_ep_queue(video, req);
> +			/* Endpoint now owns the request */
> +			req = NULL;
> +			video->req_int_count++;
> +		} else {
> +			ret =  -ENODEV;
> +			ureq = req->context;
> +			last_buf = ureq->last_buf;
> +			ureq->last_buf = NULL;
> +		}
> +		spin_unlock_irqrestore(&video->req_lock, flags);
>  		if (ret < 0) {
> +			if (last_buf != NULL) {
> +				// Return the buffer to the queue in the case the
> +				// request was not queued to the ep.
> +				uvcg_complete_buffer(&video->queue, last_buf);
> +			}
>  			uvcg_queue_cancel(queue, 0);
>  			break;
>  		}
> -
> -		/* Endpoint now owns the request */
> -		req = NULL;
> -		video->req_int_count++;
>  	}
>  
>  	if (!req)
> @@ -556,7 +584,6 @@ uvcg_video_disable(struct uvc_video *video)
>  	}
>  	spin_unlock_irqrestore(&video->req_lock, flags);
>  
> -	cancel_work_sync(&video->pump);
>  	uvcg_queue_cancel(&video->queue, 0);
>  
>  	spin_lock_irqsave(&video->req_lock, flags);
> @@ -626,14 +653,16 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
>  	if (video->max_payload_size) {
>  		video->encode = uvc_video_encode_bulk;
>  		video->payload_size = 0;
> -	} else
> +		video->is_bulk = true;
> +	} else {
>  		video->encode = video->queue.use_sg ?
>  			uvc_video_encode_isoc_sg : uvc_video_encode_isoc;
> +		video->is_bulk = false;

Isn't this already set to 0?

And wait, this is still the bulk endpoint, right?  Or am I missing
something?

thanks,

greg k-h
Michael Grzeschik Oct. 27, 2023, 11:10 a.m. UTC | #4
On Fri, Oct 27, 2023 at 10:51:17AM +0300, Laurent Pinchart wrote:
>Thank you for the patch.
>
>On Thu, Oct 26, 2023 at 09:56:35PM +0000, Jayant Chowdhary wrote:
>> This patch is based on top of
>> https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t:
>>
>> When we use an async work queue to perform the function of pumping
>> usb requests to the usb controller, it is possible that thread scheduling
>> affects at what cadence we're able to pump requests. This could mean usb
>> requests miss their uframes - resulting in video stream flickers on the host
>> device.
>>
>> In this patch, we move the pumping of usb requests to
>> 1) uvcg_video_complete() complete handler for both isoc + bulk
>>    endpoints. We still send 0 length requests when there is no uvc buffer
>>    available to encode.
>
>This means you will end up copying large amounts of data in interrupt
>context. The work queue was there to avoid exactly that, as it will
>introduce delays that can affect other parts of the system. I think this
>is a problem.

Regarding Thin's argument about possible scheduling latency that is already
introducing real errors, this seemed like a good solution.

But sure, this potential latency introduced in the interrupt context can
trigger other side effects.

However I think we need some compromise since both arguments are very valid.

Any ideas, how to solve this?

>> 2) uvc_v4l2_qbuf - only for bulk endpoints since it is not legal to send
>>    0 length requests.
>>
>> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
>> Signed-off-by: Jayant Chowdhary <jchowdhary@google.com>
>> Suggested-by: Jayant Chowdhary <jchowdhary@google.com>
>> Suggested-by: Avichal Rakesh <arakesh@google.com>
>> Tested-by: Jayant Chowdhary <jchowdhary@google.com>
>> ---
>>  v1->v2: Fix code style and add self Signed-off-by
>>
>>  drivers/usb/gadget/function/f_uvc.c     |  4 --
>>  drivers/usb/gadget/function/uvc.h       |  4 +-
>>  drivers/usb/gadget/function/uvc_v4l2.c  |  5 +-
>>  drivers/usb/gadget/function/uvc_video.c | 71 ++++++++++++++++---------
>>  drivers/usb/gadget/function/uvc_video.h |  2 +
>>  5 files changed, 51 insertions(+), 35 deletions(-)
>>
>> diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
>> index ae08341961eb..53cb2539486d 100644
>> --- a/drivers/usb/gadget/function/f_uvc.c
>> +++ b/drivers/usb/gadget/function/f_uvc.c
>> @@ -959,14 +959,10 @@ static void uvc_function_unbind(struct usb_configuration *c,
>>  {
>>  	struct usb_composite_dev *cdev = c->cdev;
>>  	struct uvc_device *uvc = to_uvc(f);
>> -	struct uvc_video *video = &uvc->video;
>>  	long wait_ret = 1;
>>
>>  	uvcg_info(f, "%s()\n", __func__);
>>
>> -	if (video->async_wq)
>> -		destroy_workqueue(video->async_wq);
>> -
>>  	/*
>>  	 * If we know we're connected via v4l2, then there should be a cleanup
>>  	 * of the device from userspace either via UVC_EVENT_DISCONNECT or
>> diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
>> index be0d012aa244..498f344fda4b 100644
>> --- a/drivers/usb/gadget/function/uvc.h
>> +++ b/drivers/usb/gadget/function/uvc.h
>> @@ -88,9 +88,6 @@ struct uvc_video {
>>  	struct uvc_device *uvc;
>>  	struct usb_ep *ep;
>>
>> -	struct work_struct pump;
>> -	struct workqueue_struct *async_wq;
>> -
>>  	/* Frame parameters */
>>  	u8 bpp;
>>  	u32 fcc;
>> @@ -116,6 +113,7 @@ struct uvc_video {
>>  	/* Context data used by the completion handler */
>>  	__u32 payload_size;
>>  	__u32 max_payload_size;
>> +	bool is_bulk;
>
>This should be introduced in a separate patch.
>
>>
>>  	struct uvc_video_queue queue;
>>  	unsigned int fid;
>> diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
>> index f4d2e24835d4..678ea6df7b5c 100644
>> --- a/drivers/usb/gadget/function/uvc_v4l2.c
>> +++ b/drivers/usb/gadget/function/uvc_v4l2.c
>> @@ -414,10 +414,7 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
>>  	ret = uvcg_queue_buffer(&video->queue, b);
>>  	if (ret < 0)
>>  		return ret;
>> -
>> -	if (uvc->state == UVC_STATE_STREAMING)
>> -		queue_work(video->async_wq, &video->pump);
>> -
>> +	uvcg_video_pump_qbuf(video);
>>  	return ret;
>>  }
>>
>> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
>> index ab3f02054e85..0fcd8e5edbac 100644
>> --- a/drivers/usb/gadget/function/uvc_video.c
>> +++ b/drivers/usb/gadget/function/uvc_video.c
>> @@ -24,6 +24,8 @@
>>   * Video codecs
>>   */
>>
>> +static void uvcg_video_pump(struct uvc_video *video);
>> +
>>  static int
>>  uvc_video_encode_header(struct uvc_video *video, struct uvc_buffer *buf,
>>  		u8 *data, int len)
>> @@ -329,7 +331,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
>>  	 */
>>  	if (video->is_enabled) {
>>  		list_add_tail(&req->list, &video->req_free);
>> -		queue_work(video->async_wq, &video->pump);
>> +		spin_unlock_irqrestore(&video->req_lock, flags);
>> +		uvcg_video_pump(video);
>> +		return;
>>  	} else {
>>  		uvc_video_free_request(ureq, ep);
>>  	}
>> @@ -409,20 +413,31 @@ uvc_video_alloc_requests(struct uvc_video *video)
>>   * Video streaming
>>   */
>>
>> +void uvcg_video_pump_qbuf(struct uvc_video *video)
>> +{
>> +	/*
>> +	 * Only call uvcg_video_pump() from qbuf, for bulk eps since
>> +	 * for isoc, the complete handler will call uvcg_video_pump()
>> +	 * consistently. Calling it for isoc eps, while correct
>> +	 * will increase contention for video->req_lock since the
>> +	 * complete handler will be called more often.
>> +	*/
>> +	if (video->is_bulk)
>> +		uvcg_video_pump(video);
>
>Am I the only one to see the *major* race condition that this patch
>introduces ?

Possible that you are. Please elaborate.

>> +}
>> +
>>  /*
>>   * uvcg_video_pump - Pump video data into the USB requests
>>   *
>>   * This function fills the available USB requests (listed in req_free) with
>>   * video data from the queued buffers.
>>   */
>> -static void uvcg_video_pump(struct work_struct *work)
>> +static void uvcg_video_pump(struct uvc_video *video)
>>  {
>> -	struct uvc_video *video = container_of(work, struct uvc_video, pump);
>>  	struct uvc_video_queue *queue = &video->queue;
>> -	/* video->max_payload_size is only set when using bulk transfer */
>> -	bool is_bulk = video->max_payload_size;
>>  	struct usb_request *req = NULL;
>> -	struct uvc_buffer *buf;
>> +	struct uvc_request *ureq = NULL;
>> +	struct uvc_buffer *buf = NULL, *last_buf = NULL;
>>  	unsigned long flags;
>>  	bool buf_done;
>>  	int ret;
>> @@ -455,7 +470,8 @@ static void uvcg_video_pump(struct work_struct *work)
>>  		if (buf != NULL) {
>>  			video->encode(req, video, buf);
>>  			buf_done = buf->state == UVC_BUF_STATE_DONE;
>> -		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) && !is_bulk) {
>> +		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) &&
>> +				!video->is_bulk) {
>>  			/*
>>  			 * No video buffer available; the queue is still connected and
>>  			 * we're transferring over ISOC. Queue a 0 length request to
>> @@ -500,18 +516,30 @@ static void uvcg_video_pump(struct work_struct *work)
>>  			req->no_interrupt = 1;
>>  		}
>>
>> -		/* Queue the USB request */
>> -		ret = uvcg_video_ep_queue(video, req);
>>  		spin_unlock_irqrestore(&queue->irqlock, flags);
>> -
>> +		spin_lock_irqsave(&video->req_lock, flags);
>> +		if (video->is_enabled) {
>> +			/* Queue the USB request */
>> +			ret = uvcg_video_ep_queue(video, req);
>> +			/* Endpoint now owns the request */
>> +			req = NULL;
>> +			video->req_int_count++;
>> +		} else {
>> +			ret =  -ENODEV;
>> +			ureq = req->context;
>> +			last_buf = ureq->last_buf;
>> +			ureq->last_buf = NULL;
>> +		}
>> +		spin_unlock_irqrestore(&video->req_lock, flags);
>>  		if (ret < 0) {
>> +			if (last_buf != NULL) {
>> +				// Return the buffer to the queue in the case the
>> +				// request was not queued to the ep.
>
>Wrong comment style.
>
>> +				uvcg_complete_buffer(&video->queue, last_buf);
>> +			}
>>  			uvcg_queue_cancel(queue, 0);
>>  			break;
>>  		}
>> -
>> -		/* Endpoint now owns the request */
>> -		req = NULL;
>> -		video->req_int_count++;
>>  	}
>>
>>  	if (!req)
>> @@ -556,7 +584,6 @@ uvcg_video_disable(struct uvc_video *video)
>>  	}
>>  	spin_unlock_irqrestore(&video->req_lock, flags);
>>
>> -	cancel_work_sync(&video->pump);
>>  	uvcg_queue_cancel(&video->queue, 0);
>>
>>  	spin_lock_irqsave(&video->req_lock, flags);
>> @@ -626,14 +653,16 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
>>  	if (video->max_payload_size) {
>>  		video->encode = uvc_video_encode_bulk;
>>  		video->payload_size = 0;
>> -	} else
>> +		video->is_bulk = true;
>> +	} else {
>>  		video->encode = video->queue.use_sg ?
>>  			uvc_video_encode_isoc_sg : uvc_video_encode_isoc;
>> +		video->is_bulk = false;
>> +	}
>>
>>  	video->req_int_count = 0;
>>
>> -	queue_work(video->async_wq, &video->pump);
>> -
>> +	uvcg_video_pump(video);
>>  	return ret;
>>  }
>>
>> @@ -646,12 +675,6 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
>>  	INIT_LIST_HEAD(&video->ureqs);
>>  	INIT_LIST_HEAD(&video->req_free);
>>  	spin_lock_init(&video->req_lock);
>> -	INIT_WORK(&video->pump, uvcg_video_pump);
>> -
>> -	/* Allocate a work queue for asynchronous video pump handler. */
>> -	video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI, 0);
>> -	if (!video->async_wq)
>> -		return -EINVAL;
>>
>>  	video->uvc = uvc;
>>  	video->fcc = V4L2_PIX_FMT_YUYV;
>> diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h
>> index 03adeefa343b..29c6b9a2e9c3 100644
>> --- a/drivers/usb/gadget/function/uvc_video.h
>> +++ b/drivers/usb/gadget/function/uvc_video.h
>> @@ -18,4 +18,6 @@ int uvcg_video_enable(struct uvc_video *video, int enable);
>>
>>  int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc);
>>
>> +void uvcg_video_pump_qbuf(struct uvc_video *video);
>> +
>>  #endif /* __UVC_VIDEO_H__ */
>
>-- 
>Regards,
>
>Laurent Pinchart
>
Laurent Pinchart Oct. 27, 2023, 11:47 a.m. UTC | #5
On Fri, Oct 27, 2023 at 01:10:21PM +0200, Michael Grzeschik wrote:
> On Fri, Oct 27, 2023 at 10:51:17AM +0300, Laurent Pinchart wrote:
> > On Thu, Oct 26, 2023 at 09:56:35PM +0000, Jayant Chowdhary wrote:
> >> This patch is based on top of
> >> https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t:
> >>
> >> When we use an async work queue to perform the function of pumping
> >> usb requests to the usb controller, it is possible that thread scheduling
> >> affects at what cadence we're able to pump requests. This could mean usb
> >> requests miss their uframes - resulting in video stream flickers on the host
> >> device.
> >>
> >> In this patch, we move the pumping of usb requests to
> >> 1) uvcg_video_complete() complete handler for both isoc + bulk
> >>    endpoints. We still send 0 length requests when there is no uvc buffer
> >>    available to encode.
> >
> > This means you will end up copying large amounts of data in interrupt
> > context. The work queue was there to avoid exactly that, as it will
> > introduce delays that can affect other parts of the system. I think this
> > is a problem.
> 
> Regarding Thin's argument about possible scheduling latency that is already
> introducing real errors, this seemed like a good solution.
> 
> But sure, this potential latency introduced in the interrupt context can
> trigger other side effects.
> 
> However I think we need some compromise since both arguments are very valid.

Agreed.

> Any ideas, how to solve this?

I'm afraid not.

> >> 2) uvc_v4l2_qbuf - only for bulk endpoints since it is not legal to send
> >>    0 length requests.
> >>
> >> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
> >> Signed-off-by: Jayant Chowdhary <jchowdhary@google.com>
> >> Suggested-by: Jayant Chowdhary <jchowdhary@google.com>
> >> Suggested-by: Avichal Rakesh <arakesh@google.com>
> >> Tested-by: Jayant Chowdhary <jchowdhary@google.com>
> >> ---
> >>  v1->v2: Fix code style and add self Signed-off-by
> >>
> >>  drivers/usb/gadget/function/f_uvc.c     |  4 --
> >>  drivers/usb/gadget/function/uvc.h       |  4 +-
> >>  drivers/usb/gadget/function/uvc_v4l2.c  |  5 +-
> >>  drivers/usb/gadget/function/uvc_video.c | 71 ++++++++++++++++---------
> >>  drivers/usb/gadget/function/uvc_video.h |  2 +
> >>  5 files changed, 51 insertions(+), 35 deletions(-)
> >>
> >> diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
> >> index ae08341961eb..53cb2539486d 100644
> >> --- a/drivers/usb/gadget/function/f_uvc.c
> >> +++ b/drivers/usb/gadget/function/f_uvc.c
> >> @@ -959,14 +959,10 @@ static void uvc_function_unbind(struct usb_configuration *c,
> >>  {
> >>  	struct usb_composite_dev *cdev = c->cdev;
> >>  	struct uvc_device *uvc = to_uvc(f);
> >> -	struct uvc_video *video = &uvc->video;
> >>  	long wait_ret = 1;
> >>
> >>  	uvcg_info(f, "%s()\n", __func__);
> >>
> >> -	if (video->async_wq)
> >> -		destroy_workqueue(video->async_wq);
> >> -
> >>  	/*
> >>  	 * If we know we're connected via v4l2, then there should be a cleanup
> >>  	 * of the device from userspace either via UVC_EVENT_DISCONNECT or
> >> diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
> >> index be0d012aa244..498f344fda4b 100644
> >> --- a/drivers/usb/gadget/function/uvc.h
> >> +++ b/drivers/usb/gadget/function/uvc.h
> >> @@ -88,9 +88,6 @@ struct uvc_video {
> >>  	struct uvc_device *uvc;
> >>  	struct usb_ep *ep;
> >>
> >> -	struct work_struct pump;
> >> -	struct workqueue_struct *async_wq;
> >> -
> >>  	/* Frame parameters */
> >>  	u8 bpp;
> >>  	u32 fcc;
> >> @@ -116,6 +113,7 @@ struct uvc_video {
> >>  	/* Context data used by the completion handler */
> >>  	__u32 payload_size;
> >>  	__u32 max_payload_size;
> >> +	bool is_bulk;
> >
> >This should be introduced in a separate patch.
> >
> >>
> >>  	struct uvc_video_queue queue;
> >>  	unsigned int fid;
> >> diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
> >> index f4d2e24835d4..678ea6df7b5c 100644
> >> --- a/drivers/usb/gadget/function/uvc_v4l2.c
> >> +++ b/drivers/usb/gadget/function/uvc_v4l2.c
> >> @@ -414,10 +414,7 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
> >>  	ret = uvcg_queue_buffer(&video->queue, b);
> >>  	if (ret < 0)
> >>  		return ret;
> >> -
> >> -	if (uvc->state == UVC_STATE_STREAMING)
> >> -		queue_work(video->async_wq, &video->pump);
> >> -
> >> +	uvcg_video_pump_qbuf(video);
> >>  	return ret;
> >>  }
> >>
> >> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
> >> index ab3f02054e85..0fcd8e5edbac 100644
> >> --- a/drivers/usb/gadget/function/uvc_video.c
> >> +++ b/drivers/usb/gadget/function/uvc_video.c
> >> @@ -24,6 +24,8 @@
> >>   * Video codecs
> >>   */
> >>
> >> +static void uvcg_video_pump(struct uvc_video *video);
> >> +
> >>  static int
> >>  uvc_video_encode_header(struct uvc_video *video, struct uvc_buffer *buf,
> >>  		u8 *data, int len)
> >> @@ -329,7 +331,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
> >>  	 */
> >>  	if (video->is_enabled) {
> >>  		list_add_tail(&req->list, &video->req_free);
> >> -		queue_work(video->async_wq, &video->pump);
> >> +		spin_unlock_irqrestore(&video->req_lock, flags);
> >> +		uvcg_video_pump(video);
> >> +		return;
> >>  	} else {
> >>  		uvc_video_free_request(ureq, ep);
> >>  	}
> >> @@ -409,20 +413,31 @@ uvc_video_alloc_requests(struct uvc_video *video)
> >>   * Video streaming
> >>   */
> >>
> >> +void uvcg_video_pump_qbuf(struct uvc_video *video)
> >> +{
> >> +	/*
> >> +	 * Only call uvcg_video_pump() from qbuf, for bulk eps since
> >> +	 * for isoc, the complete handler will call uvcg_video_pump()
> >> +	 * consistently. Calling it for isoc eps, while correct
> >> +	 * will increase contention for video->req_lock since the
> >> +	 * complete handler will be called more often.
> >> +	*/
> >> +	if (video->is_bulk)
> >> +		uvcg_video_pump(video);
> >
> > Am I the only one to see the *major* race condition that this patch
> > introduces ?
> 
> Possible that you are. Please elaborate.

uvcg_video_pump() can now run multiple times in parallel on multiple
CPUs. Look at the while() loop in the function, and consider what will
happen when run on two CPUs concurrently. See below for an additional
comment on this.

> >> +}
> >> +
> >>  /*
> >>   * uvcg_video_pump - Pump video data into the USB requests
> >>   *
> >>   * This function fills the available USB requests (listed in req_free) with
> >>   * video data from the queued buffers.
> >>   */
> >> -static void uvcg_video_pump(struct work_struct *work)
> >> +static void uvcg_video_pump(struct uvc_video *video)
> >>  {
> >> -	struct uvc_video *video = container_of(work, struct uvc_video, pump);
> >>  	struct uvc_video_queue *queue = &video->queue;
> >> -	/* video->max_payload_size is only set when using bulk transfer */
> >> -	bool is_bulk = video->max_payload_size;
> >>  	struct usb_request *req = NULL;
> >> -	struct uvc_buffer *buf;
> >> +	struct uvc_request *ureq = NULL;
> >> +	struct uvc_buffer *buf = NULL, *last_buf = NULL;
> >>  	unsigned long flags;
> >>  	bool buf_done;
> >>  	int ret;
> >> @@ -455,7 +470,8 @@ static void uvcg_video_pump(struct work_struct *work)
> >>  		if (buf != NULL) {
> >>  			video->encode(req, video, buf);
> >>  			buf_done = buf->state == UVC_BUF_STATE_DONE;
> >> -		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) && !is_bulk) {
> >> +		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) &&
> >> +				!video->is_bulk) {
> >>  			/*
> >>  			 * No video buffer available; the queue is still connected and
> >>  			 * we're transferring over ISOC. Queue a 0 length request to
> >> @@ -500,18 +516,30 @@ static void uvcg_video_pump(struct work_struct *work)
> >>  			req->no_interrupt = 1;
> >>  		}
> >>
> >> -		/* Queue the USB request */
> >> -		ret = uvcg_video_ep_queue(video, req);
> >>  		spin_unlock_irqrestore(&queue->irqlock, flags);
> >> -

Here's one problematic point. The code above may have run on CPU A,
which releases IRQ lock. CPU B may then run the same code to encode the
next chunk of data in a request, and proceed to the code below before
CPU A. The requests will then be queued in the wrong order.

In the next iteration of this patch, I would like to see a clear
explanation in the commit message of why there is no race condition
(after fixing the existing ones, of course). Writing it down forces
going through the mental exercise of thinking about the race conditions,
which should help catching them.

> >> +		spin_lock_irqsave(&video->req_lock, flags);
> >> +		if (video->is_enabled) {
> >> +			/* Queue the USB request */
> >> +			ret = uvcg_video_ep_queue(video, req);
> >> +			/* Endpoint now owns the request */
> >> +			req = NULL;
> >> +			video->req_int_count++;
> >> +		} else {
> >> +			ret =  -ENODEV;
> >> +			ureq = req->context;
> >> +			last_buf = ureq->last_buf;
> >> +			ureq->last_buf = NULL;
> >> +		}
> >> +		spin_unlock_irqrestore(&video->req_lock, flags);
> >>  		if (ret < 0) {
> >> +			if (last_buf != NULL) {
> >> +				// Return the buffer to the queue in the case the
> >> +				// request was not queued to the ep.
> >
> > Wrong comment style.
> >
> >> +				uvcg_complete_buffer(&video->queue, last_buf);
> >> +			}
> >>  			uvcg_queue_cancel(queue, 0);
> >>  			break;
> >>  		}
> >> -
> >> -		/* Endpoint now owns the request */
> >> -		req = NULL;
> >> -		video->req_int_count++;
> >>  	}
> >>
> >>  	if (!req)
> >> @@ -556,7 +584,6 @@ uvcg_video_disable(struct uvc_video *video)
> >>  	}
> >>  	spin_unlock_irqrestore(&video->req_lock, flags);
> >>
> >> -	cancel_work_sync(&video->pump);
> >>  	uvcg_queue_cancel(&video->queue, 0);
> >>
> >>  	spin_lock_irqsave(&video->req_lock, flags);
> >> @@ -626,14 +653,16 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
> >>  	if (video->max_payload_size) {
> >>  		video->encode = uvc_video_encode_bulk;
> >>  		video->payload_size = 0;
> >> -	} else
> >> +		video->is_bulk = true;
> >> +	} else {
> >>  		video->encode = video->queue.use_sg ?
> >>  			uvc_video_encode_isoc_sg : uvc_video_encode_isoc;
> >> +		video->is_bulk = false;
> >> +	}
> >>
> >>  	video->req_int_count = 0;
> >>
> >> -	queue_work(video->async_wq, &video->pump);
> >> -
> >> +	uvcg_video_pump(video);
> >>  	return ret;
> >>  }
> >>
> >> @@ -646,12 +675,6 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
> >>  	INIT_LIST_HEAD(&video->ureqs);
> >>  	INIT_LIST_HEAD(&video->req_free);
> >>  	spin_lock_init(&video->req_lock);
> >> -	INIT_WORK(&video->pump, uvcg_video_pump);
> >> -
> >> -	/* Allocate a work queue for asynchronous video pump handler. */
> >> -	video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI, 0);
> >> -	if (!video->async_wq)
> >> -		return -EINVAL;
> >>
> >>  	video->uvc = uvc;
> >>  	video->fcc = V4L2_PIX_FMT_YUYV;
> >> diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h
> >> index 03adeefa343b..29c6b9a2e9c3 100644
> >> --- a/drivers/usb/gadget/function/uvc_video.h
> >> +++ b/drivers/usb/gadget/function/uvc_video.h
> >> @@ -18,4 +18,6 @@ int uvcg_video_enable(struct uvc_video *video, int enable);
> >>
> >>  int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc);
> >>
> >> +void uvcg_video_pump_qbuf(struct uvc_video *video);
> >> +
> >>  #endif /* __UVC_VIDEO_H__ */
Michael Grzeschik Oct. 27, 2023, 1:39 p.m. UTC | #6
On Fri, Oct 27, 2023 at 02:47:52PM +0300, Laurent Pinchart wrote:
>On Fri, Oct 27, 2023 at 01:10:21PM +0200, Michael Grzeschik wrote:
>> On Fri, Oct 27, 2023 at 10:51:17AM +0300, Laurent Pinchart wrote:
>> > On Thu, Oct 26, 2023 at 09:56:35PM +0000, Jayant Chowdhary wrote:
>> >> This patch is based on top of
>> >> https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t:
>> >>
>> >> When we use an async work queue to perform the function of pumping
>> >> usb requests to the usb controller, it is possible that thread scheduling
>> >> affects at what cadence we're able to pump requests. This could mean usb
>> >> requests miss their uframes - resulting in video stream flickers on the host
>> >> device.
>> >>
>> >> In this patch, we move the pumping of usb requests to
>> >> 1) uvcg_video_complete() complete handler for both isoc + bulk
>> >>    endpoints. We still send 0 length requests when there is no uvc buffer
>> >>    available to encode.
>> >
>> > This means you will end up copying large amounts of data in interrupt
>> > context. The work queue was there to avoid exactly that, as it will
>> > introduce delays that can affect other parts of the system. I think this
>> > is a problem.
>>
>> Regarding Thin's argument about possible scheduling latency that is already
>> introducing real errors, this seemed like a good solution.
>>
>> But sure, this potential latency introduced in the interrupt context can
>> trigger other side effects.
>>
>> However I think we need some compromise since both arguments are very valid.
>
>Agreed.
>
>> Any ideas, how to solve this?
>
>I'm afraid not.

We discussed this and came to the conclusion that we could make use of
kthread_create and sched_setattr with an attr->sched_policy = SCHED_DEADLINE
here instead of the workqueue. This way we would ensure that the worker
would be triggered with hard definitions.

Since the SG case is not that heavy on the completion handler, we could
also make this kthread conditionaly to the memcpy case.

>> >> 2) uvc_v4l2_qbuf - only for bulk endpoints since it is not legal to send
>> >>    0 length requests.
>> >>
>> >> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
>> >> Signed-off-by: Jayant Chowdhary <jchowdhary@google.com>
>> >> Suggested-by: Jayant Chowdhary <jchowdhary@google.com>
>> >> Suggested-by: Avichal Rakesh <arakesh@google.com>
>> >> Tested-by: Jayant Chowdhary <jchowdhary@google.com>
>> >> ---
>> >>  v1->v2: Fix code style and add self Signed-off-by
>> >>
>> >>  drivers/usb/gadget/function/f_uvc.c     |  4 --
>> >>  drivers/usb/gadget/function/uvc.h       |  4 +-
>> >>  drivers/usb/gadget/function/uvc_v4l2.c  |  5 +-
>> >>  drivers/usb/gadget/function/uvc_video.c | 71 ++++++++++++++++---------
>> >>  drivers/usb/gadget/function/uvc_video.h |  2 +
>> >>  5 files changed, 51 insertions(+), 35 deletions(-)
>> >>
>> >> diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
>> >> index ae08341961eb..53cb2539486d 100644
>> >> --- a/drivers/usb/gadget/function/f_uvc.c
>> >> +++ b/drivers/usb/gadget/function/f_uvc.c
>> >> @@ -959,14 +959,10 @@ static void uvc_function_unbind(struct usb_configuration *c,
>> >>  {
>> >>  	struct usb_composite_dev *cdev = c->cdev;
>> >>  	struct uvc_device *uvc = to_uvc(f);
>> >> -	struct uvc_video *video = &uvc->video;
>> >>  	long wait_ret = 1;
>> >>
>> >>  	uvcg_info(f, "%s()\n", __func__);
>> >>
>> >> -	if (video->async_wq)
>> >> -		destroy_workqueue(video->async_wq);
>> >> -
>> >>  	/*
>> >>  	 * If we know we're connected via v4l2, then there should be a cleanup
>> >>  	 * of the device from userspace either via UVC_EVENT_DISCONNECT or
>> >> diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
>> >> index be0d012aa244..498f344fda4b 100644
>> >> --- a/drivers/usb/gadget/function/uvc.h
>> >> +++ b/drivers/usb/gadget/function/uvc.h
>> >> @@ -88,9 +88,6 @@ struct uvc_video {
>> >>  	struct uvc_device *uvc;
>> >>  	struct usb_ep *ep;
>> >>
>> >> -	struct work_struct pump;
>> >> -	struct workqueue_struct *async_wq;
>> >> -
>> >>  	/* Frame parameters */
>> >>  	u8 bpp;
>> >>  	u32 fcc;
>> >> @@ -116,6 +113,7 @@ struct uvc_video {
>> >>  	/* Context data used by the completion handler */
>> >>  	__u32 payload_size;
>> >>  	__u32 max_payload_size;
>> >> +	bool is_bulk;
>> >
>> >This should be introduced in a separate patch.
>> >
>> >>
>> >>  	struct uvc_video_queue queue;
>> >>  	unsigned int fid;
>> >> diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
>> >> index f4d2e24835d4..678ea6df7b5c 100644
>> >> --- a/drivers/usb/gadget/function/uvc_v4l2.c
>> >> +++ b/drivers/usb/gadget/function/uvc_v4l2.c
>> >> @@ -414,10 +414,7 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
>> >>  	ret = uvcg_queue_buffer(&video->queue, b);
>> >>  	if (ret < 0)
>> >>  		return ret;
>> >> -
>> >> -	if (uvc->state == UVC_STATE_STREAMING)
>> >> -		queue_work(video->async_wq, &video->pump);
>> >> -
>> >> +	uvcg_video_pump_qbuf(video);
>> >>  	return ret;
>> >>  }
>> >>
>> >> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
>> >> index ab3f02054e85..0fcd8e5edbac 100644
>> >> --- a/drivers/usb/gadget/function/uvc_video.c
>> >> +++ b/drivers/usb/gadget/function/uvc_video.c
>> >> @@ -24,6 +24,8 @@
>> >>   * Video codecs
>> >>   */
>> >>
>> >> +static void uvcg_video_pump(struct uvc_video *video);
>> >> +
>> >>  static int
>> >>  uvc_video_encode_header(struct uvc_video *video, struct uvc_buffer *buf,
>> >>  		u8 *data, int len)
>> >> @@ -329,7 +331,9 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
>> >>  	 */
>> >>  	if (video->is_enabled) {
>> >>  		list_add_tail(&req->list, &video->req_free);
>> >> -		queue_work(video->async_wq, &video->pump);
>> >> +		spin_unlock_irqrestore(&video->req_lock, flags);
>> >> +		uvcg_video_pump(video);
>> >> +		return;
>> >>  	} else {
>> >>  		uvc_video_free_request(ureq, ep);
>> >>  	}
>> >> @@ -409,20 +413,31 @@ uvc_video_alloc_requests(struct uvc_video *video)
>> >>   * Video streaming
>> >>   */
>> >>
>> >> +void uvcg_video_pump_qbuf(struct uvc_video *video)
>> >> +{
>> >> +	/*
>> >> +	 * Only call uvcg_video_pump() from qbuf, for bulk eps since
>> >> +	 * for isoc, the complete handler will call uvcg_video_pump()
>> >> +	 * consistently. Calling it for isoc eps, while correct
>> >> +	 * will increase contention for video->req_lock since the
>> >> +	 * complete handler will be called more often.
>> >> +	*/
>> >> +	if (video->is_bulk)
>> >> +		uvcg_video_pump(video);
>> >
>> > Am I the only one to see the *major* race condition that this patch
>> > introduces ?
>>
>> Possible that you are. Please elaborate.
>
>uvcg_video_pump() can now run multiple times in parallel on multiple
>CPUs. Look at the while() loop in the function, and consider what will
>happen when run on two CPUs concurrently. See below for an additional
>comment on this.
>
>> >> +}
>> >> +
>> >>  /*
>> >>   * uvcg_video_pump - Pump video data into the USB requests
>> >>   *
>> >>   * This function fills the available USB requests (listed in req_free) with
>> >>   * video data from the queued buffers.
>> >>   */
>> >> -static void uvcg_video_pump(struct work_struct *work)
>> >> +static void uvcg_video_pump(struct uvc_video *video)
>> >>  {
>> >> -	struct uvc_video *video = container_of(work, struct uvc_video, pump);
>> >>  	struct uvc_video_queue *queue = &video->queue;
>> >> -	/* video->max_payload_size is only set when using bulk transfer */
>> >> -	bool is_bulk = video->max_payload_size;
>> >>  	struct usb_request *req = NULL;
>> >> -	struct uvc_buffer *buf;
>> >> +	struct uvc_request *ureq = NULL;
>> >> +	struct uvc_buffer *buf = NULL, *last_buf = NULL;
>> >>  	unsigned long flags;
>> >>  	bool buf_done;
>> >>  	int ret;
>> >> @@ -455,7 +470,8 @@ static void uvcg_video_pump(struct work_struct *work)
>> >>  		if (buf != NULL) {
>> >>  			video->encode(req, video, buf);
>> >>  			buf_done = buf->state == UVC_BUF_STATE_DONE;
>> >> -		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) && !is_bulk) {
>> >> +		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) &&
>> >> +				!video->is_bulk) {
>> >>  			/*
>> >>  			 * No video buffer available; the queue is still connected and
>> >>  			 * we're transferring over ISOC. Queue a 0 length request to
>> >> @@ -500,18 +516,30 @@ static void uvcg_video_pump(struct work_struct *work)
>> >>  			req->no_interrupt = 1;
>> >>  		}
>> >>
>> >> -		/* Queue the USB request */
>> >> -		ret = uvcg_video_ep_queue(video, req);
>> >>  		spin_unlock_irqrestore(&queue->irqlock, flags);
>> >> -
>
>Here's one problematic point. The code above may have run on CPU A,
>which releases IRQ lock. CPU B may then run the same code to encode the
>next chunk of data in a request, and proceed to the code below before
>CPU A. The requests will then be queued in the wrong order.

Right

>In the next iteration of this patch, I would like to see a clear
>explanation in the commit message of why there is no race condition
>(after fixing the existing ones, of course). Writing it down forces
>going through the mental exercise of thinking about the race conditions,
>which should help catching them.

Agreed.

>> >> +		spin_lock_irqsave(&video->req_lock, flags);
>> >> +		if (video->is_enabled) {
>> >> +			/* Queue the USB request */
>> >> +			ret = uvcg_video_ep_queue(video, req);
>> >> +			/* Endpoint now owns the request */
>> >> +			req = NULL;
>> >> +			video->req_int_count++;
>> >> +		} else {
>> >> +			ret =  -ENODEV;
>> >> +			ureq = req->context;
>> >> +			last_buf = ureq->last_buf;
>> >> +			ureq->last_buf = NULL;
>> >> +		}
>> >> +		spin_unlock_irqrestore(&video->req_lock, flags);
>> >>  		if (ret < 0) {
>> >> +			if (last_buf != NULL) {
>> >> +				// Return the buffer to the queue in the case the
>> >> +				// request was not queued to the ep.
>> >
>> > Wrong comment style.
>> >
>> >> +				uvcg_complete_buffer(&video->queue, last_buf);
>> >> +			}
>> >>  			uvcg_queue_cancel(queue, 0);
>> >>  			break;
>> >>  		}
>> >> -
>> >> -		/* Endpoint now owns the request */
>> >> -		req = NULL;
>> >> -		video->req_int_count++;
>> >>  	}
>> >>
>> >>  	if (!req)
>> >> @@ -556,7 +584,6 @@ uvcg_video_disable(struct uvc_video *video)
>> >>  	}
>> >>  	spin_unlock_irqrestore(&video->req_lock, flags);
>> >>
>> >> -	cancel_work_sync(&video->pump);
>> >>  	uvcg_queue_cancel(&video->queue, 0);
>> >>
>> >>  	spin_lock_irqsave(&video->req_lock, flags);
>> >> @@ -626,14 +653,16 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
>> >>  	if (video->max_payload_size) {
>> >>  		video->encode = uvc_video_encode_bulk;
>> >>  		video->payload_size = 0;
>> >> -	} else
>> >> +		video->is_bulk = true;
>> >> +	} else {
>> >>  		video->encode = video->queue.use_sg ?
>> >>  			uvc_video_encode_isoc_sg : uvc_video_encode_isoc;
>> >> +		video->is_bulk = false;
>> >> +	}
>> >>
>> >>  	video->req_int_count = 0;
>> >>
>> >> -	queue_work(video->async_wq, &video->pump);
>> >> -
>> >> +	uvcg_video_pump(video);
>> >>  	return ret;
>> >>  }
>> >>
>> >> @@ -646,12 +675,6 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
>> >>  	INIT_LIST_HEAD(&video->ureqs);
>> >>  	INIT_LIST_HEAD(&video->req_free);
>> >>  	spin_lock_init(&video->req_lock);
>> >> -	INIT_WORK(&video->pump, uvcg_video_pump);
>> >> -
>> >> -	/* Allocate a work queue for asynchronous video pump handler. */
>> >> -	video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI, 0);
>> >> -	if (!video->async_wq)
>> >> -		return -EINVAL;
>> >>
>> >>  	video->uvc = uvc;
>> >>  	video->fcc = V4L2_PIX_FMT_YUYV;
>> >> diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h
>> >> index 03adeefa343b..29c6b9a2e9c3 100644
>> >> --- a/drivers/usb/gadget/function/uvc_video.h
>> >> +++ b/drivers/usb/gadget/function/uvc_video.h
>> >> @@ -18,4 +18,6 @@ int uvcg_video_enable(struct uvc_video *video, int enable);
>> >>
>> >>  int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc);
>> >>
>> >> +void uvcg_video_pump_qbuf(struct uvc_video *video);
>> >> +
>> >>  #endif /* __UVC_VIDEO_H__ */
>
>-- 
>Regards,
>
>Laurent Pinchart
>
Alan Stern Oct. 27, 2023, 2:58 p.m. UTC | #7
On Fri, Oct 27, 2023 at 03:39:44PM +0200, Michael Grzeschik wrote:
> On Fri, Oct 27, 2023 at 02:47:52PM +0300, Laurent Pinchart wrote:
> > On Fri, Oct 27, 2023 at 01:10:21PM +0200, Michael Grzeschik wrote:
> > > On Fri, Oct 27, 2023 at 10:51:17AM +0300, Laurent Pinchart wrote:
> > > > On Thu, Oct 26, 2023 at 09:56:35PM +0000, Jayant Chowdhary wrote:
> > > >> This patch is based on top of
> > > >> https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t:
> > > >>
> > > >> When we use an async work queue to perform the function of pumping
> > > >> usb requests to the usb controller, it is possible that thread scheduling
> > > >> affects at what cadence we're able to pump requests. This could mean usb
> > > >> requests miss their uframes - resulting in video stream flickers on the host
> > > >> device.
> > > >>
> > > >> In this patch, we move the pumping of usb requests to
> > > >> 1) uvcg_video_complete() complete handler for both isoc + bulk
> > > >>    endpoints. We still send 0 length requests when there is no uvc buffer
> > > >>    available to encode.
> > > >
> > > > This means you will end up copying large amounts of data in interrupt
> > > > context. The work queue was there to avoid exactly that, as it will
> > > > introduce delays that can affect other parts of the system. I think this
> > > > is a problem.
> > > 
> > > Regarding Thin's argument about possible scheduling latency that is already
> > > introducing real errors, this seemed like a good solution.
> > > 
> > > But sure, this potential latency introduced in the interrupt context can
> > > trigger other side effects.
> > > 
> > > However I think we need some compromise since both arguments are very valid.
> > 
> > Agreed.
> > 
> > > Any ideas, how to solve this?
> > 
> > I'm afraid not.
> 
> We discussed this and came to the conclusion that we could make use of
> kthread_create and sched_setattr with an attr->sched_policy = SCHED_DEADLINE
> here instead of the workqueue. This way we would ensure that the worker
> would be triggered with hard definitions.
> 
> Since the SG case is not that heavy on the completion handler, we could
> also make this kthread conditionaly to the memcpy case.

If you don't mind a naive suggestion from someone who knows nothing 
about the driver...

An attractive possibility is to have the work queue (or kthread) do the 
time-consuming copying, but leave the submission up to the completion 
handler.  If the data isn't ready (or there's no data to send) when the 
handler runs, then queue a 0-length request.

That will give you the best of both worlds: low latency while in 
interrupt context and a steady, constant flow of USB transfers at all 
times.  The question of how to schedule the work queue or kthread is a 
separate matter, not directly relevant to this design decision.

Alan Stern
Michael Grzeschik Oct. 28, 2023, 11:10 a.m. UTC | #8
On Fri, Oct 27, 2023 at 10:58:11AM -0400, Alan Stern wrote:
>On Fri, Oct 27, 2023 at 03:39:44PM +0200, Michael Grzeschik wrote:
>> On Fri, Oct 27, 2023 at 02:47:52PM +0300, Laurent Pinchart wrote:
>> > On Fri, Oct 27, 2023 at 01:10:21PM +0200, Michael Grzeschik wrote:
>> > > On Fri, Oct 27, 2023 at 10:51:17AM +0300, Laurent Pinchart wrote:
>> > > > On Thu, Oct 26, 2023 at 09:56:35PM +0000, Jayant Chowdhary wrote:
>> > > >> This patch is based on top of
>> > > >> https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t:
>> > > >>
>> > > >> When we use an async work queue to perform the function of pumping
>> > > >> usb requests to the usb controller, it is possible that thread scheduling
>> > > >> affects at what cadence we're able to pump requests. This could mean usb
>> > > >> requests miss their uframes - resulting in video stream flickers on the host
>> > > >> device.
>> > > >>
>> > > >> In this patch, we move the pumping of usb requests to
>> > > >> 1) uvcg_video_complete() complete handler for both isoc + bulk
>> > > >>    endpoints. We still send 0 length requests when there is no uvc buffer
>> > > >>    available to encode.
>> > > >
>> > > > This means you will end up copying large amounts of data in interrupt
>> > > > context. The work queue was there to avoid exactly that, as it will
>> > > > introduce delays that can affect other parts of the system. I think this
>> > > > is a problem.
>> > >
>> > > Regarding Thin's argument about possible scheduling latency that is already
>> > > introducing real errors, this seemed like a good solution.
>> > >
>> > > But sure, this potential latency introduced in the interrupt context can
>> > > trigger other side effects.
>> > >
>> > > However I think we need some compromise since both arguments are very valid.
>> >
>> > Agreed.
>> >
>> > > Any ideas, how to solve this?
>> >
>> > I'm afraid not.
>>
>> We discussed this and came to the conclusion that we could make use of
>> kthread_create and sched_setattr with an attr->sched_policy = SCHED_DEADLINE
>> here instead of the workqueue. This way we would ensure that the worker
>> would be triggered with hard definitions.
>>
>> Since the SG case is not that heavy on the completion handler, we could
>> also make this kthread conditionaly to the memcpy case.
>
>If you don't mind a naive suggestion from someone who knows nothing
>about the driver...
>
>An attractive possibility is to have the work queue (or kthread) do the
>time-consuming copying, but leave the submission up to the completion
>handler.  If the data isn't ready (or there's no data to send) when the
>handler runs, then queue a 0-length request.
>
>That will give you the best of both worlds: low latency while in
>interrupt context and a steady, constant flow of USB transfers at all
>times.  The question of how to schedule the work queue or kthread is a
>separate matter, not directly relevant to this design decision.

That's it. This is probably the best way to tackle the overall problem.

So we leave the call of the encode callback to the worker, that will
probably still can be a workqueue. The complete callback is calling
the explicit uvcg_video_ep_queue when prepared requests are available
and if there is nothing pending it will just enqueue zero requests.

Thank you Alan, this makes so much sense!

Jayant, Laurent: Do you agree?
If yes, Jayant will you change the patch accordingly?

Michael
Jayant Chowdhary Oct. 28, 2023, 2:09 p.m. UTC | #9
Hi,

On 10/28/23 04:10, Michael Grzeschik wrote:
> On Fri, Oct 27, 2023 at 10:58:11AM -0400, Alan Stern wrote:
>> On Fri, Oct 27, 2023 at 03:39:44PM +0200, Michael Grzeschik wrote:
>>> On Fri, Oct 27, 2023 at 02:47:52PM +0300, Laurent Pinchart wrote:
>>> > On Fri, Oct 27, 2023 at 01:10:21PM +0200, Michael Grzeschik wrote:
>>> > > On Fri, Oct 27, 2023 at 10:51:17AM +0300, Laurent Pinchart wrote:
>>> > > > On Thu, Oct 26, 2023 at 09:56:35PM +0000, Jayant Chowdhary wrote:
>>> > > >> This patch is based on top of
>>> > > >> https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t:
>>> > > >>
>>> > > >> When we use an async work queue to perform the function of pumping
>>> > > >> usb requests to the usb controller, it is possible that thread scheduling
>>> > > >> affects at what cadence we're able to pump requests. This could mean usb
>>> > > >> requests miss their uframes - resulting in video stream flickers on the host
>>> > > >> device.
>>> > > >>
>>> > > >> In this patch, we move the pumping of usb requests to
>>> > > >> 1) uvcg_video_complete() complete handler for both isoc + bulk
>>> > > >>    endpoints. We still send 0 length requests when there is no uvc buffer
>>> > > >>    available to encode.
>>> > > >
>>> > > > This means you will end up copying large amounts of data in interrupt
>>> > > > context. The work queue was there to avoid exactly that, as it will
>>> > > > introduce delays that can affect other parts of the system. I think this
>>> > > > is a problem.
>>> > >
>>> > > Regarding Thin's argument about possible scheduling latency that is already
>>> > > introducing real errors, this seemed like a good solution.
>>> > >
>>> > > But sure, this potential latency introduced in the interrupt context can
>>> > > trigger other side effects.
>>> > >
>>> > > However I think we need some compromise since both arguments are very valid.
>>> >
>>> > Agreed.
>>> >
>>> > > Any ideas, how to solve this?
>>> >
>>> > I'm afraid not.
>>>
>>> We discussed this and came to the conclusion that we could make use of
>>> kthread_create and sched_setattr with an attr->sched_policy = SCHED_DEADLINE
>>> here instead of the workqueue. This way we would ensure that the worker
>>> would be triggered with hard definitions.
>>>
>>> Since the SG case is not that heavy on the completion handler, we could
>>> also make this kthread conditionaly to the memcpy case.
>>
>> If you don't mind a naive suggestion from someone who knows nothing
>> about the driver...
>>
>> An attractive possibility is to have the work queue (or kthread) do the
>> time-consuming copying, but leave the submission up to the completion
>> handler.  If the data isn't ready (or there's no data to send) when the
>> handler runs, then queue a 0-length request.
>>
>> That will give you the best of both worlds: low latency while in
>> interrupt context and a steady, constant flow of USB transfers at all
>> times.  The question of how to schedule the work queue or kthread is a
>> separate matter, not directly relevant to this design decision.
>
> That's it. This is probably the best way to tackle the overall problem.
>
> So we leave the call of the encode callback to the worker, that will
> probably still can be a workqueue. The complete callback is calling
> the explicit uvcg_video_ep_queue when prepared requests are available
> and if there is nothing pending it will just enqueue zero requests.
>
> Thank you Alan, this makes so much sense!
>
> Jayant, Laurent: Do you agree?
> If yes, Jayant will you change the patch accordingly?
>
>
Thanks for all the discussion Greg, Michael, Laurent and Alan.
Apologies for not responding earlier since I am OOO.

While I  haven't tried this out this does seem like a very good idea.
Thank you Alan! I will aim to make changes and post a patch on Monday night PST.

Jayant
Jayant Chowdhary Oct. 31, 2023, 6:11 a.m. UTC | #10
Hi,

On 10/28/23 07:09, Jayant Chowdhary wrote:
> Hi,
>
> On 10/28/23 04:10, Michael Grzeschik wrote:
>> On Fri, Oct 27, 2023 at 10:58:11AM -0400, Alan Stern wrote:
>>> On Fri, Oct 27, 2023 at 03:39:44PM +0200, Michael Grzeschik wrote:
>>>> On Fri, Oct 27, 2023 at 02:47:52PM +0300, Laurent Pinchart wrote:
>>>>> On Fri, Oct 27, 2023 at 01:10:21PM +0200, Michael Grzeschik wrote:
>>>>>> On Fri, Oct 27, 2023 at 10:51:17AM +0300, Laurent Pinchart wrote:
>>>>>>> On Thu, Oct 26, 2023 at 09:56:35PM +0000, Jayant Chowdhary wrote:
>>>>>>>> This patch is based on top of
>>>>>>>> https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t:
>>>>>>>>
>>>>>>>> When we use an async work queue to perform the function of pumping
>>>>>>>> usb requests to the usb controller, it is possible that thread scheduling
>>>>>>>> affects at what cadence we're able to pump requests. This could mean usb
>>>>>>>> requests miss their uframes - resulting in video stream flickers on the host
>>>>>>>> device.
>>>>>>>>
>>>>>>>> In this patch, we move the pumping of usb requests to
>>>>>>>> 1) uvcg_video_complete() complete handler for both isoc + bulk
>>>>>>>>     endpoints. We still send 0 length requests when there is no uvc buffer
>>>>>>>>     available to encode.
>>>>>>> This means you will end up copying large amounts of data in interrupt
>>>>>>> context. The work queue was there to avoid exactly that, as it will
>>>>>>> introduce delays that can affect other parts of the system. I think this
>>>>>>> is a problem.
>>>>>> Regarding Thin's argument about possible scheduling latency that is already
>>>>>> introducing real errors, this seemed like a good solution.
>>>>>>
>>>>>> But sure, this potential latency introduced in the interrupt context can
>>>>>> trigger other side effects.
>>>>>>
>>>>>> However I think we need some compromise since both arguments are very valid.
>>>>> Agreed.
>>>>>
>>>>>> Any ideas, how to solve this?
>>>>> I'm afraid not.
>>>> We discussed this and came to the conclusion that we could make use of
>>>> kthread_create and sched_setattr with an attr->sched_policy = SCHED_DEADLINE
>>>> here instead of the workqueue. This way we would ensure that the worker
>>>> would be triggered with hard definitions.
>>>>
>>>> Since the SG case is not that heavy on the completion handler, we could
>>>> also make this kthread conditionaly to the memcpy case.
>>> If you don't mind a naive suggestion from someone who knows nothing
>>> about the driver...
>>>
>>> An attractive possibility is to have the work queue (or kthread) do the
>>> time-consuming copying, but leave the submission up to the completion
>>> handler.  If the data isn't ready (or there's no data to send) when the
>>> handler runs, then queue a 0-length request.
>>>
>>> That will give you the best of both worlds: low latency while in
>>> interrupt context and a steady, constant flow of USB transfers at all
>>> times.  The question of how to schedule the work queue or kthread is a
>>> separate matter, not directly relevant to this design decision.
>> That's it. This is probably the best way to tackle the overall problem.
>>
>> So we leave the call of the encode callback to the worker, that will
>> probably still can be a workqueue. The complete callback is calling
>> the explicit uvcg_video_ep_queue when prepared requests are available
>> and if there is nothing pending it will just enqueue zero requests.
>>
>> Thank you Alan, this makes so much sense!
>>
>> Jayant, Laurent: Do you agree?
>> If yes, Jayant will you change the patch accordingly?
>>
>>
> Thanks for all the discussion Greg, Michael, Laurent and Alan.
> Apologies for not responding earlier since I am OOO.
>
> While I  haven't tried this out this does seem like a very good idea.
> Thank you Alan! I will aim to make changes and post a patch on Monday night PST.

I got caught up with some work which is taking longer than expected. Apologies for the
delay :) I'm testing some things out right now. I hope to be able to post a patch in the
next couple of days. Thanks for your patience.

Jayant
Jayant Chowdhary Nov. 2, 2023, 6:06 a.m. UTC | #11
Hi,

On 10/30/23 23:11, Jayant Chowdhary wrote:
> Hi,
>
> On 10/28/23 07:09, Jayant Chowdhary wrote:
>> Hi,
>>
>> On 10/28/23 04:10, Michael Grzeschik wrote:
>>> On Fri, Oct 27, 2023 at 10:58:11AM -0400, Alan Stern wrote:
>>>> On Fri, Oct 27, 2023 at 03:39:44PM +0200, Michael Grzeschik wrote:
>>>>> On Fri, Oct 27, 2023 at 02:47:52PM +0300, Laurent Pinchart wrote:
>>>>>> On Fri, Oct 27, 2023 at 01:10:21PM +0200, Michael Grzeschik wrote:
>>>>>>> On Fri, Oct 27, 2023 at 10:51:17AM +0300, Laurent Pinchart wrote:
>>>>>>>> On Thu, Oct 26, 2023 at 09:56:35PM +0000, Jayant Chowdhary wrote:
>>>>>>>>> This patch is based on top of
>>>>>>>>> https://lore.kernel.org/linux-usb/20230930184821.310143-1-arakesh@google.com/T/#t:
>>>>>>>>>
>>>>>>>>> When we use an async work queue to perform the function of pumping
>>>>>>>>> usb requests to the usb controller, it is possible that thread scheduling
>>>>>>>>> affects at what cadence we're able to pump requests. This could mean usb
>>>>>>>>> requests miss their uframes - resulting in video stream flickers on the host
>>>>>>>>> device.
>>>>>>>>>
>>>>>>>>> In this patch, we move the pumping of usb requests to
>>>>>>>>> 1) uvcg_video_complete() complete handler for both isoc + bulk
>>>>>>>>>     endpoints. We still send 0 length requests when there is no uvc buffer
>>>>>>>>>     available to encode.
>>>>>>>> This means you will end up copying large amounts of data in interrupt
>>>>>>>> context. The work queue was there to avoid exactly that, as it will
>>>>>>>> introduce delays that can affect other parts of the system. I think this
>>>>>>>> is a problem.
>>>>>>> Regarding Thin's argument about possible scheduling latency that is already
>>>>>>> introducing real errors, this seemed like a good solution.
>>>>>>>
>>>>>>> But sure, this potential latency introduced in the interrupt context can
>>>>>>> trigger other side effects.
>>>>>>>
>>>>>>> However I think we need some compromise since both arguments are very valid.
>>>>>> Agreed.
>>>>>>
>>>>>>> Any ideas, how to solve this?
>>>>>> I'm afraid not.
>>>>> We discussed this and came to the conclusion that we could make use of
>>>>> kthread_create and sched_setattr with an attr->sched_policy = SCHED_DEADLINE
>>>>> here instead of the workqueue. This way we would ensure that the worker
>>>>> would be triggered with hard definitions.
>>>>>
>>>>> Since the SG case is not that heavy on the completion handler, we could
>>>>> also make this kthread conditionaly to the memcpy case.
>>>> If you don't mind a naive suggestion from someone who knows nothing
>>>> about the driver...
>>>>
>>>> An attractive possibility is to have the work queue (or kthread) do the
>>>> time-consuming copying, but leave the submission up to the completion
>>>> handler.  If the data isn't ready (or there's no data to send) when the
>>>> handler runs, then queue a 0-length request.
>>>>
>>>> That will give you the best of both worlds: low latency while in
>>>> interrupt context and a steady, constant flow of USB transfers at all
>>>> times.  The question of how to schedule the work queue or kthread is a
>>>> separate matter, not directly relevant to this design decision.
>>> That's it. This is probably the best way to tackle the overall problem.
>>>
>>> So we leave the call of the encode callback to the worker, that will
>>> probably still can be a workqueue. The complete callback is calling
>>> the explicit uvcg_video_ep_queue when prepared requests are available
>>> and if there is nothing pending it will just enqueue zero requests.
>>>
>>> Thank you Alan, this makes so much sense!
>>>
>>> Jayant, Laurent: Do you agree?
>>> If yes, Jayant will you change the patch accordingly?
>>>
>>>
>> Thanks for all the discussion Greg, Michael, Laurent and Alan.
>> Apologies for not responding earlier since I am OOO.
>>
>> While I  haven't tried this out this does seem like a very good idea.
>> Thank you Alan! I will aim to make changes and post a patch on Monday night PST.
> I got caught up with some work which is taking longer than expected. Apologies for the
> delay :) I'm testing some things out right now. I hope to be able to post a patch in the
> next couple of days. Thanks for your patience.

I posted another patch at https://lore.kernel.org/linux-usb/20231102060120.1159112-1-jchowdhary@google.com/T/#u.
I've not split this into 2 patches since here, we have a common function that handles both the bulk and isoc
cases and I feel they're logically related. 

Thank you

Jayant
diff mbox series

Patch

diff --git a/drivers/usb/gadget/function/f_uvc.c b/drivers/usb/gadget/function/f_uvc.c
index ae08341961eb..53cb2539486d 100644
--- a/drivers/usb/gadget/function/f_uvc.c
+++ b/drivers/usb/gadget/function/f_uvc.c
@@ -959,14 +959,10 @@  static void uvc_function_unbind(struct usb_configuration *c,
 {
 	struct usb_composite_dev *cdev = c->cdev;
 	struct uvc_device *uvc = to_uvc(f);
-	struct uvc_video *video = &uvc->video;
 	long wait_ret = 1;
 
 	uvcg_info(f, "%s()\n", __func__);
 
-	if (video->async_wq)
-		destroy_workqueue(video->async_wq);
-
 	/*
 	 * If we know we're connected via v4l2, then there should be a cleanup
 	 * of the device from userspace either via UVC_EVENT_DISCONNECT or
diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
index be0d012aa244..498f344fda4b 100644
--- a/drivers/usb/gadget/function/uvc.h
+++ b/drivers/usb/gadget/function/uvc.h
@@ -88,9 +88,6 @@  struct uvc_video {
 	struct uvc_device *uvc;
 	struct usb_ep *ep;
 
-	struct work_struct pump;
-	struct workqueue_struct *async_wq;
-
 	/* Frame parameters */
 	u8 bpp;
 	u32 fcc;
@@ -116,6 +113,7 @@  struct uvc_video {
 	/* Context data used by the completion handler */
 	__u32 payload_size;
 	__u32 max_payload_size;
+	bool is_bulk;
 
 	struct uvc_video_queue queue;
 	unsigned int fid;
diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
index f4d2e24835d4..678ea6df7b5c 100644
--- a/drivers/usb/gadget/function/uvc_v4l2.c
+++ b/drivers/usb/gadget/function/uvc_v4l2.c
@@ -414,10 +414,7 @@  uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
 	ret = uvcg_queue_buffer(&video->queue, b);
 	if (ret < 0)
 		return ret;
-
-	if (uvc->state == UVC_STATE_STREAMING)
-		queue_work(video->async_wq, &video->pump);
-
+	uvcg_video_pump_qbuf(video);
 	return ret;
 }
 
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index ab3f02054e85..0fcd8e5edbac 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -24,6 +24,8 @@ 
  * Video codecs
  */
 
+static void uvcg_video_pump(struct uvc_video *video);
+
 static int
 uvc_video_encode_header(struct uvc_video *video, struct uvc_buffer *buf,
 		u8 *data, int len)
@@ -329,7 +331,9 @@  uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
 	 */
 	if (video->is_enabled) {
 		list_add_tail(&req->list, &video->req_free);
-		queue_work(video->async_wq, &video->pump);
+		spin_unlock_irqrestore(&video->req_lock, flags);
+		uvcg_video_pump(video);
+		return;
 	} else {
 		uvc_video_free_request(ureq, ep);
 	}
@@ -409,20 +413,31 @@  uvc_video_alloc_requests(struct uvc_video *video)
  * Video streaming
  */
 
+void uvcg_video_pump_qbuf(struct uvc_video *video)
+{
+	/*
+	 * Only call uvcg_video_pump() from qbuf, for bulk eps since
+	 * for isoc, the complete handler will call uvcg_video_pump()
+	 * consistently. Calling it for isoc eps, while correct
+	 * will increase contention for video->req_lock since the
+	 * complete handler will be called more often.
+	*/
+	if (video->is_bulk)
+		uvcg_video_pump(video);
+}
+
 /*
  * uvcg_video_pump - Pump video data into the USB requests
  *
  * This function fills the available USB requests (listed in req_free) with
  * video data from the queued buffers.
  */
-static void uvcg_video_pump(struct work_struct *work)
+static void uvcg_video_pump(struct uvc_video *video)
 {
-	struct uvc_video *video = container_of(work, struct uvc_video, pump);
 	struct uvc_video_queue *queue = &video->queue;
-	/* video->max_payload_size is only set when using bulk transfer */
-	bool is_bulk = video->max_payload_size;
 	struct usb_request *req = NULL;
-	struct uvc_buffer *buf;
+	struct uvc_request *ureq = NULL;
+	struct uvc_buffer *buf = NULL, *last_buf = NULL;
 	unsigned long flags;
 	bool buf_done;
 	int ret;
@@ -455,7 +470,8 @@  static void uvcg_video_pump(struct work_struct *work)
 		if (buf != NULL) {
 			video->encode(req, video, buf);
 			buf_done = buf->state == UVC_BUF_STATE_DONE;
-		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) && !is_bulk) {
+		} else if (!(queue->flags & UVC_QUEUE_DISCONNECTED) &&
+				!video->is_bulk) {
 			/*
 			 * No video buffer available; the queue is still connected and
 			 * we're transferring over ISOC. Queue a 0 length request to
@@ -500,18 +516,30 @@  static void uvcg_video_pump(struct work_struct *work)
 			req->no_interrupt = 1;
 		}
 
-		/* Queue the USB request */
-		ret = uvcg_video_ep_queue(video, req);
 		spin_unlock_irqrestore(&queue->irqlock, flags);
-
+		spin_lock_irqsave(&video->req_lock, flags);
+		if (video->is_enabled) {
+			/* Queue the USB request */
+			ret = uvcg_video_ep_queue(video, req);
+			/* Endpoint now owns the request */
+			req = NULL;
+			video->req_int_count++;
+		} else {
+			ret =  -ENODEV;
+			ureq = req->context;
+			last_buf = ureq->last_buf;
+			ureq->last_buf = NULL;
+		}
+		spin_unlock_irqrestore(&video->req_lock, flags);
 		if (ret < 0) {
+			if (last_buf != NULL) {
+				// Return the buffer to the queue in the case the
+				// request was not queued to the ep.
+				uvcg_complete_buffer(&video->queue, last_buf);
+			}
 			uvcg_queue_cancel(queue, 0);
 			break;
 		}
-
-		/* Endpoint now owns the request */
-		req = NULL;
-		video->req_int_count++;
 	}
 
 	if (!req)
@@ -556,7 +584,6 @@  uvcg_video_disable(struct uvc_video *video)
 	}
 	spin_unlock_irqrestore(&video->req_lock, flags);
 
-	cancel_work_sync(&video->pump);
 	uvcg_queue_cancel(&video->queue, 0);
 
 	spin_lock_irqsave(&video->req_lock, flags);
@@ -626,14 +653,16 @@  int uvcg_video_enable(struct uvc_video *video, int enable)
 	if (video->max_payload_size) {
 		video->encode = uvc_video_encode_bulk;
 		video->payload_size = 0;
-	} else
+		video->is_bulk = true;
+	} else {
 		video->encode = video->queue.use_sg ?
 			uvc_video_encode_isoc_sg : uvc_video_encode_isoc;
+		video->is_bulk = false;
+	}
 
 	video->req_int_count = 0;
 
-	queue_work(video->async_wq, &video->pump);
-
+	uvcg_video_pump(video);
 	return ret;
 }
 
@@ -646,12 +675,6 @@  int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
 	INIT_LIST_HEAD(&video->ureqs);
 	INIT_LIST_HEAD(&video->req_free);
 	spin_lock_init(&video->req_lock);
-	INIT_WORK(&video->pump, uvcg_video_pump);
-
-	/* Allocate a work queue for asynchronous video pump handler. */
-	video->async_wq = alloc_workqueue("uvcgadget", WQ_UNBOUND | WQ_HIGHPRI, 0);
-	if (!video->async_wq)
-		return -EINVAL;
 
 	video->uvc = uvc;
 	video->fcc = V4L2_PIX_FMT_YUYV;
diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h
index 03adeefa343b..29c6b9a2e9c3 100644
--- a/drivers/usb/gadget/function/uvc_video.h
+++ b/drivers/usb/gadget/function/uvc_video.h
@@ -18,4 +18,6 @@  int uvcg_video_enable(struct uvc_video *video, int enable);
 
 int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc);
 
+void uvcg_video_pump_qbuf(struct uvc_video *video);
+
 #endif /* __UVC_VIDEO_H__ */