diff mbox series

[3/5] usb: gadget: uvc: increase worker prio to WQ_HIGHPRI

Message ID 20220402233914.3625405-4-m.grzeschik@pengutronix.de (mailing list archive)
State Superseded
Headers show
Series usb: gadget: uvc: fixes and improvements | expand

Commit Message

Michael Grzeschik April 2, 2022, 11:39 p.m. UTC
Likewise to the uvcvideo hostside driver, this patch is changing the
simple workqueue to an async_wq with higher priority. This ensures that
the worker will not be scheduled away while the video stream is handled.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
---
 drivers/usb/gadget/function/uvc.h       | 1 +
 drivers/usb/gadget/function/uvc_v4l2.c  | 2 +-
 drivers/usb/gadget/function/uvc_video.c | 9 +++++++--
 3 files changed, 9 insertions(+), 3 deletions(-)

Comments

Laurent Pinchart April 19, 2022, 8:46 p.m. UTC | #1
Hi Michael,

Thank you for the patch.

On Sun, Apr 03, 2022 at 01:39:12AM +0200, Michael Grzeschik wrote:
> Likewise to the uvcvideo hostside driver, this patch is changing the
> simple workqueue to an async_wq with higher priority. This ensures that
> the worker will not be scheduled away while the video stream is handled.
> 
> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
> ---
>  drivers/usb/gadget/function/uvc.h       | 1 +
>  drivers/usb/gadget/function/uvc_v4l2.c  | 2 +-
>  drivers/usb/gadget/function/uvc_video.c | 9 +++++++--
>  3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
> index c3607a32b98624..ab537acdae3184 100644
> --- a/drivers/usb/gadget/function/uvc.h
> +++ b/drivers/usb/gadget/function/uvc.h
> @@ -86,6 +86,7 @@ struct uvc_video {
>  	struct usb_ep *ep;
>  
>  	struct work_struct pump;
> +	struct workqueue_struct *async_wq;
>  
>  	/* Frame parameters */
>  	u8 bpp;
> diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
> index a2c78690c5c288..9b1488f7abd736 100644
> --- a/drivers/usb/gadget/function/uvc_v4l2.c
> +++ b/drivers/usb/gadget/function/uvc_v4l2.c
> @@ -170,7 +170,7 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
>  		return ret;
>  
>  	if (uvc->state == UVC_STATE_STREAMING)
> -		schedule_work(&video->pump);
> +		queue_work(video->async_wq, &video->pump);
>  
>  	return ret;
>  }
> diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
> index 7f59a0c4740209..b1075e23a61010 100644
> --- a/drivers/usb/gadget/function/uvc_video.c
> +++ b/drivers/usb/gadget/function/uvc_video.c
> @@ -269,7 +269,7 @@ uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
>  	spin_unlock_irqrestore(&video->req_lock, flags);
>  
>  	if (uvc->state == UVC_STATE_STREAMING)
> -		schedule_work(&video->pump);
> +		queue_work(video->async_wq, &video->pump);
>  }
>  
>  static int
> @@ -469,7 +469,7 @@ int uvcg_video_enable(struct uvc_video *video, int enable)
>  
>  	video->req_int_count = 0;
>  
> -	schedule_work(&video->pump);
> +	queue_work(video->async_wq, &video->pump);
>  
>  	return ret;
>  }
> @@ -483,6 +483,11 @@ int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
>  	spin_lock_init(&video->req_lock);
>  	INIT_WORK(&video->pump, uvcg_video_pump);
>  
> +	/* Allocate a stream specific work queue for asynchronous tasks. */

You can drop the "stream" here. The gadget driver handles a single
stream.

> +	video->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI, 0);

Unless I'm mistaken, an unbound work queue means that multiple CPUs will
handle tasks in parallel. Is that safe ?

> +	if (!video->async_wq)
> +		return -EINVAL;

No need to destroy the work queue somewhere ?

> +
>  	video->uvc = uvc;
>  	video->fcc = V4L2_PIX_FMT_YUYV;
>  	video->bpp = 16;
Dan Vacura April 29, 2022, 6:51 p.m. UTC | #2
Hi Michael,

Thanks for this change it improves the performance with the DWC3
controller on QCOM chips in an Android 5.10 kernel. I haven't tested the
scatter/gather path, so memcpy was used here via
uvc_video_encode_isoc(). I was able to get around 30% improvement (fps
on host side). I did modify the alloc to only set the WQ_HIGHPRI flag.

On Tue, Apr 19, 2022 at 11:46:57PM +0300, Laurent Pinchart wrote:
> Hi Michael,
> 
> Thank you for the patch.
> 
> On Sun, Apr 03, 2022 at 01:39:12AM +0200, Michael Grzeschik wrote:
> > Likewise to the uvcvideo hostside driver, this patch is changing the
> > simple workqueue to an async_wq with higher priority. This ensures that
> > the worker will not be scheduled away while the video stream is handled.
> > 
> > Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
> > +	video->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI, 0);
> 
> Unless I'm mistaken, an unbound work queue means that multiple CPUs will
> handle tasks in parallel. Is that safe ?

I found that with the WQ_UNBOUND flag I didn't see any performance
improvement to the baseline, perhaps related to cpu caching or
scheduling delays. I didn't notice any stability problems or concurrent
execution. Do you see any benefit to keeping the WQ_UNBOUND flag?

> 
> > +	if (!video->async_wq)
> > +		return -EINVAL;
> 
> -- 
> Regards,
> 
> Laurent Pinchart

Thanks,

Dan
Michael Grzeschik April 29, 2022, 8:01 p.m. UTC | #3
Hi Dan,
Hi Laurent,

On Fri, Apr 29, 2022 at 01:51:48PM -0500, Dan Vacura wrote:
>Thanks for this change it improves the performance with the DWC3
>controller on QCOM chips in an Android 5.10 kernel. I haven't tested the
>scatter/gather path, so memcpy was used here via
>uvc_video_encode_isoc(). I was able to get around 30% improvement (fps
>on host side). I did modify the alloc to only set the WQ_HIGHPRI flag.
>
>On Tue, Apr 19, 2022 at 11:46:57PM +0300, Laurent Pinchart wrote:
>> Thank you for the patch.
>>
>> On Sun, Apr 03, 2022 at 01:39:12AM +0200, Michael Grzeschik wrote:
>> > Likewise to the uvcvideo hostside driver, this patch is changing the
>> > simple workqueue to an async_wq with higher priority. This ensures that
>> > the worker will not be scheduled away while the video stream is handled.
>> >
>> > Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
>> > +	video->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI, 0);
>>
>> Unless I'm mistaken, an unbound work queue means that multiple CPUs will
>> handle tasks in parallel. Is that safe ?
>
>I found that with the WQ_UNBOUND flag I didn't see any performance
>improvement to the baseline, perhaps related to cpu caching or
>scheduling delays. I didn't notice any stability problems or concurrent
>execution. Do you see any benefit to keeping the WQ_UNBOUND flag?

I actually copied this from drivers/media/usb/uvc/uvc_driver.c ,
which is also allocating the workqueue with WQ_UNBOUND.

Look into drivers/media/usb/uvc/uvc_driver.c + 486

	stream->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI,

In my tests, continous streaming did not trigger any errors. In fact if
this would be unsafe, the issue would probably trigger early, numerous
and obvious on multicore cpus.

However, some users seem to have seen recent issues on unplugging the
cable while streaming. I have to check if this could be related.

>> > +	if (!video->async_wq)
>> > +		return -EINVAL;
>>
>> --
>> Regards,
>>
>> Laurent Pinchart
>
>Thanks,
>
>Dan
>
Michael Grzeschik May 2, 2022, 9 a.m. UTC | #4
Hi Dan,

On Fri, Apr 29, 2022 at 10:01:37PM +0200, Michael Grzeschik wrote:
>On Fri, Apr 29, 2022 at 01:51:48PM -0500, Dan Vacura wrote:
>>Thanks for this change it improves the performance with the DWC3
>>controller on QCOM chips in an Android 5.10 kernel. I haven't tested the
>>scatter/gather path, so memcpy was used here via
>>uvc_video_encode_isoc(). I was able to get around 30% improvement (fps
>>on host side). I did modify the alloc to only set the WQ_HIGHPRI flag.

I missed to ask you to try the WQ_CPU_INTENSIVE flag. It would be
interesting if you can see further improvement.

Regards,
Michael
Dan Vacura May 6, 2022, 9:49 p.m. UTC | #5
Hi Michael,

On Mon, May 02, 2022 at 11:00:03AM +0200, Michael Grzeschik wrote:
> Hi Dan,
> 
> On Fri, Apr 29, 2022 at 10:01:37PM +0200, Michael Grzeschik wrote:
> > On Fri, Apr 29, 2022 at 01:51:48PM -0500, Dan Vacura wrote:
> > > Thanks for this change it improves the performance with the DWC3
> > > controller on QCOM chips in an Android 5.10 kernel. I haven't tested the
> > > scatter/gather path, so memcpy was used here via
> > > uvc_video_encode_isoc(). I was able to get around 30% improvement (fps
> > > on host side). I did modify the alloc to only set the WQ_HIGHPRI flag.
> 
> I missed to ask you to try the WQ_CPU_INTENSIVE flag. It would be
> interesting if you can see further improvement.

I had some time to test this flag and I couldn't find any discernible
difference with it set or not.

Regards,

Dan
Laurent Pinchart Sept. 28, 2022, 8:12 p.m. UTC | #6
Hi Michael,

On Fri, Apr 29, 2022 at 10:01:37PM +0200, Michael Grzeschik wrote:
> Hi Dan,
> Hi Laurent,
> 
> On Fri, Apr 29, 2022 at 01:51:48PM -0500, Dan Vacura wrote:
> > Thanks for this change it improves the performance with the DWC3
> > controller on QCOM chips in an Android 5.10 kernel. I haven't tested the
> > scatter/gather path, so memcpy was used here via
> > uvc_video_encode_isoc(). I was able to get around 30% improvement (fps
> > on host side). I did modify the alloc to only set the WQ_HIGHPRI flag.
> >
> > On Tue, Apr 19, 2022 at 11:46:57PM +0300, Laurent Pinchart wrote:
> >> Thank you for the patch.
> >>
> >> On Sun, Apr 03, 2022 at 01:39:12AM +0200, Michael Grzeschik wrote:
> >> > Likewise to the uvcvideo hostside driver, this patch is changing the
> >> > simple workqueue to an async_wq with higher priority. This ensures that
> >> > the worker will not be scheduled away while the video stream is handled.
> >> >
> >> > Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
> >> > +	video->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI, 0);
> >>
> >> Unless I'm mistaken, an unbound work queue means that multiple CPUs will
> >> handle tasks in parallel. Is that safe ?
> >
> > I found that with the WQ_UNBOUND flag I didn't see any performance
> > improvement to the baseline, perhaps related to cpu caching or
> > scheduling delays. I didn't notice any stability problems or concurrent
> > execution. Do you see any benefit to keeping the WQ_UNBOUND flag?
> 
> I actually copied this from drivers/media/usb/uvc/uvc_driver.c ,
> which is also allocating the workqueue with WQ_UNBOUND.
> 
> Look into drivers/media/usb/uvc/uvc_driver.c + 486
> 
> 	stream->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI,

Just for the record, as a newer version of this patch has been merged,
the host-side uvcvideo driver is specifically made to handle multiple
work items in parallel. Each work item will essentially perform one or
multiple memcpy operations, with the size and offset calculated by the
code that dispatches the work items.

As Lucas separately commented, the UVC gadget driver has a single
work_struct, so there can't be any concurrency. We seem to be safe for
now.

> In my tests, continous streaming did not trigger any errors. In fact if
> this would be unsafe, the issue would probably trigger early, numerous
> and obvious on multicore cpus.
> 
> However, some users seem to have seen recent issues on unplugging the
> cable while streaming. I have to check if this could be related.
> 
> >> > +	if (!video->async_wq)
> >> > +		return -EINVAL;
diff mbox series

Patch

diff --git a/drivers/usb/gadget/function/uvc.h b/drivers/usb/gadget/function/uvc.h
index c3607a32b98624..ab537acdae3184 100644
--- a/drivers/usb/gadget/function/uvc.h
+++ b/drivers/usb/gadget/function/uvc.h
@@ -86,6 +86,7 @@  struct uvc_video {
 	struct usb_ep *ep;
 
 	struct work_struct pump;
+	struct workqueue_struct *async_wq;
 
 	/* Frame parameters */
 	u8 bpp;
diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
index a2c78690c5c288..9b1488f7abd736 100644
--- a/drivers/usb/gadget/function/uvc_v4l2.c
+++ b/drivers/usb/gadget/function/uvc_v4l2.c
@@ -170,7 +170,7 @@  uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
 		return ret;
 
 	if (uvc->state == UVC_STATE_STREAMING)
-		schedule_work(&video->pump);
+		queue_work(video->async_wq, &video->pump);
 
 	return ret;
 }
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 7f59a0c4740209..b1075e23a61010 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -269,7 +269,7 @@  uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
 	spin_unlock_irqrestore(&video->req_lock, flags);
 
 	if (uvc->state == UVC_STATE_STREAMING)
-		schedule_work(&video->pump);
+		queue_work(video->async_wq, &video->pump);
 }
 
 static int
@@ -469,7 +469,7 @@  int uvcg_video_enable(struct uvc_video *video, int enable)
 
 	video->req_int_count = 0;
 
-	schedule_work(&video->pump);
+	queue_work(video->async_wq, &video->pump);
 
 	return ret;
 }
@@ -483,6 +483,11 @@  int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
 	spin_lock_init(&video->req_lock);
 	INIT_WORK(&video->pump, uvcg_video_pump);
 
+	/* Allocate a stream specific work queue for asynchronous tasks. */
+	video->async_wq = alloc_workqueue("uvcvideo", WQ_UNBOUND | WQ_HIGHPRI, 0);
+	if (!video->async_wq)
+		return -EINVAL;
+
 	video->uvc = uvc;
 	video->fcc = V4L2_PIX_FMT_YUYV;
 	video->bpp = 16;