diff mbox series

usb: gadget: uvc: use pump call conditionally

Message ID 20211202005852.3538102-1-m.grzeschik@pengutronix.de (mailing list archive)
State New, archived
Headers show
Series usb: gadget: uvc: use pump call conditionally | expand

Commit Message

Michael Grzeschik Dec. 2, 2021, 12:58 a.m. UTC
Preparing the usb request is not very expensive, when using
scatter gather. In that case we can even remove the overhead
of the extra pump worker and call the pump function directly.

Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
---
 drivers/usb/gadget/function/uvc_v4l2.c  |  8 +++++--
 drivers/usb/gadget/function/uvc_video.c | 28 +++++++++++++++++++------
 drivers/usb/gadget/function/uvc_video.h |  2 ++
 3 files changed, 30 insertions(+), 8 deletions(-)

Comments

Greg KH Dec. 3, 2021, 12:55 p.m. UTC | #1
On Thu, Dec 02, 2021 at 01:58:52AM +0100, Michael Grzeschik wrote:
> Preparing the usb request is not very expensive, when using
> scatter gather. In that case we can even remove the overhead
> of the extra pump worker and call the pump function directly.
> 
> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
> ---
>  drivers/usb/gadget/function/uvc_v4l2.c  |  8 +++++--
>  drivers/usb/gadget/function/uvc_video.c | 28 +++++++++++++++++++------
>  drivers/usb/gadget/function/uvc_video.h |  2 ++
>  3 files changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
> index a2c78690c5c288..020b4adc7840e0 100644
> --- a/drivers/usb/gadget/function/uvc_v4l2.c
> +++ b/drivers/usb/gadget/function/uvc_v4l2.c
> @@ -169,8 +169,12 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
>  	if (ret < 0)
>  		return ret;
>  
> -	if (uvc->state == UVC_STATE_STREAMING)
> -		schedule_work(&video->pump);
> +	if (uvc->state == UVC_STATE_STREAMING) {
> +		if (!video->queue.use_sg)
> +			schedule_work(&video->pump);
> +		else
> +			uvcg_video_pump(video);

Wouldn't it be easier to understand this if you flip the if test around:
		if (video->queue.use_sg)
			uvcg_video_pump(video);
		else
			schedule_work(&video->pump);
?

Negagive logic is never fun to read...

Also, are you sure that sg really is not expensive on all systems?  What
did you test this on, and what was the results?

thanks,

greg k-h
Michael Grzeschik Dec. 5, 2021, 9:49 p.m. UTC | #2
On Fri, Dec 03, 2021 at 01:55:01PM +0100, Greg KH wrote:
>On Thu, Dec 02, 2021 at 01:58:52AM +0100, Michael Grzeschik wrote:
>> Preparing the usb request is not very expensive, when using
>> scatter gather. In that case we can even remove the overhead
>> of the extra pump worker and call the pump function directly.
>>
>> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
>> ---
>>  drivers/usb/gadget/function/uvc_v4l2.c  |  8 +++++--
>>  drivers/usb/gadget/function/uvc_video.c | 28 +++++++++++++++++++------
>>  drivers/usb/gadget/function/uvc_video.h |  2 ++
>>  3 files changed, 30 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
>> index a2c78690c5c288..020b4adc7840e0 100644
>> --- a/drivers/usb/gadget/function/uvc_v4l2.c
>> +++ b/drivers/usb/gadget/function/uvc_v4l2.c
>> @@ -169,8 +169,12 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
>>  	if (ret < 0)
>>  		return ret;
>>
>> -	if (uvc->state == UVC_STATE_STREAMING)
>> -		schedule_work(&video->pump);
>> +	if (uvc->state == UVC_STATE_STREAMING) {
>> +		if (!video->queue.use_sg)
>> +			schedule_work(&video->pump);
>> +		else
>> +			uvcg_video_pump(video);
>
>Wouldn't it be easier to understand this if you flip the if test around:
>		if (video->queue.use_sg)
>			uvcg_video_pump(video);
>		else
>			schedule_work(&video->pump);
>?
>
>Negagive logic is never fun to read...

Yes, you are not wrong.

>Also, are you sure that sg really is not expensive on all systems?  What
>did you test this on, and what was the results?

I tested it on an zynqmp arm64 machine. I tried to test the sg case on
an 32 bit IMX with chipidea, but the driver was complaining a lot about
"not page aligned sg buffers". So if needed, I would first need to find
a working machine to compare this with.

However I would think that assigning some pointers on a scatterlist
instead of doing memcpy of 1024 bytes should be less expensive.

Regards,
Michael
Greg KH Dec. 6, 2021, 7:34 a.m. UTC | #3
On Sun, Dec 05, 2021 at 10:49:58PM +0100, Michael Grzeschik wrote:
> On Fri, Dec 03, 2021 at 01:55:01PM +0100, Greg KH wrote:
> > On Thu, Dec 02, 2021 at 01:58:52AM +0100, Michael Grzeschik wrote:
> > > Preparing the usb request is not very expensive, when using
> > > scatter gather. In that case we can even remove the overhead
> > > of the extra pump worker and call the pump function directly.
> > > 
> > > Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
> > > ---
> > >  drivers/usb/gadget/function/uvc_v4l2.c  |  8 +++++--
> > >  drivers/usb/gadget/function/uvc_video.c | 28 +++++++++++++++++++------
> > >  drivers/usb/gadget/function/uvc_video.h |  2 ++
> > >  3 files changed, 30 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
> > > index a2c78690c5c288..020b4adc7840e0 100644
> > > --- a/drivers/usb/gadget/function/uvc_v4l2.c
> > > +++ b/drivers/usb/gadget/function/uvc_v4l2.c
> > > @@ -169,8 +169,12 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
> > >  	if (ret < 0)
> > >  		return ret;
> > > 
> > > -	if (uvc->state == UVC_STATE_STREAMING)
> > > -		schedule_work(&video->pump);
> > > +	if (uvc->state == UVC_STATE_STREAMING) {
> > > +		if (!video->queue.use_sg)
> > > +			schedule_work(&video->pump);
> > > +		else
> > > +			uvcg_video_pump(video);
> > 
> > Wouldn't it be easier to understand this if you flip the if test around:
> > 		if (video->queue.use_sg)
> > 			uvcg_video_pump(video);
> > 		else
> > 			schedule_work(&video->pump);
> > ?
> > 
> > Negagive logic is never fun to read...
> 
> Yes, you are not wrong.
> 
> > Also, are you sure that sg really is not expensive on all systems?  What
> > did you test this on, and what was the results?
> 
> I tested it on an zynqmp arm64 machine. I tried to test the sg case on
> an 32 bit IMX with chipidea, but the driver was complaining a lot about
> "not page aligned sg buffers". So if needed, I would first need to find
> a working machine to compare this with.
> 
> However I would think that assigning some pointers on a scatterlist
> instead of doing memcpy of 1024 bytes should be less expensive.

Not true on many systems, memcpy can be _very_ fast, especially for
small amounts like 1024 bytes.  So please, measure this to be sure.

thanks,

greg k-h
Laurent Pinchart Dec. 6, 2021, 12:15 p.m. UTC | #4
Hello,

On Mon, Dec 06, 2021 at 08:34:17AM +0100, Greg KH wrote:
> On Sun, Dec 05, 2021 at 10:49:58PM +0100, Michael Grzeschik wrote:
> > On Fri, Dec 03, 2021 at 01:55:01PM +0100, Greg KH wrote:
> > > On Thu, Dec 02, 2021 at 01:58:52AM +0100, Michael Grzeschik wrote:
> > > > Preparing the usb request is not very expensive, when using
> > > > scatter gather. In that case we can even remove the overhead
> > > > of the extra pump worker and call the pump function directly.
> > > > 
> > > > Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
> > > > ---
> > > >  drivers/usb/gadget/function/uvc_v4l2.c  |  8 +++++--
> > > >  drivers/usb/gadget/function/uvc_video.c | 28 +++++++++++++++++++------
> > > >  drivers/usb/gadget/function/uvc_video.h |  2 ++
> > > >  3 files changed, 30 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
> > > > index a2c78690c5c288..020b4adc7840e0 100644
> > > > --- a/drivers/usb/gadget/function/uvc_v4l2.c
> > > > +++ b/drivers/usb/gadget/function/uvc_v4l2.c
> > > > @@ -169,8 +169,12 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
> > > >  	if (ret < 0)
> > > >  		return ret;
> > > > 
> > > > -	if (uvc->state == UVC_STATE_STREAMING)
> > > > -		schedule_work(&video->pump);
> > > > +	if (uvc->state == UVC_STATE_STREAMING) {
> > > > +		if (!video->queue.use_sg)
> > > > +			schedule_work(&video->pump);
> > > > +		else
> > > > +			uvcg_video_pump(video);
> > > 
> > > Wouldn't it be easier to understand this if you flip the if test around:
> > > 		if (video->queue.use_sg)
> > > 			uvcg_video_pump(video);
> > > 		else
> > > 			schedule_work(&video->pump);
> > > ?
> > > 
> > > Negagive logic is never fun to read...
> > 
> > Yes, you are not wrong.
> > 
> > > Also, are you sure that sg really is not expensive on all systems?  What
> > > did you test this on, and what was the results?
> > 
> > I tested it on an zynqmp arm64 machine. I tried to test the sg case on
> > an 32 bit IMX with chipidea, but the driver was complaining a lot about
> > "not page aligned sg buffers". So if needed, I would first need to find
> > a working machine to compare this with.
> > 
> > However I would think that assigning some pointers on a scatterlist
> > instead of doing memcpy of 1024 bytes should be less expensive.
> 
> Not true on many systems, memcpy can be _very_ fast, especially for
> small amounts like 1024 bytes.  So please, measure this to be sure.

We've moved memcpy()s to a work queue in the host UVC driver for
high-bandwidth devices, which brough massive performance improvements
(if I recall correctly, at least partly due to the parallelization of
memcpy operations). I'm not sure we have measured performances in the
gadget driver with the same level of accuracy and care though.

Michael, do you plan to make measurements ?
Michael Grzeschik Dec. 6, 2021, 1:30 p.m. UTC | #5
On Mon, Dec 06, 2021 at 02:15:02PM +0200, Laurent Pinchart wrote:
>Hello,
>
>On Mon, Dec 06, 2021 at 08:34:17AM +0100, Greg KH wrote:
>> On Sun, Dec 05, 2021 at 10:49:58PM +0100, Michael Grzeschik wrote:
>> > On Fri, Dec 03, 2021 at 01:55:01PM +0100, Greg KH wrote:
>> > > On Thu, Dec 02, 2021 at 01:58:52AM +0100, Michael Grzeschik wrote:
>> > > > Preparing the usb request is not very expensive, when using
>> > > > scatter gather. In that case we can even remove the overhead
>> > > > of the extra pump worker and call the pump function directly.
>> > > >
>> > > > Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
>> > > > ---
>> > > >  drivers/usb/gadget/function/uvc_v4l2.c  |  8 +++++--
>> > > >  drivers/usb/gadget/function/uvc_video.c | 28 +++++++++++++++++++------
>> > > >  drivers/usb/gadget/function/uvc_video.h |  2 ++
>> > > >  3 files changed, 30 insertions(+), 8 deletions(-)
>> > > >
>> > > > diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
>> > > > index a2c78690c5c288..020b4adc7840e0 100644
>> > > > --- a/drivers/usb/gadget/function/uvc_v4l2.c
>> > > > +++ b/drivers/usb/gadget/function/uvc_v4l2.c
>> > > > @@ -169,8 +169,12 @@ uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
>> > > >  	if (ret < 0)
>> > > >  		return ret;
>> > > >
>> > > > -	if (uvc->state == UVC_STATE_STREAMING)
>> > > > -		schedule_work(&video->pump);
>> > > > +	if (uvc->state == UVC_STATE_STREAMING) {
>> > > > +		if (!video->queue.use_sg)
>> > > > +			schedule_work(&video->pump);
>> > > > +		else
>> > > > +			uvcg_video_pump(video);
>> > >
>> > > Wouldn't it be easier to understand this if you flip the if test around:
>> > > 		if (video->queue.use_sg)
>> > > 			uvcg_video_pump(video);
>> > > 		else
>> > > 			schedule_work(&video->pump);
>> > > ?
>> > >
>> > > Negagive logic is never fun to read...
>> >
>> > Yes, you are not wrong.
>> >
>> > > Also, are you sure that sg really is not expensive on all systems?  What
>> > > did you test this on, and what was the results?
>> >
>> > I tested it on an zynqmp arm64 machine. I tried to test the sg case on
>> > an 32 bit IMX with chipidea, but the driver was complaining a lot about
>> > "not page aligned sg buffers". So if needed, I would first need to find
>> > a working machine to compare this with.
>> >
>> > However I would think that assigning some pointers on a scatterlist
>> > instead of doing memcpy of 1024 bytes should be less expensive.
>>
>> Not true on many systems, memcpy can be _very_ fast, especially for
>> small amounts like 1024 bytes.  So please, measure this to be sure.
>
>We've moved memcpy()s to a work queue in the host UVC driver for
>high-bandwidth devices, which brough massive performance improvements
>(if I recall correctly, at least partly due to the parallelization of
>memcpy operations). I'm not sure we have measured performances in the
>gadget driver with the same level of accuracy and care though.
>
>Michael, do you plan to make measurements ?

I would do it, but only if I can find time and hardware to test it on.
For now, my agenda says to get the other patches mainline first. So yes,
but only in some later undefined time.

Michael
diff mbox series

Patch

diff --git a/drivers/usb/gadget/function/uvc_v4l2.c b/drivers/usb/gadget/function/uvc_v4l2.c
index a2c78690c5c288..020b4adc7840e0 100644
--- a/drivers/usb/gadget/function/uvc_v4l2.c
+++ b/drivers/usb/gadget/function/uvc_v4l2.c
@@ -169,8 +169,12 @@  uvc_v4l2_qbuf(struct file *file, void *fh, struct v4l2_buffer *b)
 	if (ret < 0)
 		return ret;
 
-	if (uvc->state == UVC_STATE_STREAMING)
-		schedule_work(&video->pump);
+	if (uvc->state == UVC_STATE_STREAMING) {
+		if (!video->queue.use_sg)
+			schedule_work(&video->pump);
+		else
+			uvcg_video_pump(video);
+	}
 
 	return ret;
 }
diff --git a/drivers/usb/gadget/function/uvc_video.c b/drivers/usb/gadget/function/uvc_video.c
index 7f59a0c4740209..933c2b831fe747 100644
--- a/drivers/usb/gadget/function/uvc_video.c
+++ b/drivers/usb/gadget/function/uvc_video.c
@@ -268,8 +268,12 @@  uvc_video_complete(struct usb_ep *ep, struct usb_request *req)
 	list_add_tail(&req->list, &video->req_free);
 	spin_unlock_irqrestore(&video->req_lock, flags);
 
-	if (uvc->state == UVC_STATE_STREAMING)
-		schedule_work(&video->pump);
+	if (uvc->state == UVC_STATE_STREAMING) {
+		if (!queue->use_sg)
+			schedule_work(&video->pump);
+		else
+			uvcg_video_pump(video);
+	}
 }
 
 static int
@@ -359,9 +363,8 @@  uvc_video_alloc_requests(struct uvc_video *video)
  * This function fills the available USB requests (listed in req_free) with
  * video data from the queued buffers.
  */
-static void uvcg_video_pump(struct work_struct *work)
+void uvcg_video_pump(struct uvc_video *video)
 {
-	struct uvc_video *video = container_of(work, struct uvc_video, pump);
 	struct uvc_video_queue *queue = &video->queue;
 	struct usb_request *req = NULL;
 	struct uvc_buffer *buf;
@@ -427,6 +430,14 @@  static void uvcg_video_pump(struct work_struct *work)
 	return;
 }
 
+
+static void uvcg_video_pump_t(struct work_struct *work)
+{
+	struct uvc_video *video = container_of(work, struct uvc_video, pump);
+
+	uvcg_video_pump(video);
+}
+
 /*
  * Enable or disable the video stream.
  */
@@ -469,7 +480,10 @@  int uvcg_video_enable(struct uvc_video *video, int enable)
 
 	video->req_int_count = 0;
 
-	schedule_work(&video->pump);
+	if (!video->queue.use_sg)
+		schedule_work(&video->pump);
+	else
+		uvcg_video_pump(video);
 
 	return ret;
 }
@@ -481,7 +495,9 @@  int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc)
 {
 	INIT_LIST_HEAD(&video->req_free);
 	spin_lock_init(&video->req_lock);
-	INIT_WORK(&video->pump, uvcg_video_pump);
+
+	if (!video->queue.use_sg)
+		INIT_WORK(&video->pump, uvcg_video_pump_t);
 
 	video->uvc = uvc;
 	video->fcc = V4L2_PIX_FMT_YUYV;
diff --git a/drivers/usb/gadget/function/uvc_video.h b/drivers/usb/gadget/function/uvc_video.h
index 03adeefa343b71..d4ad61dd568974 100644
--- a/drivers/usb/gadget/function/uvc_video.h
+++ b/drivers/usb/gadget/function/uvc_video.h
@@ -14,6 +14,8 @@ 
 
 struct uvc_video;
 
+void uvcg_video_pump(struct uvc_video *video);
+
 int uvcg_video_enable(struct uvc_video *video, int enable);
 
 int uvcg_video_init(struct uvc_video *video, struct uvc_device *uvc);