diff mbox series

[3/3] media: stk1160: use dma_alloc_noncontiguous API

Message ID 20220111065505.6323-4-dafna.hirschfeld@collabora.com (mailing list archive)
State New, archived
Headers show
Series media: stk1160: allocate urb buffs with the DMA noncontiguous API | expand

Commit Message

Dafna Hirschfeld Jan. 11, 2022, 6:55 a.m. UTC
Replace the urb buffers allocation to
use the noncontiguous API. This improve performance
on Arm.
The noncontiguous API require handling
synchronization.
This commit is similar to the one sent to
uvc: [1]

[1] https://lkml.org/lkml/2021/3/12/1506

Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
---
 drivers/media/usb/stk1160/stk1160-v4l.c   |   3 +
 drivers/media/usb/stk1160/stk1160-video.c | 109 +++++++++++++---------
 drivers/media/usb/stk1160/stk1160.h       |  10 ++
 3 files changed, 79 insertions(+), 43 deletions(-)

Comments

Christoph Hellwig Jan. 11, 2022, 8:13 a.m. UTC | #1
On Tue, Jan 11, 2022 at 08:55:05AM +0200, Dafna Hirschfeld wrote:
> Replace the urb buffers allocation to
> use the noncontiguous API. This improve performance
> on Arm.
> The noncontiguous API require handling
> synchronization.
> This commit is similar to the one sent to
> uvc: [1]

Strange formatting.  I'd flow this as:

Replace the urb buffers allocation to use the noncontiguous API.  This
improve performance on ARM plattform (XXX: insert why here)
The noncontiguous API requires the driver to handle synchronization.
This commit is similar to this one for the uvc driver:

    https://lkml.org/lkml/2021/3/12/1506

> @@ -310,6 +311,9 @@ static void stk1160_isoc_irq(struct urb *urb)
>  		return;
>  	}
>  
> +	dma_sync_sgtable_for_cpu(stk1160_get_dmadev(dev), stk_urb->sgt, DMA_FROM_DEVICE);
> +	invalidate_kernel_vmap_range(stk_urb->transfer_buffer, urb->transfer_buffer_length);

Besisdes the unreadably long lines, I'd invalidate the vmap range before
the direct mapping range.
Dafna Hirschfeld Jan. 11, 2022, 8:50 a.m. UTC | #2
On 11.01.22 10:13, Christoph Hellwig wrote:
> On Tue, Jan 11, 2022 at 08:55:05AM +0200, Dafna Hirschfeld wrote:
>> Replace the urb buffers allocation to
>> use the noncontiguous API. This improve performance
>> on Arm.
>> The noncontiguous API require handling
>> synchronization.
>> This commit is similar to the one sent to
>> uvc: [1]
> 
> Strange formatting.  I'd flow this as:
> 
> Replace the urb buffers allocation to use the noncontiguous API.  This
> improve performance on ARM plattform (XXX: insert why here)
> The noncontiguous API requires the driver to handle synchronization.
> This commit is similar to this one for the uvc driver:
> 
>      https://lkml.org/lkml/2021/3/12/1506

I'll reformulate and explain.

> 
>> @@ -310,6 +311,9 @@ static void stk1160_isoc_irq(struct urb *urb)
>>   		return;
>>   	}
>>   
>> +	dma_sync_sgtable_for_cpu(stk1160_get_dmadev(dev), stk_urb->sgt, DMA_FROM_DEVICE);
>> +	invalidate_kernel_vmap_range(stk_urb->transfer_buffer, urb->transfer_buffer_length);
> 
> Besisdes the unreadably long lines, I'd invalidate the vmap range before
> the direct mapping range.

I'll send v2 with shorter lines. (the official limit is now 100 char which I still follow).
I can send v2 with limit of 80.

You mean you would call "invalidate_kernel_vmap_range" before "dma_sync_sgtable_for_cpu" ?

Thanks,
Dafna

>
Christoph Hellwig Jan. 11, 2022, 8:52 a.m. UTC | #3
On Tue, Jan 11, 2022 at 10:50:50AM +0200, Dafna Hirschfeld wrote:
> I'll send v2 with shorter lines. (the official limit is now 100 char which I still follow).

No.  It is 80 lines with an exception to go over it if it sigificantly
improves readability.

> You mean you would call "invalidate_kernel_vmap_range" before "dma_sync_sgtable_for_cpu" ?

Yes.
Dafna Hirschfeld Jan. 11, 2022, 8:55 a.m. UTC | #4
On 11.01.22 10:52, Christoph Hellwig wrote:
> On Tue, Jan 11, 2022 at 10:50:50AM +0200, Dafna Hirschfeld wrote:
>> I'll send v2 with shorter lines. (the official limit is now 100 char which I still follow).
> 
> No.  It is 80 lines with an exception to go over it if it sigificantly
> improves readability.

Ok, didn't know that.

> 
>> You mean you would call "invalidate_kernel_vmap_range" before "dma_sync_sgtable_for_cpu" ?
> 
> Yes.

Could you explain why?

>
Christoph Hellwig Jan. 11, 2022, 8:59 a.m. UTC | #5
On Tue, Jan 11, 2022 at 10:55:47AM +0200, Dafna Hirschfeld wrote:
>
>
> On 11.01.22 10:52, Christoph Hellwig wrote:
>> On Tue, Jan 11, 2022 at 10:50:50AM +0200, Dafna Hirschfeld wrote:
>>> I'll send v2 with shorter lines. (the official limit is now 100 char which I still follow).
>>
>> No.  It is 80 lines with an exception to go over it if it sigificantly
>> improves readability.
>
> Ok, didn't know that.
>
>>
>>> You mean you would call "invalidate_kernel_vmap_range" before "dma_sync_sgtable_for_cpu" ?
>>
>> Yes.
>
> Could you explain why?

the vmap range is the one actually use for cpu access and thus most
prone for speculation, so I'd invalidate it first.  It probably does
not matter to much, but that order looks more sensible.
Ezequiel Garcia Jan. 11, 2022, 12:38 p.m. UTC | #6
Hi Dafna,

Very nice work.

I specially like all the testing that you mentioned in the cover-letter.

Back in the day, there were a few users trying to use STK1160 (Easycap)
with Beaglebone boards, without success due to lack of USB throughput.

If anything else, I think it's good to see additional noncontiguous API users.

On Tue, Jan 11, 2022 at 08:55:05AM +0200, Dafna Hirschfeld wrote:
> Replace the urb buffers allocation to
> use the noncontiguous API. This improve performance
> on Arm.
> The noncontiguous API require handling
> synchronization.
> This commit is similar to the one sent to
> uvc: [1]
> 
> [1] https://lkml.org/lkml/2021/3/12/1506
> 

This commit description needs lots of attention. In particular,
please add the test results here.

> Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
> ---
>  drivers/media/usb/stk1160/stk1160-v4l.c   |   3 +
>  drivers/media/usb/stk1160/stk1160-video.c | 109 +++++++++++++---------
>  drivers/media/usb/stk1160/stk1160.h       |  10 ++
>  3 files changed, 79 insertions(+), 43 deletions(-)
> 
[..]
> @@ -501,7 +524,7 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
>  
>  free_i_bufs:
>  	/* Save the allocated buffers so far, so we can properly free them */
> -	dev->isoc_ctl.num_bufs = i+1;
> +	dev->isoc_ctl.num_bufs = i;

This looks like a separate fix, similar to 1/3 ?

>  	stk1160_free_isoc(dev);
>  	return -ENOMEM;
>  }
> diff --git a/drivers/media/usb/stk1160/stk1160.h b/drivers/media/usb/stk1160/stk1160.h
> index 1ffca1343d88..52bea7815ae5 100644
> --- a/drivers/media/usb/stk1160/stk1160.h
> +++ b/drivers/media/usb/stk1160/stk1160.h
> @@ -16,6 +16,8 @@
>  #include <media/videobuf2-v4l2.h>
>  #include <media/v4l2-device.h>
>  #include <media/v4l2-ctrls.h>
> +#include <linux/usb.h>
> +#include <linux/usb/hcd.h>
>  
>  #define STK1160_VERSION		"0.9.5"
>  #define STK1160_VERSION_NUM	0x000905
> @@ -87,6 +89,9 @@ struct stk1160_buffer {
>  struct stk1160_urb {
>  	struct urb *urb;
>  	char *transfer_buffer;
> +	struct sg_table *sgt;
> +	struct stk1160 *dev;
> +	dma_addr_t dma;
>  };
>  
>  struct stk1160_isoc_ctl {
> @@ -190,3 +195,8 @@ void stk1160_select_input(struct stk1160 *dev);
>  
>  /* Provided by stk1160-ac97.c */
>  void stk1160_ac97_setup(struct stk1160 *dev);
> +
> +static inline struct device *stk1160_get_dmadev(struct stk1160 *dev)
> +{
> +	return bus_to_hcd(dev->udev->bus)->self.sysdev;

This function looks truly horrible, is there no other way to get the device ?

I suppose this function return something different than (struct stk1160*)dev->dev ?

Thanks,
Ezequiel
Sergey Senozhatsky Jan. 12, 2022, 9:25 a.m. UTC | #7
On (22/01/11 09:59), Christoph Hellwig wrote:
[..]
> >>> You mean you would call "invalidate_kernel_vmap_range" before "dma_sync_sgtable_for_cpu" ?
> >>
> >> Yes.
> >
> > Could you explain why?
> 
> the vmap range is the one actually use for cpu access and thus most
> prone for speculation, so I'd invalidate it first.  It probably does
> not matter to much, but that order looks more sensible.

Shall we do the same in videobuf2?

----

diff --git a/drivers/media/common/videobuf2/videobuf2-dma-contig.c b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
index 7c4096e62173..0e3f264122af 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-contig.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-contig.c
@@ -132,12 +132,12 @@ static void vb2_dc_prepare(void *buf_priv)
 	if (!buf->non_coherent_mem)
 		return;
 
-	/* For both USERPTR and non-coherent MMAP */
-	dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
-
 	/* Non-coherent MMAP only */
 	if (buf->vaddr)
 		flush_kernel_vmap_range(buf->vaddr, buf->size);
+
+	/* For both USERPTR and non-coherent MMAP */
+	dma_sync_sgtable_for_device(buf->dev, sgt, buf->dma_dir);
 }
 
 static void vb2_dc_finish(void *buf_priv)
@@ -152,12 +152,12 @@ static void vb2_dc_finish(void *buf_priv)
 	if (!buf->non_coherent_mem)
 		return;
 
-	/* For both USERPTR and non-coherent MMAP */
-	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
-
 	/* Non-coherent MMAP only */
 	if (buf->vaddr)
 		invalidate_kernel_vmap_range(buf->vaddr, buf->size);
+
+	/* For both USERPTR and non-coherent MMAP */
+	dma_sync_sgtable_for_cpu(buf->dev, sgt, buf->dma_dir);
 }
 
 /*********************************************/
Dafna Hirschfeld Jan. 24, 2022, 2:43 p.m. UTC | #8
On 11.01.22 14:38, Ezequiel Garcia wrote:
> Hi Dafna,
> 
> Very nice work.
> 
> I specially like all the testing that you mentioned in the cover-letter.
> 
> Back in the day, there were a few users trying to use STK1160 (Easycap)
> with Beaglebone boards, without success due to lack of USB throughput.

Hi, what do you mean by "USB throughput" ?
I see that the FPS does not change with this patchset and stands on about 25 frames/sec
So this patchset only improve cpu time.

> 
> If anything else, I think it's good to see additional noncontiguous API users.
> 
> On Tue, Jan 11, 2022 at 08:55:05AM +0200, Dafna Hirschfeld wrote:
>> Replace the urb buffers allocation to
>> use the noncontiguous API. This improve performance
>> on Arm.
>> The noncontiguous API require handling
>> synchronization.
>> This commit is similar to the one sent to
>> uvc: [1]
>>
>> [1] https://lkml.org/lkml/2021/3/12/1506
>>
> 
> This commit description needs lots of attention. In particular,
> please add the test results here.
> 
>> Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
>> ---
>>   drivers/media/usb/stk1160/stk1160-v4l.c   |   3 +
>>   drivers/media/usb/stk1160/stk1160-video.c | 109 +++++++++++++---------
>>   drivers/media/usb/stk1160/stk1160.h       |  10 ++
>>   3 files changed, 79 insertions(+), 43 deletions(-)
>>
> [..]
>> @@ -501,7 +524,7 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
>>   
>>   free_i_bufs:
>>   	/* Save the allocated buffers so far, so we can properly free them */
>> -	dev->isoc_ctl.num_bufs = i+1;
>> +	dev->isoc_ctl.num_bufs = i;
> 
> This looks like a separate fix, similar to 1/3 ?
> 
>>   	stk1160_free_isoc(dev);
>>   	return -ENOMEM;
>>   }
>> diff --git a/drivers/media/usb/stk1160/stk1160.h b/drivers/media/usb/stk1160/stk1160.h
>> index 1ffca1343d88..52bea7815ae5 100644
>> --- a/drivers/media/usb/stk1160/stk1160.h
>> +++ b/drivers/media/usb/stk1160/stk1160.h
>> @@ -16,6 +16,8 @@
>>   #include <media/videobuf2-v4l2.h>
>>   #include <media/v4l2-device.h>
>>   #include <media/v4l2-ctrls.h>
>> +#include <linux/usb.h>
>> +#include <linux/usb/hcd.h>
>>   
>>   #define STK1160_VERSION		"0.9.5"
>>   #define STK1160_VERSION_NUM	0x000905
>> @@ -87,6 +89,9 @@ struct stk1160_buffer {
>>   struct stk1160_urb {
>>   	struct urb *urb;
>>   	char *transfer_buffer;
>> +	struct sg_table *sgt;
>> +	struct stk1160 *dev;
>> +	dma_addr_t dma;
>>   };
>>   
>>   struct stk1160_isoc_ctl {
>> @@ -190,3 +195,8 @@ void stk1160_select_input(struct stk1160 *dev);
>>   
>>   /* Provided by stk1160-ac97.c */
>>   void stk1160_ac97_setup(struct stk1160 *dev);
>> +
>> +static inline struct device *stk1160_get_dmadev(struct stk1160 *dev)
>> +{
>> +	return bus_to_hcd(dev->udev->bus)->self.sysdev;
> 
> This function looks truly horrible, is there no other way to get the device ?
> 
> I suppose this function return something different than (struct stk1160*)dev->dev ?

I remember I tried using this device and got an error that it is not cable of doing dma
allocations or something like that. The function "stk1160_get_dmadev" is a copy-paste from the
uvc equivalent:

static inline struct device *uvc_stream_to_dmadev(struct uvc_streaming *stream)
{
         return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
}

Thanks,
Dafna

> 
> Thanks,
> Ezequiel
>
Dafna Hirschfeld Jan. 24, 2022, 4:02 p.m. UTC | #9
On 24.01.22 16:43, Dafna Hirschfeld wrote:
> 
> 
> On 11.01.22 14:38, Ezequiel Garcia wrote:
>> Hi Dafna,
>>
>> Very nice work.
>>
>> I specially like all the testing that you mentioned in the cover-letter.
>>
>> Back in the day, there were a few users trying to use STK1160 (Easycap)
>> with Beaglebone boards, without success due to lack of USB throughput.
> 
> Hi, what do you mean by "USB throughput" ?
> I see that the FPS does not change with this patchset and stands on about 25 frames/sec
> So this patchset only improve cpu time.
> 
>>
>> If anything else, I think it's good to see additional noncontiguous API users.
>>
>> On Tue, Jan 11, 2022 at 08:55:05AM +0200, Dafna Hirschfeld wrote:
>>> Replace the urb buffers allocation to
>>> use the noncontiguous API. This improve performance
>>> on Arm.
>>> The noncontiguous API require handling
>>> synchronization.
>>> This commit is similar to the one sent to
>>> uvc: [1]
>>>
>>> [1] https://lkml.org/lkml/2021/3/12/1506
>>>
>>
>> This commit description needs lots of attention. In particular,
>> please add the test results here.
>>
>>> Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com>
>>> ---
>>>   drivers/media/usb/stk1160/stk1160-v4l.c   |   3 +
>>>   drivers/media/usb/stk1160/stk1160-video.c | 109 +++++++++++++---------
>>>   drivers/media/usb/stk1160/stk1160.h       |  10 ++
>>>   3 files changed, 79 insertions(+), 43 deletions(-)
>>>
>> [..]
>>> @@ -501,7 +524,7 @@ int stk1160_alloc_isoc(struct stk1160 *dev)
>>>   free_i_bufs:
>>>       /* Save the allocated buffers so far, so we can properly free them */
>>> -    dev->isoc_ctl.num_bufs = i+1;
>>> +    dev->isoc_ctl.num_bufs = i;
>>
>> This looks like a separate fix, similar to 1/3 ?
>>
>>>       stk1160_free_isoc(dev);
>>>       return -ENOMEM;
>>>   }
>>> diff --git a/drivers/media/usb/stk1160/stk1160.h b/drivers/media/usb/stk1160/stk1160.h
>>> index 1ffca1343d88..52bea7815ae5 100644
>>> --- a/drivers/media/usb/stk1160/stk1160.h
>>> +++ b/drivers/media/usb/stk1160/stk1160.h
>>> @@ -16,6 +16,8 @@
>>>   #include <media/videobuf2-v4l2.h>
>>>   #include <media/v4l2-device.h>
>>>   #include <media/v4l2-ctrls.h>
>>> +#include <linux/usb.h>
>>> +#include <linux/usb/hcd.h>
>>>   #define STK1160_VERSION        "0.9.5"
>>>   #define STK1160_VERSION_NUM    0x000905
>>> @@ -87,6 +89,9 @@ struct stk1160_buffer {
>>>   struct stk1160_urb {
>>>       struct urb *urb;
>>>       char *transfer_buffer;
>>> +    struct sg_table *sgt;
>>> +    struct stk1160 *dev;
>>> +    dma_addr_t dma;
>>>   };
>>>   struct stk1160_isoc_ctl {
>>> @@ -190,3 +195,8 @@ void stk1160_select_input(struct stk1160 *dev);
>>>   /* Provided by stk1160-ac97.c */
>>>   void stk1160_ac97_setup(struct stk1160 *dev);
>>> +
>>> +static inline struct device *stk1160_get_dmadev(struct stk1160 *dev)
>>> +{
>>> +    return bus_to_hcd(dev->udev->bus)->self.sysdev;
>>
>> This function looks truly horrible, is there no other way to get the device ?
>>
>> I suppose this function return something different than (struct stk1160*)dev->dev ?
> 
> I remember I tried using this device and got an error that it is not cable of doing dma
> allocations or something like that. The function "stk1160_get_dmadev" is a copy-paste from the
> uvc equivalent:
> 
> static inline struct device *uvc_stream_to_dmadev(struct uvc_streaming *stream)
> {
>          return bus_to_hcd(stream->dev->udev->bus)->self.sysdev;
> }
> 
> Thanks,
> Dafna

There are 3 'struct device' related, when printing their names with:

         pr_err("dmadev device is %s\n", dev_name(stk1160_get_dmadev(dev)));
         pr_err("udev->dev device is %s\n", dev_name(&dev->udev->dev));
         pr_err("dev device is %s\n", dev_name(dev->dev));

I get:

[  107.770870] dmadev device is fe900000.usb
[  107.771253] udev->dev device is 7-1
[  107.771561] dev device is 7-1:1.0

If I try to use 'dev->dev' instead of the above stk1160_get_dmadev,
streaming fails and I get two warnings. The first one is originated from:

https://elixir.bootlin.com/linux/v5.16-rc4/source/kernel/dma/mapping.c#L546

Which is:
WARN_ON_ONCE(!dev->coherent_dma_mask)

I wonder if that warning is required in our case since we remove the alloc_coherent usage.

The second warning in videobuf2-core.c:1612 seems to be a bug in the driver itself.
I'll look at that.


[   65.554406] ------------[ cut here ]------------
[   65.554815] WARNING: CPU: 1 PID: 593 at /home/dafna/git/easycap_media_tree/kernel/dma/mapping.c:546 __dma_alloc_pages+0x78/0xc8
[   65.555836] Modules linked in: snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_soc_hdmi_codec dw_hdmi_i2s_audio saa7115 stk1160 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc crct10dif_ce panfrost snd_soc_simple_card snd_soc_audio_graph_card snd_soc_spdif_tx snd_soc_simple_card_utils gpu_sched phy_rockchip_pcie snd_soc_rockchip_i2s rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi cec drm_kms_helper drm rtc_rk808 rockchip_saradc industrialio_triggered_buffer kfifo_buf rockchip_thermal pcie_rockchip_host ip_tables x_tables ipv6
[   65.560265] CPU: 1 PID: 593 Comm: v4l2src0:src Not tainted 5.16.0-rc4-62408-g32447129cb30-dirty #14
[   65.561269] Hardware name: Radxa ROCK Pi 4B (DT)
[   65.561676] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   65.562288] pc : __dma_alloc_pages+0x78/0xc8
[   65.562849] lr : dma_alloc_noncontiguous+0x128/0x1a8
[   65.563288] sp : ffff800012bc39b0
[   65.563579] x29: ffff800012bc39b0 x28: 0000000000000000 x27: 0000000000030000
[   65.564213] x26: ffff00000908a950 x25: ffff0000018af830 x24: 0000000000410080
[   65.564843] x23: 0000000000000002 x22: ffff000005c88090 x21: 0000000000000cc0
[   65.565473] x20: ffff0000018af830 x19: 0000000000030000 x18: ffffffffffffffff
[   65.566103] x17: 000000040044ffff x16: 00400032b5503510 x15: ffff800011323f78
[   65.566735] x14: 0000000000000000 x13: 302e313a312d3720 x12: 7369206563697665
[   65.567366] x11: 0000000005f5e0ff x10: 000000000000005d x9 : 00000000ffffffd0
[   65.567997] x8 : ffff000005c880a0 x7 : 0000000000000000 x6 : ffff800012bc3750
[   65.568628] x5 : ffff800011b671f8 x4 : 0000000000000cc0 x3 : 0000000000000002
[   65.569259] x2 : ffff000005c88090 x1 : 0000000000030000 x0 : 0000000000000000
[   65.569891] Call trace:
[   65.570110]  __dma_alloc_pages+0x78/0xc8
[   65.570459]  dma_alloc_noncontiguous+0x128/0x1a8
[   65.570868]  stk1160_alloc_isoc+0xf0/0x2b0 [stk1160]
[   65.571318]  start_streaming+0xfc/0x250 [stk1160]
[   65.571740]  vb2_start_streaming+0x6c/0x160 [videobuf2_common]
[   65.572273]  vb2_core_streamon+0x17c/0x1a8 [videobuf2_common]
[   65.572794]  vb2_streamon+0x54/0x88 [videobuf2_v4l2]
[   65.573244]  vb2_ioctl_streamon+0x54/0x60 [videobuf2_v4l2]
[   65.573736]  v4l_streamon+0x3c/0x50 [videodev]
[   65.574195]  __video_do_ioctl+0x1a4/0x428 [videodev]
[   65.574684]  video_usercopy+0x320/0x828 [videodev]
[   65.575158]  video_ioctl2+0x3c/0x58 [videodev]
[   65.575603]  v4l2_ioctl+0x60/0x90 [videodev]
[   65.576031]  __arm64_sys_ioctl+0xa8/0xe0
[   65.576381]  invoke_syscall+0x54/0x118
[   65.576718]  el0_svc_common.constprop.3+0x84/0x100
[   65.577144]  do_el0_svc+0x34/0xa0
[   65.577441]  el0_svc+0x1c/0x50
[   65.577716]  el0t_64_sync_handler+0x88/0xb0
[   65.578086]  el0t_64_sync+0x16c/0x170
[   65.578413] ---[ end trace 578e0ba07742170c ]---
[   65.583222] ------------[ cut here ]------------
[   65.583633] WARNING: CPU: 5 PID: 593 at /home/dafna/git/easycap_media_tree/drivers/media/common/videobuf2/videobuf2-core.c:1612 vb2_start_streaming+0xd4/0x160 [videobuf2_common]
[   65.585027] Modules linked in: snd_usb_audio snd_hwdep snd_usbmidi_lib snd_rawmidi snd_soc_hdmi_codec dw_hdmi_i2s_audio saa7115 stk1160 videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc crct10dif_ce panfrost snd_soc_simple_card snd_soc_audio_graph_card snd_soc_spdif_tx snd_soc_simple_card_utils gpu_sched phy_rockchip_pcie snd_soc_rockchip_i2s rockchipdrm analogix_dp dw_mipi_dsi dw_hdmi cec drm_kms_helper drm rtc_rk808 rockchip_saradc industrialio_triggered_buffer kfifo_buf rockchip_thermal pcie_rockchip_host ip_tables x_tables ipv6
[   65.589383] CPU: 5 PID: 593 Comm: v4l2src0:src Tainted: G        W         5.16.0-rc4-62408-g32447129cb30-dirty #14
[   65.590293] Hardware name: Radxa ROCK Pi 4B (DT)
[   65.590696] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   65.591304] pc : vb2_start_streaming+0xd4/0x160 [videobuf2_common]
[   65.591850] lr : vb2_start_streaming+0x6c/0x160 [videobuf2_common]
[   65.592395] sp : ffff800012bc3ad0
[   65.592685] x29: ffff800012bc3ad0 x28: 0000000000000000 x27: ffff800012bc3cd8
[   65.593312] x26: 0000000000000000 x25: ffff00000d8a7800 x24: 0000000040045612
[   65.593938] x23: ffff800011323000 x22: ffff800012bc3cd8 x21: ffff00000908a8b0
[   65.594562] x20: ffff00000908a8c8 x19: 00000000fffffff4 x18: ffffffffffffffff
[   65.595188] x17: 000000040044ffff x16: 00400034b5503510 x15: ffff800011323f78
[   65.595813] x14: ffff000013163886 x13: ffff000013163885 x12: 00000000000002ce
[   65.596439] x11: 0000000000000028 x10: 0000000000000001 x9 : 0000000000000228
[   65.597064] x8 : 0101010101010101 x7 : 7f7f7f7f7f7f7f7f x6 : fefefeff726c5e78
[   65.597690] x5 : ffff800012bc3990 x4 : 0000000000000000 x3 : ffff000009a34880
[   65.598315] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000007cd99f0
[   65.598940] Call trace:
[   65.599155]  vb2_start_streaming+0xd4/0x160 [videobuf2_common]
[   65.599672]  vb2_core_streamon+0x17c/0x1a8 [videobuf2_common]
[   65.600179]  vb2_streamon+0x54/0x88 [videobuf2_v4l2]
[   65.600619]  vb2_ioctl_streamon+0x54/0x60 [videobuf2_v4l2]
[   65.601103]  v4l_streamon+0x3c/0x50 [videodev]
[   65.601521]  __video_do_ioctl+0x1a4/0x428 [videodev]
[   65.601977]  video_usercopy+0x320/0x828 [videodev]
[   65.602419]  video_ioctl2+0x3c/0x58 [videodev]
[   65.602830]  v4l2_ioctl+0x60/0x90 [videodev]
[   65.603227]  __arm64_sys_ioctl+0xa8/0xe0
[   65.603576]  invoke_syscall+0x54/0x118
[   65.603911]  el0_svc_common.constprop.3+0x84/0x100
[   65.604332]  do_el0_svc+0x34/0xa0
[   65.604625]  el0_svc+0x1c/0x50
[   65.604897]  el0t_64_sync_handler+0x88/0xb0
[   65.605264]  el0t_64_sync+0x16c/0x170
[   65.605587] ---[ end trace 578e0ba07742170d ]---

Thanks,
Dafna

> 
>>
>> Thanks,
>> Ezequiel
>>
>
diff mbox series

Patch

diff --git a/drivers/media/usb/stk1160/stk1160-v4l.c b/drivers/media/usb/stk1160/stk1160-v4l.c
index a06030451db4..31c34734425b 100644
--- a/drivers/media/usb/stk1160/stk1160-v4l.c
+++ b/drivers/media/usb/stk1160/stk1160-v4l.c
@@ -232,6 +232,9 @@  static int stk1160_start_streaming(struct stk1160 *dev)
 
 	/* submit urbs and enables IRQ */
 	for (i = 0; i < dev->isoc_ctl.num_bufs; i++) {
+		struct stk1160_urb *stk_urb = &dev->isoc_ctl.stk1160_urb[i];
+
+		dma_sync_sgtable_for_device(stk1160_get_dmadev(dev), stk_urb->sgt, DMA_FROM_DEVICE);
 		rc = usb_submit_urb(dev->isoc_ctl.stk1160_urb[i].urb, GFP_KERNEL);
 		if (rc) {
 			stk1160_err("cannot submit urb[%d] (%d)\n", i, rc);
diff --git a/drivers/media/usb/stk1160/stk1160-video.c b/drivers/media/usb/stk1160/stk1160-video.c
index 4194d31b53bb..fdc0de2af4c0 100644
--- a/drivers/media/usb/stk1160/stk1160-video.c
+++ b/drivers/media/usb/stk1160/stk1160-video.c
@@ -295,7 +295,8 @@  static void stk1160_process_isoc(struct stk1160 *dev, struct urb *urb)
 static void stk1160_isoc_irq(struct urb *urb)
 {
 	int i, rc;
-	struct stk1160 *dev = urb->context;
+	struct stk1160_urb *stk_urb = urb->context;
+	struct stk1160 *dev = stk_urb->dev;
 
 	switch (urb->status) {
 	case 0:
@@ -310,6 +311,9 @@  static void stk1160_isoc_irq(struct urb *urb)
 		return;
 	}
 
+	dma_sync_sgtable_for_cpu(stk1160_get_dmadev(dev), stk_urb->sgt, DMA_FROM_DEVICE);
+	invalidate_kernel_vmap_range(stk_urb->transfer_buffer, urb->transfer_buffer_length);
+
 	stk1160_process_isoc(dev, urb);
 
 	/* Reset urb buffers */
@@ -318,6 +322,7 @@  static void stk1160_isoc_irq(struct urb *urb)
 		urb->iso_frame_desc[i].actual_length = 0;
 	}
 
+	dma_sync_sgtable_for_device(stk1160_get_dmadev(dev), stk_urb->sgt, DMA_FROM_DEVICE);
 	rc = usb_submit_urb(urb, GFP_ATOMIC);
 	if (rc)
 		stk1160_err("urb re-submit failed (%d)\n", rc);
@@ -353,37 +358,34 @@  void stk1160_cancel_isoc(struct stk1160 *dev)
 	stk1160_dbg("all urbs killed\n");
 }
 
+static void stk_free_urb_buffer(struct stk1160 *dev, struct stk1160_urb *stk_urb)
+{
+	struct device *dma_dev = stk1160_get_dmadev(dev);
+
+	dma_vunmap_noncontiguous(dma_dev, stk_urb->transfer_buffer);
+	dma_free_noncontiguous(dma_dev, stk_urb->urb->transfer_buffer_length, stk_urb->sgt,
+			       DMA_FROM_DEVICE);
+	usb_free_urb(stk_urb->urb);
+
+	stk_urb->transfer_buffer = NULL;
+	stk_urb->sgt = NULL;
+	stk_urb->urb = NULL;
+	stk_urb->dev = NULL;
+	stk_urb->dma = 0;
+}
+
 /*
  * Releases urb and transfer buffers
  * Obviusly, associated urb must be killed before releasing it.
  */
 void stk1160_free_isoc(struct stk1160 *dev)
 {
-	struct urb *urb;
 	int i, num_bufs = dev->isoc_ctl.num_bufs;
 
 	stk1160_dbg("freeing %d urb buffers...\n", num_bufs);
 
-	for (i = 0; i < num_bufs; i++) {
-
-		urb = dev->isoc_ctl.stk1160_urb[i].urb;
-		if (urb) {
-
-			if (dev->isoc_ctl.stk1160_urb[i].transfer_buffer) {
-#ifndef CONFIG_DMA_NONCOHERENT
-				usb_free_coherent(dev->udev,
-					urb->transfer_buffer_length,
-					dev->isoc_ctl.stk1160_urb[i].transfer_buffer,
-					urb->transfer_dma);
-#else
-				kfree(dev->isoc_ctl.stk1160_urb[i].transfer_buffer);
-#endif
-			}
-			usb_free_urb(urb);
-			dev->isoc_ctl.stk1160_urb[i].urb = NULL;
-		}
-		dev->isoc_ctl.stk1160_urb[i].transfer_buffer = NULL;
-	}
+	for (i = 0; i < num_bufs; i++)
+		stk_free_urb_buffer(dev, &dev->isoc_ctl.stk1160_urb[i]);
 
 	dev->isoc_ctl.num_bufs = 0;
 
@@ -400,6 +402,39 @@  void stk1160_uninit_isoc(struct stk1160 *dev)
 	stk1160_free_isoc(dev);
 }
 
+static int stk1160_fill_urb(struct stk1160 *dev, struct stk1160_urb *stk_urb, int sb_size,
+			    int max_packets)
+{
+	struct device *dma_dev = stk1160_get_dmadev(dev);
+
+	stk_urb->urb = usb_alloc_urb(max_packets, GFP_KERNEL);
+	if (!stk_urb->urb)
+		return -ENOMEM;
+	stk_urb->sgt = dma_alloc_noncontiguous(dma_dev, sb_size, DMA_FROM_DEVICE, GFP_KERNEL, 0);
+
+	/*
+	 * If the buffer allocation failed, we exit but return 0 since we allow driver
+	 * working with less buffers
+	 */
+	if (!stk_urb->sgt)
+		goto free_urb;
+
+	stk_urb->transfer_buffer = dma_vmap_noncontiguous(dma_dev, sb_size, stk_urb->sgt);
+	if (!stk_urb->transfer_buffer)
+		goto free_sgt;
+
+	stk_urb->dma = stk_urb->sgt->sgl->dma_address;
+	stk_urb->dev = dev;
+	return 0;
+free_sgt:
+	dma_free_noncontiguous(dma_dev, sb_size, stk_urb->sgt, DMA_FROM_DEVICE);
+	stk_urb->sgt = NULL;
+free_urb:
+	usb_free_urb(stk_urb->urb);
+	stk_urb->urb = NULL;
+
+	return 0;
+}
 /*
  * Allocate URBs
  */
@@ -407,6 +442,7 @@  int stk1160_alloc_isoc(struct stk1160 *dev)
 {
 	struct urb *urb;
 	int i, j, k, sb_size, max_packets, num_bufs;
+	int ret;
 
 	/*
 	 * It may be necessary to release isoc here,
@@ -428,21 +464,13 @@  int stk1160_alloc_isoc(struct stk1160 *dev)
 	/* allocate urbs and transfer buffers */
 	for (i = 0; i < num_bufs; i++) {
 
-		urb = usb_alloc_urb(max_packets, GFP_KERNEL);
-		if (!urb)
+		ret = stk1160_fill_urb(dev, &dev->isoc_ctl.stk1160_urb[i], sb_size, max_packets);
+		if (ret)
 			goto free_i_bufs;
-		dev->isoc_ctl.stk1160_urb[i].urb = urb;
-
-#ifndef CONFIG_DMA_NONCOHERENT
-		dev->isoc_ctl.stk1160_urb[i].transfer_buffer =
-			usb_alloc_coherent(dev->udev, sb_size, GFP_KERNEL, &urb->transfer_dma);
-#else
-		dev->isoc_ctl.stk1160_urb[i].transfer_buffer = kmalloc(sb_size, GFP_KERNEL);
-#endif
-		if (!dev->isoc_ctl.stk1160_urb[i].transfer_buffer) {
-			stk1160_err("cannot alloc %d bytes for tx[%d] buffer\n",
-				sb_size, i);
 
+		urb = dev->isoc_ctl.stk1160_urb[i].urb;
+
+		if (!urb) {
 			/* Not enough transfer buffers, so just give up */
 			if (i < STK1160_MIN_BUFS)
 				goto free_i_bufs;
@@ -458,15 +486,12 @@  int stk1160_alloc_isoc(struct stk1160 *dev)
 		urb->transfer_buffer = dev->isoc_ctl.stk1160_urb[i].transfer_buffer;
 		urb->transfer_buffer_length = sb_size;
 		urb->complete = stk1160_isoc_irq;
-		urb->context = dev;
+		urb->context = &dev->isoc_ctl.stk1160_urb[i];
 		urb->interval = 1;
 		urb->start_frame = 0;
 		urb->number_of_packets = max_packets;
-#ifndef CONFIG_DMA_NONCOHERENT
 		urb->transfer_flags = URB_ISO_ASAP | URB_NO_TRANSFER_DMA_MAP;
-#else
-		urb->transfer_flags = URB_ISO_ASAP;
-#endif
+		urb->transfer_dma = dev->isoc_ctl.stk1160_urb[i].dma;
 
 		k = 0;
 		for (j = 0; j < max_packets; j++) {
@@ -490,8 +515,6 @@  int stk1160_alloc_isoc(struct stk1160 *dev)
 	 * enough to work fine, so we just free the extra urb,
 	 * store the allocated count and keep going, fingers crossed!
 	 */
-	usb_free_urb(dev->isoc_ctl.stk1160_urb[i].urb);
-	dev->isoc_ctl.stk1160_urb[i].urb = NULL;
 
 	stk1160_warn("%d urbs allocated. Trying to continue...\n", i);
 
@@ -501,7 +524,7 @@  int stk1160_alloc_isoc(struct stk1160 *dev)
 
 free_i_bufs:
 	/* Save the allocated buffers so far, so we can properly free them */
-	dev->isoc_ctl.num_bufs = i+1;
+	dev->isoc_ctl.num_bufs = i;
 	stk1160_free_isoc(dev);
 	return -ENOMEM;
 }
diff --git a/drivers/media/usb/stk1160/stk1160.h b/drivers/media/usb/stk1160/stk1160.h
index 1ffca1343d88..52bea7815ae5 100644
--- a/drivers/media/usb/stk1160/stk1160.h
+++ b/drivers/media/usb/stk1160/stk1160.h
@@ -16,6 +16,8 @@ 
 #include <media/videobuf2-v4l2.h>
 #include <media/v4l2-device.h>
 #include <media/v4l2-ctrls.h>
+#include <linux/usb.h>
+#include <linux/usb/hcd.h>
 
 #define STK1160_VERSION		"0.9.5"
 #define STK1160_VERSION_NUM	0x000905
@@ -87,6 +89,9 @@  struct stk1160_buffer {
 struct stk1160_urb {
 	struct urb *urb;
 	char *transfer_buffer;
+	struct sg_table *sgt;
+	struct stk1160 *dev;
+	dma_addr_t dma;
 };
 
 struct stk1160_isoc_ctl {
@@ -190,3 +195,8 @@  void stk1160_select_input(struct stk1160 *dev);
 
 /* Provided by stk1160-ac97.c */
 void stk1160_ac97_setup(struct stk1160 *dev);
+
+static inline struct device *stk1160_get_dmadev(struct stk1160 *dev)
+{
+	return bus_to_hcd(dev->udev->bus)->self.sysdev;
+}