diff mbox series

[v2] media: rkisp1: Reduce min_queued_buffers to 0

Message ID 20241028143553.36704-1-jacopo.mondi@ideasonboard.com (mailing list archive)
State New
Headers show
Series [v2] media: rkisp1: Reduce min_queued_buffers to 0 | expand

Commit Message

Jacopo Mondi Oct. 28, 2024, 2:35 p.m. UTC
There apparently is no reason to require 3 queued buffers to call
streamon() for the RkISP1 as the driver operates with a scratch buffer
where frames can be directed to if there's no available buffer provided
by userspace.

Reduce the number of required buffers to 0 to allow applications to
operate by queueing capture buffers on-demand.

Tested with libcamera, by operating with a single capture request. The
same request (and associated capture buffer) gets recycled once
completed. This of course causes a frame rate drop but doesn't hinder
operations.

Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
---
The first version of this patch set min_queued_buffers to 1, but setting it
to 0 doesn't compromise operations and it's even better as it allows application
to queue buffers to the capture devices on-demand. If a buffer is not provided
to the DMA engines, image data gets directed to the driver's internal scratch
buffer.
---
 drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--
2.47.0

Comments

Hans Verkuil Oct. 28, 2024, 3:02 p.m. UTC | #1
On 28/10/2024 15:35, Jacopo Mondi wrote:
> There apparently is no reason to require 3 queued buffers to call
> streamon() for the RkISP1 as the driver operates with a scratch buffer
> where frames can be directed to if there's no available buffer provided
> by userspace.
> 
> Reduce the number of required buffers to 0 to allow applications to
> operate by queueing capture buffers on-demand.
> 
> Tested with libcamera, by operating with a single capture request. The
> same request (and associated capture buffer) gets recycled once
> completed. This of course causes a frame rate drop but doesn't hinder
> operations.
> 
> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> ---
> The first version of this patch set min_queued_buffers to 1, but setting it
> to 0 doesn't compromise operations and it's even better as it allows application
> to queue buffers to the capture devices on-demand. If a buffer is not provided
> to the DMA engines, image data gets directed to the driver's internal scratch
> buffer.
> ---
>  drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> index 2bddb4fa8a5c..5fcf9731f41b 100644
> --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> @@ -35,8 +35,6 @@
>  #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
>  #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"
> 
> -#define RKISP1_MIN_BUFFERS_NEEDED 3
> -
>  enum rkisp1_plane {
>  	RKISP1_PLANE_Y	= 0,
>  	RKISP1_PLANE_CB	= 1,
> @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
>  	q->ops = &rkisp1_vb2_ops;
>  	q->mem_ops = &vb2_dma_contig_memops;
>  	q->buf_struct_size = sizeof(struct rkisp1_buffer);
> -	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
> +	q->min_queued_buffers = 0;

You can probably just drop this since the vb2_queue struct is zeroed when it
is allocated. So no need to set it to 0.

And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change?

Also, see my RFC I posted today:

https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xs4all.nl/T/#u

My main concern is that applications that just call VIDIOC_REQBUFS with count = 1
and expect the driver to change that to a workable value, will, in fact, now just get
one buffer. And streaming that will cause lots of frame drops.

It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available,
but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called.

My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in
libcamera to get much better control over how many buffers should be allocated.

Regards,

	Hans

>  	q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
>  	q->lock = &node->vlock;
>  	q->dev = cap->rkisp1->dev;
> --
> 2.47.0
> 
>
Laurent Pinchart Oct. 28, 2024, 3:19 p.m. UTC | #2
Hi Hans, Jacopo,

On Mon, Oct 28, 2024 at 04:02:13PM +0100, Hans Verkuil wrote:
> On 28/10/2024 15:35, Jacopo Mondi wrote:
> > There apparently is no reason to require 3 queued buffers to call
> > streamon() for the RkISP1 as the driver operates with a scratch buffer
> > where frames can be directed to if there's no available buffer provided
> > by userspace.
> > 
> > Reduce the number of required buffers to 0 to allow applications to
> > operate by queueing capture buffers on-demand.
> > 
> > Tested with libcamera, by operating with a single capture request. The
> > same request (and associated capture buffer) gets recycled once
> > completed. This of course causes a frame rate drop but doesn't hinder
> > operations.
> > 
> > Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> > ---
> > The first version of this patch set min_queued_buffers to 1, but setting it
> > to 0 doesn't compromise operations and it's even better as it allows application
> > to queue buffers to the capture devices on-demand. If a buffer is not provided
> > to the DMA engines, image data gets directed to the driver's internal scratch
> > buffer.
> > ---
> >  drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > index 2bddb4fa8a5c..5fcf9731f41b 100644
> > --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > @@ -35,8 +35,6 @@
> >  #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
> >  #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"
> > 
> > -#define RKISP1_MIN_BUFFERS_NEEDED 3
> > -
> >  enum rkisp1_plane {
> >  	RKISP1_PLANE_Y	= 0,
> >  	RKISP1_PLANE_CB	= 1,
> > @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
> >  	q->ops = &rkisp1_vb2_ops;
> >  	q->mem_ops = &vb2_dma_contig_memops;
> >  	q->buf_struct_size = sizeof(struct rkisp1_buffer);
> > -	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
> > +	q->min_queued_buffers = 0;
> 
> You can probably just drop this since the vb2_queue struct is zeroed when it
> is allocated. So no need to set it to 0.
> 
> And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change?

Probably not, as the patch drops the define :-)

> Also, see my RFC I posted today:
> 
> https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xs4all.nl/T/#u
> 
> My main concern is that applications that just call VIDIOC_REQBUFS with count = 1
> and expect the driver to change that to a workable value, will, in fact, now just get
> one buffer. And streaming that will cause lots of frame drops.
> 
> It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available,
> but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called.

The rkisp1 driver requires a device-specific userspace, so I don't think
we need to care about old and/or dumb applications calling REQBUFS(1).

> My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in
> libcamera to get much better control over how many buffers should be allocated.
> 
> >  	q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
> >  	q->lock = &node->vlock;
> >  	q->dev = cap->rkisp1->dev;

Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Jacopo Mondi Oct. 28, 2024, 3:30 p.m. UTC | #3
Hi Hans

On Mon, Oct 28, 2024 at 04:02:13PM +0100, Hans Verkuil wrote:
> On 28/10/2024 15:35, Jacopo Mondi wrote:
> > There apparently is no reason to require 3 queued buffers to call
> > streamon() for the RkISP1 as the driver operates with a scratch buffer
> > where frames can be directed to if there's no available buffer provided
> > by userspace.
> >
> > Reduce the number of required buffers to 0 to allow applications to
> > operate by queueing capture buffers on-demand.
> >
> > Tested with libcamera, by operating with a single capture request. The
> > same request (and associated capture buffer) gets recycled once
> > completed. This of course causes a frame rate drop but doesn't hinder
> > operations.
> >
> > Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> > ---
> > The first version of this patch set min_queued_buffers to 1, but setting it
> > to 0 doesn't compromise operations and it's even better as it allows application
> > to queue buffers to the capture devices on-demand. If a buffer is not provided
> > to the DMA engines, image data gets directed to the driver's internal scratch
> > buffer.
> > ---
> >  drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > index 2bddb4fa8a5c..5fcf9731f41b 100644
> > --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > @@ -35,8 +35,6 @@
> >  #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
> >  #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"
> >
> > -#define RKISP1_MIN_BUFFERS_NEEDED 3
> > -
> >  enum rkisp1_plane {
> >  	RKISP1_PLANE_Y	= 0,
> >  	RKISP1_PLANE_CB	= 1,
> > @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
> >  	q->ops = &rkisp1_vb2_ops;
> >  	q->mem_ops = &vb2_dma_contig_memops;
> >  	q->buf_struct_size = sizeof(struct rkisp1_buffer);
> > -	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
> > +	q->min_queued_buffers = 0;
>
> You can probably just drop this since the vb2_queue struct is zeroed when it
> is allocated. So no need to set it to 0.

I suspected so :)

>
> And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change?

No, and this patch removes it in facts

 -#define RKISP1_MIN_BUFFERS_NEEDED 3
 -

>
> Also, see my RFC I posted today:
>
> https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xs4all.nl/T/#u
>
> My main concern is that applications that just call VIDIOC_REQBUFS with count = 1
> and expect the driver to change that to a workable value, will, in fact, now just get
> one buffer. And streaming that will cause lots of frame drops.
>
> It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available,
> but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called.

Yeah, I've read the discussion between you and Tomi and it seemed like
a good time to re-send this patch.

>
> My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in
> libcamera to get much better control over how many buffers should be allocated.
>

In my understanding min_queued_buffers identifies how many buffers
should be queued before calling start_streaming, and this comes
directly from an hw/driver requirement. This doesn't mean that at
least min_queue_buffers should be queued at all the times during
streaming, at least, I don't see how and where videobuf2 enforces
this. Or does it ?

If the above is correct, then the number of buffers to be queued
during streaming is, in my opinion, less an hw/driver requirement but
more an application decision. As you said an application should be good with
3 buffers (one queued, one currently being written to, one to be
consumed by the application), but in very specific cases where an
application retains the buffer for longer, for whatever reason, it
might need a larger number of queued buffers to provide the DMA
engines a space where to write data without them being discarded (to
scratch buffers or discarded by the DMA engine itself, if the HW
supports that). Or maybe an application is fine to drop frames and
only queue buffers sporadically (if the HW supports that ofc).

For libcamera, and for this specific platform in particular, we're
going to base new developments on the assumption that
min_queued_buffers == 0, and it would be more convenient for use to be
able to access its value from userspace to identify if we're running
on a kernel with or without this patch being applied.

Thanks
  j

> Regards,
>
> 	Hans
>
> >  	q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
> >  	q->lock = &node->vlock;
> >  	q->dev = cap->rkisp1->dev;
> > --
> > 2.47.0
> >
> >
>
Hans Verkuil Oct. 28, 2024, 3:48 p.m. UTC | #4
On 28/10/2024 16:30, Jacopo Mondi wrote:
> Hi Hans
> 
> On Mon, Oct 28, 2024 at 04:02:13PM +0100, Hans Verkuil wrote:
>> On 28/10/2024 15:35, Jacopo Mondi wrote:
>>> There apparently is no reason to require 3 queued buffers to call
>>> streamon() for the RkISP1 as the driver operates with a scratch buffer
>>> where frames can be directed to if there's no available buffer provided
>>> by userspace.
>>>
>>> Reduce the number of required buffers to 0 to allow applications to
>>> operate by queueing capture buffers on-demand.
>>>
>>> Tested with libcamera, by operating with a single capture request. The
>>> same request (and associated capture buffer) gets recycled once
>>> completed. This of course causes a frame rate drop but doesn't hinder
>>> operations.
>>>
>>> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
>>> ---
>>> The first version of this patch set min_queued_buffers to 1, but setting it
>>> to 0 doesn't compromise operations and it's even better as it allows application
>>> to queue buffers to the capture devices on-demand. If a buffer is not provided
>>> to the DMA engines, image data gets directed to the driver's internal scratch
>>> buffer.
>>> ---
>>>  drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
>>> index 2bddb4fa8a5c..5fcf9731f41b 100644
>>> --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
>>> +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
>>> @@ -35,8 +35,6 @@
>>>  #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
>>>  #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"
>>>
>>> -#define RKISP1_MIN_BUFFERS_NEEDED 3
>>> -
>>>  enum rkisp1_plane {
>>>  	RKISP1_PLANE_Y	= 0,
>>>  	RKISP1_PLANE_CB	= 1,
>>> @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
>>>  	q->ops = &rkisp1_vb2_ops;
>>>  	q->mem_ops = &vb2_dma_contig_memops;
>>>  	q->buf_struct_size = sizeof(struct rkisp1_buffer);
>>> -	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
>>> +	q->min_queued_buffers = 0;
>>
>> You can probably just drop this since the vb2_queue struct is zeroed when it
>> is allocated. So no need to set it to 0.
> 
> I suspected so :)
> 
>>
>> And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change?
> 
> No, and this patch removes it in facts
> 
>  -#define RKISP1_MIN_BUFFERS_NEEDED 3
>  -

I should have checked the patch :-) Sorry for the noise.

> 
>>
>> Also, see my RFC I posted today:
>>
>> https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xs4all.nl/T/#u
>>
>> My main concern is that applications that just call VIDIOC_REQBUFS with count = 1
>> and expect the driver to change that to a workable value, will, in fact, now just get
>> one buffer. And streaming that will cause lots of frame drops.
>>
>> It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available,
>> but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called.
> 
> Yeah, I've read the discussion between you and Tomi and it seemed like
> a good time to re-send this patch.
> 
>>
>> My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in
>> libcamera to get much better control over how many buffers should be allocated.
>>
> 
> In my understanding min_queued_buffers identifies how many buffers
> should be queued before calling start_streaming, and this comes
> directly from an hw/driver requirement. This doesn't mean that at
> least min_queue_buffers should be queued at all the times during
> streaming, at least, I don't see how and where videobuf2 enforces
> this. Or does it ?

It's an intrinsic property of the HW/driver: e.g. if it needs two buffers
queued up for the DMA engine to work, then it really is always holding on
to two buffers. The only thing the framework does is postpone calling
start_streaming until that number of buffers is queued to ensure the
DMA engine has what it needs to start. But after that vb2 doesn't check
it.

> 
> If the above is correct, then the number of buffers to be queued
> during streaming is, in my opinion, less an hw/driver requirement but
> more an application decision.

No, min_queued_buffers is a HW/drivers property: the DMA engine can't
start until that many buffers are queued up, and once it is started
it will always hold on to that many buffers.

So the application has to know somehow how many buffers are needed to
actually stream. One way is via VIDIOC_REQBUFS since that is supposed to
always return a workable number of buffers, the other is by actually
reporting the minimum number of buffers as per my RFC.

> As you said an application should be good with> 3 buffers (one queued, one currently being written to, one to be
> consumed by the application), but in very specific cases where an
> application retains the buffer for longer, for whatever reason, it
> might need a larger number of queued buffers to provide the DMA
> engines a space where to write data without them being discarded (to
> scratch buffers or discarded by the DMA engine itself, if the HW
> supports that). Or maybe an application is fine to drop frames and
> only queue buffers sporadically (if the HW supports that ofc).
> 
> For libcamera, and for this specific platform in particular, we're
> going to base new developments on the assumption that
> min_queued_buffers == 0, and it would be more convenient for use to be
> able to access its value from userspace to identify if we're running
> on a kernel with or without this patch being applied.

So my proposal in my RFC to expose min_num_buffers would work for libcamera?
It sounds like that's what you need.

Regards,

	Hans
Laurent Pinchart Oct. 28, 2024, 4:21 p.m. UTC | #5
Hi Hans,

On Mon, Oct 28, 2024 at 04:48:55PM +0100, Hans Verkuil wrote:
> On 28/10/2024 16:30, Jacopo Mondi wrote:
> > On Mon, Oct 28, 2024 at 04:02:13PM +0100, Hans Verkuil wrote:
> >> On 28/10/2024 15:35, Jacopo Mondi wrote:
> >>> There apparently is no reason to require 3 queued buffers to call
> >>> streamon() for the RkISP1 as the driver operates with a scratch buffer
> >>> where frames can be directed to if there's no available buffer provided
> >>> by userspace.
> >>>
> >>> Reduce the number of required buffers to 0 to allow applications to
> >>> operate by queueing capture buffers on-demand.
> >>>
> >>> Tested with libcamera, by operating with a single capture request. The
> >>> same request (and associated capture buffer) gets recycled once
> >>> completed. This of course causes a frame rate drop but doesn't hinder
> >>> operations.
> >>>
> >>> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> >>> ---
> >>> The first version of this patch set min_queued_buffers to 1, but setting it
> >>> to 0 doesn't compromise operations and it's even better as it allows application
> >>> to queue buffers to the capture devices on-demand. If a buffer is not provided
> >>> to the DMA engines, image data gets directed to the driver's internal scratch
> >>> buffer.
> >>> ---
> >>>  drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
> >>>  1 file changed, 1 insertion(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> >>> index 2bddb4fa8a5c..5fcf9731f41b 100644
> >>> --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> >>> +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> >>> @@ -35,8 +35,6 @@
> >>>  #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
> >>>  #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"
> >>>
> >>> -#define RKISP1_MIN_BUFFERS_NEEDED 3
> >>> -
> >>>  enum rkisp1_plane {
> >>>  	RKISP1_PLANE_Y	= 0,
> >>>  	RKISP1_PLANE_CB	= 1,
> >>> @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
> >>>  	q->ops = &rkisp1_vb2_ops;
> >>>  	q->mem_ops = &vb2_dma_contig_memops;
> >>>  	q->buf_struct_size = sizeof(struct rkisp1_buffer);
> >>> -	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
> >>> +	q->min_queued_buffers = 0;
> >>
> >> You can probably just drop this since the vb2_queue struct is zeroed when it
> >> is allocated. So no need to set it to 0.
> > 
> > I suspected so :)
> > 
> >>
> >> And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change?
> > 
> > No, and this patch removes it in facts
> > 
> >  -#define RKISP1_MIN_BUFFERS_NEEDED 3
> >  -
> 
> I should have checked the patch :-) Sorry for the noise.
> 
> >>
> >> Also, see my RFC I posted today:
> >>
> >> https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xs4all.nl/T/#u
> >>
> >> My main concern is that applications that just call VIDIOC_REQBUFS with count = 1
> >> and expect the driver to change that to a workable value, will, in fact, now just get
> >> one buffer. And streaming that will cause lots of frame drops.
> >>
> >> It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available,
> >> but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called.
> > 
> > Yeah, I've read the discussion between you and Tomi and it seemed like
> > a good time to re-send this patch.
> > 
> >> My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in
> >> libcamera to get much better control over how many buffers should be allocated.
> > 
> > In my understanding min_queued_buffers identifies how many buffers
> > should be queued before calling start_streaming, and this comes
> > directly from an hw/driver requirement. This doesn't mean that at
> > least min_queue_buffers should be queued at all the times during
> > streaming, at least, I don't see how and where videobuf2 enforces
> > this. Or does it ?
> 
> It's an intrinsic property of the HW/driver: e.g. if it needs two buffers
> queued up for the DMA engine to work, then it really is always holding on
> to two buffers. The only thing the framework does is postpone calling
> start_streaming until that number of buffers is queued to ensure the
> DMA engine has what it needs to start. But after that vb2 doesn't check
> it.

The "driver" part of "HW/driver" is important here, as drivers can
influence this in multiple ways. One of them is usage of scratch
buffers, but even without that, a DMA engine that requires two buffers
can easily be operated with a single buffer by programming the DMA
engine with the same buffer address twice. Drivers should really do so
unless they really can't.

> > If the above is correct, then the number of buffers to be queued
> > during streaming is, in my opinion, less an hw/driver requirement but
> > more an application decision.
> 
> No, min_queued_buffers is a HW/drivers property: the DMA engine can't
> start until that many buffers are queued up, and once it is started
> it will always hold on to that many buffers.

That's not always true. The imx7-media-csi driver, for instance, sets
min_queued_buffers to 2, but allocates scratch buffers and uses them at
runtime, so that it can return all queued buffers to userspace.
Grepping for min_queued_buffers I see drivers setting it to 4
(rcar-dma.c, rzg2l-video.c), 6 (cxusb-analog.c) or even 9
(zoran_driver.c) ! I doubt the zoran driver holds on to 9 buffers at
runtime. Your statement is not universally true today?.

This could be considered as driver issues, and the min_queued_buffers
values should be fixed to match the runtime behaviour. In some cases I
expect it will require more work than just changing the value, as
drivers may implement the logic to operate with less buffers at runtime
but not at start time. This would be fixable, but it may also call for
asking if the start at runtime behaviours need to be identical.

> So the application has to know somehow how many buffers are needed to
> actually stream. One way is via VIDIOC_REQBUFS since that is supposed to
> always return a workable number of buffers, the other is by actually
> reporting the minimum number of buffers as per my RFC.
> 
> > As you said an application should be good with> 3 buffers (one queued, one currently being written to, one to be
> > consumed by the application), but in very specific cases where an
> > application retains the buffer for longer, for whatever reason, it
> > might need a larger number of queued buffers to provide the DMA
> > engines a space where to write data without them being discarded (to
> > scratch buffers or discarded by the DMA engine itself, if the HW
> > supports that). Or maybe an application is fine to drop frames and
> > only queue buffers sporadically (if the HW supports that ofc).
> > 
> > For libcamera, and for this specific platform in particular, we're
> > going to base new developments on the assumption that
> > min_queued_buffers == 0, and it would be more convenient for use to be
> > able to access its value from userspace to identify if we're running
> > on a kernel with or without this patch being applied.
> 
> So my proposal in my RFC to expose min_num_buffers would work for libcamera?
> It sounds like that's what you need.

It may be useful, but I think we may also just require min_num_buffers
== 0 for a device to be supported in libcamera. We have to implement
APIs such as the Android camera HAL that has no concept of buffers being
kept by the device. This could possibly be handled within libcamera by
allocating scratch buffers in userspace, but that comes with other
challenges. I would like to at least try to get help from the kernel
until proven that it's a bad idea.
Jacopo Mondi Oct. 28, 2024, 5:08 p.m. UTC | #6
Hi Laurent, Hans,

On Mon, Oct 28, 2024 at 06:21:41PM +0200, Laurent Pinchart wrote:
> Hi Hans,
>
> On Mon, Oct 28, 2024 at 04:48:55PM +0100, Hans Verkuil wrote:
> > On 28/10/2024 16:30, Jacopo Mondi wrote:
> > > On Mon, Oct 28, 2024 at 04:02:13PM +0100, Hans Verkuil wrote:
> > >> On 28/10/2024 15:35, Jacopo Mondi wrote:
> > >>> There apparently is no reason to require 3 queued buffers to call
> > >>> streamon() for the RkISP1 as the driver operates with a scratch buffer
> > >>> where frames can be directed to if there's no available buffer provided
> > >>> by userspace.
> > >>>
> > >>> Reduce the number of required buffers to 0 to allow applications to
> > >>> operate by queueing capture buffers on-demand.
> > >>>
> > >>> Tested with libcamera, by operating with a single capture request. The
> > >>> same request (and associated capture buffer) gets recycled once
> > >>> completed. This of course causes a frame rate drop but doesn't hinder
> > >>> operations.
> > >>>
> > >>> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> > >>> ---
> > >>> The first version of this patch set min_queued_buffers to 1, but setting it
> > >>> to 0 doesn't compromise operations and it's even better as it allows application
> > >>> to queue buffers to the capture devices on-demand. If a buffer is not provided
> > >>> to the DMA engines, image data gets directed to the driver's internal scratch
> > >>> buffer.
> > >>> ---
> > >>>  drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
> > >>>  1 file changed, 1 insertion(+), 3 deletions(-)
> > >>>
> > >>> diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > >>> index 2bddb4fa8a5c..5fcf9731f41b 100644
> > >>> --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > >>> +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > >>> @@ -35,8 +35,6 @@
> > >>>  #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
> > >>>  #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"
> > >>>
> > >>> -#define RKISP1_MIN_BUFFERS_NEEDED 3
> > >>> -
> > >>>  enum rkisp1_plane {
> > >>>  	RKISP1_PLANE_Y	= 0,
> > >>>  	RKISP1_PLANE_CB	= 1,
> > >>> @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
> > >>>  	q->ops = &rkisp1_vb2_ops;
> > >>>  	q->mem_ops = &vb2_dma_contig_memops;
> > >>>  	q->buf_struct_size = sizeof(struct rkisp1_buffer);
> > >>> -	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
> > >>> +	q->min_queued_buffers = 0;
> > >>
> > >> You can probably just drop this since the vb2_queue struct is zeroed when it
> > >> is allocated. So no need to set it to 0.
> > >
> > > I suspected so :)
> > >
> > >>
> > >> And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change?
> > >
> > > No, and this patch removes it in facts
> > >
> > >  -#define RKISP1_MIN_BUFFERS_NEEDED 3
> > >  -
> >
> > I should have checked the patch :-) Sorry for the noise.
> >
> > >>
> > >> Also, see my RFC I posted today:
> > >>
> > >> https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xs4all.nl/T/#u
> > >>
> > >> My main concern is that applications that just call VIDIOC_REQBUFS with count = 1
> > >> and expect the driver to change that to a workable value, will, in fact, now just get
> > >> one buffer. And streaming that will cause lots of frame drops.
> > >>
> > >> It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available,
> > >> but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called.
> > >
> > > Yeah, I've read the discussion between you and Tomi and it seemed like
> > > a good time to re-send this patch.
> > >
> > >> My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in
> > >> libcamera to get much better control over how many buffers should be allocated.
> > >
> > > In my understanding min_queued_buffers identifies how many buffers
> > > should be queued before calling start_streaming, and this comes
> > > directly from an hw/driver requirement. This doesn't mean that at
> > > least min_queue_buffers should be queued at all the times during
> > > streaming, at least, I don't see how and where videobuf2 enforces
> > > this. Or does it ?
> >
> > It's an intrinsic property of the HW/driver: e.g. if it needs two buffers
> > queued up for the DMA engine to work, then it really is always holding on
> > to two buffers. The only thing the framework does is postpone calling
> > start_streaming until that number of buffers is queued to ensure the
> > DMA engine has what it needs to start. But after that vb2 doesn't check
> > it.
>
> The "driver" part of "HW/driver" is important here, as drivers can
> influence this in multiple ways. One of them is usage of scratch
> buffers, but even without that, a DMA engine that requires two buffers
> can easily be operated with a single buffer by programming the DMA
> engine with the same buffer address twice. Drivers should really do so
> unless they really can't.
>
> > > If the above is correct, then the number of buffers to be queued
> > > during streaming is, in my opinion, less an hw/driver requirement but
> > > more an application decision.
> >
> > No, min_queued_buffers is a HW/drivers property: the DMA engine can't
> > start until that many buffers are queued up, and once it is started
> > it will always hold on to that many buffers.

I get it, my point was that once start_streaming has been called, even
if min_queued_buffers=2, there is nothing preventing userspace from
queing one buffer at the time once the first two have completed. Sure, the
hw/driver might not like it, but while delaying start_streaming
prevents bad things from happening, there is nothing in the core that
prevents applications from potentially stalling the capture
operations.

But I get your point, if the system needs 2 buffers to start
streaming, it will probably need two buffers to continue producing
frames.

>
> That's not always true. The imx7-media-csi driver, for instance, sets
> min_queued_buffers to 2, but allocates scratch buffers and uses them at
> runtime, so that it can return all queued buffers to userspace.

That's interesting. From your mention of "scratch buffers" I get there
actually is a need to have 2 buffers queued to the HW ? How does
that work, after all queuing a buffer to the DMA engine usually means
pointing its write engine to one (set of) addresses.

Or is it a driver-only requirement to ask for two buffers ?

> Grepping for min_queued_buffers I see drivers setting it to 4
> (rcar-dma.c, rzg2l-video.c), 6 (cxusb-analog.c) or even 9
> (zoran_driver.c) ! I doubt the zoran driver holds on to 9 buffers at
> runtime. Your statement is not universally true today?.
>
> This could be considered as driver issues, and the min_queued_buffers
> values should be fixed to match the runtime behaviour. In some cases I
> expect it will require more work than just changing the value, as
> drivers may implement the logic to operate with less buffers at runtime
> but not at start time. This would be fixable, but it may also call for
> asking if the start at runtime behaviours need to be identical.
>
> > So the application has to know somehow how many buffers are needed to
> > actually stream. One way is via VIDIOC_REQBUFS since that is supposed to
> > always return a workable number of buffers, the other is by actually
> > reporting the minimum number of buffers as per my RFC.
> >
> > > As you said an application should be good with> 3 buffers (one queued, one currently being written to, one to be
> > > consumed by the application), but in very specific cases where an
> > > application retains the buffer for longer, for whatever reason, it
> > > might need a larger number of queued buffers to provide the DMA
> > > engines a space where to write data without them being discarded (to
> > > scratch buffers or discarded by the DMA engine itself, if the HW
> > > supports that). Or maybe an application is fine to drop frames and
> > > only queue buffers sporadically (if the HW supports that ofc).
> > >
> > > For libcamera, and for this specific platform in particular, we're
> > > going to base new developments on the assumption that
> > > min_queued_buffers == 0, and it would be more convenient for use to be
> > > able to access its value from userspace to identify if we're running
> > > on a kernel with or without this patch being applied.
> >
> > So my proposal in my RFC to expose min_num_buffers would work for libcamera?
> > It sounds like that's what you need.
>

My immediate need is to know if I'm running on a "legacy" version of
this driver that still requires 3 buffers for no apparent reason, or
on a new version. Your proposal might work, but I still feel like we
should report the HW/driver requirement (min_queued_buffers) instead
of trying to suggest applications how many buffers they need to
allocate to get "smooth streaming" or similar, as the use cases
might be different.

> It may be useful, but I think we may also just require min_num_buffers
> == 0 for a device to be supported in libcamera. We have to implement

While I concur this would be ideal, how would it work for existing
rkisp1 implementation that do not include this patch ? libcamera
should be able to run on both, probably in two different "modes" /o\

> APIs such as the Android camera HAL that has no concept of buffers being
> kept by the device. This could possibly be handled within libcamera by
> allocating scratch buffers in userspace, but that comes with other
> challenges. I would like to at least try to get help from the kernel
> until proven that it's a bad idea.
>
> --
> Regards,
>
> Laurent Pinchart
Laurent Pinchart Oct. 28, 2024, 6:39 p.m. UTC | #7
On Mon, Oct 28, 2024 at 06:08:18PM +0100, Jacopo Mondi wrote:
> On Mon, Oct 28, 2024 at 06:21:41PM +0200, Laurent Pinchart wrote:
> > On Mon, Oct 28, 2024 at 04:48:55PM +0100, Hans Verkuil wrote:
> > > On 28/10/2024 16:30, Jacopo Mondi wrote:
> > > > On Mon, Oct 28, 2024 at 04:02:13PM +0100, Hans Verkuil wrote:
> > > >> On 28/10/2024 15:35, Jacopo Mondi wrote:
> > > >>> There apparently is no reason to require 3 queued buffers to call
> > > >>> streamon() for the RkISP1 as the driver operates with a scratch buffer
> > > >>> where frames can be directed to if there's no available buffer provided
> > > >>> by userspace.
> > > >>>
> > > >>> Reduce the number of required buffers to 0 to allow applications to
> > > >>> operate by queueing capture buffers on-demand.
> > > >>>
> > > >>> Tested with libcamera, by operating with a single capture request. The
> > > >>> same request (and associated capture buffer) gets recycled once
> > > >>> completed. This of course causes a frame rate drop but doesn't hinder
> > > >>> operations.
> > > >>>
> > > >>> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> > > >>> ---
> > > >>> The first version of this patch set min_queued_buffers to 1, but setting it
> > > >>> to 0 doesn't compromise operations and it's even better as it allows application
> > > >>> to queue buffers to the capture devices on-demand. If a buffer is not provided
> > > >>> to the DMA engines, image data gets directed to the driver's internal scratch
> > > >>> buffer.
> > > >>> ---
> > > >>>  drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
> > > >>>  1 file changed, 1 insertion(+), 3 deletions(-)
> > > >>>
> > > >>> diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > > >>> index 2bddb4fa8a5c..5fcf9731f41b 100644
> > > >>> --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > > >>> +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > > >>> @@ -35,8 +35,6 @@
> > > >>>  #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
> > > >>>  #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"
> > > >>>
> > > >>> -#define RKISP1_MIN_BUFFERS_NEEDED 3
> > > >>> -
> > > >>>  enum rkisp1_plane {
> > > >>>  	RKISP1_PLANE_Y	= 0,
> > > >>>  	RKISP1_PLANE_CB	= 1,
> > > >>> @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
> > > >>>  	q->ops = &rkisp1_vb2_ops;
> > > >>>  	q->mem_ops = &vb2_dma_contig_memops;
> > > >>>  	q->buf_struct_size = sizeof(struct rkisp1_buffer);
> > > >>> -	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
> > > >>> +	q->min_queued_buffers = 0;
> > > >>
> > > >> You can probably just drop this since the vb2_queue struct is zeroed when it
> > > >> is allocated. So no need to set it to 0.
> > > >
> > > > I suspected so :)
> > > >
> > > >>
> > > >> And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change?
> > > >
> > > > No, and this patch removes it in facts
> > > >
> > > >  -#define RKISP1_MIN_BUFFERS_NEEDED 3
> > > >  -
> > >
> > > I should have checked the patch :-) Sorry for the noise.
> > >
> > > >>
> > > >> Also, see my RFC I posted today:
> > > >>
> > > >> https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xs4all.nl/T/#u
> > > >>
> > > >> My main concern is that applications that just call VIDIOC_REQBUFS with count = 1
> > > >> and expect the driver to change that to a workable value, will, in fact, now just get
> > > >> one buffer. And streaming that will cause lots of frame drops.
> > > >>
> > > >> It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available,
> > > >> but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called.
> > > >
> > > > Yeah, I've read the discussion between you and Tomi and it seemed like
> > > > a good time to re-send this patch.
> > > >
> > > >> My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in
> > > >> libcamera to get much better control over how many buffers should be allocated.
> > > >
> > > > In my understanding min_queued_buffers identifies how many buffers
> > > > should be queued before calling start_streaming, and this comes
> > > > directly from an hw/driver requirement. This doesn't mean that at
> > > > least min_queue_buffers should be queued at all the times during
> > > > streaming, at least, I don't see how and where videobuf2 enforces
> > > > this. Or does it ?
> > >
> > > It's an intrinsic property of the HW/driver: e.g. if it needs two buffers
> > > queued up for the DMA engine to work, then it really is always holding on
> > > to two buffers. The only thing the framework does is postpone calling
> > > start_streaming until that number of buffers is queued to ensure the
> > > DMA engine has what it needs to start. But after that vb2 doesn't check
> > > it.
> >
> > The "driver" part of "HW/driver" is important here, as drivers can
> > influence this in multiple ways. One of them is usage of scratch
> > buffers, but even without that, a DMA engine that requires two buffers
> > can easily be operated with a single buffer by programming the DMA
> > engine with the same buffer address twice. Drivers should really do so
> > unless they really can't.
> >
> > > > If the above is correct, then the number of buffers to be queued
> > > > during streaming is, in my opinion, less an hw/driver requirement but
> > > > more an application decision.
> > >
> > > No, min_queued_buffers is a HW/drivers property: the DMA engine can't
> > > start until that many buffers are queued up, and once it is started
> > > it will always hold on to that many buffers.
> 
> I get it, my point was that once start_streaming has been called, even
> if min_queued_buffers=2, there is nothing preventing userspace from
> queing one buffer at the time once the first two have completed. Sure, the
> hw/driver might not like it, but while delaying start_streaming
> prevents bad things from happening, there is nothing in the core that
> prevents applications from potentially stalling the capture
> operations.
> 
> But I get your point, if the system needs 2 buffers to start
> streaming, it will probably need two buffers to continue producing
> frames.
> 
> > That's not always true. The imx7-media-csi driver, for instance, sets
> > min_queued_buffers to 2, but allocates scratch buffers and uses them at
> > runtime, so that it can return all queued buffers to userspace.
> 
> That's interesting. From your mention of "scratch buffers" I get there
> actually is a need to have 2 buffers queued to the HW ? How does
> that work, after all queuing a buffer to the DMA engine usually means
> pointing its write engine to one (set of) addresses.
> 
> Or is it a driver-only requirement to ask for two buffers ?

The hardware has a ping-pong mechanism with two addresses and switches
between them automatically.

> > Grepping for min_queued_buffers I see drivers setting it to 4
> > (rcar-dma.c, rzg2l-video.c), 6 (cxusb-analog.c) or even 9
> > (zoran_driver.c) ! I doubt the zoran driver holds on to 9 buffers at
> > runtime. Your statement is not universally true today?.
> >
> > This could be considered as driver issues, and the min_queued_buffers
> > values should be fixed to match the runtime behaviour. In some cases I
> > expect it will require more work than just changing the value, as
> > drivers may implement the logic to operate with less buffers at runtime
> > but not at start time. This would be fixable, but it may also call for
> > asking if the start at runtime behaviours need to be identical.
> >
> > > So the application has to know somehow how many buffers are needed to
> > > actually stream. One way is via VIDIOC_REQBUFS since that is supposed to
> > > always return a workable number of buffers, the other is by actually
> > > reporting the minimum number of buffers as per my RFC.
> > >
> > > > As you said an application should be good with> 3 buffers (one queued, one currently being written to, one to be
> > > > consumed by the application), but in very specific cases where an
> > > > application retains the buffer for longer, for whatever reason, it
> > > > might need a larger number of queued buffers to provide the DMA
> > > > engines a space where to write data without them being discarded (to
> > > > scratch buffers or discarded by the DMA engine itself, if the HW
> > > > supports that). Or maybe an application is fine to drop frames and
> > > > only queue buffers sporadically (if the HW supports that ofc).
> > > >
> > > > For libcamera, and for this specific platform in particular, we're
> > > > going to base new developments on the assumption that
> > > > min_queued_buffers == 0, and it would be more convenient for use to be
> > > > able to access its value from userspace to identify if we're running
> > > > on a kernel with or without this patch being applied.
> > >
> > > So my proposal in my RFC to expose min_num_buffers would work for libcamera?
> > > It sounds like that's what you need.
> 
> My immediate need is to know if I'm running on a "legacy" version of
> this driver that still requires 3 buffers for no apparent reason, or
> on a new version. Your proposal might work, but I still feel like we
> should report the HW/driver requirement (min_queued_buffers) instead
> of trying to suggest applications how many buffers they need to
> allocate to get "smooth streaming" or similar, as the use cases
> might be different.
> 
> > It may be useful, but I think we may also just require min_num_buffers
> > == 0 for a device to be supported in libcamera. We have to implement
> 
> While I concur this would be ideal, how would it work for existing
> rkisp1 implementation that do not include this patch ? libcamera
> should be able to run on both, probably in two different "modes" /o\

A simple option is to check the kernel version, we do that in a few
places. Over time we'll increase the minimum kernel version and drop
support for legacy APIs.

> > APIs such as the Android camera HAL that has no concept of buffers being
> > kept by the device. This could possibly be handled within libcamera by
> > allocating scratch buffers in userspace, but that comes with other
> > challenges. I would like to at least try to get help from the kernel
> > until proven that it's a bad idea.
Jacopo Mondi Oct. 29, 2024, 7:01 a.m. UTC | #8
Hi Laurent

On Mon, Oct 28, 2024 at 08:39:36PM +0200, Laurent Pinchart wrote:
> On Mon, Oct 28, 2024 at 06:08:18PM +0100, Jacopo Mondi wrote:
> > On Mon, Oct 28, 2024 at 06:21:41PM +0200, Laurent Pinchart wrote:
> > > On Mon, Oct 28, 2024 at 04:48:55PM +0100, Hans Verkuil wrote:
> > > > On 28/10/2024 16:30, Jacopo Mondi wrote:
> > > > > On Mon, Oct 28, 2024 at 04:02:13PM +0100, Hans Verkuil wrote:
> > > > >> On 28/10/2024 15:35, Jacopo Mondi wrote:
> > > > >>> There apparently is no reason to require 3 queued buffers to call
> > > > >>> streamon() for the RkISP1 as the driver operates with a scratch buffer
> > > > >>> where frames can be directed to if there's no available buffer provided
> > > > >>> by userspace.
> > > > >>>
> > > > >>> Reduce the number of required buffers to 0 to allow applications to
> > > > >>> operate by queueing capture buffers on-demand.
> > > > >>>
> > > > >>> Tested with libcamera, by operating with a single capture request. The
> > > > >>> same request (and associated capture buffer) gets recycled once
> > > > >>> completed. This of course causes a frame rate drop but doesn't hinder
> > > > >>> operations.
> > > > >>>
> > > > >>> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> > > > >>> ---
> > > > >>> The first version of this patch set min_queued_buffers to 1, but setting it
> > > > >>> to 0 doesn't compromise operations and it's even better as it allows application
> > > > >>> to queue buffers to the capture devices on-demand. If a buffer is not provided
> > > > >>> to the DMA engines, image data gets directed to the driver's internal scratch
> > > > >>> buffer.
> > > > >>> ---
> > > > >>>  drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
> > > > >>>  1 file changed, 1 insertion(+), 3 deletions(-)
> > > > >>>
> > > > >>> diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > > > >>> index 2bddb4fa8a5c..5fcf9731f41b 100644
> > > > >>> --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > > > >>> +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> > > > >>> @@ -35,8 +35,6 @@
> > > > >>>  #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
> > > > >>>  #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"
> > > > >>>
> > > > >>> -#define RKISP1_MIN_BUFFERS_NEEDED 3
> > > > >>> -
> > > > >>>  enum rkisp1_plane {
> > > > >>>  	RKISP1_PLANE_Y	= 0,
> > > > >>>  	RKISP1_PLANE_CB	= 1,
> > > > >>> @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
> > > > >>>  	q->ops = &rkisp1_vb2_ops;
> > > > >>>  	q->mem_ops = &vb2_dma_contig_memops;
> > > > >>>  	q->buf_struct_size = sizeof(struct rkisp1_buffer);
> > > > >>> -	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
> > > > >>> +	q->min_queued_buffers = 0;
> > > > >>
> > > > >> You can probably just drop this since the vb2_queue struct is zeroed when it
> > > > >> is allocated. So no need to set it to 0.
> > > > >
> > > > > I suspected so :)
> > > > >
> > > > >>
> > > > >> And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change?
> > > > >
> > > > > No, and this patch removes it in facts
> > > > >
> > > > >  -#define RKISP1_MIN_BUFFERS_NEEDED 3
> > > > >  -
> > > >
> > > > I should have checked the patch :-) Sorry for the noise.
> > > >
> > > > >>
> > > > >> Also, see my RFC I posted today:
> > > > >>
> > > > >> https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xs4all.nl/T/#u
> > > > >>
> > > > >> My main concern is that applications that just call VIDIOC_REQBUFS with count = 1
> > > > >> and expect the driver to change that to a workable value, will, in fact, now just get
> > > > >> one buffer. And streaming that will cause lots of frame drops.
> > > > >>
> > > > >> It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available,
> > > > >> but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called.
> > > > >
> > > > > Yeah, I've read the discussion between you and Tomi and it seemed like
> > > > > a good time to re-send this patch.
> > > > >
> > > > >> My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in
> > > > >> libcamera to get much better control over how many buffers should be allocated.
> > > > >
> > > > > In my understanding min_queued_buffers identifies how many buffers
> > > > > should be queued before calling start_streaming, and this comes
> > > > > directly from an hw/driver requirement. This doesn't mean that at
> > > > > least min_queue_buffers should be queued at all the times during
> > > > > streaming, at least, I don't see how and where videobuf2 enforces
> > > > > this. Or does it ?
> > > >
> > > > It's an intrinsic property of the HW/driver: e.g. if it needs two buffers
> > > > queued up for the DMA engine to work, then it really is always holding on
> > > > to two buffers. The only thing the framework does is postpone calling
> > > > start_streaming until that number of buffers is queued to ensure the
> > > > DMA engine has what it needs to start. But after that vb2 doesn't check
> > > > it.
> > >
> > > The "driver" part of "HW/driver" is important here, as drivers can
> > > influence this in multiple ways. One of them is usage of scratch
> > > buffers, but even without that, a DMA engine that requires two buffers
> > > can easily be operated with a single buffer by programming the DMA
> > > engine with the same buffer address twice. Drivers should really do so
> > > unless they really can't.
> > >
> > > > > If the above is correct, then the number of buffers to be queued
> > > > > during streaming is, in my opinion, less an hw/driver requirement but
> > > > > more an application decision.
> > > >
> > > > No, min_queued_buffers is a HW/drivers property: the DMA engine can't
> > > > start until that many buffers are queued up, and once it is started
> > > > it will always hold on to that many buffers.
> >
> > I get it, my point was that once start_streaming has been called, even
> > if min_queued_buffers=2, there is nothing preventing userspace from
> > queing one buffer at the time once the first two have completed. Sure, the
> > hw/driver might not like it, but while delaying start_streaming
> > prevents bad things from happening, there is nothing in the core that
> > prevents applications from potentially stalling the capture
> > operations.
> >
> > But I get your point, if the system needs 2 buffers to start
> > streaming, it will probably need two buffers to continue producing
> > frames.
> >
> > > That's not always true. The imx7-media-csi driver, for instance, sets
> > > min_queued_buffers to 2, but allocates scratch buffers and uses them at
> > > runtime, so that it can return all queued buffers to userspace.
> >
> > That's interesting. From your mention of "scratch buffers" I get there
> > actually is a need to have 2 buffers queued to the HW ? How does
> > that work, after all queuing a buffer to the DMA engine usually means
> > pointing its write engine to one (set of) addresses.
> >
> > Or is it a driver-only requirement to ask for two buffers ?
>
> The hardware has a ping-pong mechanism with two addresses and switches
> between them automatically.
>
> > > Grepping for min_queued_buffers I see drivers setting it to 4
> > > (rcar-dma.c, rzg2l-video.c), 6 (cxusb-analog.c) or even 9
> > > (zoran_driver.c) ! I doubt the zoran driver holds on to 9 buffers at
> > > runtime. Your statement is not universally true today?.
> > >
> > > This could be considered as driver issues, and the min_queued_buffers
> > > values should be fixed to match the runtime behaviour. In some cases I
> > > expect it will require more work than just changing the value, as
> > > drivers may implement the logic to operate with less buffers at runtime
> > > but not at start time. This would be fixable, but it may also call for
> > > asking if the start at runtime behaviours need to be identical.
> > >
> > > > So the application has to know somehow how many buffers are needed to
> > > > actually stream. One way is via VIDIOC_REQBUFS since that is supposed to
> > > > always return a workable number of buffers, the other is by actually
> > > > reporting the minimum number of buffers as per my RFC.
> > > >
> > > > > As you said an application should be good with> 3 buffers (one queued, one currently being written to, one to be
> > > > > consumed by the application), but in very specific cases where an
> > > > > application retains the buffer for longer, for whatever reason, it
> > > > > might need a larger number of queued buffers to provide the DMA
> > > > > engines a space where to write data without them being discarded (to
> > > > > scratch buffers or discarded by the DMA engine itself, if the HW
> > > > > supports that). Or maybe an application is fine to drop frames and
> > > > > only queue buffers sporadically (if the HW supports that ofc).
> > > > >
> > > > > For libcamera, and for this specific platform in particular, we're
> > > > > going to base new developments on the assumption that
> > > > > min_queued_buffers == 0, and it would be more convenient for use to be
> > > > > able to access its value from userspace to identify if we're running
> > > > > on a kernel with or without this patch being applied.
> > > >
> > > > So my proposal in my RFC to expose min_num_buffers would work for libcamera?
> > > > It sounds like that's what you need.
> >
> > My immediate need is to know if I'm running on a "legacy" version of
> > this driver that still requires 3 buffers for no apparent reason, or
> > on a new version. Your proposal might work, but I still feel like we
> > should report the HW/driver requirement (min_queued_buffers) instead
> > of trying to suggest applications how many buffers they need to
> > allocate to get "smooth streaming" or similar, as the use cases
> > might be different.
> >
> > > It may be useful, but I think we may also just require min_num_buffers
> > > == 0 for a device to be supported in libcamera. We have to implement
> >
> > While I concur this would be ideal, how would it work for existing
> > rkisp1 implementation that do not include this patch ? libcamera
> > should be able to run on both, probably in two different "modes" /o\
>
> A simple option is to check the kernel version, we do that in a few
> places. Over time we'll increase the minimum kernel version and drop

I considered that, but this patch is pretty easy to backport, having
something that tells to userspace the value of
min_queued_buffers might be useful indeed.

> support for legacy APIs.
>
> > > APIs such as the Android camera HAL that has no concept of buffers being
> > > kept by the device. This could possibly be handled within libcamera by
> > > allocating scratch buffers in userspace, but that comes with other
> > > challenges. I would like to at least try to get help from the kernel
> > > until proven that it's a bad idea.
>
> --
> Regards,
>
> Laurent Pinchart
Hans Verkuil Oct. 29, 2024, 8:26 a.m. UTC | #9
On 29/10/2024 08:01, Jacopo Mondi wrote:
> Hi Laurent
> 
> On Mon, Oct 28, 2024 at 08:39:36PM +0200, Laurent Pinchart wrote:
>> On Mon, Oct 28, 2024 at 06:08:18PM +0100, Jacopo Mondi wrote:
>>> On Mon, Oct 28, 2024 at 06:21:41PM +0200, Laurent Pinchart wrote:
>>>> On Mon, Oct 28, 2024 at 04:48:55PM +0100, Hans Verkuil wrote:
>>>>> On 28/10/2024 16:30, Jacopo Mondi wrote:
>>>>>> On Mon, Oct 28, 2024 at 04:02:13PM +0100, Hans Verkuil wrote:
>>>>>>> On 28/10/2024 15:35, Jacopo Mondi wrote:
>>>>>>>> There apparently is no reason to require 3 queued buffers to call
>>>>>>>> streamon() for the RkISP1 as the driver operates with a scratch buffer
>>>>>>>> where frames can be directed to if there's no available buffer provided
>>>>>>>> by userspace.
>>>>>>>>
>>>>>>>> Reduce the number of required buffers to 0 to allow applications to
>>>>>>>> operate by queueing capture buffers on-demand.
>>>>>>>>
>>>>>>>> Tested with libcamera, by operating with a single capture request. The
>>>>>>>> same request (and associated capture buffer) gets recycled once
>>>>>>>> completed. This of course causes a frame rate drop but doesn't hinder
>>>>>>>> operations.
>>>>>>>>
>>>>>>>> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
>>>>>>>> ---
>>>>>>>> The first version of this patch set min_queued_buffers to 1, but setting it
>>>>>>>> to 0 doesn't compromise operations and it's even better as it allows application
>>>>>>>> to queue buffers to the capture devices on-demand. If a buffer is not provided
>>>>>>>> to the DMA engines, image data gets directed to the driver's internal scratch
>>>>>>>> buffer.
>>>>>>>> ---
>>>>>>>>  drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
>>>>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
>>>>>>>> index 2bddb4fa8a5c..5fcf9731f41b 100644
>>>>>>>> --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
>>>>>>>> +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
>>>>>>>> @@ -35,8 +35,6 @@
>>>>>>>>  #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
>>>>>>>>  #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"
>>>>>>>>
>>>>>>>> -#define RKISP1_MIN_BUFFERS_NEEDED 3
>>>>>>>> -
>>>>>>>>  enum rkisp1_plane {
>>>>>>>>  	RKISP1_PLANE_Y	= 0,
>>>>>>>>  	RKISP1_PLANE_CB	= 1,
>>>>>>>> @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
>>>>>>>>  	q->ops = &rkisp1_vb2_ops;
>>>>>>>>  	q->mem_ops = &vb2_dma_contig_memops;
>>>>>>>>  	q->buf_struct_size = sizeof(struct rkisp1_buffer);
>>>>>>>> -	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
>>>>>>>> +	q->min_queued_buffers = 0;
>>>>>>>
>>>>>>> You can probably just drop this since the vb2_queue struct is zeroed when it
>>>>>>> is allocated. So no need to set it to 0.
>>>>>>
>>>>>> I suspected so :)
>>>>>>
>>>>>>>
>>>>>>> And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change?
>>>>>>
>>>>>> No, and this patch removes it in facts
>>>>>>
>>>>>>  -#define RKISP1_MIN_BUFFERS_NEEDED 3
>>>>>>  -
>>>>>
>>>>> I should have checked the patch :-) Sorry for the noise.
>>>>>
>>>>>>>
>>>>>>> Also, see my RFC I posted today:
>>>>>>>
>>>>>>> https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xs4all.nl/T/#u
>>>>>>>
>>>>>>> My main concern is that applications that just call VIDIOC_REQBUFS with count = 1
>>>>>>> and expect the driver to change that to a workable value, will, in fact, now just get
>>>>>>> one buffer. And streaming that will cause lots of frame drops.
>>>>>>>
>>>>>>> It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available,
>>>>>>> but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called.
>>>>>>
>>>>>> Yeah, I've read the discussion between you and Tomi and it seemed like
>>>>>> a good time to re-send this patch.
>>>>>>
>>>>>>> My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in
>>>>>>> libcamera to get much better control over how many buffers should be allocated.
>>>>>>
>>>>>> In my understanding min_queued_buffers identifies how many buffers
>>>>>> should be queued before calling start_streaming, and this comes
>>>>>> directly from an hw/driver requirement. This doesn't mean that at
>>>>>> least min_queue_buffers should be queued at all the times during
>>>>>> streaming, at least, I don't see how and where videobuf2 enforces
>>>>>> this. Or does it ?
>>>>>
>>>>> It's an intrinsic property of the HW/driver: e.g. if it needs two buffers
>>>>> queued up for the DMA engine to work, then it really is always holding on
>>>>> to two buffers. The only thing the framework does is postpone calling
>>>>> start_streaming until that number of buffers is queued to ensure the
>>>>> DMA engine has what it needs to start. But after that vb2 doesn't check
>>>>> it.
>>>>
>>>> The "driver" part of "HW/driver" is important here, as drivers can
>>>> influence this in multiple ways. One of them is usage of scratch
>>>> buffers, but even without that, a DMA engine that requires two buffers
>>>> can easily be operated with a single buffer by programming the DMA
>>>> engine with the same buffer address twice. Drivers should really do so
>>>> unless they really can't.
>>>>
>>>>>> If the above is correct, then the number of buffers to be queued
>>>>>> during streaming is, in my opinion, less an hw/driver requirement but
>>>>>> more an application decision.
>>>>>
>>>>> No, min_queued_buffers is a HW/drivers property: the DMA engine can't
>>>>> start until that many buffers are queued up, and once it is started
>>>>> it will always hold on to that many buffers.
>>>
>>> I get it, my point was that once start_streaming has been called, even
>>> if min_queued_buffers=2, there is nothing preventing userspace from
>>> queing one buffer at the time once the first two have completed. Sure, the
>>> hw/driver might not like it, but while delaying start_streaming
>>> prevents bad things from happening, there is nothing in the core that
>>> prevents applications from potentially stalling the capture
>>> operations.
>>>
>>> But I get your point, if the system needs 2 buffers to start
>>> streaming, it will probably need two buffers to continue producing
>>> frames.

Right, it won't be able to complete anything and return it to userspace
until it gets a third buffer. Only then can it return one buffer to userspace.

>>>
>>>> That's not always true. The imx7-media-csi driver, for instance, sets
>>>> min_queued_buffers to 2, but allocates scratch buffers and uses them at
>>>> runtime, so that it can return all queued buffers to userspace.
>>>
>>> That's interesting. From your mention of "scratch buffers" I get there
>>> actually is a need to have 2 buffers queued to the HW ? How does
>>> that work, after all queuing a buffer to the DMA engine usually means
>>> pointing its write engine to one (set of) addresses.
>>>
>>> Or is it a driver-only requirement to ask for two buffers ?
>>
>> The hardware has a ping-pong mechanism with two addresses and switches
>> between them automatically.
>>
>>>> Grepping for min_queued_buffers I see drivers setting it to 4
>>>> (rcar-dma.c, rzg2l-video.c), 6 (cxusb-analog.c) or even 9
>>>> (zoran_driver.c) ! I doubt the zoran driver holds on to 9 buffers at
>>>> runtime. Your statement is not universally true today?.
>>>>
>>>> This could be considered as driver issues, and the min_queued_buffers
>>>> values should be fixed to match the runtime behaviour. In some cases I
>>>> expect it will require more work than just changing the value, as
>>>> drivers may implement the logic to operate with less buffers at runtime
>>>> but not at start time. This would be fixable, but it may also call for
>>>> asking if the start at runtime behaviours need to be identical.
>>>>
>>>>> So the application has to know somehow how many buffers are needed to
>>>>> actually stream. One way is via VIDIOC_REQBUFS since that is supposed to
>>>>> always return a workable number of buffers, the other is by actually
>>>>> reporting the minimum number of buffers as per my RFC.
>>>>>
>>>>>> As you said an application should be good with> 3 buffers (one queued, one currently being written to, one to be
>>>>>> consumed by the application), but in very specific cases where an
>>>>>> application retains the buffer for longer, for whatever reason, it
>>>>>> might need a larger number of queued buffers to provide the DMA
>>>>>> engines a space where to write data without them being discarded (to
>>>>>> scratch buffers or discarded by the DMA engine itself, if the HW
>>>>>> supports that). Or maybe an application is fine to drop frames and
>>>>>> only queue buffers sporadically (if the HW supports that ofc).
>>>>>>
>>>>>> For libcamera, and for this specific platform in particular, we're
>>>>>> going to base new developments on the assumption that
>>>>>> min_queued_buffers == 0, and it would be more convenient for use to be
>>>>>> able to access its value from userspace to identify if we're running
>>>>>> on a kernel with or without this patch being applied.
>>>>>
>>>>> So my proposal in my RFC to expose min_num_buffers would work for libcamera?
>>>>> It sounds like that's what you need.
>>>
>>> My immediate need is to know if I'm running on a "legacy" version of
>>> this driver that still requires 3 buffers for no apparent reason, or
>>> on a new version. Your proposal might work, but I still feel like we
>>> should report the HW/driver requirement (min_queued_buffers) instead
>>> of trying to suggest applications how many buffers they need to
>>> allocate to get "smooth streaming" or similar, as the use cases
>>> might be different.
>>>
>>>> It may be useful, but I think we may also just require min_num_buffers
>>>> == 0 for a device to be supported in libcamera. We have to implement
>>>
>>> While I concur this would be ideal, how would it work for existing
>>> rkisp1 implementation that do not include this patch ? libcamera
>>> should be able to run on both, probably in two different "modes" /o\
>>
>> A simple option is to check the kernel version, we do that in a few
>> places. Over time we'll increase the minimum kernel version and drop
> 
> I considered that, but this patch is pretty easy to backport, having
> something that tells to userspace the value of
> min_queued_buffers might be useful indeed.

The implementation is similar to that of the new max_num_buffers field
in struct v4l2_create_buffers: it will be signaled by a new buffer
capability flag. So you can check that in the code. Much better than
relying on kernel versions.

Regards,

	Hans

> 
>> support for legacy APIs.
>>
>>>> APIs such as the Android camera HAL that has no concept of buffers being
>>>> kept by the device. This could possibly be handled within libcamera by
>>>> allocating scratch buffers in userspace, but that comes with other
>>>> challenges. I would like to at least try to get help from the kernel
>>>> until proven that it's a bad idea.
>>
>> --
>> Regards,
>>
>> Laurent Pinchart
Laurent Pinchart Oct. 29, 2024, 8:56 a.m. UTC | #10
On Tue, Oct 29, 2024 at 09:26:58AM +0100, Hans Verkuil wrote:
> On 29/10/2024 08:01, Jacopo Mondi wrote:
> > On Mon, Oct 28, 2024 at 08:39:36PM +0200, Laurent Pinchart wrote:
> >> On Mon, Oct 28, 2024 at 06:08:18PM +0100, Jacopo Mondi wrote:
> >>> On Mon, Oct 28, 2024 at 06:21:41PM +0200, Laurent Pinchart wrote:
> >>>> On Mon, Oct 28, 2024 at 04:48:55PM +0100, Hans Verkuil wrote:
> >>>>> On 28/10/2024 16:30, Jacopo Mondi wrote:
> >>>>>> On Mon, Oct 28, 2024 at 04:02:13PM +0100, Hans Verkuil wrote:
> >>>>>>> On 28/10/2024 15:35, Jacopo Mondi wrote:
> >>>>>>>> There apparently is no reason to require 3 queued buffers to call
> >>>>>>>> streamon() for the RkISP1 as the driver operates with a scratch buffer
> >>>>>>>> where frames can be directed to if there's no available buffer provided
> >>>>>>>> by userspace.
> >>>>>>>>
> >>>>>>>> Reduce the number of required buffers to 0 to allow applications to
> >>>>>>>> operate by queueing capture buffers on-demand.
> >>>>>>>>
> >>>>>>>> Tested with libcamera, by operating with a single capture request. The
> >>>>>>>> same request (and associated capture buffer) gets recycled once
> >>>>>>>> completed. This of course causes a frame rate drop but doesn't hinder
> >>>>>>>> operations.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> >>>>>>>> ---
> >>>>>>>> The first version of this patch set min_queued_buffers to 1, but setting it
> >>>>>>>> to 0 doesn't compromise operations and it's even better as it allows application
> >>>>>>>> to queue buffers to the capture devices on-demand. If a buffer is not provided
> >>>>>>>> to the DMA engines, image data gets directed to the driver's internal scratch
> >>>>>>>> buffer.
> >>>>>>>> ---
> >>>>>>>>  drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
> >>>>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> >>>>>>>> index 2bddb4fa8a5c..5fcf9731f41b 100644
> >>>>>>>> --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> >>>>>>>> +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> >>>>>>>> @@ -35,8 +35,6 @@
> >>>>>>>>  #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
> >>>>>>>>  #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"
> >>>>>>>>
> >>>>>>>> -#define RKISP1_MIN_BUFFERS_NEEDED 3
> >>>>>>>> -
> >>>>>>>>  enum rkisp1_plane {
> >>>>>>>>  	RKISP1_PLANE_Y	= 0,
> >>>>>>>>  	RKISP1_PLANE_CB	= 1,
> >>>>>>>> @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
> >>>>>>>>  	q->ops = &rkisp1_vb2_ops;
> >>>>>>>>  	q->mem_ops = &vb2_dma_contig_memops;
> >>>>>>>>  	q->buf_struct_size = sizeof(struct rkisp1_buffer);
> >>>>>>>> -	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
> >>>>>>>> +	q->min_queued_buffers = 0;
> >>>>>>>
> >>>>>>> You can probably just drop this since the vb2_queue struct is zeroed when it
> >>>>>>> is allocated. So no need to set it to 0.
> >>>>>>
> >>>>>> I suspected so :)
> >>>>>>
> >>>>>>>
> >>>>>>> And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change?
> >>>>>>
> >>>>>> No, and this patch removes it in facts
> >>>>>>
> >>>>>>  -#define RKISP1_MIN_BUFFERS_NEEDED 3
> >>>>>>  -
> >>>>>
> >>>>> I should have checked the patch :-) Sorry for the noise.
> >>>>>
> >>>>>>>
> >>>>>>> Also, see my RFC I posted today:
> >>>>>>>
> >>>>>>> https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xs4all.nl/T/#u
> >>>>>>>
> >>>>>>> My main concern is that applications that just call VIDIOC_REQBUFS with count = 1
> >>>>>>> and expect the driver to change that to a workable value, will, in fact, now just get
> >>>>>>> one buffer. And streaming that will cause lots of frame drops.
> >>>>>>>
> >>>>>>> It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available,
> >>>>>>> but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called.
> >>>>>>
> >>>>>> Yeah, I've read the discussion between you and Tomi and it seemed like
> >>>>>> a good time to re-send this patch.
> >>>>>>
> >>>>>>> My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in
> >>>>>>> libcamera to get much better control over how many buffers should be allocated.
> >>>>>>
> >>>>>> In my understanding min_queued_buffers identifies how many buffers
> >>>>>> should be queued before calling start_streaming, and this comes
> >>>>>> directly from an hw/driver requirement. This doesn't mean that at
> >>>>>> least min_queue_buffers should be queued at all the times during
> >>>>>> streaming, at least, I don't see how and where videobuf2 enforces
> >>>>>> this. Or does it ?
> >>>>>
> >>>>> It's an intrinsic property of the HW/driver: e.g. if it needs two buffers
> >>>>> queued up for the DMA engine to work, then it really is always holding on
> >>>>> to two buffers. The only thing the framework does is postpone calling
> >>>>> start_streaming until that number of buffers is queued to ensure the
> >>>>> DMA engine has what it needs to start. But after that vb2 doesn't check
> >>>>> it.
> >>>>
> >>>> The "driver" part of "HW/driver" is important here, as drivers can
> >>>> influence this in multiple ways. One of them is usage of scratch
> >>>> buffers, but even without that, a DMA engine that requires two buffers
> >>>> can easily be operated with a single buffer by programming the DMA
> >>>> engine with the same buffer address twice. Drivers should really do so
> >>>> unless they really can't.
> >>>>
> >>>>>> If the above is correct, then the number of buffers to be queued
> >>>>>> during streaming is, in my opinion, less an hw/driver requirement but
> >>>>>> more an application decision.
> >>>>>
> >>>>> No, min_queued_buffers is a HW/drivers property: the DMA engine can't
> >>>>> start until that many buffers are queued up, and once it is started
> >>>>> it will always hold on to that many buffers.
> >>>
> >>> I get it, my point was that once start_streaming has been called, even
> >>> if min_queued_buffers=2, there is nothing preventing userspace from
> >>> queing one buffer at the time once the first two have completed. Sure, the
> >>> hw/driver might not like it, but while delaying start_streaming
> >>> prevents bad things from happening, there is nothing in the core that
> >>> prevents applications from potentially stalling the capture
> >>> operations.
> >>>
> >>> But I get your point, if the system needs 2 buffers to start
> >>> streaming, it will probably need two buffers to continue producing
> >>> frames.
> 
> Right, it won't be able to complete anything and return it to userspace
> until it gets a third buffer. Only then can it return one buffer to userspace.
> 
> >>>> That's not always true. The imx7-media-csi driver, for instance, sets
> >>>> min_queued_buffers to 2, but allocates scratch buffers and uses them at
> >>>> runtime, so that it can return all queued buffers to userspace.
> >>>
> >>> That's interesting. From your mention of "scratch buffers" I get there
> >>> actually is a need to have 2 buffers queued to the HW ? How does
> >>> that work, after all queuing a buffer to the DMA engine usually means
> >>> pointing its write engine to one (set of) addresses.
> >>>
> >>> Or is it a driver-only requirement to ask for two buffers ?
> >>
> >> The hardware has a ping-pong mechanism with two addresses and switches
> >> between them automatically.
> >>
> >>>> Grepping for min_queued_buffers I see drivers setting it to 4
> >>>> (rcar-dma.c, rzg2l-video.c), 6 (cxusb-analog.c) or even 9
> >>>> (zoran_driver.c) ! I doubt the zoran driver holds on to 9 buffers at
> >>>> runtime. Your statement is not universally true today?.
> >>>>
> >>>> This could be considered as driver issues, and the min_queued_buffers
> >>>> values should be fixed to match the runtime behaviour. In some cases I
> >>>> expect it will require more work than just changing the value, as
> >>>> drivers may implement the logic to operate with less buffers at runtime
> >>>> but not at start time. This would be fixable, but it may also call for
> >>>> asking if the start at runtime behaviours need to be identical.
> >>>>
> >>>>> So the application has to know somehow how many buffers are needed to
> >>>>> actually stream. One way is via VIDIOC_REQBUFS since that is supposed to
> >>>>> always return a workable number of buffers, the other is by actually
> >>>>> reporting the minimum number of buffers as per my RFC.
> >>>>>
> >>>>>> As you said an application should be good with> 3 buffers (one queued, one currently being written to, one to be
> >>>>>> consumed by the application), but in very specific cases where an
> >>>>>> application retains the buffer for longer, for whatever reason, it
> >>>>>> might need a larger number of queued buffers to provide the DMA
> >>>>>> engines a space where to write data without them being discarded (to
> >>>>>> scratch buffers or discarded by the DMA engine itself, if the HW
> >>>>>> supports that). Or maybe an application is fine to drop frames and
> >>>>>> only queue buffers sporadically (if the HW supports that ofc).
> >>>>>>
> >>>>>> For libcamera, and for this specific platform in particular, we're
> >>>>>> going to base new developments on the assumption that
> >>>>>> min_queued_buffers == 0, and it would be more convenient for use to be
> >>>>>> able to access its value from userspace to identify if we're running
> >>>>>> on a kernel with or without this patch being applied.
> >>>>>
> >>>>> So my proposal in my RFC to expose min_num_buffers would work for libcamera?
> >>>>> It sounds like that's what you need.
> >>>
> >>> My immediate need is to know if I'm running on a "legacy" version of
> >>> this driver that still requires 3 buffers for no apparent reason, or
> >>> on a new version. Your proposal might work, but I still feel like we
> >>> should report the HW/driver requirement (min_queued_buffers) instead
> >>> of trying to suggest applications how many buffers they need to
> >>> allocate to get "smooth streaming" or similar, as the use cases
> >>> might be different.
> >>>
> >>>> It may be useful, but I think we may also just require min_num_buffers
> >>>> == 0 for a device to be supported in libcamera. We have to implement
> >>>
> >>> While I concur this would be ideal, how would it work for existing
> >>> rkisp1 implementation that do not include this patch ? libcamera
> >>> should be able to run on both, probably in two different "modes" /o\
> >>
> >> A simple option is to check the kernel version, we do that in a few
> >> places. Over time we'll increase the minimum kernel version and drop
> > 
> > I considered that, but this patch is pretty easy to backport, having
> > something that tells to userspace the value of
> > min_queued_buffers might be useful indeed.
> 
> The implementation is similar to that of the new max_num_buffers field
> in struct v4l2_create_buffers: it will be signaled by a new buffer
> capability flag. So you can check that in the code. Much better than
> relying on kernel versions.

Sure, I agree that an explicit API is better. I'm not a big fan of
adding a field to v4l2_create_buffers though (I didn't realize it was
extended with a max_num_buffers field) as it would require calling
VIDIOC_CREATE_BUFS to get the information. That can get quite awkward to
use for userspace.

This being said, libcamera today won't behave properly with rkisp1, as
it won't be able to give the application the last buffers, as guaranteed
by the libcamera API. Relying on min_num_buffers == 0 will fix it and
won't cause any regression on kernels that still use 3, it will just
keep the current behaviour.

> >> support for legacy APIs.
> >>
> >>>> APIs such as the Android camera HAL that has no concept of buffers being
> >>>> kept by the device. This could possibly be handled within libcamera by
> >>>> allocating scratch buffers in userspace, but that comes with other
> >>>> challenges. I would like to at least try to get help from the kernel
> >>>> until proven that it's a bad idea.
Hans Verkuil Oct. 29, 2024, 9:54 a.m. UTC | #11
On 10/29/24 09:56, Laurent Pinchart wrote:
> On Tue, Oct 29, 2024 at 09:26:58AM +0100, Hans Verkuil wrote:
>> On 29/10/2024 08:01, Jacopo Mondi wrote:
>>> On Mon, Oct 28, 2024 at 08:39:36PM +0200, Laurent Pinchart wrote:
>>>> On Mon, Oct 28, 2024 at 06:08:18PM +0100, Jacopo Mondi wrote:
>>>>> On Mon, Oct 28, 2024 at 06:21:41PM +0200, Laurent Pinchart wrote:
>>>>>> On Mon, Oct 28, 2024 at 04:48:55PM +0100, Hans Verkuil wrote:
>>>>>>> On 28/10/2024 16:30, Jacopo Mondi wrote:
>>>>>>>> On Mon, Oct 28, 2024 at 04:02:13PM +0100, Hans Verkuil wrote:
>>>>>>>>> On 28/10/2024 15:35, Jacopo Mondi wrote:
>>>>>>>>>> There apparently is no reason to require 3 queued buffers to call
>>>>>>>>>> streamon() for the RkISP1 as the driver operates with a scratch buffer
>>>>>>>>>> where frames can be directed to if there's no available buffer provided
>>>>>>>>>> by userspace.
>>>>>>>>>>
>>>>>>>>>> Reduce the number of required buffers to 0 to allow applications to
>>>>>>>>>> operate by queueing capture buffers on-demand.
>>>>>>>>>>
>>>>>>>>>> Tested with libcamera, by operating with a single capture request. The
>>>>>>>>>> same request (and associated capture buffer) gets recycled once
>>>>>>>>>> completed. This of course causes a frame rate drop but doesn't hinder
>>>>>>>>>> operations.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
>>>>>>>>>> ---
>>>>>>>>>> The first version of this patch set min_queued_buffers to 1, but setting it
>>>>>>>>>> to 0 doesn't compromise operations and it's even better as it allows application
>>>>>>>>>> to queue buffers to the capture devices on-demand. If a buffer is not provided
>>>>>>>>>> to the DMA engines, image data gets directed to the driver's internal scratch
>>>>>>>>>> buffer.
>>>>>>>>>> ---
>>>>>>>>>>  drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
>>>>>>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
>>>>>>>>>> index 2bddb4fa8a5c..5fcf9731f41b 100644
>>>>>>>>>> --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
>>>>>>>>>> +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
>>>>>>>>>> @@ -35,8 +35,6 @@
>>>>>>>>>>  #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
>>>>>>>>>>  #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"
>>>>>>>>>>
>>>>>>>>>> -#define RKISP1_MIN_BUFFERS_NEEDED 3
>>>>>>>>>> -
>>>>>>>>>>  enum rkisp1_plane {
>>>>>>>>>>  	RKISP1_PLANE_Y	= 0,
>>>>>>>>>>  	RKISP1_PLANE_CB	= 1,
>>>>>>>>>> @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
>>>>>>>>>>  	q->ops = &rkisp1_vb2_ops;
>>>>>>>>>>  	q->mem_ops = &vb2_dma_contig_memops;
>>>>>>>>>>  	q->buf_struct_size = sizeof(struct rkisp1_buffer);
>>>>>>>>>> -	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
>>>>>>>>>> +	q->min_queued_buffers = 0;
>>>>>>>>>
>>>>>>>>> You can probably just drop this since the vb2_queue struct is zeroed when it
>>>>>>>>> is allocated. So no need to set it to 0.
>>>>>>>>
>>>>>>>> I suspected so :)
>>>>>>>>
>>>>>>>>>
>>>>>>>>> And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change?
>>>>>>>>
>>>>>>>> No, and this patch removes it in facts
>>>>>>>>
>>>>>>>>  -#define RKISP1_MIN_BUFFERS_NEEDED 3
>>>>>>>>  -
>>>>>>>
>>>>>>> I should have checked the patch :-) Sorry for the noise.
>>>>>>>
>>>>>>>>>
>>>>>>>>> Also, see my RFC I posted today:
>>>>>>>>>
>>>>>>>>> https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xs4all.nl/T/#u
>>>>>>>>>
>>>>>>>>> My main concern is that applications that just call VIDIOC_REQBUFS with count = 1
>>>>>>>>> and expect the driver to change that to a workable value, will, in fact, now just get
>>>>>>>>> one buffer. And streaming that will cause lots of frame drops.
>>>>>>>>>
>>>>>>>>> It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available,
>>>>>>>>> but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called.
>>>>>>>>
>>>>>>>> Yeah, I've read the discussion between you and Tomi and it seemed like
>>>>>>>> a good time to re-send this patch.
>>>>>>>>
>>>>>>>>> My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in
>>>>>>>>> libcamera to get much better control over how many buffers should be allocated.
>>>>>>>>
>>>>>>>> In my understanding min_queued_buffers identifies how many buffers
>>>>>>>> should be queued before calling start_streaming, and this comes
>>>>>>>> directly from an hw/driver requirement. This doesn't mean that at
>>>>>>>> least min_queue_buffers should be queued at all the times during
>>>>>>>> streaming, at least, I don't see how and where videobuf2 enforces
>>>>>>>> this. Or does it ?
>>>>>>>
>>>>>>> It's an intrinsic property of the HW/driver: e.g. if it needs two buffers
>>>>>>> queued up for the DMA engine to work, then it really is always holding on
>>>>>>> to two buffers. The only thing the framework does is postpone calling
>>>>>>> start_streaming until that number of buffers is queued to ensure the
>>>>>>> DMA engine has what it needs to start. But after that vb2 doesn't check
>>>>>>> it.
>>>>>>
>>>>>> The "driver" part of "HW/driver" is important here, as drivers can
>>>>>> influence this in multiple ways. One of them is usage of scratch
>>>>>> buffers, but even without that, a DMA engine that requires two buffers
>>>>>> can easily be operated with a single buffer by programming the DMA
>>>>>> engine with the same buffer address twice. Drivers should really do so
>>>>>> unless they really can't.
>>>>>>
>>>>>>>> If the above is correct, then the number of buffers to be queued
>>>>>>>> during streaming is, in my opinion, less an hw/driver requirement but
>>>>>>>> more an application decision.
>>>>>>>
>>>>>>> No, min_queued_buffers is a HW/drivers property: the DMA engine can't
>>>>>>> start until that many buffers are queued up, and once it is started
>>>>>>> it will always hold on to that many buffers.
>>>>>
>>>>> I get it, my point was that once start_streaming has been called, even
>>>>> if min_queued_buffers=2, there is nothing preventing userspace from
>>>>> queing one buffer at the time once the first two have completed. Sure, the
>>>>> hw/driver might not like it, but while delaying start_streaming
>>>>> prevents bad things from happening, there is nothing in the core that
>>>>> prevents applications from potentially stalling the capture
>>>>> operations.
>>>>>
>>>>> But I get your point, if the system needs 2 buffers to start
>>>>> streaming, it will probably need two buffers to continue producing
>>>>> frames.
>>
>> Right, it won't be able to complete anything and return it to userspace
>> until it gets a third buffer. Only then can it return one buffer to userspace.
>>
>>>>>> That's not always true. The imx7-media-csi driver, for instance, sets
>>>>>> min_queued_buffers to 2, but allocates scratch buffers and uses them at
>>>>>> runtime, so that it can return all queued buffers to userspace.
>>>>>
>>>>> That's interesting. From your mention of "scratch buffers" I get there
>>>>> actually is a need to have 2 buffers queued to the HW ? How does
>>>>> that work, after all queuing a buffer to the DMA engine usually means
>>>>> pointing its write engine to one (set of) addresses.
>>>>>
>>>>> Or is it a driver-only requirement to ask for two buffers ?
>>>>
>>>> The hardware has a ping-pong mechanism with two addresses and switches
>>>> between them automatically.
>>>>
>>>>>> Grepping for min_queued_buffers I see drivers setting it to 4
>>>>>> (rcar-dma.c, rzg2l-video.c), 6 (cxusb-analog.c) or even 9
>>>>>> (zoran_driver.c) ! I doubt the zoran driver holds on to 9 buffers at
>>>>>> runtime. Your statement is not universally true today?.
>>>>>>
>>>>>> This could be considered as driver issues, and the min_queued_buffers
>>>>>> values should be fixed to match the runtime behaviour. In some cases I
>>>>>> expect it will require more work than just changing the value, as
>>>>>> drivers may implement the logic to operate with less buffers at runtime
>>>>>> but not at start time. This would be fixable, but it may also call for
>>>>>> asking if the start at runtime behaviours need to be identical.
>>>>>>
>>>>>>> So the application has to know somehow how many buffers are needed to
>>>>>>> actually stream. One way is via VIDIOC_REQBUFS since that is supposed to
>>>>>>> always return a workable number of buffers, the other is by actually
>>>>>>> reporting the minimum number of buffers as per my RFC.
>>>>>>>
>>>>>>>> As you said an application should be good with> 3 buffers (one queued, one currently being written to, one to be
>>>>>>>> consumed by the application), but in very specific cases where an
>>>>>>>> application retains the buffer for longer, for whatever reason, it
>>>>>>>> might need a larger number of queued buffers to provide the DMA
>>>>>>>> engines a space where to write data without them being discarded (to
>>>>>>>> scratch buffers or discarded by the DMA engine itself, if the HW
>>>>>>>> supports that). Or maybe an application is fine to drop frames and
>>>>>>>> only queue buffers sporadically (if the HW supports that ofc).
>>>>>>>>
>>>>>>>> For libcamera, and for this specific platform in particular, we're
>>>>>>>> going to base new developments on the assumption that
>>>>>>>> min_queued_buffers == 0, and it would be more convenient for use to be
>>>>>>>> able to access its value from userspace to identify if we're running
>>>>>>>> on a kernel with or without this patch being applied.
>>>>>>>
>>>>>>> So my proposal in my RFC to expose min_num_buffers would work for libcamera?
>>>>>>> It sounds like that's what you need.
>>>>>
>>>>> My immediate need is to know if I'm running on a "legacy" version of
>>>>> this driver that still requires 3 buffers for no apparent reason, or
>>>>> on a new version. Your proposal might work, but I still feel like we
>>>>> should report the HW/driver requirement (min_queued_buffers) instead
>>>>> of trying to suggest applications how many buffers they need to
>>>>> allocate to get "smooth streaming" or similar, as the use cases
>>>>> might be different.
>>>>>
>>>>>> It may be useful, but I think we may also just require min_num_buffers
>>>>>> == 0 for a device to be supported in libcamera. We have to implement
>>>>>
>>>>> While I concur this would be ideal, how would it work for existing
>>>>> rkisp1 implementation that do not include this patch ? libcamera
>>>>> should be able to run on both, probably in two different "modes" /o\
>>>>
>>>> A simple option is to check the kernel version, we do that in a few
>>>> places. Over time we'll increase the minimum kernel version and drop
>>>
>>> I considered that, but this patch is pretty easy to backport, having
>>> something that tells to userspace the value of
>>> min_queued_buffers might be useful indeed.
>>
>> The implementation is similar to that of the new max_num_buffers field
>> in struct v4l2_create_buffers: it will be signaled by a new buffer
>> capability flag. So you can check that in the code. Much better than
>> relying on kernel versions.
> 
> Sure, I agree that an explicit API is better. I'm not a big fan of
> adding a field to v4l2_create_buffers though (I didn't realize it was
> extended with a max_num_buffers field) as it would require calling
> VIDIOC_CREATE_BUFS to get the information. That can get quite awkward to
> use for userspace.

Why would that be awkward? If called with count = 0, then it will just
report the capabilities and it will not create any buffers.

Regards,

	Hans

> 
> This being said, libcamera today won't behave properly with rkisp1, as
> it won't be able to give the application the last buffers, as guaranteed
> by the libcamera API. Relying on min_num_buffers == 0 will fix it and
> won't cause any regression on kernels that still use 3, it will just
> keep the current behaviour.
> 
>>>> support for legacy APIs.
>>>>
>>>>>> APIs such as the Android camera HAL that has no concept of buffers being
>>>>>> kept by the device. This could possibly be handled within libcamera by
>>>>>> allocating scratch buffers in userspace, but that comes with other
>>>>>> challenges. I would like to at least try to get help from the kernel
>>>>>> until proven that it's a bad idea.
>
Laurent Pinchart Oct. 29, 2024, 10:35 a.m. UTC | #12
On Tue, Oct 29, 2024 at 10:54:31AM +0100, Hans Verkuil wrote:
> On 10/29/24 09:56, Laurent Pinchart wrote:
> > On Tue, Oct 29, 2024 at 09:26:58AM +0100, Hans Verkuil wrote:
> >> On 29/10/2024 08:01, Jacopo Mondi wrote:
> >>> On Mon, Oct 28, 2024 at 08:39:36PM +0200, Laurent Pinchart wrote:
> >>>> On Mon, Oct 28, 2024 at 06:08:18PM +0100, Jacopo Mondi wrote:
> >>>>> On Mon, Oct 28, 2024 at 06:21:41PM +0200, Laurent Pinchart wrote:
> >>>>>> On Mon, Oct 28, 2024 at 04:48:55PM +0100, Hans Verkuil wrote:
> >>>>>>> On 28/10/2024 16:30, Jacopo Mondi wrote:
> >>>>>>>> On Mon, Oct 28, 2024 at 04:02:13PM +0100, Hans Verkuil wrote:
> >>>>>>>>> On 28/10/2024 15:35, Jacopo Mondi wrote:
> >>>>>>>>>> There apparently is no reason to require 3 queued buffers to call
> >>>>>>>>>> streamon() for the RkISP1 as the driver operates with a scratch buffer
> >>>>>>>>>> where frames can be directed to if there's no available buffer provided
> >>>>>>>>>> by userspace.
> >>>>>>>>>>
> >>>>>>>>>> Reduce the number of required buffers to 0 to allow applications to
> >>>>>>>>>> operate by queueing capture buffers on-demand.
> >>>>>>>>>>
> >>>>>>>>>> Tested with libcamera, by operating with a single capture request. The
> >>>>>>>>>> same request (and associated capture buffer) gets recycled once
> >>>>>>>>>> completed. This of course causes a frame rate drop but doesn't hinder
> >>>>>>>>>> operations.
> >>>>>>>>>>
> >>>>>>>>>> Signed-off-by: Jacopo Mondi <jacopo.mondi@ideasonboard.com>
> >>>>>>>>>> ---
> >>>>>>>>>> The first version of this patch set min_queued_buffers to 1, but setting it
> >>>>>>>>>> to 0 doesn't compromise operations and it's even better as it allows application
> >>>>>>>>>> to queue buffers to the capture devices on-demand. If a buffer is not provided
> >>>>>>>>>> to the DMA engines, image data gets directed to the driver's internal scratch
> >>>>>>>>>> buffer.
> >>>>>>>>>> ---
> >>>>>>>>>>  drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c | 4 +---
> >>>>>>>>>>  1 file changed, 1 insertion(+), 3 deletions(-)
> >>>>>>>>>>
> >>>>>>>>>> diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> >>>>>>>>>> index 2bddb4fa8a5c..5fcf9731f41b 100644
> >>>>>>>>>> --- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> >>>>>>>>>> +++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
> >>>>>>>>>> @@ -35,8 +35,6 @@
> >>>>>>>>>>  #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
> >>>>>>>>>>  #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"
> >>>>>>>>>>
> >>>>>>>>>> -#define RKISP1_MIN_BUFFERS_NEEDED 3
> >>>>>>>>>> -
> >>>>>>>>>>  enum rkisp1_plane {
> >>>>>>>>>>  	RKISP1_PLANE_Y	= 0,
> >>>>>>>>>>  	RKISP1_PLANE_CB	= 1,
> >>>>>>>>>> @@ -1563,7 +1561,7 @@ static int rkisp1_register_capture(struct rkisp1_capture *cap)
> >>>>>>>>>>  	q->ops = &rkisp1_vb2_ops;
> >>>>>>>>>>  	q->mem_ops = &vb2_dma_contig_memops;
> >>>>>>>>>>  	q->buf_struct_size = sizeof(struct rkisp1_buffer);
> >>>>>>>>>> -	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
> >>>>>>>>>> +	q->min_queued_buffers = 0;
> >>>>>>>>>
> >>>>>>>>> You can probably just drop this since the vb2_queue struct is zeroed when it
> >>>>>>>>> is allocated. So no need to set it to 0.
> >>>>>>>>
> >>>>>>>> I suspected so :)
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> And is the RKISP1_MIN_BUFFERS_NEEDED define still needed after this change?
> >>>>>>>>
> >>>>>>>> No, and this patch removes it in facts
> >>>>>>>>
> >>>>>>>>  -#define RKISP1_MIN_BUFFERS_NEEDED 3
> >>>>>>>>  -
> >>>>>>>
> >>>>>>> I should have checked the patch :-) Sorry for the noise.
> >>>>>>>
> >>>>>>>>>
> >>>>>>>>> Also, see my RFC I posted today:
> >>>>>>>>>
> >>>>>>>>> https://lore.kernel.org/linux-media/126cd76a-6224-483b-a18d-a3cc89e5ff2d@xs4all.nl/T/#u
> >>>>>>>>>
> >>>>>>>>> My main concern is that applications that just call VIDIOC_REQBUFS with count = 1
> >>>>>>>>> and expect the driver to change that to a workable value, will, in fact, now just get
> >>>>>>>>> one buffer. And streaming that will cause lots of frame drops.
> >>>>>>>>>
> >>>>>>>>> It makes sense to leave min_queued_buffers at 0 if a scratch buffer is available,
> >>>>>>>>> but I'm unhappy with the fact that you get a poor experience when REQBUFS(1) is called.
> >>>>>>>>
> >>>>>>>> Yeah, I've read the discussion between you and Tomi and it seemed like
> >>>>>>>> a good time to re-send this patch.
> >>>>>>>>
> >>>>>>>>> My RFC suggests improvements in the uAPI. With that in place you can use CREATE_BUFS in
> >>>>>>>>> libcamera to get much better control over how many buffers should be allocated.
> >>>>>>>>
> >>>>>>>> In my understanding min_queued_buffers identifies how many buffers
> >>>>>>>> should be queued before calling start_streaming, and this comes
> >>>>>>>> directly from an hw/driver requirement. This doesn't mean that at
> >>>>>>>> least min_queue_buffers should be queued at all the times during
> >>>>>>>> streaming, at least, I don't see how and where videobuf2 enforces
> >>>>>>>> this. Or does it ?
> >>>>>>>
> >>>>>>> It's an intrinsic property of the HW/driver: e.g. if it needs two buffers
> >>>>>>> queued up for the DMA engine to work, then it really is always holding on
> >>>>>>> to two buffers. The only thing the framework does is postpone calling
> >>>>>>> start_streaming until that number of buffers is queued to ensure the
> >>>>>>> DMA engine has what it needs to start. But after that vb2 doesn't check
> >>>>>>> it.
> >>>>>>
> >>>>>> The "driver" part of "HW/driver" is important here, as drivers can
> >>>>>> influence this in multiple ways. One of them is usage of scratch
> >>>>>> buffers, but even without that, a DMA engine that requires two buffers
> >>>>>> can easily be operated with a single buffer by programming the DMA
> >>>>>> engine with the same buffer address twice. Drivers should really do so
> >>>>>> unless they really can't.
> >>>>>>
> >>>>>>>> If the above is correct, then the number of buffers to be queued
> >>>>>>>> during streaming is, in my opinion, less an hw/driver requirement but
> >>>>>>>> more an application decision.
> >>>>>>>
> >>>>>>> No, min_queued_buffers is a HW/drivers property: the DMA engine can't
> >>>>>>> start until that many buffers are queued up, and once it is started
> >>>>>>> it will always hold on to that many buffers.
> >>>>>
> >>>>> I get it, my point was that once start_streaming has been called, even
> >>>>> if min_queued_buffers=2, there is nothing preventing userspace from
> >>>>> queing one buffer at the time once the first two have completed. Sure, the
> >>>>> hw/driver might not like it, but while delaying start_streaming
> >>>>> prevents bad things from happening, there is nothing in the core that
> >>>>> prevents applications from potentially stalling the capture
> >>>>> operations.
> >>>>>
> >>>>> But I get your point, if the system needs 2 buffers to start
> >>>>> streaming, it will probably need two buffers to continue producing
> >>>>> frames.
> >>
> >> Right, it won't be able to complete anything and return it to userspace
> >> until it gets a third buffer. Only then can it return one buffer to userspace.
> >>
> >>>>>> That's not always true. The imx7-media-csi driver, for instance, sets
> >>>>>> min_queued_buffers to 2, but allocates scratch buffers and uses them at
> >>>>>> runtime, so that it can return all queued buffers to userspace.
> >>>>>
> >>>>> That's interesting. From your mention of "scratch buffers" I get there
> >>>>> actually is a need to have 2 buffers queued to the HW ? How does
> >>>>> that work, after all queuing a buffer to the DMA engine usually means
> >>>>> pointing its write engine to one (set of) addresses.
> >>>>>
> >>>>> Or is it a driver-only requirement to ask for two buffers ?
> >>>>
> >>>> The hardware has a ping-pong mechanism with two addresses and switches
> >>>> between them automatically.
> >>>>
> >>>>>> Grepping for min_queued_buffers I see drivers setting it to 4
> >>>>>> (rcar-dma.c, rzg2l-video.c), 6 (cxusb-analog.c) or even 9
> >>>>>> (zoran_driver.c) ! I doubt the zoran driver holds on to 9 buffers at
> >>>>>> runtime. Your statement is not universally true today?.
> >>>>>>
> >>>>>> This could be considered as driver issues, and the min_queued_buffers
> >>>>>> values should be fixed to match the runtime behaviour. In some cases I
> >>>>>> expect it will require more work than just changing the value, as
> >>>>>> drivers may implement the logic to operate with less buffers at runtime
> >>>>>> but not at start time. This would be fixable, but it may also call for
> >>>>>> asking if the start at runtime behaviours need to be identical.
> >>>>>>
> >>>>>>> So the application has to know somehow how many buffers are needed to
> >>>>>>> actually stream. One way is via VIDIOC_REQBUFS since that is supposed to
> >>>>>>> always return a workable number of buffers, the other is by actually
> >>>>>>> reporting the minimum number of buffers as per my RFC.
> >>>>>>>
> >>>>>>>> As you said an application should be good with> 3 buffers (one queued, one currently being written to, one to be
> >>>>>>>> consumed by the application), but in very specific cases where an
> >>>>>>>> application retains the buffer for longer, for whatever reason, it
> >>>>>>>> might need a larger number of queued buffers to provide the DMA
> >>>>>>>> engines a space where to write data without them being discarded (to
> >>>>>>>> scratch buffers or discarded by the DMA engine itself, if the HW
> >>>>>>>> supports that). Or maybe an application is fine to drop frames and
> >>>>>>>> only queue buffers sporadically (if the HW supports that ofc).
> >>>>>>>>
> >>>>>>>> For libcamera, and for this specific platform in particular, we're
> >>>>>>>> going to base new developments on the assumption that
> >>>>>>>> min_queued_buffers == 0, and it would be more convenient for use to be
> >>>>>>>> able to access its value from userspace to identify if we're running
> >>>>>>>> on a kernel with or without this patch being applied.
> >>>>>>>
> >>>>>>> So my proposal in my RFC to expose min_num_buffers would work for libcamera?
> >>>>>>> It sounds like that's what you need.
> >>>>>
> >>>>> My immediate need is to know if I'm running on a "legacy" version of
> >>>>> this driver that still requires 3 buffers for no apparent reason, or
> >>>>> on a new version. Your proposal might work, but I still feel like we
> >>>>> should report the HW/driver requirement (min_queued_buffers) instead
> >>>>> of trying to suggest applications how many buffers they need to
> >>>>> allocate to get "smooth streaming" or similar, as the use cases
> >>>>> might be different.
> >>>>>
> >>>>>> It may be useful, but I think we may also just require min_num_buffers
> >>>>>> == 0 for a device to be supported in libcamera. We have to implement
> >>>>>
> >>>>> While I concur this would be ideal, how would it work for existing
> >>>>> rkisp1 implementation that do not include this patch ? libcamera
> >>>>> should be able to run on both, probably in two different "modes" /o\
> >>>>
> >>>> A simple option is to check the kernel version, we do that in a few
> >>>> places. Over time we'll increase the minimum kernel version and drop
> >>>
> >>> I considered that, but this patch is pretty easy to backport, having
> >>> something that tells to userspace the value of
> >>> min_queued_buffers might be useful indeed.
> >>
> >> The implementation is similar to that of the new max_num_buffers field
> >> in struct v4l2_create_buffers: it will be signaled by a new buffer
> >> capability flag. So you can check that in the code. Much better than
> >> relying on kernel versions.
> > 
> > Sure, I agree that an explicit API is better. I'm not a big fan of
> > adding a field to v4l2_create_buffers though (I didn't realize it was
> > extended with a max_num_buffers field) as it would require calling
> > VIDIOC_CREATE_BUFS to get the information. That can get quite awkward to
> > use for userspace.
> 
> Why would that be awkward? If called with count = 0, then it will just
> report the capabilities and it will not create any buffers.

I wasn't aware you could call it with count = 0 for that purpose. This
addresses my concerns.

> > This being said, libcamera today won't behave properly with rkisp1, as
> > it won't be able to give the application the last buffers, as guaranteed
> > by the libcamera API. Relying on min_num_buffers == 0 will fix it and
> > won't cause any regression on kernels that still use 3, it will just
> > keep the current behaviour.
> > 
> >>>> support for legacy APIs.
> >>>>
> >>>>>> APIs such as the Android camera HAL that has no concept of buffers being
> >>>>>> kept by the device. This could possibly be handled within libcamera by
> >>>>>> allocating scratch buffers in userspace, but that comes with other
> >>>>>> challenges. I would like to at least try to get help from the kernel
> >>>>>> until proven that it's a bad idea.
diff mbox series

Patch

diff --git a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
index 2bddb4fa8a5c..5fcf9731f41b 100644
--- a/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
+++ b/drivers/media/platform/rockchip/rkisp1/rkisp1-capture.c
@@ -35,8 +35,6 @@ 
 #define RKISP1_SP_DEV_NAME	RKISP1_DRIVER_NAME "_selfpath"
 #define RKISP1_MP_DEV_NAME	RKISP1_DRIVER_NAME "_mainpath"

-#define RKISP1_MIN_BUFFERS_NEEDED 3
-
 enum rkisp1_plane {
 	RKISP1_PLANE_Y	= 0,
 	RKISP1_PLANE_CB	= 1,
@@ -1563,7 +1561,7 @@  static int rkisp1_register_capture(struct rkisp1_capture *cap)
 	q->ops = &rkisp1_vb2_ops;
 	q->mem_ops = &vb2_dma_contig_memops;
 	q->buf_struct_size = sizeof(struct rkisp1_buffer);
-	q->min_queued_buffers = RKISP1_MIN_BUFFERS_NEEDED;
+	q->min_queued_buffers = 0;
 	q->timestamp_flags = V4L2_BUF_FLAG_TIMESTAMP_MONOTONIC;
 	q->lock = &node->vlock;
 	q->dev = cap->rkisp1->dev;