diff mbox

[RFC,1/2,media] vb2: defer sync buffers from vb2_buffer_done() with a workqueue

Message ID 1471458537-16859-2-git-send-email-javier@osg.samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Javier Martinez Canillas Aug. 17, 2016, 6:28 p.m. UTC
The vb2_buffer_done() function can be called from interrupt context but it
currently calls the vb2 memory allocator .finish operation to sync buffers
and this can take a long time, so it's not suitable to be done there.

This patch defers part of the vb2_buffer_done() logic to a worker thread
to avoid doing the time consuming operation in interrupt context.

This will also allow to unmap the dmabuf as soon as possible once the buf
has been processed by the driver instead of waiting until user-space call
VIDIOC_DQBUF. Since the dmabuf unmap can sleep, it couldn't be done from
the vb2_buffer_done() function as it was before.

Suggested-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com>
---

 drivers/media/v4l2-core/videobuf2-core.c | 81 +++++++++++++++++++++-----------
 include/media/videobuf2-core.h           |  5 ++
 2 files changed, 58 insertions(+), 28 deletions(-)

Comments

Sakari Ailus Aug. 17, 2016, 7:50 p.m. UTC | #1
Hi Javier,

On Wed, Aug 17, 2016 at 02:28:56PM -0400, Javier Martinez Canillas wrote:
> The vb2_buffer_done() function can be called from interrupt context but it
> currently calls the vb2 memory allocator .finish operation to sync buffers
> and this can take a long time, so it's not suitable to be done there.
> 
> This patch defers part of the vb2_buffer_done() logic to a worker thread
> to avoid doing the time consuming operation in interrupt context.

I agree the interrupt handler is not the best place to perform the work in
vb2_buffer_done() (including cache flushing), but is a work queue an ideal
solution?

The work queue task is a regular kernel thread not subject to
sched_setscheduler(2) and alike, which user space programs can and do use to
change how the scheduler treats these processes. Requiring a work queue to
be run between the interrupt arriving from the hardware and the user space
process being able to dequeue the related buffer would hurt use cases where
strict time limits are crucial.

Neither I propose making the work queue to have real time priority either,
albeit I think might still be marginally better.

Additionally, the work queue brings another context switch per dequeued
buffer. This would also be undesirable on IoT and mobile systems that often
handle multiple buffer queues simultaneously.

Performing this task in the context of the process that actually dequeues
the buffer avoids both of these problem entirely as there are no other
processes involved.
Javier Martinez Canillas Aug. 17, 2016, 9:21 p.m. UTC | #2
Hello Sakari,

Thanks a lot for your feedback.

On 08/17/2016 03:50 PM, Sakari Ailus wrote:
> Hi Javier,
> 
> On Wed, Aug 17, 2016 at 02:28:56PM -0400, Javier Martinez Canillas wrote:
>> The vb2_buffer_done() function can be called from interrupt context but it
>> currently calls the vb2 memory allocator .finish operation to sync buffers
>> and this can take a long time, so it's not suitable to be done there.
>>
>> This patch defers part of the vb2_buffer_done() logic to a worker thread
>> to avoid doing the time consuming operation in interrupt context.
> 
> I agree the interrupt handler is not the best place to perform the work in
> vb2_buffer_done() (including cache flushing), but is a work queue an ideal
> solution?
>

I would also like to know Hans opinions since he suggested deferring the buffer
sync to be done in a worker thread (if I understood his suggestions correctly).
 
> The work queue task is a regular kernel thread not subject to
> sched_setscheduler(2) and alike, which user space programs can and do use to
> change how the scheduler treats these processes. Requiring a work queue to
> be run between the interrupt arriving from the hardware and the user space
> process being able to dequeue the related buffer would hurt use cases where
> strict time limits are crucial.
> 
> Neither I propose making the work queue to have real time priority either,
> albeit I think might still be marginally better.
>
> Additionally, the work queue brings another context switch per dequeued
> buffer. This would also be undesirable on IoT and mobile systems that often
> handle multiple buffer queues simultaneously.
>

Yes, I agree with you that this change might increase the latency.
 
> Performing this task in the context of the process that actually dequeues
> the buffer avoids both of these problem entirely as there are no other
> processes involved.
> 

You already mentioned in the other thread that you prefer to move the buffer
sync to DQBUF. But as I explained there, the reason why I want to move the
dma-buf unmapping out of DQBUF is to allow other drivers that share the same
DMA buffer to access the buffer even when a DQBUF has not been called yet.

This may be possible if vb2 supports implicit dma-buf fences and in this case
user-space doesn't even need to call DQBUF/QBUF, since the fences can be used
to serialize the access to the shared DMA buffer. Instead of using DQBUF/QBUF
as a serialization mechanism like is the case for the current non-fences case.

It would be possible to move the cache flushing to DQBUF and leave the dma-buf
unmap there, and do these operations when the driver calls vb2_buffer_done()
only when implicit fences are used. But it would simplify the vb2-core if this
is consistent between the fences and non-fences cases.

Best regards,
diff mbox

Patch

diff --git a/drivers/media/v4l2-core/videobuf2-core.c b/drivers/media/v4l2-core/videobuf2-core.c
index 1dbd7beb71f0..14bed8acf3cf 100644
--- a/drivers/media/v4l2-core/videobuf2-core.c
+++ b/drivers/media/v4l2-core/videobuf2-core.c
@@ -315,6 +315,45 @@  static void __setup_offsets(struct vb2_buffer *vb)
 	}
 }
 
+static void vb2_done_work(struct work_struct *work)
+{
+	struct vb2_buffer *vb = container_of(work, struct vb2_buffer,
+					     done_work);
+	struct vb2_queue *q = vb->vb2_queue;
+	unsigned long flags;
+	unsigned int plane;
+
+	/* sync buffers */
+	for (plane = 0; plane < vb->num_planes; ++plane)
+		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
+
+	spin_lock_irqsave(&q->done_lock, flags);
+	if (vb->state == VB2_BUF_STATE_QUEUED ||
+	    vb->state == VB2_BUF_STATE_REQUEUEING) {
+		vb->state = VB2_BUF_STATE_QUEUED;
+	} else {
+		/* Add the buffer to the done buffers list */
+		list_add_tail(&vb->done_entry, &q->done_list);
+	}
+	atomic_dec(&q->owned_by_drv_count);
+	spin_unlock_irqrestore(&q->done_lock, flags);
+
+	trace_vb2_buf_done(q, vb);
+
+	switch (vb->state) {
+	case VB2_BUF_STATE_QUEUED:
+		break;
+	case VB2_BUF_STATE_REQUEUEING:
+		if (q->start_streaming_called)
+			__enqueue_in_driver(vb);
+		break;
+	default:
+		/* Inform any processes that may be waiting for buffers */
+		wake_up(&q->done_wq);
+		break;
+	}
+}
+
 /**
  * __vb2_queue_alloc() - allocate videobuf buffer structures and (for MMAP type)
  * video buffer memory for all buffers/planes on the queue and initializes the
@@ -330,6 +369,10 @@  static int __vb2_queue_alloc(struct vb2_queue *q, enum vb2_memory memory,
 	struct vb2_buffer *vb;
 	int ret;
 
+	q->vb2_workqueue = alloc_ordered_workqueue("vb2_wq", 0);
+	if (!q->vb2_workqueue)
+		return buffer;
+
 	for (buffer = 0; buffer < num_buffers; ++buffer) {
 		/* Allocate videobuf buffer structures */
 		vb = kzalloc(q->buf_struct_size, GFP_KERNEL);
@@ -344,6 +387,8 @@  static int __vb2_queue_alloc(struct vb2_queue *q, enum vb2_memory memory,
 		vb->index = q->num_buffers + buffer;
 		vb->type = q->type;
 		vb->memory = memory;
+		INIT_WORK(&vb->done_work, vb2_done_work);
+
 		for (plane = 0; plane < num_planes; ++plane) {
 			vb->planes[plane].length = plane_sizes[plane];
 			vb->planes[plane].min_length = plane_sizes[plane];
@@ -982,7 +1027,6 @@  void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
 {
 	struct vb2_queue *q = vb->vb2_queue;
 	unsigned long flags;
-	unsigned int plane;
 
 	if (WARN_ON(vb->state != VB2_BUF_STATE_ACTIVE))
 		return;
@@ -1003,36 +1047,11 @@  void vb2_buffer_done(struct vb2_buffer *vb, enum vb2_buffer_state state)
 	dprintk(4, "done processing on buffer %d, state: %d\n",
 			vb->index, state);
 
-	/* sync buffers */
-	for (plane = 0; plane < vb->num_planes; ++plane)
-		call_void_memop(vb, finish, vb->planes[plane].mem_priv);
-
 	spin_lock_irqsave(&q->done_lock, flags);
-	if (state == VB2_BUF_STATE_QUEUED ||
-	    state == VB2_BUF_STATE_REQUEUEING) {
-		vb->state = VB2_BUF_STATE_QUEUED;
-	} else {
-		/* Add the buffer to the done buffers list */
-		list_add_tail(&vb->done_entry, &q->done_list);
-		vb->state = state;
-	}
-	atomic_dec(&q->owned_by_drv_count);
+	vb->state = state;
 	spin_unlock_irqrestore(&q->done_lock, flags);
 
-	trace_vb2_buf_done(q, vb);
-
-	switch (state) {
-	case VB2_BUF_STATE_QUEUED:
-		return;
-	case VB2_BUF_STATE_REQUEUEING:
-		if (q->start_streaming_called)
-			__enqueue_in_driver(vb);
-		return;
-	default:
-		/* Inform any processes that may be waiting for buffers */
-		wake_up(&q->done_wq);
-		break;
-	}
+	queue_work(q->vb2_workqueue, &vb->done_work);
 }
 EXPORT_SYMBOL_GPL(vb2_buffer_done);
 
@@ -1812,6 +1831,12 @@  static void __vb2_queue_cancel(struct vb2_queue *q)
 	if (q->start_streaming_called)
 		call_void_qop(q, stop_streaming, q);
 
+	if (q->vb2_workqueue) {
+		flush_workqueue(q->vb2_workqueue);
+		destroy_workqueue(q->vb2_workqueue);
+		q->vb2_workqueue = NULL;
+	}
+
 	/*
 	 * If you see this warning, then the driver isn't cleaning up properly
 	 * in stop_streaming(). See the stop_streaming() documentation in
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index a4a9a55a0c42..dd765ef06cce 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -242,6 +242,9 @@  struct vb2_buffer {
 
 	struct list_head	queued_entry;
 	struct list_head	done_entry;
+
+	struct work_struct	done_work;
+
 #ifdef CONFIG_VIDEO_ADV_DEBUG
 	/*
 	 * Counters for how often these buffer-related ops are
@@ -523,6 +526,8 @@  struct vb2_queue {
 	struct vb2_fileio_data		*fileio;
 	struct vb2_threadio_data	*threadio;
 
+	struct workqueue_struct		*vb2_workqueue;
+
 #ifdef CONFIG_VIDEO_ADV_DEBUG
 	/*
 	 * Counters for how often these queue-related ops are