diff mbox series

[v3] ALSA: compress_offload: introduce passthrough operation mode

Message ID 20240621164915.410946-1-perex@perex.cz (mailing list archive)
State Superseded
Headers show
Series [v3] ALSA: compress_offload: introduce passthrough operation mode | expand

Commit Message

Jaroslav Kysela June 21, 2024, 4:49 p.m. UTC
There is a requirement to expose the audio hardware that accelerates various
tasks for user space such as sample rate converters, compressed
stream decoders, etc.

This is description for the API extension for the compress ALSA API which
is able to handle "tasks" that are not bound to real-time operations
and allows for the serialization of operations.

For details, refer to "compress-passthrough.rst" document.

Note: This code is RFC (not tested, just to clearify the API requirements).
My goal is to add a test (loopback) driver and add a support to tinycompress
library in the next step.

Cc: Mark Brown <broonie@kernel.org>
Cc: Shengjiu Wang <shengjiu.wang@gmail.com>
Cc: Nicolas Dufresne <nicolas@ndufresne.ca>
Cc: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Cc: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Jaroslav Kysela <perex@perex.cz>

---
v2..v3:
  - fix missing runtime->tasks initialization (thanks Shengjiu Wang)
  - fix missing seqno initialization in task_new (thanks Shengjiu Wang)
  - fix reference counting for allocated dma buffers (thanks Shengjiu Wang)
  - use origin_seqno to reuse the already allocated buffers for new task

v1..v2:
  - fix some documentation typos (thanks Amadeusz Sławiński)
  - fix memdup_user() error handling (thanks Takashi)
  - use one state variable instead multiple (thanks Takashi)
  - handle task limit (set to 64 - mentioned in documentation, NIY)
  - fix file release (free all tasks)
---
 .../sound/designs/compress-passthrough.rst    | 125 +++++++
 include/sound/compress_driver.h               |  32 ++
 include/uapi/sound/compress_offload.h         |  51 ++-
 sound/core/Kconfig                            |   4 +
 sound/core/compress_offload.c                 | 338 +++++++++++++++++-
 5 files changed, 541 insertions(+), 9 deletions(-)
 create mode 100644 Documentation/sound/designs/compress-passthrough.rst

Comments

Pierre-Louis Bossart June 24, 2024, 7:13 a.m. UTC | #1
Thanks Jaroslav, couple of questions below:

> +For the buffering parameters, the fragments means a limit of allocated tasks
> +for given device. The fragment_size limits the input buffer size for the given
> +device. The output buffer size is determined by the driver (may be different
> +from the input buffer size).

if (stream->direction == SND_COMPRESS_PASSTHROUGH)
	max_fragments = 64;			/* safe value */

Is there anything preventing us from increasing this if there was a need
for it? Wondering if this would be an ABI restriction or just an
internal safeguard userspace doesn't need to know.

> +
> +State Machine
> +=============
> +
> +The passthrough audio stream state machine is described below :
> +
> +                                       +----------+
> +                                       |          |
> +                                       |   OPEN   |
> +                                       |          |
> +                                       +----------+
> +                                             |
> +                                             |
> +                                             | compr_set_params()
> +                                             |
> +                                             v
> +         all passthrough task ops      +----------+
> +  +------------------------------------|          |
> +  |                                    |   SETUP  |
> +  |                                    |
> +  |                                    +----------+
> +  |                                          |
> +  +------------------------------------------+
> +
> +
> +Passthrough operations (ioctls)
> +===============================
> +
> +CREATE
> +------
> +Creates a set of input/output buffers. The input buffer size is
> +fragment_size. Allocates unique seqno.
> +
> +The hardware drivers allocate internal 'struct dma_buf' for both input and
> +output buffers (using 'dma_buf_export()' function). The anonymous
> +file descriptors for those buffers are passed to user space.

The code adds the tasks in the order in which they are created:

	list_add_tail(&task->list, &stream->runtime->tasks);

This should probably be documented, there's no explicit mechanism to
chain the tasks other than the order of creation.

> +FREE
> +----
> +Free a set of input/output buffers. If a task is active, the stop
> +operation is executed before. If seqno is zero, operation is executed for all
> +tasks.

Can a task in the middle of the list be freed?

If yes, is any locking required?

> +START
> +-----
> +Starts (queues) a task. There are two cases of the task start - right after
> +the task is created. In this case, origin_seqno must be zero.
> +The second case is for reusing of already finished task. The origin_seqno
> +must identify the task to be reused. In both cases, a new seqno value
> +is allocated and returned to user space.
> +
> +The prerequisite is that application filled input dma buffer with
> +new source data and set input_size to pass the real data size to the driver.
> +
> +The order of data processing is preserved (first started job must be
> +finished at first).
> +
> +STOP
> +----
> +Stop (dequeues) a task. If seqno is zero, operation is executed for all
> +tasks.

What happens if a STOP is sent to a task in the middle of the list?


It's similar to the question on free above, the creation needs to follow
an order but the free/stop can be individual tasks so there could be
interesting state machine transition and programming errors.

I honestly find the state machine confusing, it looks like in the SETUP
stage tasks can be added/removed dynamically, but I am not sure if it's
a real use case? Most pipeline management add a bunch of processing,
then go in the 'run' mode. Adding/removing stuff on a running pipeline
is really painful and not super useful, is it?

>  /**
>   * struct snd_compr_runtime: runtime stream description
>   * @state: stream state
> @@ -54,6 +70,11 @@ struct snd_compr_runtime {
>  	dma_addr_t dma_addr;
>  	size_t dma_bytes;
>  	struct snd_dma_buffer *dma_buffer_p;
> +
> +	u32 active_tasks;
> +	u32 total_tasks;
> +	u64 task_seqno;
> +	struct list_head tasks;
>  };

should there be some sort of identifier that says the stream in used in
passthrough mode and only then are the 4 added members relevant?


'struct snd_compr_runtime' doesn't have a notion of direction, so
there's no real way to know what set_params() requested.

> diff --git a/include/uapi/sound/compress_offload.h b/include/uapi/sound/compress_offload.h
> index d185957f3fe0..5fed1979522b 100644
> --- a/include/uapi/sound/compress_offload.h
> +++ b/include/uapi/sound/compress_offload.h
> @@ -1,4 +1,4 @@
> -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +	/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */

spurious change?

>  /*
>   *  compress_offload.h - compress offload header definations
>   *
> @@ -14,7 +14,7 @@

> +/**
> + * struct snd_compr_task - task primitive for non-realtime operation
> + * @seqno: sequence number (task identifier)
> + * @origin_seqno: previous sequence number (task identifier) - for reuse
> + * @input_fd: data input file descriptor (dma-buf)
> + * @output_fd: data output file descriptor (dma-buf)
> + * @input_size: filled data in bytes (from caller, must not exceed fragment size)
> + */
> +struct snd_compr_task {
> +	__u64 seqno;
> +	__u64 origin_seqno;
> +	int input_fd;
> +	int output_fd;
> +	__u64 input_size;

Any reason why output_size is not listed here....

> +	__u8 reserved[16];
> +} __attribute__((packed, aligned(4)));
> +
> +enum snd_compr_state {
> +	SND_COMPRESS_TASK_STATE_IDLE = 0,
> +	SND_COMPRESS_TASK_STATE_ACTIVE,
> +	SND_COMPRESS_TASK_STATE_FINISHED
> +};
> +
> +/**
> + * struct snd_compr_task_status - task status
> + * @seqno: sequence number (task identifier)
> + * @output_size: filled data in bytes (from driver)
> + * @state: actual task state (SND_COMPRESS_TASK_STATE_*)
> + */
> +struct snd_compr_task_status {
> +	__u64 seqno;
> +	__u64 output_size;

... but it's listed here.

It'd be worth explaining why the input and output are in different
structures. I can understand that for the configuration only the host
can provide data in the input, but in a status it'd be good to have a
snapshot of the two variables, no?


> +	__u8 state;
> +	__u8 reserved[15];
> +} __attribute__((packed, aligned(4)));

> +static int snd_compr_task_new(struct snd_compr_stream *stream, struct snd_compr_task *utask)
> +{
> +	struct snd_compr_task_runtime *task;
> +	int retval;
> +
> +	if (stream->runtime->total_tasks >= stream->runtime->fragments)
> +		return -EBUSY;
> +	if (utask->origin_seqno != 0 || utask->input_size != 0)
> +		return -EINVAL;
> +	task = kzalloc(sizeof(*task), GFP_KERNEL);
> +	if (task == NULL)
> +		return -ENOMEM;
> +	task->seqno = utask->seqno = snd_compr_seqno_next(stream);
> +	task->input_size = utask->input_size;
> +	retval = stream->ops->task_create(stream, task);
> +	if (retval < 0)
> +		goto cleanup;
> +	utask->input_fd = dma_buf_fd(task->input, O_WRONLY|O_CLOEXEC);
> +	if (utask->input_fd < 0) {
> +		retval = utask->input_fd;
> +		goto cleanup;
> +	}
> +	utask->output_fd = dma_buf_fd(task->output, O_RDONLY|O_CLOEXEC);
> +	if (utask->output_fd < 0) {
> +		retval = utask->output_fd;
> +		goto cleanup;
> +	}
> +	/* keep dmabuf reference until freed with task free ioctl */
> +	dma_buf_get(utask->input_fd);
> +	dma_buf_get(utask->output_fd);
> +	list_add_tail(&task->list, &stream->runtime->tasks);
> +	stream->runtime->total_tasks++;

no locking for these two lines? If there's already a lock handled by the
ALSA/ASoC/compressed frameworks, it'd be worth explaining which one is
assumed to be held.

> +	return 0;
> +cleanup:
> +	snd_compr_task_free(task);
> +	return retval;
> +}
Takashi Iwai June 24, 2024, 1:19 p.m. UTC | #2
On Fri, 21 Jun 2024 18:49:15 +0200,
Jaroslav Kysela wrote:
>  /**
>   * struct snd_compr_runtime: runtime stream description
>   * @state: stream state
> @@ -54,6 +70,11 @@ struct snd_compr_runtime {
>  	dma_addr_t dma_addr;
>  	size_t dma_bytes;
>  	struct snd_dma_buffer *dma_buffer_p;
> +
> +	u32 active_tasks;
> +	u32 total_tasks;
> +	u64 task_seqno;
> +	struct list_head tasks;

Those new fields deserve for some more comments (at least mentioning
that those are for passthrough operations).

> --- a/sound/core/Kconfig
> +++ b/sound/core/Kconfig
> @@ -59,6 +59,10 @@ config SND_CORE_TEST
>  config SND_COMPRESS_OFFLOAD
>  	tristate
>  
> +config SND_COMPRESS_PASSTHROUGH
> +	select DMA_BUF
> +	tristate

I suppose this is a boolean?

> --- a/sound/core/compress_offload.c
> +++ b/sound/core/compress_offload.c
(snip)
> @@ -85,6 +92,8 @@ static int snd_compr_open(struct inode *inode, struct file *f)
>  		dirn = SND_COMPRESS_PLAYBACK;
>  	else if ((f->f_flags & O_ACCMODE) == O_RDONLY)
>  		dirn = SND_COMPRESS_CAPTURE;
> +	else if ((f->f_flags & O_ACCMODE) == O_RDWR)
> +		dirn = SND_COMPRESS_PASSTHROUGH;
>  	else
>  		return -EINVAL;

Shouldn't it be safer to give an error for a RW access unless the
driver supports the passthrough mode?

> @@ -939,6 +988,262 @@ static int snd_compr_partial_drain(struct snd_compr_stream *stream)
>  	return snd_compress_wait_for_drain(stream);
>  }
>  
> +#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH)
> +
> +static struct snd_compr_task_runtime *
> +		snd_compr_find_task(struct snd_compr_stream *stream, __u64 seqno)

Please fix indentation.

> +void snd_compr_task_finished(struct snd_compr_stream *stream,
> +			    struct snd_compr_task_runtime *task)
> +{
> +	guard(mutex)(&stream->device->lock);
> +	if (!snd_BUG_ON(stream->runtime->active_tasks == 0))
> +		stream->runtime->active_tasks--;
> +	task->state = SND_COMPRESS_TASK_STATE_FINISHED;
> +	wake_up(&stream->runtime->sleep);
> +}
> +EXPORT_SYMBOL(snd_compr_task_finished);

Let's use EXPORT_SYMBOL_GPL() for a new function.


thanks,

Takashi
Jaroslav Kysela June 24, 2024, 1:59 p.m. UTC | #3
On 24. 06. 24 15:19, Takashi Iwai wrote:
> On Fri, 21 Jun 2024 18:49:15 +0200,
> Jaroslav Kysela wrote:
>>   /**
>>    * struct snd_compr_runtime: runtime stream description
>>    * @state: stream state
>> @@ -54,6 +70,11 @@ struct snd_compr_runtime {
>>   	dma_addr_t dma_addr;
>>   	size_t dma_bytes;
>>   	struct snd_dma_buffer *dma_buffer_p;
>> +
>> +	u32 active_tasks;
>> +	u32 total_tasks;
>> +	u64 task_seqno;
>> +	struct list_head tasks;
> 
> Those new fields deserve for some more comments (at least mentioning
> that those are for passthrough operations).

I've tried to resolve all comments in v4.

			Thank you for your review,
						Jaroslav
Jaroslav Kysela June 24, 2024, 3:58 p.m. UTC | #4
On 24. 06. 24 9:13, Pierre-Louis Bossart wrote:
> Thanks Jaroslav, couple of questions below:
> 
>> +For the buffering parameters, the fragments means a limit of allocated tasks
>> +for given device. The fragment_size limits the input buffer size for the given
>> +device. The output buffer size is determined by the driver (may be different
>> +from the input buffer size).
> 
> if (stream->direction == SND_COMPRESS_PASSTHROUGH)
> 	max_fragments = 64;			/* safe value */
> 
> Is there anything preventing us from increasing this if there was a need
> for it? Wondering if this would be an ABI restriction or just an
> internal safeguard userspace doesn't need to know.

It can be increased later when required. I think that 64 is good value for 
start, but it's not a limit which is set to a stone.

>> +CREATE
>> +------
>> +Creates a set of input/output buffers. The input buffer size is
>> +fragment_size. Allocates unique seqno.
>> +
>> +The hardware drivers allocate internal 'struct dma_buf' for both input and
>> +output buffers (using 'dma_buf_export()' function). The anonymous
>> +file descriptors for those buffers are passed to user space.
> 
> The code adds the tasks in the order in which they are created:
> 
> 	list_add_tail(&task->list, &stream->runtime->tasks);
> 
> This should probably be documented, there's no explicit mechanism to
> chain the tasks other than the order of creation.

I agree. I will document this in next version.

>> +FREE
>> +----
>> +Free a set of input/output buffers. If a task is active, the stop
>> +operation is executed before. If seqno is zero, operation is executed for all
>> +tasks.
> 
> Can a task in the middle of the list be freed?

Yes.

> If yes, is any locking required?

There's always stream mutex (stream->device->lock) held before any callback 
(operation) is called.

>> +START
>> +-----
>> +Starts (queues) a task. There are two cases of the task start - right after
>> +the task is created. In this case, origin_seqno must be zero.
>> +The second case is for reusing of already finished task. The origin_seqno
>> +must identify the task to be reused. In both cases, a new seqno value
>> +is allocated and returned to user space.
>> +
>> +The prerequisite is that application filled input dma buffer with
>> +new source data and set input_size to pass the real data size to the driver.
>> +
>> +The order of data processing is preserved (first started job must be
>> +finished at first).
>> +
>> +STOP
>> +----
>> +Stop (dequeues) a task. If seqno is zero, operation is executed for all
>> +tasks.
> 
> What happens if a STOP is sent to a task in the middle of the list?

The driver MUST remove this job from the queue.

> It's similar to the question on free above, the creation needs to follow
> an order but the free/stop can be individual tasks so there could be
> interesting state machine transition and programming errors.

Yes, I see your point. Probably the current code should call stop in reverse 
for "STOP ALL" or "FREE ALL" ioctls or in the file descriptor release callback 
to avoid "jump to next active job" in the driver for a short moment.

> I honestly find the state machine confusing, it looks like in the SETUP
> stage tasks can be added/removed dynamically, but I am not sure if it's
> a real use case? Most pipeline management add a bunch of processing,
> then go in the 'run' mode. Adding/removing stuff on a running pipeline
> is really painful and not super useful, is it?

This I/O mechanism tries to be "universal". As opposite to the standard 
streaming APIs, those tasks may be individual (without any state handling 
among multiple tasks). In this case, the "stop" in the middle makes sense. 
Also, it may make sense for real-time operation (remove altered/old data and 
feed new).

> 
>>   /**
>>    * struct snd_compr_runtime: runtime stream description
>>    * @state: stream state
>> @@ -54,6 +70,11 @@ struct snd_compr_runtime {
>>   	dma_addr_t dma_addr;
>>   	size_t dma_bytes;
>>   	struct snd_dma_buffer *dma_buffer_p;
>> +
>> +	u32 active_tasks;
>> +	u32 total_tasks;
>> +	u64 task_seqno;
>> +	struct list_head tasks;
>>   };
> 
> should there be some sort of identifier that says the stream in used in
> passthrough mode and only then are the 4 added members relevant?
> 
> 
> 'struct snd_compr_runtime' doesn't have a notion of direction, so
> there's no real way to know what set_params() requested.

'struct snd_compr_stream' is passed to all callbacks, so direction is known.

>> diff --git a/include/uapi/sound/compress_offload.h b/include/uapi/sound/compress_offload.h
>> index d185957f3fe0..5fed1979522b 100644
>> --- a/include/uapi/sound/compress_offload.h
>> +++ b/include/uapi/sound/compress_offload.h
>> @@ -1,4 +1,4 @@
>> -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>> +	/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> 
> spurious change?

Yes. I will fix that.

>>   /*
>>    *  compress_offload.h - compress offload header definations
>>    *
>> @@ -14,7 +14,7 @@
> 
>> +/**
>> + * struct snd_compr_task - task primitive for non-realtime operation
>> + * @seqno: sequence number (task identifier)
>> + * @origin_seqno: previous sequence number (task identifier) - for reuse
>> + * @input_fd: data input file descriptor (dma-buf)
>> + * @output_fd: data output file descriptor (dma-buf)
>> + * @input_size: filled data in bytes (from caller, must not exceed fragment size)
>> + */
>> +struct snd_compr_task {
>> +	__u64 seqno;
>> +	__u64 origin_seqno;
>> +	int input_fd;
>> +	int output_fd;
>> +	__u64 input_size;
> 
> Any reason why output_size is not listed here....

The real output size is known after the operation is finished not before. The 
maximal output size is determined by the driver based on the maximal input 
buffer size.

>> +	__u8 reserved[16];
>> +} __attribute__((packed, aligned(4)));
>> +
>> +enum snd_compr_state {
>> +	SND_COMPRESS_TASK_STATE_IDLE = 0,
>> +	SND_COMPRESS_TASK_STATE_ACTIVE,
>> +	SND_COMPRESS_TASK_STATE_FINISHED
>> +};
>> +
>> +/**
>> + * struct snd_compr_task_status - task status
>> + * @seqno: sequence number (task identifier)
>> + * @output_size: filled data in bytes (from driver)
>> + * @state: actual task state (SND_COMPRESS_TASK_STATE_*)
>> + */
>> +struct snd_compr_task_status {
>> +	__u64 seqno;
>> +	__u64 output_size;
> 
> ... but it's listed here.
> 
> It'd be worth explaining why the input and output are in different
> structures. I can understand that for the configuration only the host
> can provide data in the input, but in a status it'd be good to have a
> snapshot of the two variables, no?

Is this really useful? The application already knows what was passed to kernel.

> no locking for these two lines? If there's already a lock handled by the
> ALSA/ASoC/compressed frameworks, it'd be worth explaining which one is
> assumed to be held.

Called inside the stream mutex.

					Jaroslav
Shengjiu Wang June 25, 2024, 1:58 a.m. UTC | #5
On Sat, Jun 22, 2024 at 12:49 AM Jaroslav Kysela <perex@perex.cz> wrote:
>
> There is a requirement to expose the audio hardware that accelerates various
> tasks for user space such as sample rate converters, compressed
> stream decoders, etc.
>
> This is description for the API extension for the compress ALSA API which
> is able to handle "tasks" that are not bound to real-time operations
> and allows for the serialization of operations.
>
> For details, refer to "compress-passthrough.rst" document.
>
> Note: This code is RFC (not tested, just to clearify the API requirements).
> My goal is to add a test (loopback) driver and add a support to tinycompress
> library in the next step.
>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Shengjiu Wang <shengjiu.wang@gmail.com>
> Cc: Nicolas Dufresne <nicolas@ndufresne.ca>
> Cc: Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com>
> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
> Cc: Vinod Koul <vkoul@kernel.org>
> Signed-off-by: Jaroslav Kysela <perex@perex.cz>
>
> ---
> v2..v3:
>   - fix missing runtime->tasks initialization (thanks Shengjiu Wang)
>   - fix missing seqno initialization in task_new (thanks Shengjiu Wang)
>   - fix reference counting for allocated dma buffers (thanks Shengjiu Wang)
>   - use origin_seqno to reuse the already allocated buffers for new task
>
> v1..v2:
>   - fix some documentation typos (thanks Amadeusz Sławiński)
>   - fix memdup_user() error handling (thanks Takashi)
>   - use one state variable instead multiple (thanks Takashi)
>   - handle task limit (set to 64 - mentioned in documentation, NIY)
>   - fix file release (free all tasks)
> ---
>  .../sound/designs/compress-passthrough.rst    | 125 +++++++
>  include/sound/compress_driver.h               |  32 ++
>  include/uapi/sound/compress_offload.h         |  51 ++-
>  sound/core/Kconfig                            |   4 +
>  sound/core/compress_offload.c                 | 338 +++++++++++++++++-
>  5 files changed, 541 insertions(+), 9 deletions(-)
>  create mode 100644 Documentation/sound/designs/compress-passthrough.rst
>
> diff --git a/Documentation/sound/designs/compress-passthrough.rst b/Documentation/sound/designs/compress-passthrough.rst
> new file mode 100644
> index 000000000000..975462500c33
> --- /dev/null
> +++ b/Documentation/sound/designs/compress-passthrough.rst
> @@ -0,0 +1,125 @@
> +=================================
> +ALSA Co-processor Passthrough API
> +=================================
> +
> +Jaroslav Kysela <perex@perex.cz>
> +
> +
> +Overview
> +========
> +
> +There is a requirement to expose the audio hardware that accelerates various
> +tasks for user space such as sample rate converters, compressed
> +stream decoders, etc.
> +
> +This is description for the API extension for the compress ALSA API which
> +is able to handle "tasks" that are not bound to real-time operations
> +and allows for the serialization of operations.
> +
> +Requirements
> +============
> +
> +The main requirements are:
> +
> +- serialization of multiple tasks for user space to allow multiple
> +  operations without user space intervention
> +
> +- separate buffers (input + output) for each operation
> +
> +- expose buffers using mmap to user space
> +
> +- signal user space when the task is finished (standard poll mechanism)
> +
> +Design
> +======
> +
> +A new direction SND_COMPRESS_PASSTHROUGH is introduced to identify
> +the passthrough API.
> +
> +The API extension shares device enumeration and parameters handling from
> +the main compressed API. All other realtime streaming ioctls are deactivated
> +and a new set of task related ioctls are introduced. The standard
> +read/write/mmap I/O operations are not supported in the passthrough device.
> +
> +Device ("stream") state handling is reduced to OPEN/SETUP. All other
> +states are not available for the passthrough mode.
> +
> +Data I/O mechanism is using standard dma-buf interface with all advantages
> +like mmap, standard I/O, buffer sharing etc. One buffer is used for the
> +input data and second (separate) buffer is used for the output data. Each task
> +have separate I/O buffers.
> +
> +For the buffering parameters, the fragments means a limit of allocated tasks
> +for given device. The fragment_size limits the input buffer size for the given
> +device. The output buffer size is determined by the driver (may be different
> +from the input buffer size).
> +
> +State Machine
> +=============
> +
> +The passthrough audio stream state machine is described below :
> +
> +                                       +----------+
> +                                       |          |
> +                                       |   OPEN   |
> +                                       |          |
> +                                       +----------+
> +                                             |
> +                                             |
> +                                             | compr_set_params()
> +                                             |
> +                                             v
> +         all passthrough task ops      +----------+
> +  +------------------------------------|          |
> +  |                                    |   SETUP  |
> +  |                                    |
> +  |                                    +----------+
> +  |                                          |
> +  +------------------------------------------+
> +
> +
> +Passthrough operations (ioctls)
> +===============================
> +
> +CREATE
> +------
> +Creates a set of input/output buffers. The input buffer size is
> +fragment_size. Allocates unique seqno.
> +
> +The hardware drivers allocate internal 'struct dma_buf' for both input and
> +output buffers (using 'dma_buf_export()' function). The anonymous
> +file descriptors for those buffers are passed to user space.
> +
> +FREE
> +----
> +Free a set of input/output buffers. If a task is active, the stop
> +operation is executed before. If seqno is zero, operation is executed for all
> +tasks.
> +
> +START
> +-----
> +Starts (queues) a task. There are two cases of the task start - right after
> +the task is created. In this case, origin_seqno must be zero.
> +The second case is for reusing of already finished task. The origin_seqno
> +must identify the task to be reused. In both cases, a new seqno value
> +is allocated and returned to user space.
> +
> +The prerequisite is that application filled input dma buffer with
> +new source data and set input_size to pass the real data size to the driver.
> +
> +The order of data processing is preserved (first started job must be
> +finished at first).
> +
> +STOP
> +----
> +Stop (dequeues) a task. If seqno is zero, operation is executed for all
> +tasks.
> +
> +STATUS
> +------
> +Obtain the task status (active, finished). Also, the driver will set
> +the real output data size (valid area in the output buffer).
> +
> +Credits
> +=======
> +- ...
> diff --git a/include/sound/compress_driver.h b/include/sound/compress_driver.h
> index bcf872c17dd3..2884b1e7955d 100644
> --- a/include/sound/compress_driver.h
> +++ b/include/sound/compress_driver.h
> @@ -19,6 +19,22 @@
>
>  struct snd_compr_ops;
>
> +/**
> + * struct snd_compr_task_runtime: task runtime description
> + *
> + */
> +struct snd_compr_task_runtime {
> +       struct list_head list;
> +       struct dma_buf *input;
> +       struct dma_buf *output;
> +       u64 seqno;
> +       u64 input_size;
> +       u64 output_size;
> +       u8 state;
> +       void *private_value;
> +};
> +
> +
>  /**
>   * struct snd_compr_runtime: runtime stream description
>   * @state: stream state
> @@ -54,6 +70,11 @@ struct snd_compr_runtime {
>         dma_addr_t dma_addr;
>         size_t dma_bytes;
>         struct snd_dma_buffer *dma_buffer_p;
> +
> +       u32 active_tasks;
> +       u32 total_tasks;
> +       u64 task_seqno;
> +       struct list_head tasks;
>  };
>
>  /**
> @@ -132,6 +153,12 @@ struct snd_compr_ops {
>                         struct snd_compr_caps *caps);
>         int (*get_codec_caps) (struct snd_compr_stream *stream,
>                         struct snd_compr_codec_caps *codec);
> +#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH)
> +       int (*task_create) (struct snd_compr_stream *stream, struct snd_compr_task_runtime *task);
> +       int (*task_start) (struct snd_compr_stream *stream, struct snd_compr_task_runtime *task);
> +       int (*task_stop) (struct snd_compr_stream *stream, struct snd_compr_task_runtime *task);
> +       int (*task_free) (struct snd_compr_stream *stream, struct snd_compr_task_runtime *task);
> +#endif
>  };
>
>  /**
> @@ -242,4 +269,9 @@ int snd_compr_free_pages(struct snd_compr_stream *stream);
>  int snd_compr_stop_error(struct snd_compr_stream *stream,
>                          snd_pcm_state_t state);
>
> +#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH)
> +void snd_compr_task_finished(struct snd_compr_stream *stream,
> +                            struct snd_compr_task_runtime *task);
> +#endif
> +
>  #endif
> diff --git a/include/uapi/sound/compress_offload.h b/include/uapi/sound/compress_offload.h
> index d185957f3fe0..5fed1979522b 100644
> --- a/include/uapi/sound/compress_offload.h
> +++ b/include/uapi/sound/compress_offload.h
> @@ -1,4 +1,4 @@
> -/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +       /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>  /*
>   *  compress_offload.h - compress offload header definations
>   *
> @@ -14,7 +14,7 @@
>  #include <sound/compress_params.h>
>
>
> -#define SNDRV_COMPRESS_VERSION SNDRV_PROTOCOL_VERSION(0, 2, 0)
> +#define SNDRV_COMPRESS_VERSION SNDRV_PROTOCOL_VERSION(0, 3, 0)
>  /**
>   * struct snd_compressed_buffer - compressed buffer
>   * @fragment_size: size of buffer fragment in bytes
> @@ -68,7 +68,8 @@ struct snd_compr_avail {
>
>  enum snd_compr_direction {
>         SND_COMPRESS_PLAYBACK = 0,
> -       SND_COMPRESS_CAPTURE
> +       SND_COMPRESS_CAPTURE,
> +       SND_COMPRESS_PASSTHROUGH
>  };
>
>  /**
> @@ -127,6 +128,42 @@ struct snd_compr_metadata {
>          __u32 value[8];
>  } __attribute__((packed, aligned(4)));
>
> +/**
> + * struct snd_compr_task - task primitive for non-realtime operation
> + * @seqno: sequence number (task identifier)
> + * @origin_seqno: previous sequence number (task identifier) - for reuse
> + * @input_fd: data input file descriptor (dma-buf)
> + * @output_fd: data output file descriptor (dma-buf)
> + * @input_size: filled data in bytes (from caller, must not exceed fragment size)
> + */
> +struct snd_compr_task {
> +       __u64 seqno;
> +       __u64 origin_seqno;
> +       int input_fd;
> +       int output_fd;
> +       __u64 input_size;
> +       __u8 reserved[16];
> +} __attribute__((packed, aligned(4)));
> +
> +enum snd_compr_state {
> +       SND_COMPRESS_TASK_STATE_IDLE = 0,
> +       SND_COMPRESS_TASK_STATE_ACTIVE,
> +       SND_COMPRESS_TASK_STATE_FINISHED
> +};
> +
> +/**
> + * struct snd_compr_task_status - task status
> + * @seqno: sequence number (task identifier)
> + * @output_size: filled data in bytes (from driver)
> + * @state: actual task state (SND_COMPRESS_TASK_STATE_*)
> + */
> +struct snd_compr_task_status {
> +       __u64 seqno;
> +       __u64 output_size;
> +       __u8 state;
> +       __u8 reserved[15];
> +} __attribute__((packed, aligned(4)));
> +
>  /*
>   * compress path ioctl definitions
>   * SNDRV_COMPRESS_GET_CAPS: Query capability of DSP
> @@ -164,6 +201,14 @@ struct snd_compr_metadata {
>  #define SNDRV_COMPRESS_DRAIN           _IO('C', 0x34)
>  #define SNDRV_COMPRESS_NEXT_TRACK      _IO('C', 0x35)
>  #define SNDRV_COMPRESS_PARTIAL_DRAIN   _IO('C', 0x36)
> +
> +
> +#define SNDRV_COMPRESS_TASK_CREATE     _IOWR('C', 0x60, struct snd_compr_task)
> +#define SNDRV_COMPRESS_TASK_FREE       _IOW('C', 0x61, __u64)
> +#define SNDRV_COMPRESS_TASK_START      _IOWR('C', 0x62, struct snd_compr_task)
> +#define SNDRV_COMPRESS_TASK_STOP       _IOW('C', 0x63, __u64)
> +#define SNDRV_COMPRESS_TASK_STATUS     _IOWR('C', 0x68, struct snd_compr_task_status)
> +
>  /*
>   * TODO
>   * 1. add mmap support
> diff --git a/sound/core/Kconfig b/sound/core/Kconfig
> index 8077f481d84f..3541fe6d477f 100644
> --- a/sound/core/Kconfig
> +++ b/sound/core/Kconfig
> @@ -59,6 +59,10 @@ config SND_CORE_TEST
>  config SND_COMPRESS_OFFLOAD
>         tristate
>
> +config SND_COMPRESS_PASSTHROUGH
> +       select DMA_BUF
> +       tristate
> +
>  config SND_JACK
>         bool
>
> diff --git a/sound/core/compress_offload.c b/sound/core/compress_offload.c
> index f0008fa2d839..190d754ec994 100644
> --- a/sound/core/compress_offload.c
> +++ b/sound/core/compress_offload.c
> @@ -24,6 +24,7 @@
>  #include <linux/types.h>
>  #include <linux/uio.h>
>  #include <linux/uaccess.h>
> +#include <linux/dma-buf.h>
>  #include <linux/module.h>
>  #include <linux/compat.h>
>  #include <sound/core.h>
> @@ -54,6 +55,12 @@ struct snd_compr_file {
>
>  static void error_delayed_work(struct work_struct *work);
>
> +#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH)
> +static void snd_compr_task_free_all(struct snd_compr_stream *stream);
> +#else
> +static inline void snd_compr_task_free_all(struct snd_compr_stream *stream) { }
> +#endif
> +
>  /*
>   * a note on stream states used:
>   * we use following states in the compressed core
> @@ -85,6 +92,8 @@ static int snd_compr_open(struct inode *inode, struct file *f)
>                 dirn = SND_COMPRESS_PLAYBACK;
>         else if ((f->f_flags & O_ACCMODE) == O_RDONLY)
>                 dirn = SND_COMPRESS_CAPTURE;
> +       else if ((f->f_flags & O_ACCMODE) == O_RDWR)
> +               dirn = SND_COMPRESS_PASSTHROUGH;
>         else
>                 return -EINVAL;
>
> @@ -112,6 +121,7 @@ static int snd_compr_open(struct inode *inode, struct file *f)
>         }
>
>         INIT_DELAYED_WORK(&data->stream.error_work, error_delayed_work);
> +       INIT_LIST_HEAD(&runtime->tasks);

This needs to be after the 'runtime' allocated.

Best regards
Shengjiu Wang
>
>         data->stream.ops = compr->ops;
>         data->stream.direction = dirn;
> @@ -154,6 +164,8 @@ static int snd_compr_free(struct inode *inode, struct file *f)
>                 break;
>         }
>
> +       snd_compr_task_free_all(&data->stream);
> +
>         data->stream.ops->free(&data->stream);
>         if (!data->stream.runtime->dma_buffer_p)
>                 kfree(data->stream.runtime->buffer);
> @@ -226,6 +238,9 @@ snd_compr_ioctl_avail(struct snd_compr_stream *stream, unsigned long arg)
>         struct snd_compr_avail ioctl_avail;
>         size_t avail;
>
> +       if (stream->direction == SND_COMPRESS_PASSTHROUGH)
> +               return -EBADFD;
> +
>         avail = snd_compr_calc_avail(stream, &ioctl_avail);
>         ioctl_avail.avail = avail;
>
> @@ -287,6 +302,8 @@ static ssize_t snd_compr_write(struct file *f, const char __user *buf,
>                 return -EFAULT;
>
>         stream = &data->stream;
> +       if (stream->direction == SND_COMPRESS_PASSTHROUGH)
> +               return -EBADFD;
>         guard(mutex)(&stream->device->lock);
>         /* write is allowed when stream is running or has been steup */
>         switch (stream->runtime->state) {
> @@ -336,6 +353,8 @@ static ssize_t snd_compr_read(struct file *f, char __user *buf,
>                 return -EFAULT;
>
>         stream = &data->stream;
> +       if (stream->direction == SND_COMPRESS_PASSTHROUGH)
> +               return -EBADFD;
>         guard(mutex)(&stream->device->lock);
>
>         /* read is allowed when stream is running, paused, draining and setup
> @@ -385,6 +404,8 @@ static __poll_t snd_compr_poll(struct file *f, poll_table *wait)
>  {
>         struct snd_compr_file *data = f->private_data;
>         struct snd_compr_stream *stream;
> +       struct snd_compr_runtime *runtime;
> +       struct snd_compr_task_runtime *task;
>         size_t avail;
>         __poll_t retval = 0;
>
> @@ -392,6 +413,7 @@ static __poll_t snd_compr_poll(struct file *f, poll_table *wait)
>                 return EPOLLERR;
>
>         stream = &data->stream;
> +       runtime = stream->runtime;
>
>         guard(mutex)(&stream->device->lock);
>
> @@ -405,6 +427,18 @@ static __poll_t snd_compr_poll(struct file *f, poll_table *wait)
>
>         poll_wait(f, &stream->runtime->sleep, wait);
>
> +       if (stream->direction == SND_COMPRESS_PASSTHROUGH) {
> +               if (runtime->fragments > runtime->active_tasks)
> +                       retval |= EPOLLOUT | EPOLLWRNORM;
> +               task = list_first_entry_or_null(&runtime->tasks,
> +                                               struct snd_compr_task_runtime,
> +                                               list);
> +               if (task && task->state == SND_COMPRESS_TASK_STATE_FINISHED)
> +                       retval |= EPOLLIN | EPOLLRDNORM;
> +               return retval;
> +       }
> +
> +
>         avail = snd_compr_get_avail(stream);
>         pr_debug("avail is %ld\n", (unsigned long)avail);
>         /* check if we have at least one fragment to fill */
> @@ -521,6 +555,9 @@ static int snd_compr_allocate_buffer(struct snd_compr_stream *stream,
>         unsigned int buffer_size;
>         void *buffer = NULL;
>
> +       if (stream->direction == SND_COMPRESS_PASSTHROUGH)
> +               goto params;
> +
>         buffer_size = params->buffer.fragment_size * params->buffer.fragments;
>         if (stream->ops->copy) {
>                 buffer = NULL;
> @@ -543,18 +580,30 @@ static int snd_compr_allocate_buffer(struct snd_compr_stream *stream,
>                 if (!buffer)
>                         return -ENOMEM;
>         }
> -       stream->runtime->fragment_size = params->buffer.fragment_size;
> -       stream->runtime->fragments = params->buffer.fragments;
> +
>         stream->runtime->buffer = buffer;
>         stream->runtime->buffer_size = buffer_size;
> +params:
> +       stream->runtime->fragment_size = params->buffer.fragment_size;
> +       stream->runtime->fragments = params->buffer.fragments;
>         return 0;
>  }
>
> -static int snd_compress_check_input(struct snd_compr_params *params)
> +static int
> +snd_compress_check_input(struct snd_compr_stream *stream, struct snd_compr_params *params)
>  {
> +       u32 max_fragments;
> +
>         /* first let's check the buffer parameter's */
> -       if (params->buffer.fragment_size == 0 ||
> -           params->buffer.fragments > U32_MAX / params->buffer.fragment_size ||
> +       if (params->buffer.fragment_size == 0)
> +               return -EINVAL;
> +
> +       if (stream->direction == SND_COMPRESS_PASSTHROUGH)
> +               max_fragments = 64;                     /* safe value */
> +       else
> +               max_fragments = U32_MAX / params->buffer.fragment_size;
> +
> +       if (params->buffer.fragments > max_fragments ||
>             params->buffer.fragments == 0)
>                 return -EINVAL;
>
> @@ -583,7 +632,7 @@ snd_compr_set_params(struct snd_compr_stream *stream, unsigned long arg)
>                 if (IS_ERR(params))
>                         return PTR_ERR(no_free_ptr(params));
>
> -               retval = snd_compress_check_input(params);
> +               retval = snd_compress_check_input(stream, params);
>                 if (retval)
>                         return retval;
>
> @@ -939,6 +988,262 @@ static int snd_compr_partial_drain(struct snd_compr_stream *stream)
>         return snd_compress_wait_for_drain(stream);
>  }
>
> +#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH)
> +
> +static struct snd_compr_task_runtime *
> +               snd_compr_find_task(struct snd_compr_stream *stream, __u64 seqno)
> +{
> +       struct snd_compr_task_runtime *task;
> +
> +       list_for_each_entry(task, &stream->runtime->tasks, list) {
> +               if (task->seqno == seqno)
> +                       return task;
> +       }
> +       return NULL;
> +}
> +
> +static void snd_compr_task_free(struct snd_compr_task_runtime *task)
> +{
> +       if (task->output)
> +               dma_buf_put(task->output);
> +       if (task->input)
> +               dma_buf_put(task->input);
> +       kfree(task);
> +}
> +
> +static u64 snd_compr_seqno_next(struct snd_compr_stream *stream)
> +{
> +       u64 seqno = ++stream->runtime->task_seqno;
> +       if (seqno == 0)
> +               seqno = ++stream->runtime->task_seqno;
> +       return seqno;
> +}
> +
> +static int snd_compr_task_new(struct snd_compr_stream *stream, struct snd_compr_task *utask)
> +{
> +       struct snd_compr_task_runtime *task;
> +       int retval;
> +
> +       if (stream->runtime->total_tasks >= stream->runtime->fragments)
> +               return -EBUSY;
> +       if (utask->origin_seqno != 0 || utask->input_size != 0)
> +               return -EINVAL;
> +       task = kzalloc(sizeof(*task), GFP_KERNEL);
> +       if (task == NULL)
> +               return -ENOMEM;
> +       task->seqno = utask->seqno = snd_compr_seqno_next(stream);
> +       task->input_size = utask->input_size;
> +       retval = stream->ops->task_create(stream, task);
> +       if (retval < 0)
> +               goto cleanup;
> +       utask->input_fd = dma_buf_fd(task->input, O_WRONLY|O_CLOEXEC);
> +       if (utask->input_fd < 0) {
> +               retval = utask->input_fd;
> +               goto cleanup;
> +       }
> +       utask->output_fd = dma_buf_fd(task->output, O_RDONLY|O_CLOEXEC);
> +       if (utask->output_fd < 0) {
> +               retval = utask->output_fd;
> +               goto cleanup;
> +       }
> +       /* keep dmabuf reference until freed with task free ioctl */
> +       dma_buf_get(utask->input_fd);
> +       dma_buf_get(utask->output_fd);
> +       list_add_tail(&task->list, &stream->runtime->tasks);
> +       stream->runtime->total_tasks++;
> +       return 0;
> +cleanup:
> +       snd_compr_task_free(task);
> +       return retval;
> +}
> +
> +static int snd_compr_task_create(struct snd_compr_stream *stream, unsigned long arg)
> +{
> +       struct snd_compr_task *task __free(kfree) = NULL;
> +       int retval;
> +
> +       if (stream->runtime->state != SNDRV_PCM_STATE_SETUP)
> +               return -EPERM;
> +       task = memdup_user((void __user *)arg, sizeof(*task));
> +       if (IS_ERR(task))
> +               return PTR_ERR(no_free_ptr(task));
> +       retval = snd_compr_task_new(stream, task);
> +       if (retval >= 0)
> +               if (copy_to_user((void __user *)arg, task, sizeof(*task)))
> +                       retval = -EFAULT;
> +       return retval;
> +}
> +
> +static int snd_compr_task_start_prepare(struct snd_compr_task_runtime *task,
> +                                       struct snd_compr_task *utask)
> +{
> +       if (task == NULL)
> +               return -EINVAL;
> +       if (task->state >= SND_COMPRESS_TASK_STATE_FINISHED)
> +               return -EBUSY;
> +       if (utask->input_size > task->input->size)
> +               return -EINVAL;
> +       task->input_size = utask->input_size;
> +       task->state = SND_COMPRESS_TASK_STATE_IDLE;
> +       return 0;
> +}
> +
> +static int snd_compr_task_start(struct snd_compr_stream *stream, struct snd_compr_task *utask)
> +{
> +       struct snd_compr_task_runtime *task;
> +       int retval;
> +
> +       if (utask->origin_seqno > 0) {
> +               task = snd_compr_find_task(stream, utask->origin_seqno);
> +               retval = snd_compr_task_start_prepare(task, utask);
> +               if (retval < 0)
> +                       return retval;
> +               task->seqno = utask->seqno = snd_compr_seqno_next(stream);
> +               utask->origin_seqno = 0;
> +               list_move_tail(&task->list, &stream->runtime->tasks);
> +       } else {
> +               task = snd_compr_find_task(stream, utask->seqno);
> +               if (task && task->state != SND_COMPRESS_TASK_STATE_IDLE)
> +                       return -EBUSY;
> +               retval = snd_compr_task_start_prepare(task, utask);
> +               if (retval < 0)
> +                       return retval;
> +       }
> +       retval = stream->ops->task_start(stream, task);
> +       if (retval >= 0) {
> +               task->state = SND_COMPRESS_TASK_STATE_ACTIVE;
> +               stream->runtime->active_tasks++;
> +       }
> +       return retval;
> +}
> +
> +static int snd_compr_task_start_ioctl(struct snd_compr_stream *stream, unsigned long arg)
> +{
> +       struct snd_compr_task *task __free(kfree) = NULL;
> +       int retval;
> +
> +       if (stream->runtime->state != SNDRV_PCM_STATE_SETUP)
> +               return -EPERM;
> +       task = memdup_user((void __user *)arg, sizeof(*task));
> +       if (IS_ERR(task))
> +               return PTR_ERR(no_free_ptr(task));
> +       retval = snd_compr_task_start(stream, task);
> +       if (retval >= 0)
> +               if (copy_to_user((void __user *)arg, task, sizeof(*task)))
> +                       retval = -EFAULT;
> +       return retval;
> +}
> +
> +static void snd_compr_task_stop_one(struct snd_compr_stream *stream,
> +                                       struct snd_compr_task_runtime *task)
> +{
> +       if (task->state != SND_COMPRESS_TASK_STATE_ACTIVE)
> +               return;
> +       stream->ops->task_stop(stream, task);
> +       if (!snd_BUG_ON(stream->runtime->active_tasks == 0))
> +               stream->runtime->active_tasks--;
> +       list_move_tail(&task->list, &stream->runtime->tasks);
> +       task->state = SND_COMPRESS_TASK_STATE_IDLE;
> +}
> +
> +static void snd_compr_task_free_one(struct snd_compr_stream *stream,
> +                                       struct snd_compr_task_runtime *task)
> +{
> +       snd_compr_task_stop_one(stream, task);
> +       stream->ops->task_free(stream, task);
> +       list_del(&task->list);
> +       snd_compr_task_free(task);
> +       stream->runtime->total_tasks--;
> +}
> +
> +static void snd_compr_task_free_all(struct snd_compr_stream *stream)
> +{
> +       struct snd_compr_task_runtime *task;
> +
> +       list_for_each_entry(task, &stream->runtime->tasks, list)
> +               snd_compr_task_free_one(stream, task);
> +}
> +
> +typedef void (*snd_compr_seq_func_t)(struct snd_compr_stream *stream,
> +                                       struct snd_compr_task_runtime *task);
> +
> +static int snd_compr_task_seq(struct snd_compr_stream *stream, unsigned long arg,
> +                                       snd_compr_seq_func_t fcn)
> +{
> +       struct snd_compr_task_runtime *task;
> +       __u64 seqno;
> +       int retval;
> +
> +       if (stream->runtime->state != SNDRV_PCM_STATE_SETUP)
> +               return -EPERM;
> +       retval = get_user(seqno, (__u64 __user *)arg);
> +       if (retval < 0)
> +               return retval;
> +       retval = 0;
> +       if (seqno == 0) {
> +               list_for_each_entry(task, &stream->runtime->tasks, list)
> +                       fcn(stream, task);
> +       } else {
> +               task = snd_compr_find_task(stream, seqno);
> +               if (task == NULL) {
> +                       retval = -EINVAL;
> +               } else {
> +                       fcn(stream, task);
> +               }
> +       }
> +       return retval;
> +}
> +
> +static int snd_compr_task_status(struct snd_compr_stream *stream,
> +                                       struct snd_compr_task_status *status)
> +{
> +       struct snd_compr_task_runtime *task;
> +
> +       task = snd_compr_find_task(stream, status->seqno);
> +       if (task == NULL)
> +               return -EINVAL;
> +       status->output_size = task->output_size;
> +       status->state = task->state;
> +       return 0;
> +}
> +
> +static int snd_compr_task_status_ioctl(struct snd_compr_stream *stream, unsigned long arg)
> +{
> +       struct snd_compr_task_status *status __free(kfree) = NULL;
> +       int retval;
> +
> +       if (stream->runtime->state != SNDRV_PCM_STATE_SETUP)
> +               return -EPERM;
> +       status = memdup_user((void __user *)arg, sizeof(*status));
> +       if (IS_ERR(status))
> +               return PTR_ERR(no_free_ptr(status));
> +       retval = snd_compr_task_status(stream, status);
> +       if (retval >= 0)
> +               if (copy_to_user((void __user *)arg, status, sizeof(*status)))
> +                       retval = -EFAULT;
> +       return retval;
> +}
> +
> +/**
> + * snd_compr_task_finished: Notify that the task was finished
> + * @stream: pointer to stream
> + * @task: runtime task structure
> + *
> + * Set the finished task state and notify waiters.
> + */
> +void snd_compr_task_finished(struct snd_compr_stream *stream,
> +                           struct snd_compr_task_runtime *task)
> +{
> +       guard(mutex)(&stream->device->lock);
> +       if (!snd_BUG_ON(stream->runtime->active_tasks == 0))
> +               stream->runtime->active_tasks--;
> +       task->state = SND_COMPRESS_TASK_STATE_FINISHED;
> +       wake_up(&stream->runtime->sleep);
> +}
> +EXPORT_SYMBOL(snd_compr_task_finished);
> +
> +#endif /* CONFIG_COMPRESS_PASSTHROUGH */
> +
>  static long snd_compr_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
>  {
>         struct snd_compr_file *data = f->private_data;
> @@ -968,6 +1273,27 @@ static long snd_compr_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
>                 return snd_compr_set_metadata(stream, arg);
>         case _IOC_NR(SNDRV_COMPRESS_GET_METADATA):
>                 return snd_compr_get_metadata(stream, arg);
> +       }
> +
> +       if (stream->direction == SND_COMPRESS_PASSTHROUGH) {
> +#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH)
> +               switch (_IOC_NR(cmd)) {
> +               case _IOC_NR(SNDRV_COMPRESS_TASK_CREATE):
> +                       return snd_compr_task_create(stream, arg);
> +               case _IOC_NR(SNDRV_COMPRESS_TASK_FREE):
> +                       return snd_compr_task_seq(stream, arg, snd_compr_task_free_one);
> +               case _IOC_NR(SNDRV_COMPRESS_TASK_START):
> +                       return snd_compr_task_start_ioctl(stream, arg);
> +               case _IOC_NR(SNDRV_COMPRESS_TASK_STOP):
> +                       return snd_compr_task_seq(stream, arg, snd_compr_task_stop_one);
> +               case _IOC_NR(SNDRV_COMPRESS_TASK_STATUS):
> +                       return snd_compr_task_status_ioctl(stream, arg);
> +               }
> +#endif
> +               return -ENOTTY;
> +       }
> +
> +       switch (_IOC_NR(cmd)) {
>         case _IOC_NR(SNDRV_COMPRESS_TSTAMP):
>                 return snd_compr_tstamp(stream, arg);
>         case _IOC_NR(SNDRV_COMPRESS_AVAIL):
> --
> 2.45.2
>
Pierre-Louis Bossart June 25, 2024, 6:06 a.m. UTC | #6
>> I honestly find the state machine confusing, it looks like in the SETUP
>> stage tasks can be added/removed dynamically, but I am not sure if it's
>> a real use case? Most pipeline management add a bunch of processing,
>> then go in the 'run' mode. Adding/removing stuff on a running pipeline
>> is really painful and not super useful, is it?
> 
> This I/O mechanism tries to be "universal". As opposite to the standard
> streaming APIs, those tasks may be individual (without any state
> handling among multiple tasks). In this case, the "stop" in the middle
> makes sense. Also, it may make sense for real-time operation (remove
> altered/old data and feed new).

I must be missing something on the data flow then. I was assuming that
the data generated in the output buffer of one task was used as the
input buffer of the next task. If that were true, stopping a task in the
middle will essentially starve the tasks downstream, no?

If the tasks are handled as completely independent entities, what usages
would this design allow for?

Also I don't fully get the initial/final stages of processing. It seems
that the host needs to feed data to the first task in the chain, then
start it. That's fine for playback, but how would this be used if we
wanted to e.g. enable an ASRC on captured data coming from an audio
interface?

It's similar for the final stages on the playback, the memory model is
fine, but at some point the audio data will have to be fed to a regular
audio interface, and that point seems to have been overlooked, or I
missed it entirely.

In the existing "compress" framework, that connection to audio
interfaces is typically left as an exercise for the DSP engineers, and
typically requires the presence of sample-rate conversion and mixer. But
for a memory-to-memory model what is the direction to tie the input or
output buffers to the rest of the audio subsystem?

I forget btw that some processing consumes audio data but does not
generate anything in the output buffer, for example when analyzing
captured data to signal specific patterns or triggers (vad, hot wording,
presence detection, etc).
Jaroslav Kysela June 25, 2024, 11:48 a.m. UTC | #7
On 25. 06. 24 8:06, Pierre-Louis Bossart wrote:

>>> I honestly find the state machine confusing, it looks like in the SETUP
>>> stage tasks can be added/removed dynamically, but I am not sure if it's
>>> a real use case? Most pipeline management add a bunch of processing,
>>> then go in the 'run' mode. Adding/removing stuff on a running pipeline
>>> is really painful and not super useful, is it?
>>
>> This I/O mechanism tries to be "universal". As opposite to the standard
>> streaming APIs, those tasks may be individual (without any state
>> handling among multiple tasks). In this case, the "stop" in the middle
>> makes sense. Also, it may make sense for real-time operation (remove
>> altered/old data and feed new).
> 
> I must be missing something on the data flow then. I was assuming that
> the data generated in the output buffer of one task was used as the
> input buffer of the next task. If that were true, stopping a task in the
> middle will essentially starve the tasks downstream, no?
> 
> If the tasks are handled as completely independent entities, what usages
> would this design allow for?

The usage is for the user space. It allows to accelerate the audio data 
processing in hardware, but input is from user space and output is exported to 
user space in this simple API. The purpose of this API is just "chaining" to 
reduce the user space context switches (latency).

> Also I don't fully get the initial/final stages of processing. It seems
> that the host needs to feed data to the first task in the chain, then
> start it. That's fine for playback, but how would this be used if we
> wanted to e.g. enable an ASRC on captured data coming from an audio
> interface?

There are no stream endpoints in kernel (no playback, no capture). It's just 
about we have some audio data, do something with them and return them back.

For an universal media stream router, another API should be designed. I 
believe that using dma-buf buffers for I/O is nice and ready to be reused in 
another API.

						Jaroslav
Pierre-Louis Bossart June 25, 2024, 12:36 p.m. UTC | #8
On 6/25/24 13:48, Jaroslav Kysela wrote:
> On 25. 06. 24 8:06, Pierre-Louis Bossart wrote:
> 
>>>> I honestly find the state machine confusing, it looks like in the SETUP
>>>> stage tasks can be added/removed dynamically, but I am not sure if it's
>>>> a real use case? Most pipeline management add a bunch of processing,
>>>> then go in the 'run' mode. Adding/removing stuff on a running pipeline
>>>> is really painful and not super useful, is it?
>>>
>>> This I/O mechanism tries to be "universal". As opposite to the standard
>>> streaming APIs, those tasks may be individual (without any state
>>> handling among multiple tasks). In this case, the "stop" in the middle
>>> makes sense. Also, it may make sense for real-time operation (remove
>>> altered/old data and feed new).
>>
>> I must be missing something on the data flow then. I was assuming that
>> the data generated in the output buffer of one task was used as the
>> input buffer of the next task. If that were true, stopping a task in the
>> middle will essentially starve the tasks downstream, no?
>>
>> If the tasks are handled as completely independent entities, what usages
>> would this design allow for?
> 
> The usage is for the user space. It allows to accelerate the audio data
> processing in hardware, but input is from user space and output is
> exported to user space in this simple API. The purpose of this API is
> just "chaining" to reduce the user space context switches (latency).

I am still very confused between the notion of  "chaining" and
adding/removing tasks dynamically at run-time. The former is fine, the
latter is very hard to enable in a glitch-free manner, usually all
filters have an internal history buffer. Inserting, stopping or removing
a filter is likely to add audible discontinuities.

>> Also I don't fully get the initial/final stages of processing. It seems
>> that the host needs to feed data to the first task in the chain, then
>> start it. That's fine for playback, but how would this be used if we
>> wanted to e.g. enable an ASRC on captured data coming from an audio
>> interface?
> 
> There are no stream endpoints in kernel (no playback, no capture). It's
> just about we have some audio data, do something with them and return
> them back.
> 
> For an universal media stream router, another API should be designed. I
> believe that using dma-buf buffers for I/O is nice and ready to be
> reused in another API.

Humm, how would this work with the initial ask to enable the ASRC from
FSL/NXP? If we leave the ends of the processing chain completely
undefined, who's going to use this processing chain? Shouldn't there be
at least one example of how existing userspace (alsa-lib, pipewire,
wireplumber, etc) might use the API? It's been a while now, but when we
introduced the compress API there was a companion 'tinycompress' utility
- largely inspired by 'tinyplay' - to showcase how the API was meant to
be used.

To be clear: I am not against this API at all, the direction to have
userspace orchestrate a buffer-based processing chain with minimal
latency is a good one, I am just concerned that we are leaving too many
points open in terms of integration with other audio components.
Jaroslav Kysela June 25, 2024, 1 p.m. UTC | #9
On 25. 06. 24 14:36, Pierre-Louis Bossart wrote:
> 
> 
> On 6/25/24 13:48, Jaroslav Kysela wrote:
>> On 25. 06. 24 8:06, Pierre-Louis Bossart wrote:
>>
>>>>> I honestly find the state machine confusing, it looks like in the SETUP
>>>>> stage tasks can be added/removed dynamically, but I am not sure if it's
>>>>> a real use case? Most pipeline management add a bunch of processing,
>>>>> then go in the 'run' mode. Adding/removing stuff on a running pipeline
>>>>> is really painful and not super useful, is it?
>>>>
>>>> This I/O mechanism tries to be "universal". As opposite to the standard
>>>> streaming APIs, those tasks may be individual (without any state
>>>> handling among multiple tasks). In this case, the "stop" in the middle
>>>> makes sense. Also, it may make sense for real-time operation (remove
>>>> altered/old data and feed new).
>>>
>>> I must be missing something on the data flow then. I was assuming that
>>> the data generated in the output buffer of one task was used as the
>>> input buffer of the next task. If that were true, stopping a task in the
>>> middle will essentially starve the tasks downstream, no?
>>>
>>> If the tasks are handled as completely independent entities, what usages
>>> would this design allow for?
>>
>> The usage is for the user space. It allows to accelerate the audio data
>> processing in hardware, but input is from user space and output is
>> exported to user space in this simple API. The purpose of this API is
>> just "chaining" to reduce the user space context switches (latency).
> 
> I am still very confused between the notion of  "chaining" and
> adding/removing tasks dynamically at run-time. The former is fine, the
> latter is very hard to enable in a glitch-free manner, usually all
> filters have an internal history buffer. Inserting, stopping or removing
> a filter is likely to add audible discontinuities.

The internal state requirement for multiple tasks is mostly given by the used 
stream structure, so user space will handle this correctly (restart stream on 
demand). You can imagine situation, where too many data are queued and user 
space will receive a signal to do something different, so it makes sense to 
support dequeuing of tasks. The stream state should be reset when the task is 
stopped (removed from the queue) even if there are other active tasks after 
this stopped one.

I may also propose kernel API extension to inform user space that all active 
tasks must be canceled in one shot (ioctl).

>>> Also I don't fully get the initial/final stages of processing. It seems
>>> that the host needs to feed data to the first task in the chain, then
>>> start it. That's fine for playback, but how would this be used if we
>>> wanted to e.g. enable an ASRC on captured data coming from an audio
>>> interface?
>>
>> There are no stream endpoints in kernel (no playback, no capture). It's
>> just about we have some audio data, do something with them and return
>> them back.
>>
>> For an universal media stream router, another API should be designed. I
>> believe that using dma-buf buffers for I/O is nice and ready to be
>> reused in another API.
> 
> Humm, how would this work with the initial ask to enable the ASRC from
> FSL/NXP? If we leave the ends of the processing chain completely
> undefined, who's going to use this processing chain? Shouldn't there be
> at least one example of how existing userspace (alsa-lib, pipewire,
> wireplumber, etc) might use the API? It's been a while now, but when we
> introduced the compress API there was a companion 'tinycompress' utility
> - largely inspired by 'tinyplay' - to showcase how the API was meant to
> be used.

I replied this in another answer. The expected users are media frameworks like 
gstreamer or ffmpeg (use this directly as a plugin in the processing chain). 
Maybe audio servers can use this hardware acceleration, too.

I would like to define the basic kernel API (ioctls) in the first stage and 
then continue with a test kernel module, user space library (maybe include 
support in tinycompress) and user space test utility.

					Jaroslav
Pierre-Louis Bossart June 25, 2024, 1:48 p.m. UTC | #10
> The internal state requirement for multiple tasks is mostly given by the
> used stream structure, so user space will handle this correctly (restart
> stream on demand). You can imagine situation, where too many data are
> queued and user space will receive a signal to do something different,
> so it makes sense to support dequeuing of tasks. The stream state should
> be reset when the task is stopped (removed from the queue) even if there
> are other active tasks after this stopped one.
We are in agreement that the 'drop' (stop now) and 'drain' (keep going
until all data was consumed) capabilities are very much needed. I don't
think controlling the states of intermediate tasks is possible or even
desired though.

> I may also propose kernel API extension to inform user space that all
> active tasks must be canceled in one shot (ioctl).

Did you mean "All active tasks in the same context" - defined by the
open step?

>>>> Also I don't fully get the initial/final stages of processing. It seems
>>>> that the host needs to feed data to the first task in the chain, then
>>>> start it. That's fine for playback, but how would this be used if we
>>>> wanted to e.g. enable an ASRC on captured data coming from an audio
>>>> interface?
>>>
>>> There are no stream endpoints in kernel (no playback, no capture). It's
>>> just about we have some audio data, do something with them and return
>>> them back.
>>>
>>> For an universal media stream router, another API should be designed. I
>>> believe that using dma-buf buffers for I/O is nice and ready to be
>>> reused in another API.
>>
>> Humm, how would this work with the initial ask to enable the ASRC from
>> FSL/NXP? If we leave the ends of the processing chain completely
>> undefined, who's going to use this processing chain? Shouldn't there be
>> at least one example of how existing userspace (alsa-lib, pipewire,
>> wireplumber, etc) might use the API? It's been a while now, but when we
>> introduced the compress API there was a companion 'tinycompress' utility
>> - largely inspired by 'tinyplay' - to showcase how the API was meant to
>> be used.
> 
> I replied this in another answer. The expected users are media
> frameworks like gstreamer or ffmpeg (use this directly as a plugin in
> the processing chain). Maybe audio servers can use this hardware
> acceleration, too.
> 
> I would like to define the basic kernel API (ioctls) in the first stage
> and then continue with a test kernel module, user space library (maybe
> include support in tinycompress) and user space test utility.

Incremental development sounds fine, but at some point we'll need some
sort of development hardware to check how well things work, and what's
missing. In the case of the compress API some 12+ years ago we
completely missed the gapless playback requirement which led to the ugly
partial drain solution. We also underestimated the inertia and effort
needed to change userspace, so much so that the main users of the
compress API are in the Android world. I am not aware of any users of
the compress API in the traditional Gnome/KDE environments.
diff mbox series

Patch

diff --git a/Documentation/sound/designs/compress-passthrough.rst b/Documentation/sound/designs/compress-passthrough.rst
new file mode 100644
index 000000000000..975462500c33
--- /dev/null
+++ b/Documentation/sound/designs/compress-passthrough.rst
@@ -0,0 +1,125 @@ 
+=================================
+ALSA Co-processor Passthrough API
+=================================
+
+Jaroslav Kysela <perex@perex.cz>
+
+
+Overview
+========
+
+There is a requirement to expose the audio hardware that accelerates various
+tasks for user space such as sample rate converters, compressed
+stream decoders, etc.
+
+This is description for the API extension for the compress ALSA API which
+is able to handle "tasks" that are not bound to real-time operations
+and allows for the serialization of operations.
+
+Requirements
+============
+
+The main requirements are:
+
+- serialization of multiple tasks for user space to allow multiple
+  operations without user space intervention
+
+- separate buffers (input + output) for each operation
+
+- expose buffers using mmap to user space
+
+- signal user space when the task is finished (standard poll mechanism)
+
+Design
+======
+
+A new direction SND_COMPRESS_PASSTHROUGH is introduced to identify
+the passthrough API.
+
+The API extension shares device enumeration and parameters handling from
+the main compressed API. All other realtime streaming ioctls are deactivated
+and a new set of task related ioctls are introduced. The standard
+read/write/mmap I/O operations are not supported in the passthrough device.
+
+Device ("stream") state handling is reduced to OPEN/SETUP. All other
+states are not available for the passthrough mode.
+
+Data I/O mechanism is using standard dma-buf interface with all advantages
+like mmap, standard I/O, buffer sharing etc. One buffer is used for the
+input data and second (separate) buffer is used for the output data. Each task
+have separate I/O buffers.
+
+For the buffering parameters, the fragments means a limit of allocated tasks
+for given device. The fragment_size limits the input buffer size for the given
+device. The output buffer size is determined by the driver (may be different
+from the input buffer size).
+
+State Machine
+=============
+
+The passthrough audio stream state machine is described below :
+
+                                       +----------+
+                                       |          |
+                                       |   OPEN   |
+                                       |          |
+                                       +----------+
+                                             |
+                                             |
+                                             | compr_set_params()
+                                             |
+                                             v
+         all passthrough task ops      +----------+
+  +------------------------------------|          |
+  |                                    |   SETUP  |
+  |                                    |
+  |                                    +----------+
+  |                                          |
+  +------------------------------------------+
+
+
+Passthrough operations (ioctls)
+===============================
+
+CREATE
+------
+Creates a set of input/output buffers. The input buffer size is
+fragment_size. Allocates unique seqno.
+
+The hardware drivers allocate internal 'struct dma_buf' for both input and
+output buffers (using 'dma_buf_export()' function). The anonymous
+file descriptors for those buffers are passed to user space.
+
+FREE
+----
+Free a set of input/output buffers. If a task is active, the stop
+operation is executed before. If seqno is zero, operation is executed for all
+tasks.
+
+START
+-----
+Starts (queues) a task. There are two cases of the task start - right after
+the task is created. In this case, origin_seqno must be zero.
+The second case is for reusing of already finished task. The origin_seqno
+must identify the task to be reused. In both cases, a new seqno value
+is allocated and returned to user space.
+
+The prerequisite is that application filled input dma buffer with
+new source data and set input_size to pass the real data size to the driver.
+
+The order of data processing is preserved (first started job must be
+finished at first).
+
+STOP
+----
+Stop (dequeues) a task. If seqno is zero, operation is executed for all
+tasks.
+
+STATUS
+------
+Obtain the task status (active, finished). Also, the driver will set
+the real output data size (valid area in the output buffer).
+
+Credits
+=======
+- ...
diff --git a/include/sound/compress_driver.h b/include/sound/compress_driver.h
index bcf872c17dd3..2884b1e7955d 100644
--- a/include/sound/compress_driver.h
+++ b/include/sound/compress_driver.h
@@ -19,6 +19,22 @@ 
 
 struct snd_compr_ops;
 
+/**
+ * struct snd_compr_task_runtime: task runtime description
+ *
+ */
+struct snd_compr_task_runtime {
+	struct list_head list;
+	struct dma_buf *input;
+	struct dma_buf *output;
+	u64 seqno;
+	u64 input_size;
+	u64 output_size;
+	u8 state;
+	void *private_value;
+};
+
+
 /**
  * struct snd_compr_runtime: runtime stream description
  * @state: stream state
@@ -54,6 +70,11 @@  struct snd_compr_runtime {
 	dma_addr_t dma_addr;
 	size_t dma_bytes;
 	struct snd_dma_buffer *dma_buffer_p;
+
+	u32 active_tasks;
+	u32 total_tasks;
+	u64 task_seqno;
+	struct list_head tasks;
 };
 
 /**
@@ -132,6 +153,12 @@  struct snd_compr_ops {
 			struct snd_compr_caps *caps);
 	int (*get_codec_caps) (struct snd_compr_stream *stream,
 			struct snd_compr_codec_caps *codec);
+#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH)
+	int (*task_create) (struct snd_compr_stream *stream, struct snd_compr_task_runtime *task);
+	int (*task_start) (struct snd_compr_stream *stream, struct snd_compr_task_runtime *task);
+	int (*task_stop) (struct snd_compr_stream *stream, struct snd_compr_task_runtime *task);
+	int (*task_free) (struct snd_compr_stream *stream, struct snd_compr_task_runtime *task);
+#endif
 };
 
 /**
@@ -242,4 +269,9 @@  int snd_compr_free_pages(struct snd_compr_stream *stream);
 int snd_compr_stop_error(struct snd_compr_stream *stream,
 			 snd_pcm_state_t state);
 
+#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH)
+void snd_compr_task_finished(struct snd_compr_stream *stream,
+			     struct snd_compr_task_runtime *task);
+#endif
+
 #endif
diff --git a/include/uapi/sound/compress_offload.h b/include/uapi/sound/compress_offload.h
index d185957f3fe0..5fed1979522b 100644
--- a/include/uapi/sound/compress_offload.h
+++ b/include/uapi/sound/compress_offload.h
@@ -1,4 +1,4 @@ 
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+	/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
  *  compress_offload.h - compress offload header definations
  *
@@ -14,7 +14,7 @@ 
 #include <sound/compress_params.h>
 
 
-#define SNDRV_COMPRESS_VERSION SNDRV_PROTOCOL_VERSION(0, 2, 0)
+#define SNDRV_COMPRESS_VERSION SNDRV_PROTOCOL_VERSION(0, 3, 0)
 /**
  * struct snd_compressed_buffer - compressed buffer
  * @fragment_size: size of buffer fragment in bytes
@@ -68,7 +68,8 @@  struct snd_compr_avail {
 
 enum snd_compr_direction {
 	SND_COMPRESS_PLAYBACK = 0,
-	SND_COMPRESS_CAPTURE
+	SND_COMPRESS_CAPTURE,
+	SND_COMPRESS_PASSTHROUGH
 };
 
 /**
@@ -127,6 +128,42 @@  struct snd_compr_metadata {
 	 __u32 value[8];
 } __attribute__((packed, aligned(4)));
 
+/**
+ * struct snd_compr_task - task primitive for non-realtime operation
+ * @seqno: sequence number (task identifier)
+ * @origin_seqno: previous sequence number (task identifier) - for reuse
+ * @input_fd: data input file descriptor (dma-buf)
+ * @output_fd: data output file descriptor (dma-buf)
+ * @input_size: filled data in bytes (from caller, must not exceed fragment size)
+ */
+struct snd_compr_task {
+	__u64 seqno;
+	__u64 origin_seqno;
+	int input_fd;
+	int output_fd;
+	__u64 input_size;
+	__u8 reserved[16];
+} __attribute__((packed, aligned(4)));
+
+enum snd_compr_state {
+	SND_COMPRESS_TASK_STATE_IDLE = 0,
+	SND_COMPRESS_TASK_STATE_ACTIVE,
+	SND_COMPRESS_TASK_STATE_FINISHED
+};
+
+/**
+ * struct snd_compr_task_status - task status
+ * @seqno: sequence number (task identifier)
+ * @output_size: filled data in bytes (from driver)
+ * @state: actual task state (SND_COMPRESS_TASK_STATE_*)
+ */
+struct snd_compr_task_status {
+	__u64 seqno;
+	__u64 output_size;
+	__u8 state;
+	__u8 reserved[15];
+} __attribute__((packed, aligned(4)));
+
 /*
  * compress path ioctl definitions
  * SNDRV_COMPRESS_GET_CAPS: Query capability of DSP
@@ -164,6 +201,14 @@  struct snd_compr_metadata {
 #define SNDRV_COMPRESS_DRAIN		_IO('C', 0x34)
 #define SNDRV_COMPRESS_NEXT_TRACK	_IO('C', 0x35)
 #define SNDRV_COMPRESS_PARTIAL_DRAIN	_IO('C', 0x36)
+
+
+#define SNDRV_COMPRESS_TASK_CREATE	_IOWR('C', 0x60, struct snd_compr_task)
+#define SNDRV_COMPRESS_TASK_FREE	_IOW('C', 0x61, __u64)
+#define SNDRV_COMPRESS_TASK_START	_IOWR('C', 0x62, struct snd_compr_task)
+#define SNDRV_COMPRESS_TASK_STOP	_IOW('C', 0x63, __u64)
+#define SNDRV_COMPRESS_TASK_STATUS	_IOWR('C', 0x68, struct snd_compr_task_status)
+
 /*
  * TODO
  * 1. add mmap support
diff --git a/sound/core/Kconfig b/sound/core/Kconfig
index 8077f481d84f..3541fe6d477f 100644
--- a/sound/core/Kconfig
+++ b/sound/core/Kconfig
@@ -59,6 +59,10 @@  config SND_CORE_TEST
 config SND_COMPRESS_OFFLOAD
 	tristate
 
+config SND_COMPRESS_PASSTHROUGH
+	select DMA_BUF
+	tristate
+
 config SND_JACK
 	bool
 
diff --git a/sound/core/compress_offload.c b/sound/core/compress_offload.c
index f0008fa2d839..190d754ec994 100644
--- a/sound/core/compress_offload.c
+++ b/sound/core/compress_offload.c
@@ -24,6 +24,7 @@ 
 #include <linux/types.h>
 #include <linux/uio.h>
 #include <linux/uaccess.h>
+#include <linux/dma-buf.h>
 #include <linux/module.h>
 #include <linux/compat.h>
 #include <sound/core.h>
@@ -54,6 +55,12 @@  struct snd_compr_file {
 
 static void error_delayed_work(struct work_struct *work);
 
+#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH)
+static void snd_compr_task_free_all(struct snd_compr_stream *stream);
+#else
+static inline void snd_compr_task_free_all(struct snd_compr_stream *stream) { }
+#endif
+
 /*
  * a note on stream states used:
  * we use following states in the compressed core
@@ -85,6 +92,8 @@  static int snd_compr_open(struct inode *inode, struct file *f)
 		dirn = SND_COMPRESS_PLAYBACK;
 	else if ((f->f_flags & O_ACCMODE) == O_RDONLY)
 		dirn = SND_COMPRESS_CAPTURE;
+	else if ((f->f_flags & O_ACCMODE) == O_RDWR)
+		dirn = SND_COMPRESS_PASSTHROUGH;
 	else
 		return -EINVAL;
 
@@ -112,6 +121,7 @@  static int snd_compr_open(struct inode *inode, struct file *f)
 	}
 
 	INIT_DELAYED_WORK(&data->stream.error_work, error_delayed_work);
+	INIT_LIST_HEAD(&runtime->tasks);
 
 	data->stream.ops = compr->ops;
 	data->stream.direction = dirn;
@@ -154,6 +164,8 @@  static int snd_compr_free(struct inode *inode, struct file *f)
 		break;
 	}
 
+	snd_compr_task_free_all(&data->stream);
+
 	data->stream.ops->free(&data->stream);
 	if (!data->stream.runtime->dma_buffer_p)
 		kfree(data->stream.runtime->buffer);
@@ -226,6 +238,9 @@  snd_compr_ioctl_avail(struct snd_compr_stream *stream, unsigned long arg)
 	struct snd_compr_avail ioctl_avail;
 	size_t avail;
 
+	if (stream->direction == SND_COMPRESS_PASSTHROUGH)
+		return -EBADFD;
+
 	avail = snd_compr_calc_avail(stream, &ioctl_avail);
 	ioctl_avail.avail = avail;
 
@@ -287,6 +302,8 @@  static ssize_t snd_compr_write(struct file *f, const char __user *buf,
 		return -EFAULT;
 
 	stream = &data->stream;
+	if (stream->direction == SND_COMPRESS_PASSTHROUGH)
+		return -EBADFD;
 	guard(mutex)(&stream->device->lock);
 	/* write is allowed when stream is running or has been steup */
 	switch (stream->runtime->state) {
@@ -336,6 +353,8 @@  static ssize_t snd_compr_read(struct file *f, char __user *buf,
 		return -EFAULT;
 
 	stream = &data->stream;
+	if (stream->direction == SND_COMPRESS_PASSTHROUGH)
+		return -EBADFD;
 	guard(mutex)(&stream->device->lock);
 
 	/* read is allowed when stream is running, paused, draining and setup
@@ -385,6 +404,8 @@  static __poll_t snd_compr_poll(struct file *f, poll_table *wait)
 {
 	struct snd_compr_file *data = f->private_data;
 	struct snd_compr_stream *stream;
+	struct snd_compr_runtime *runtime;
+	struct snd_compr_task_runtime *task;
 	size_t avail;
 	__poll_t retval = 0;
 
@@ -392,6 +413,7 @@  static __poll_t snd_compr_poll(struct file *f, poll_table *wait)
 		return EPOLLERR;
 
 	stream = &data->stream;
+	runtime = stream->runtime;
 
 	guard(mutex)(&stream->device->lock);
 
@@ -405,6 +427,18 @@  static __poll_t snd_compr_poll(struct file *f, poll_table *wait)
 
 	poll_wait(f, &stream->runtime->sleep, wait);
 
+	if (stream->direction == SND_COMPRESS_PASSTHROUGH) {
+		if (runtime->fragments > runtime->active_tasks)
+			retval |= EPOLLOUT | EPOLLWRNORM;
+		task = list_first_entry_or_null(&runtime->tasks,
+						struct snd_compr_task_runtime,
+						list);
+		if (task && task->state == SND_COMPRESS_TASK_STATE_FINISHED)
+			retval |= EPOLLIN | EPOLLRDNORM;
+		return retval;
+	}
+
+
 	avail = snd_compr_get_avail(stream);
 	pr_debug("avail is %ld\n", (unsigned long)avail);
 	/* check if we have at least one fragment to fill */
@@ -521,6 +555,9 @@  static int snd_compr_allocate_buffer(struct snd_compr_stream *stream,
 	unsigned int buffer_size;
 	void *buffer = NULL;
 
+	if (stream->direction == SND_COMPRESS_PASSTHROUGH)
+		goto params;
+
 	buffer_size = params->buffer.fragment_size * params->buffer.fragments;
 	if (stream->ops->copy) {
 		buffer = NULL;
@@ -543,18 +580,30 @@  static int snd_compr_allocate_buffer(struct snd_compr_stream *stream,
 		if (!buffer)
 			return -ENOMEM;
 	}
-	stream->runtime->fragment_size = params->buffer.fragment_size;
-	stream->runtime->fragments = params->buffer.fragments;
+
 	stream->runtime->buffer = buffer;
 	stream->runtime->buffer_size = buffer_size;
+params:
+	stream->runtime->fragment_size = params->buffer.fragment_size;
+	stream->runtime->fragments = params->buffer.fragments;
 	return 0;
 }
 
-static int snd_compress_check_input(struct snd_compr_params *params)
+static int
+snd_compress_check_input(struct snd_compr_stream *stream, struct snd_compr_params *params)
 {
+	u32 max_fragments;
+
 	/* first let's check the buffer parameter's */
-	if (params->buffer.fragment_size == 0 ||
-	    params->buffer.fragments > U32_MAX / params->buffer.fragment_size ||
+	if (params->buffer.fragment_size == 0)
+		return -EINVAL;
+
+	if (stream->direction == SND_COMPRESS_PASSTHROUGH)
+		max_fragments = 64;			/* safe value */
+	else
+		max_fragments = U32_MAX / params->buffer.fragment_size;
+
+	if (params->buffer.fragments > max_fragments ||
 	    params->buffer.fragments == 0)
 		return -EINVAL;
 
@@ -583,7 +632,7 @@  snd_compr_set_params(struct snd_compr_stream *stream, unsigned long arg)
 		if (IS_ERR(params))
 			return PTR_ERR(no_free_ptr(params));
 
-		retval = snd_compress_check_input(params);
+		retval = snd_compress_check_input(stream, params);
 		if (retval)
 			return retval;
 
@@ -939,6 +988,262 @@  static int snd_compr_partial_drain(struct snd_compr_stream *stream)
 	return snd_compress_wait_for_drain(stream);
 }
 
+#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH)
+
+static struct snd_compr_task_runtime *
+		snd_compr_find_task(struct snd_compr_stream *stream, __u64 seqno)
+{
+	struct snd_compr_task_runtime *task;
+
+	list_for_each_entry(task, &stream->runtime->tasks, list) {
+		if (task->seqno == seqno)
+			return task;
+	}
+	return NULL;
+}
+
+static void snd_compr_task_free(struct snd_compr_task_runtime *task)
+{
+	if (task->output)
+		dma_buf_put(task->output);
+	if (task->input)
+		dma_buf_put(task->input);
+	kfree(task);
+}
+
+static u64 snd_compr_seqno_next(struct snd_compr_stream *stream)
+{
+	u64 seqno = ++stream->runtime->task_seqno;
+	if (seqno == 0)
+		seqno = ++stream->runtime->task_seqno;
+	return seqno;
+}
+
+static int snd_compr_task_new(struct snd_compr_stream *stream, struct snd_compr_task *utask)
+{
+	struct snd_compr_task_runtime *task;
+	int retval;
+
+	if (stream->runtime->total_tasks >= stream->runtime->fragments)
+		return -EBUSY;
+	if (utask->origin_seqno != 0 || utask->input_size != 0)
+		return -EINVAL;
+	task = kzalloc(sizeof(*task), GFP_KERNEL);
+	if (task == NULL)
+		return -ENOMEM;
+	task->seqno = utask->seqno = snd_compr_seqno_next(stream);
+	task->input_size = utask->input_size;
+	retval = stream->ops->task_create(stream, task);
+	if (retval < 0)
+		goto cleanup;
+	utask->input_fd = dma_buf_fd(task->input, O_WRONLY|O_CLOEXEC);
+	if (utask->input_fd < 0) {
+		retval = utask->input_fd;
+		goto cleanup;
+	}
+	utask->output_fd = dma_buf_fd(task->output, O_RDONLY|O_CLOEXEC);
+	if (utask->output_fd < 0) {
+		retval = utask->output_fd;
+		goto cleanup;
+	}
+	/* keep dmabuf reference until freed with task free ioctl */
+	dma_buf_get(utask->input_fd);
+	dma_buf_get(utask->output_fd);
+	list_add_tail(&task->list, &stream->runtime->tasks);
+	stream->runtime->total_tasks++;
+	return 0;
+cleanup:
+	snd_compr_task_free(task);
+	return retval;
+}
+
+static int snd_compr_task_create(struct snd_compr_stream *stream, unsigned long arg)
+{
+	struct snd_compr_task *task __free(kfree) = NULL;
+	int retval;
+
+	if (stream->runtime->state != SNDRV_PCM_STATE_SETUP)
+		return -EPERM;
+	task = memdup_user((void __user *)arg, sizeof(*task));
+	if (IS_ERR(task))
+		return PTR_ERR(no_free_ptr(task));
+	retval = snd_compr_task_new(stream, task);
+	if (retval >= 0)
+		if (copy_to_user((void __user *)arg, task, sizeof(*task)))
+			retval = -EFAULT;
+	return retval;
+}
+
+static int snd_compr_task_start_prepare(struct snd_compr_task_runtime *task,
+					struct snd_compr_task *utask)
+{
+	if (task == NULL)
+		return -EINVAL;
+	if (task->state >= SND_COMPRESS_TASK_STATE_FINISHED)
+		return -EBUSY;
+	if (utask->input_size > task->input->size)
+		return -EINVAL;
+	task->input_size = utask->input_size;
+	task->state = SND_COMPRESS_TASK_STATE_IDLE;
+	return 0;
+}
+
+static int snd_compr_task_start(struct snd_compr_stream *stream, struct snd_compr_task *utask)
+{
+	struct snd_compr_task_runtime *task;
+	int retval;
+
+	if (utask->origin_seqno > 0) {
+		task = snd_compr_find_task(stream, utask->origin_seqno);
+		retval = snd_compr_task_start_prepare(task, utask);
+		if (retval < 0)
+			return retval;
+		task->seqno = utask->seqno = snd_compr_seqno_next(stream);
+		utask->origin_seqno = 0;
+		list_move_tail(&task->list, &stream->runtime->tasks);
+	} else {
+		task = snd_compr_find_task(stream, utask->seqno);
+		if (task && task->state != SND_COMPRESS_TASK_STATE_IDLE)
+			return -EBUSY;
+		retval = snd_compr_task_start_prepare(task, utask);
+		if (retval < 0)
+			return retval;
+	}
+	retval = stream->ops->task_start(stream, task);
+	if (retval >= 0) {
+		task->state = SND_COMPRESS_TASK_STATE_ACTIVE;
+		stream->runtime->active_tasks++;
+	}
+	return retval;
+}
+
+static int snd_compr_task_start_ioctl(struct snd_compr_stream *stream, unsigned long arg)
+{
+	struct snd_compr_task *task __free(kfree) = NULL;
+	int retval;
+
+	if (stream->runtime->state != SNDRV_PCM_STATE_SETUP)
+		return -EPERM;
+	task = memdup_user((void __user *)arg, sizeof(*task));
+	if (IS_ERR(task))
+		return PTR_ERR(no_free_ptr(task));
+	retval = snd_compr_task_start(stream, task);
+	if (retval >= 0)
+		if (copy_to_user((void __user *)arg, task, sizeof(*task)))
+			retval = -EFAULT;
+	return retval;
+}
+
+static void snd_compr_task_stop_one(struct snd_compr_stream *stream,
+					struct snd_compr_task_runtime *task)
+{
+	if (task->state != SND_COMPRESS_TASK_STATE_ACTIVE)
+		return;
+	stream->ops->task_stop(stream, task);
+	if (!snd_BUG_ON(stream->runtime->active_tasks == 0))
+		stream->runtime->active_tasks--;
+	list_move_tail(&task->list, &stream->runtime->tasks);
+	task->state = SND_COMPRESS_TASK_STATE_IDLE;
+}
+
+static void snd_compr_task_free_one(struct snd_compr_stream *stream,
+					struct snd_compr_task_runtime *task)
+{
+	snd_compr_task_stop_one(stream, task);
+	stream->ops->task_free(stream, task);
+	list_del(&task->list);
+	snd_compr_task_free(task);
+	stream->runtime->total_tasks--;
+}
+
+static void snd_compr_task_free_all(struct snd_compr_stream *stream)
+{
+	struct snd_compr_task_runtime *task;
+
+	list_for_each_entry(task, &stream->runtime->tasks, list)
+		snd_compr_task_free_one(stream, task);
+}
+
+typedef void (*snd_compr_seq_func_t)(struct snd_compr_stream *stream,
+					struct snd_compr_task_runtime *task);
+
+static int snd_compr_task_seq(struct snd_compr_stream *stream, unsigned long arg,
+					snd_compr_seq_func_t fcn)
+{
+	struct snd_compr_task_runtime *task;
+	__u64 seqno;
+	int retval;
+
+	if (stream->runtime->state != SNDRV_PCM_STATE_SETUP)
+		return -EPERM;
+	retval = get_user(seqno, (__u64 __user *)arg);
+	if (retval < 0)
+		return retval;
+	retval = 0;
+	if (seqno == 0) {
+		list_for_each_entry(task, &stream->runtime->tasks, list)
+			fcn(stream, task);
+	} else {
+		task = snd_compr_find_task(stream, seqno);
+		if (task == NULL) {
+			retval = -EINVAL;
+		} else {
+			fcn(stream, task);
+		}
+	}
+	return retval;
+}
+
+static int snd_compr_task_status(struct snd_compr_stream *stream,
+					struct snd_compr_task_status *status)
+{
+	struct snd_compr_task_runtime *task;
+
+	task = snd_compr_find_task(stream, status->seqno);
+	if (task == NULL)
+		return -EINVAL;
+	status->output_size = task->output_size;
+	status->state = task->state;
+	return 0;
+}
+
+static int snd_compr_task_status_ioctl(struct snd_compr_stream *stream, unsigned long arg)
+{
+	struct snd_compr_task_status *status __free(kfree) = NULL;
+	int retval;
+
+	if (stream->runtime->state != SNDRV_PCM_STATE_SETUP)
+		return -EPERM;
+	status = memdup_user((void __user *)arg, sizeof(*status));
+	if (IS_ERR(status))
+		return PTR_ERR(no_free_ptr(status));
+	retval = snd_compr_task_status(stream, status);
+	if (retval >= 0)
+		if (copy_to_user((void __user *)arg, status, sizeof(*status)))
+			retval = -EFAULT;
+	return retval;
+}
+
+/**
+ * snd_compr_task_finished: Notify that the task was finished
+ * @stream: pointer to stream
+ * @task: runtime task structure
+ *
+ * Set the finished task state and notify waiters.
+ */
+void snd_compr_task_finished(struct snd_compr_stream *stream,
+			    struct snd_compr_task_runtime *task)
+{
+	guard(mutex)(&stream->device->lock);
+	if (!snd_BUG_ON(stream->runtime->active_tasks == 0))
+		stream->runtime->active_tasks--;
+	task->state = SND_COMPRESS_TASK_STATE_FINISHED;
+	wake_up(&stream->runtime->sleep);
+}
+EXPORT_SYMBOL(snd_compr_task_finished);
+
+#endif /* CONFIG_COMPRESS_PASSTHROUGH */
+
 static long snd_compr_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 {
 	struct snd_compr_file *data = f->private_data;
@@ -968,6 +1273,27 @@  static long snd_compr_ioctl(struct file *f, unsigned int cmd, unsigned long arg)
 		return snd_compr_set_metadata(stream, arg);
 	case _IOC_NR(SNDRV_COMPRESS_GET_METADATA):
 		return snd_compr_get_metadata(stream, arg);
+	}
+
+	if (stream->direction == SND_COMPRESS_PASSTHROUGH) {
+#if IS_ENABLED(CONFIG_SND_COMPRESS_PASSTHROUGH)
+		switch (_IOC_NR(cmd)) {
+		case _IOC_NR(SNDRV_COMPRESS_TASK_CREATE):
+			return snd_compr_task_create(stream, arg);
+		case _IOC_NR(SNDRV_COMPRESS_TASK_FREE):
+			return snd_compr_task_seq(stream, arg, snd_compr_task_free_one);
+		case _IOC_NR(SNDRV_COMPRESS_TASK_START):
+			return snd_compr_task_start_ioctl(stream, arg);
+		case _IOC_NR(SNDRV_COMPRESS_TASK_STOP):
+			return snd_compr_task_seq(stream, arg, snd_compr_task_stop_one);
+		case _IOC_NR(SNDRV_COMPRESS_TASK_STATUS):
+			return snd_compr_task_status_ioctl(stream, arg);
+		}
+#endif
+		return -ENOTTY;
+	}
+
+	switch (_IOC_NR(cmd)) {
 	case _IOC_NR(SNDRV_COMPRESS_TSTAMP):
 		return snd_compr_tstamp(stream, arg);
 	case _IOC_NR(SNDRV_COMPRESS_AVAIL):