mbox series

[v15,00/16] Add audio support in v4l2 framework

Message ID 1710834674-3285-1-git-send-email-shengjiu.wang@nxp.com (mailing list archive)
Headers show
Series Add audio support in v4l2 framework | expand

Message

Shengjiu Wang March 19, 2024, 7:50 a.m. UTC
Audio signal processing also has the requirement for memory to
memory similar as Video.

This asrc memory to memory (memory ->asrc->memory) case is a non
real time use case.

User fills the input buffer to the asrc module, after conversion, then asrc
sends back the output buffer to user. So it is not a traditional ALSA playback
and capture case.

It is a specific use case,  there is no reference in current kernel.
v4l2 memory to memory is the closed implementation,  v4l2 current
support video, image, radio, tuner, touch devices, so it is not
complicated to add support for this specific audio case.

Because we had implemented the "memory -> asrc ->i2s device-> codec"
use case in ALSA.  Now the "memory->asrc->memory" needs
to reuse the code in asrc driver, so the first 3 patches is for refining
the code to make it can be shared by the "memory->asrc->memory"
driver.

The main change is in the v4l2 side, A /dev/vl4-audioX will be created,
user applications only use the ioctl of v4l2 framework.

Other change is to add memory to memory support for two kinds of i.MX ASRC
module.

changes in v15:
- update MAINTAINERS for imx-asrc.c and vim2m-audio.c

changes in v14:
- document the reservation of 'AUXX' fourcc format.
- add v4l2_audfmt_to_fourcc() definition.

changes in v13
- change 'pixelformat' to 'audioformat' in dev-audio-mem2mem.rst
- add more description for clock drift in ext-ctrls-audio-m2m.rst
- Add "media: v4l2-ctrls: add support for fraction_bits" from Hans
  to avoid build issue for kernel test robot

changes in v12
- minor changes according to comments
- drop min_buffers_needed = 1 and V4L2_CTRL_FLAG_UPDATE flag
- drop bus_info

changes in v11
- add add-fixed-point-test-controls in vivid.
- add v4l2_ctrl_fp_compose() helper function for min and max

changes in v10
- remove FIXED_POINT type
- change code base on media: v4l2-ctrls: add support for fraction_bits
- fix issue reported by kernel test robot
- remove module_alias

changes in v9:
- add MEDIA_ENT_F_PROC_AUDIO_RESAMPLER.
- add MEDIA_INTF_T_V4L_AUDIO
- add media controller support
- refine the vim2m-audio to support 8k<->16k conversion.

changes in v8:
- refine V4L2_CAP_AUDIO_M2M to be 0x00000008
- update doc for FIXED_POINT
- address comments for imx-asrc

changes in v7:
- add acked-by from Mark
- separate commit for fixed point, m2m audio class, audio rate controls
- use INTEGER_MENU for rate,  FIXED_POINT for rate offset
- remove used fmts
- address other comments for Hans

changes in v6:
- use m2m_prepare/m2m_unprepare/m2m_start/m2m_stop to replace
  m2m_start_part_one/m2m_stop_part_one, m2m_start_part_two/m2m_stop_part_two.
- change V4L2_CTRL_TYPE_ASRC_RATE to V4L2_CTRL_TYPE_FIXED_POINT
- fix warning by kernel test rebot
- remove some unused format V4L2_AUDIO_FMT_XX
- Get SNDRV_PCM_FORMAT from V4L2_AUDIO_FMT in driver.
- rename audm2m to viaudm2m.

changes in v5:
- remove V4L2_AUDIO_FMT_LPCM
- define audio pixel format like V4L2_AUDIO_FMT_S8...
- remove rate and format in struct v4l2_audio_format.
- Add V4L2_CID_ASRC_SOURCE_RATE and V4L2_CID_ASRC_DEST_RATE controls
- updata document accordingly.

changes in v4:
- update document style
- separate V4L2_AUDIO_FMT_LPCM and V4L2_CAP_AUDIO_M2M in separate commit

changes in v3:
- Modify documents for adding audio m2m support
- Add audio virtual m2m driver
- Defined V4L2_AUDIO_FMT_LPCM format type for audio.
- Defined V4L2_CAP_AUDIO_M2M capability type for audio m2m case.
- with modification in v4l-utils, pass v4l2-compliance test.

changes in v2:
- decouple the implementation in v4l2 and ALSA
- implement the memory to memory driver as a platfrom driver
  and move it to driver/media
- move fsl_asrc_common.h to include/sound folder

Hans Verkuil (1):
  media: v4l2-ctrls: add support for fraction_bits

Shengjiu Wang (15):
  ASoC: fsl_asrc: define functions for memory to memory usage
  ASoC: fsl_easrc: define functions for memory to memory usage
  ASoC: fsl_asrc: move fsl_asrc_common.h to include/sound
  ASoC: fsl_asrc: register m2m platform device
  ASoC: fsl_easrc: register m2m platform device
  media: uapi: Add V4L2_CAP_AUDIO_M2M capability flag
  media: v4l2: Add audio capture and output support
  media: uapi: Define audio sample format fourcc type
  media: uapi: Add V4L2_CTRL_CLASS_M2M_AUDIO
  media: uapi: Add audio rate controls support
  media: uapi: Declare interface types for Audio
  media: uapi: Add an entity type for audio resampler
  media: vivid: add fixed point test controls
  media: imx-asrc: Add memory to memory driver
  media: vim2m-audio: add virtual driver for audio memory to memory

 .../media/mediactl/media-types.rst            |   11 +
 .../userspace-api/media/v4l/buffer.rst        |    6 +
 .../userspace-api/media/v4l/common.rst        |    1 +
 .../media/v4l/dev-audio-mem2mem.rst           |   71 +
 .../userspace-api/media/v4l/devices.rst       |    1 +
 .../media/v4l/ext-ctrls-audio-m2m.rst         |   59 +
 .../userspace-api/media/v4l/pixfmt-audio.rst  |  100 ++
 .../userspace-api/media/v4l/pixfmt.rst        |    1 +
 .../media/v4l/vidioc-enum-fmt.rst             |    2 +
 .../media/v4l/vidioc-g-ext-ctrls.rst          |    4 +
 .../userspace-api/media/v4l/vidioc-g-fmt.rst  |    4 +
 .../media/v4l/vidioc-querycap.rst             |    3 +
 .../media/v4l/vidioc-queryctrl.rst            |   11 +-
 .../media/videodev2.h.rst.exceptions          |    3 +
 MAINTAINERS                                   |   17 +
 .../media/common/videobuf2/videobuf2-v4l2.c   |    4 +
 drivers/media/platform/nxp/Kconfig            |   13 +
 drivers/media/platform/nxp/Makefile           |    1 +
 drivers/media/platform/nxp/imx-asrc.c         | 1256 +++++++++++++++++
 drivers/media/test-drivers/Kconfig            |   10 +
 drivers/media/test-drivers/Makefile           |    1 +
 drivers/media/test-drivers/vim2m-audio.c      |  793 +++++++++++
 drivers/media/test-drivers/vivid/vivid-core.h |    2 +
 .../media/test-drivers/vivid/vivid-ctrls.c    |   26 +
 drivers/media/v4l2-core/v4l2-compat-ioctl32.c |    9 +
 drivers/media/v4l2-core/v4l2-ctrls-api.c      |    1 +
 drivers/media/v4l2-core/v4l2-ctrls-core.c     |   93 +-
 drivers/media/v4l2-core/v4l2-ctrls-defs.c     |   10 +
 drivers/media/v4l2-core/v4l2-dev.c            |   21 +
 drivers/media/v4l2-core/v4l2-ioctl.c          |   66 +
 drivers/media/v4l2-core/v4l2-mem2mem.c        |   13 +-
 include/media/v4l2-ctrls.h                    |   13 +-
 include/media/v4l2-dev.h                      |    2 +
 include/media/v4l2-ioctl.h                    |   34 +
 .../fsl => include/sound}/fsl_asrc_common.h   |   60 +
 include/uapi/linux/media.h                    |    2 +
 include/uapi/linux/v4l2-controls.h            |    9 +
 include/uapi/linux/videodev2.h                |   50 +-
 sound/soc/fsl/fsl_asrc.c                      |  144 ++
 sound/soc/fsl/fsl_asrc.h                      |    4 +-
 sound/soc/fsl/fsl_asrc_dma.c                  |    2 +-
 sound/soc/fsl/fsl_easrc.c                     |  233 +++
 sound/soc/fsl/fsl_easrc.h                     |    6 +-
 43 files changed, 3145 insertions(+), 27 deletions(-)
 create mode 100644 Documentation/userspace-api/media/v4l/dev-audio-mem2mem.rst
 create mode 100644 Documentation/userspace-api/media/v4l/ext-ctrls-audio-m2m.rst
 create mode 100644 Documentation/userspace-api/media/v4l/pixfmt-audio.rst
 create mode 100644 drivers/media/platform/nxp/imx-asrc.c
 create mode 100644 drivers/media/test-drivers/vim2m-audio.c
 rename {sound/soc/fsl => include/sound}/fsl_asrc_common.h (60%)

Comments

Sebastian Fricke April 30, 2024, 8:21 a.m. UTC | #1
Hey Shengjiu,

first of all thanks for all of this work and I am very sorry for only
emerging this late into the series, I sadly didn't notice it earlier.

I would like to voice a few concerns about the general idea of adding
Audio support to the Media subsystem.

1. The biggest objection is, that the Linux Kernel has a subsystem
specifically targeted for audio devices, adding support for these
devices in another subsystem are counterproductive as they work around
the shortcomings of the audio subsystem while forcing support for a
device into a subsystem that was never designed for such devices.
Instead, the audio subsystem has to be adjusted to be able to support
all of the required workflows, otherwise, the next audio driver with
similar requirements will have to move to the media subsystem as well,
the audio subsystem would then never experience the required change and
soon we would have two audio subsystems.

2. Closely connected to the previous objection, the media subsystem with
its current staff of maintainers is overworked and barely capable of
handling the workload, which includes an abundance of different devices
from DVB, codecs, cameras, PCI devices, radio tuners, HDMI CEC, IR
receivers, etc. Adding more device types to this matrix will make the
situation worse and should only be done with a plan for how first to
improve the current maintainer situation.

3. By using the same framework and APIs as the video codecs, the audio
codecs are going to cause extra work for the video codec developers and
maintainers simply by occupying the same space that was orginally
designed for the purpose of video only. Even if you try to not cause any
extra stress the simple presence of the audio code in the codebase is
going to cause restrictions.

The main issue here is that the audio subsystem doesn't provide a
mem2mem framework and I would say you are in luck because the media
subsystem has gathered a lot of shortcomings with its current
implementation of the mem2mem framework over time, which is why a new
implementation will be necessary anyway.

So instead of hammering a driver into the wrong destination, I would
suggest bundling our forces and implementing a general memory-to-memory
framework that both the media and the audio subsystem can use, that
addresses the current shortcomings of the implementation and allows you
to upload the driver where it is supposed to be.
This is going to cause restrictions as well, like mentioned in the
concern number 3, but with the difference that we can make a general
plan for such a framework that accomodates lots of use cases and each
subsystem can add their routines on top of the general framework.

Another possible alternative is to try and make the DRM scheduler more
generally available, this scheduler is the most mature and in fact is
very similar to what you and what the media devices need.
Which again just shows how common your usecase actually is and how a
general solution is the best long term solution.

Please notice that Daniel Almeida is currently working on something
related to this:
https://lore.kernel.org/linux-media/3F80AC0D-DCAA-4EDE-BF58-BB1369C7EDCA@collabora.com/T/#u

If the toplevel maintainers decide to add the patchset so be it, but I
wanted to voice my concerns and also highlight that this is likely going
to cause extra stress for the video codecs maintainers and the
maintainers in general. We cannot spend a lot of time on audio codecs,
as video codecs already fill up our available time sufficiently,
so the use of the framework needs to be conservative and cause as little
extra work as possible for the original use case of the framework.

Regards,
Sebastian

On 19.03.2024 15:50, Shengjiu Wang wrote:
>Audio signal processing also has the requirement for memory to
>memory similar as Video.
>
>This asrc memory to memory (memory ->asrc->memory) case is a non
>real time use case.
>
>User fills the input buffer to the asrc module, after conversion, then asrc
>sends back the output buffer to user. So it is not a traditional ALSA playback
>and capture case.
>
>It is a specific use case,  there is no reference in current kernel.
>v4l2 memory to memory is the closed implementation,  v4l2 current
>support video, image, radio, tuner, touch devices, so it is not
>complicated to add support for this specific audio case.
>
>Because we had implemented the "memory -> asrc ->i2s device-> codec"
>use case in ALSA.  Now the "memory->asrc->memory" needs
>to reuse the code in asrc driver, so the first 3 patches is for refining
>the code to make it can be shared by the "memory->asrc->memory"
>driver.
>
>The main change is in the v4l2 side, A /dev/vl4-audioX will be created,
>user applications only use the ioctl of v4l2 framework.
>
>Other change is to add memory to memory support for two kinds of i.MX ASRC
>module.
>
>changes in v15:
>- update MAINTAINERS for imx-asrc.c and vim2m-audio.c
>
>changes in v14:
>- document the reservation of 'AUXX' fourcc format.
>- add v4l2_audfmt_to_fourcc() definition.
>
>changes in v13
>- change 'pixelformat' to 'audioformat' in dev-audio-mem2mem.rst
>- add more description for clock drift in ext-ctrls-audio-m2m.rst
>- Add "media: v4l2-ctrls: add support for fraction_bits" from Hans
>  to avoid build issue for kernel test robot
>
>changes in v12
>- minor changes according to comments
>- drop min_buffers_needed = 1 and V4L2_CTRL_FLAG_UPDATE flag
>- drop bus_info
>
>changes in v11
>- add add-fixed-point-test-controls in vivid.
>- add v4l2_ctrl_fp_compose() helper function for min and max
>
>changes in v10
>- remove FIXED_POINT type
>- change code base on media: v4l2-ctrls: add support for fraction_bits
>- fix issue reported by kernel test robot
>- remove module_alias
>
>changes in v9:
>- add MEDIA_ENT_F_PROC_AUDIO_RESAMPLER.
>- add MEDIA_INTF_T_V4L_AUDIO
>- add media controller support
>- refine the vim2m-audio to support 8k<->16k conversion.
>
>changes in v8:
>- refine V4L2_CAP_AUDIO_M2M to be 0x00000008
>- update doc for FIXED_POINT
>- address comments for imx-asrc
>
>changes in v7:
>- add acked-by from Mark
>- separate commit for fixed point, m2m audio class, audio rate controls
>- use INTEGER_MENU for rate,  FIXED_POINT for rate offset
>- remove used fmts
>- address other comments for Hans
>
>changes in v6:
>- use m2m_prepare/m2m_unprepare/m2m_start/m2m_stop to replace
>  m2m_start_part_one/m2m_stop_part_one, m2m_start_part_two/m2m_stop_part_two.
>- change V4L2_CTRL_TYPE_ASRC_RATE to V4L2_CTRL_TYPE_FIXED_POINT
>- fix warning by kernel test rebot
>- remove some unused format V4L2_AUDIO_FMT_XX
>- Get SNDRV_PCM_FORMAT from V4L2_AUDIO_FMT in driver.
>- rename audm2m to viaudm2m.
>
>changes in v5:
>- remove V4L2_AUDIO_FMT_LPCM
>- define audio pixel format like V4L2_AUDIO_FMT_S8...
>- remove rate and format in struct v4l2_audio_format.
>- Add V4L2_CID_ASRC_SOURCE_RATE and V4L2_CID_ASRC_DEST_RATE controls
>- updata document accordingly.
>
>changes in v4:
>- update document style
>- separate V4L2_AUDIO_FMT_LPCM and V4L2_CAP_AUDIO_M2M in separate commit
>
>changes in v3:
>- Modify documents for adding audio m2m support
>- Add audio virtual m2m driver
>- Defined V4L2_AUDIO_FMT_LPCM format type for audio.
>- Defined V4L2_CAP_AUDIO_M2M capability type for audio m2m case.
>- with modification in v4l-utils, pass v4l2-compliance test.
>
>changes in v2:
>- decouple the implementation in v4l2 and ALSA
>- implement the memory to memory driver as a platfrom driver
>  and move it to driver/media
>- move fsl_asrc_common.h to include/sound folder
>
>Hans Verkuil (1):
>  media: v4l2-ctrls: add support for fraction_bits
>
>Shengjiu Wang (15):
>  ASoC: fsl_asrc: define functions for memory to memory usage
>  ASoC: fsl_easrc: define functions for memory to memory usage
>  ASoC: fsl_asrc: move fsl_asrc_common.h to include/sound
>  ASoC: fsl_asrc: register m2m platform device
>  ASoC: fsl_easrc: register m2m platform device
>  media: uapi: Add V4L2_CAP_AUDIO_M2M capability flag
>  media: v4l2: Add audio capture and output support
>  media: uapi: Define audio sample format fourcc type
>  media: uapi: Add V4L2_CTRL_CLASS_M2M_AUDIO
>  media: uapi: Add audio rate controls support
>  media: uapi: Declare interface types for Audio
>  media: uapi: Add an entity type for audio resampler
>  media: vivid: add fixed point test controls
>  media: imx-asrc: Add memory to memory driver
>  media: vim2m-audio: add virtual driver for audio memory to memory
>
> .../media/mediactl/media-types.rst            |   11 +
> .../userspace-api/media/v4l/buffer.rst        |    6 +
> .../userspace-api/media/v4l/common.rst        |    1 +
> .../media/v4l/dev-audio-mem2mem.rst           |   71 +
> .../userspace-api/media/v4l/devices.rst       |    1 +
> .../media/v4l/ext-ctrls-audio-m2m.rst         |   59 +
> .../userspace-api/media/v4l/pixfmt-audio.rst  |  100 ++
> .../userspace-api/media/v4l/pixfmt.rst        |    1 +
> .../media/v4l/vidioc-enum-fmt.rst             |    2 +
> .../media/v4l/vidioc-g-ext-ctrls.rst          |    4 +
> .../userspace-api/media/v4l/vidioc-g-fmt.rst  |    4 +
> .../media/v4l/vidioc-querycap.rst             |    3 +
> .../media/v4l/vidioc-queryctrl.rst            |   11 +-
> .../media/videodev2.h.rst.exceptions          |    3 +
> MAINTAINERS                                   |   17 +
> .../media/common/videobuf2/videobuf2-v4l2.c   |    4 +
> drivers/media/platform/nxp/Kconfig            |   13 +
> drivers/media/platform/nxp/Makefile           |    1 +
> drivers/media/platform/nxp/imx-asrc.c         | 1256 +++++++++++++++++
> drivers/media/test-drivers/Kconfig            |   10 +
> drivers/media/test-drivers/Makefile           |    1 +
> drivers/media/test-drivers/vim2m-audio.c      |  793 +++++++++++
> drivers/media/test-drivers/vivid/vivid-core.h |    2 +
> .../media/test-drivers/vivid/vivid-ctrls.c    |   26 +
> drivers/media/v4l2-core/v4l2-compat-ioctl32.c |    9 +
> drivers/media/v4l2-core/v4l2-ctrls-api.c      |    1 +
> drivers/media/v4l2-core/v4l2-ctrls-core.c     |   93 +-
> drivers/media/v4l2-core/v4l2-ctrls-defs.c     |   10 +
> drivers/media/v4l2-core/v4l2-dev.c            |   21 +
> drivers/media/v4l2-core/v4l2-ioctl.c          |   66 +
> drivers/media/v4l2-core/v4l2-mem2mem.c        |   13 +-
> include/media/v4l2-ctrls.h                    |   13 +-
> include/media/v4l2-dev.h                      |    2 +
> include/media/v4l2-ioctl.h                    |   34 +
> .../fsl => include/sound}/fsl_asrc_common.h   |   60 +
> include/uapi/linux/media.h                    |    2 +
> include/uapi/linux/v4l2-controls.h            |    9 +
> include/uapi/linux/videodev2.h                |   50 +-
> sound/soc/fsl/fsl_asrc.c                      |  144 ++
> sound/soc/fsl/fsl_asrc.h                      |    4 +-
> sound/soc/fsl/fsl_asrc_dma.c                  |    2 +-
> sound/soc/fsl/fsl_easrc.c                     |  233 +++
> sound/soc/fsl/fsl_easrc.h                     |    6 +-
> 43 files changed, 3145 insertions(+), 27 deletions(-)
> create mode 100644 Documentation/userspace-api/media/v4l/dev-audio-mem2mem.rst
> create mode 100644 Documentation/userspace-api/media/v4l/ext-ctrls-audio-m2m.rst
> create mode 100644 Documentation/userspace-api/media/v4l/pixfmt-audio.rst
> create mode 100644 drivers/media/platform/nxp/imx-asrc.c
> create mode 100644 drivers/media/test-drivers/vim2m-audio.c
> rename {sound/soc/fsl => include/sound}/fsl_asrc_common.h (60%)
>
>-- 
>2.34.1
>
>
Hans Verkuil April 30, 2024, 8:47 a.m. UTC | #2
On 30/04/2024 10:21, Sebastian Fricke wrote:
> Hey Shengjiu,
> 
> first of all thanks for all of this work and I am very sorry for only
> emerging this late into the series, I sadly didn't notice it earlier.
> 
> I would like to voice a few concerns about the general idea of adding
> Audio support to the Media subsystem.
> 
> 1. The biggest objection is, that the Linux Kernel has a subsystem
> specifically targeted for audio devices, adding support for these
> devices in another subsystem are counterproductive as they work around
> the shortcomings of the audio subsystem while forcing support for a
> device into a subsystem that was never designed for such devices.
> Instead, the audio subsystem has to be adjusted to be able to support
> all of the required workflows, otherwise, the next audio driver with
> similar requirements will have to move to the media subsystem as well,
> the audio subsystem would then never experience the required change and
> soon we would have two audio subsystems.
> 
> 2. Closely connected to the previous objection, the media subsystem with
> its current staff of maintainers is overworked and barely capable of
> handling the workload, which includes an abundance of different devices
> from DVB, codecs, cameras, PCI devices, radio tuners, HDMI CEC, IR
> receivers, etc. Adding more device types to this matrix will make the
> situation worse and should only be done with a plan for how first to
> improve the current maintainer situation.
> 
> 3. By using the same framework and APIs as the video codecs, the audio
> codecs are going to cause extra work for the video codec developers and
> maintainers simply by occupying the same space that was orginally
> designed for the purpose of video only. Even if you try to not cause any
> extra stress the simple presence of the audio code in the codebase is
> going to cause restrictions.
> 
> The main issue here is that the audio subsystem doesn't provide a
> mem2mem framework and I would say you are in luck because the media
> subsystem has gathered a lot of shortcomings with its current
> implementation of the mem2mem framework over time, which is why a new
> implementation will be necessary anyway.
> 
> So instead of hammering a driver into the wrong destination, I would
> suggest bundling our forces and implementing a general memory-to-memory
> framework that both the media and the audio subsystem can use, that
> addresses the current shortcomings of the implementation and allows you
> to upload the driver where it is supposed to be.
> This is going to cause restrictions as well, like mentioned in the
> concern number 3, but with the difference that we can make a general
> plan for such a framework that accomodates lots of use cases and each
> subsystem can add their routines on top of the general framework.
> 
> Another possible alternative is to try and make the DRM scheduler more
> generally available, this scheduler is the most mature and in fact is
> very similar to what you and what the media devices need.
> Which again just shows how common your usecase actually is and how a
> general solution is the best long term solution.
> 
> Please notice that Daniel Almeida is currently working on something
> related to this:
> https://lore.kernel.org/linux-media/3F80AC0D-DCAA-4EDE-BF58-BB1369C7EDCA@collabora.com/T/#u
> 
> If the toplevel maintainers decide to add the patchset so be it, but I
> wanted to voice my concerns and also highlight that this is likely going
> to cause extra stress for the video codecs maintainers and the
> maintainers in general. We cannot spend a lot of time on audio codecs,
> as video codecs already fill up our available time sufficiently,
> so the use of the framework needs to be conservative and cause as little
> extra work as possible for the original use case of the framework.

I would really like to get the input of the audio maintainers on this.
Sebastian has a good point, especially with us being overworked :-)

Having a shared mem2mem framework would certainly be nice, on the other
hand, developing that will most likely take a substantial amount of time.

Perhaps it is possible to copy the current media v4l2-mem2mem.c and turn
it into an alsa-mem2mem.c? I really do not know enough about the alsa
subsystem to tell if that is possible.

While this driver is a rate converter, not an audio codec, the same
principles would apply to off-line audio codecs as well. And it is true
that we definitely do not want to support audio codecs in the media
subsystem.

Accepting this driver creates a precedent and would open the door for
audio codecs.

I may have been too hasty in saying yes to this, I did not consider
the wider implications for our workload and what it can lead to. I
sincerely apologize to Shengjiu Wang as it is no fun to end up in a
situation like this.

Regards,

	Hans
Mauro Carvalho Chehab April 30, 2024, 1:52 p.m. UTC | #3
Em Tue, 30 Apr 2024 10:47:13 +0200
Hans Verkuil <hverkuil@xs4all.nl> escreveu:

> On 30/04/2024 10:21, Sebastian Fricke wrote:
> > Hey Shengjiu,
> > 
> > first of all thanks for all of this work and I am very sorry for only
> > emerging this late into the series, I sadly didn't notice it earlier.
> > 
> > I would like to voice a few concerns about the general idea of adding
> > Audio support to the Media subsystem.
> > 
> > 1. The biggest objection is, that the Linux Kernel has a subsystem
> > specifically targeted for audio devices, adding support for these
> > devices in another subsystem are counterproductive as they work around
> > the shortcomings of the audio subsystem while forcing support for a
> > device into a subsystem that was never designed for such devices.
> > Instead, the audio subsystem has to be adjusted to be able to support
> > all of the required workflows, otherwise, the next audio driver with
> > similar requirements will have to move to the media subsystem as well,
> > the audio subsystem would then never experience the required change and
> > soon we would have two audio subsystems.
> > 
> > 2. Closely connected to the previous objection, the media subsystem with
> > its current staff of maintainers is overworked and barely capable of
> > handling the workload, which includes an abundance of different devices
> > from DVB, codecs, cameras, PCI devices, radio tuners, HDMI CEC, IR
> > receivers, etc. Adding more device types to this matrix will make the
> > situation worse and should only be done with a plan for how first to
> > improve the current maintainer situation.
> > 
> > 3. By using the same framework and APIs as the video codecs, the audio
> > codecs are going to cause extra work for the video codec developers and
> > maintainers simply by occupying the same space that was orginally
> > designed for the purpose of video only. Even if you try to not cause any
> > extra stress the simple presence of the audio code in the codebase is
> > going to cause restrictions.
> > 
> > The main issue here is that the audio subsystem doesn't provide a
> > mem2mem framework and I would say you are in luck because the media
> > subsystem has gathered a lot of shortcomings with its current
> > implementation of the mem2mem framework over time, which is why a new
> > implementation will be necessary anyway.
> > 
> > So instead of hammering a driver into the wrong destination, I would
> > suggest bundling our forces and implementing a general memory-to-memory
> > framework that both the media and the audio subsystem can use, that
> > addresses the current shortcomings of the implementation and allows you
> > to upload the driver where it is supposed to be.
> > This is going to cause restrictions as well, like mentioned in the
> > concern number 3, but with the difference that we can make a general
> > plan for such a framework that accomodates lots of use cases and each
> > subsystem can add their routines on top of the general framework.
> > 
> > Another possible alternative is to try and make the DRM scheduler more
> > generally available, this scheduler is the most mature and in fact is
> > very similar to what you and what the media devices need.
> > Which again just shows how common your usecase actually is and how a
> > general solution is the best long term solution.
> > 
> > Please notice that Daniel Almeida is currently working on something
> > related to this:
> > https://lore.kernel.org/linux-media/3F80AC0D-DCAA-4EDE-BF58-BB1369C7EDCA@collabora.com/T/#u
> > 
> > If the toplevel maintainers decide to add the patchset so be it, but I
> > wanted to voice my concerns and also highlight that this is likely going
> > to cause extra stress for the video codecs maintainers and the
> > maintainers in general. We cannot spend a lot of time on audio codecs,
> > as video codecs already fill up our available time sufficiently,
> > so the use of the framework needs to be conservative and cause as little
> > extra work as possible for the original use case of the framework.  
> 
> I would really like to get the input of the audio maintainers on this.
> Sebastian has a good point, especially with us being overworked :-)
> 
> Having a shared mem2mem framework would certainly be nice, on the other
> hand, developing that will most likely take a substantial amount of time.
> 
> Perhaps it is possible to copy the current media v4l2-mem2mem.c and turn
> it into an alsa-mem2mem.c? I really do not know enough about the alsa
> subsystem to tell if that is possible.
> 
> While this driver is a rate converter, not an audio codec, the same
> principles would apply to off-line audio codecs as well. And it is true
> that we definitely do not want to support audio codecs in the media
> subsystem.
> 
> Accepting this driver creates a precedent and would open the door for
> audio codecs.
> 
> I may have been too hasty in saying yes to this, I did not consider
> the wider implications for our workload and what it can lead to. I
> sincerely apologize to Shengjiu Wang as it is no fun to end up in a
> situation like this.

I agree with both Sebastian and Hans here: media devices always had
audio streams, even on old PCI analog TV devices like bttv. There
are even some devices like the ones based on usb em28xx that contains
an AC97 chip on it. The decision was always to have audio supported by
ALSA APIs/subsystem, as otherwise we'll end duplicating code and 
reinventing the wheel with new incompatible APIs for audio in and outside
media, creating unneeded complexity, which will end being reflected on
userspace as well.

So, IMO it makes a lot more sense to place audio codecs and processor
blocks inside ALSA, probably as part of ALSA SOF, if possible.

Hans suggestion of forking v4l2-mem2mem.c on ALSA seems a good
starting point. Also, moving the DRM mem2mem functionality to a 
core library that could be re-used by the three subsystems sounds
a good idea, but I suspect that a change like that could be more
time-consuming.

Regards,
Mauro
Mark Brown April 30, 2024, 2:46 p.m. UTC | #4
On Tue, Apr 30, 2024 at 10:21:12AM +0200, Sebastian Fricke wrote:

> first of all thanks for all of this work and I am very sorry for only
> emerging this late into the series, I sadly didn't notice it earlier.

It might be worth checking out the discussion on earlier versions...

> 1. The biggest objection is, that the Linux Kernel has a subsystem
> specifically targeted for audio devices, adding support for these
> devices in another subsystem are counterproductive as they work around
> the shortcomings of the audio subsystem while forcing support for a
> device into a subsystem that was never designed for such devices.
> Instead, the audio subsystem has to be adjusted to be able to support
> all of the required workflows, otherwise, the next audio driver with
> similar requirements will have to move to the media subsystem as well,
> the audio subsystem would then never experience the required change and
> soon we would have two audio subsystems.

The discussion around this originally was that all the audio APIs are
very much centered around real time operations rather than completely
async memory to memory operations and that it's not clear that it's
worth reinventing the wheel simply for the sake of having things in
ALSA when that's already pretty idiomatic for the media subsystem.  It
wasn't the memory to memory bit per se, it was the disconnection from
any timing.

> So instead of hammering a driver into the wrong destination, I would
> suggest bundling our forces and implementing a general memory-to-memory
> framework that both the media and the audio subsystem can use, that
> addresses the current shortcomings of the implementation and allows you
> to upload the driver where it is supposed to be.

That doesn't sound like an immediate solution to maintainer overload
issues...  if something like this is going to happen the DRM solution
does seem more general but I'm not sure the amount of stop energy is
proportionate.
Jaroslav Kysela April 30, 2024, 3:03 p.m. UTC | #5
On 30. 04. 24 16:46, Mark Brown wrote:

>> So instead of hammering a driver into the wrong destination, I would
>> suggest bundling our forces and implementing a general memory-to-memory
>> framework that both the media and the audio subsystem can use, that
>> addresses the current shortcomings of the implementation and allows you
>> to upload the driver where it is supposed to be.
> 
> That doesn't sound like an immediate solution to maintainer overload
> issues...  if something like this is going to happen the DRM solution
> does seem more general but I'm not sure the amount of stop energy is
> proportionate.

The "do what you want" ALSA's hwdep device / interface can be used to transfer 
data in/out from SRC using custom read/write/ioctl/mmap syscalls. The question 
is, if the changes cannot be more simpler for the first implementation keeping 
the hardware enumeration in one subsystem where is the driver code placed. I 
also see the benefit to reuse the already existing framework (but is v4l2 the 
right one?).

					Jaroslav
Mauro Carvalho Chehab April 30, 2024, 4:27 p.m. UTC | #6
Em Tue, 30 Apr 2024 23:46:03 +0900
Mark Brown <broonie@kernel.org> escreveu:

> On Tue, Apr 30, 2024 at 10:21:12AM +0200, Sebastian Fricke wrote:
> 
> > first of all thanks for all of this work and I am very sorry for only
> > emerging this late into the series, I sadly didn't notice it earlier.  
> 
> It might be worth checking out the discussion on earlier versions...
> 
> > 1. The biggest objection is, that the Linux Kernel has a subsystem
> > specifically targeted for audio devices, adding support for these
> > devices in another subsystem are counterproductive as they work around
> > the shortcomings of the audio subsystem while forcing support for a
> > device into a subsystem that was never designed for such devices.
> > Instead, the audio subsystem has to be adjusted to be able to support
> > all of the required workflows, otherwise, the next audio driver with
> > similar requirements will have to move to the media subsystem as well,
> > the audio subsystem would then never experience the required change and
> > soon we would have two audio subsystems.  
> 
> The discussion around this originally was that all the audio APIs are
> very much centered around real time operations rather than completely
> async memory to memory operations and that it's not clear that it's
> worth reinventing the wheel simply for the sake of having things in
> ALSA when that's already pretty idiomatic for the media subsystem.  It
> wasn't the memory to memory bit per se, it was the disconnection from
> any timing.

The media subsystem is also centered around real time. Without real
time, you can't have a decent video conference system. Having
mem2mem transfers actually help reducing real time delays, as it 
avoids extra latency due to CPU congestion and/or data transfers
from/to userspace.

> 
> > So instead of hammering a driver into the wrong destination, I would
> > suggest bundling our forces and implementing a general memory-to-memory
> > framework that both the media and the audio subsystem can use, that
> > addresses the current shortcomings of the implementation and allows you
> > to upload the driver where it is supposed to be.  
> 
> That doesn't sound like an immediate solution to maintainer overload
> issues...  if something like this is going to happen the DRM solution
> does seem more general but I'm not sure the amount of stop energy is
> proportionate.

I don't think maintainer overload is the issue here. The main
point is to avoid a fork at the audio uAPI, plus the burden
of re-inventing the wheel with new codes for audio formats,
new documentation for them, etc.

Regards,
Mauro
Mark Brown May 1, 2024, 1:56 a.m. UTC | #7
On Tue, Apr 30, 2024 at 05:27:52PM +0100, Mauro Carvalho Chehab wrote:
> Mark Brown <broonie@kernel.org> escreveu:
> > On Tue, Apr 30, 2024 at 10:21:12AM +0200, Sebastian Fricke wrote:

> > The discussion around this originally was that all the audio APIs are
> > very much centered around real time operations rather than completely

> The media subsystem is also centered around real time. Without real
> time, you can't have a decent video conference system. Having
> mem2mem transfers actually help reducing real time delays, as it 
> avoids extra latency due to CPU congestion and/or data transfers
> from/to userspace.

Real time means strongly tied to wall clock times rather than fast - the
issue was that all the ALSA APIs are based around pushing data through
the system based on a clock.

> > That doesn't sound like an immediate solution to maintainer overload
> > issues...  if something like this is going to happen the DRM solution
> > does seem more general but I'm not sure the amount of stop energy is
> > proportionate.

> I don't think maintainer overload is the issue here. The main
> point is to avoid a fork at the audio uAPI, plus the burden
> of re-inventing the wheel with new codes for audio formats,
> new documentation for them, etc.

I thought that discussion had been had already at one of the earlier
versions?  TBH I've not really been paying attention to this since the
very early versions where I raised some similar "why is this in media"
points and I thought everyone had decided that this did actually make
sense.
Takashi Iwai May 2, 2024, 7:46 a.m. UTC | #8
On Wed, 01 May 2024 03:56:15 +0200,
Mark Brown wrote:
> 
> On Tue, Apr 30, 2024 at 05:27:52PM +0100, Mauro Carvalho Chehab wrote:
> > Mark Brown <broonie@kernel.org> escreveu:
> > > On Tue, Apr 30, 2024 at 10:21:12AM +0200, Sebastian Fricke wrote:
> 
> > > The discussion around this originally was that all the audio APIs are
> > > very much centered around real time operations rather than completely
> 
> > The media subsystem is also centered around real time. Without real
> > time, you can't have a decent video conference system. Having
> > mem2mem transfers actually help reducing real time delays, as it 
> > avoids extra latency due to CPU congestion and/or data transfers
> > from/to userspace.
> 
> Real time means strongly tied to wall clock times rather than fast - the
> issue was that all the ALSA APIs are based around pushing data through
> the system based on a clock.
> 
> > > That doesn't sound like an immediate solution to maintainer overload
> > > issues...  if something like this is going to happen the DRM solution
> > > does seem more general but I'm not sure the amount of stop energy is
> > > proportionate.
> 
> > I don't think maintainer overload is the issue here. The main
> > point is to avoid a fork at the audio uAPI, plus the burden
> > of re-inventing the wheel with new codes for audio formats,
> > new documentation for them, etc.
> 
> I thought that discussion had been had already at one of the earlier
> versions?  TBH I've not really been paying attention to this since the
> very early versions where I raised some similar "why is this in media"
> points and I thought everyone had decided that this did actually make
> sense.

Yeah, it was discussed in v1 and v2 threads, e.g.
  https://patchwork.kernel.org/project/linux-media/cover/1690265540-25999-1-git-send-email-shengjiu.wang@nxp.com/#25485573

My argument at that time was how the operation would be, and the point
was that it'd be a "batch-like" operation via M2M without any timing
control.  It'd be a very special usage for for ALSA, and if any, it'd
be hwdep -- that is a very hardware-specific API implementation -- or
try compress-offload API, which looks dubious.

OTOH, the argument was that there is already a framework for M2M in
media API and that also fits for the batch-like operation, too.  So
was the thread evolved until now.


thanks,

Takashi
Mauro Carvalho Chehab May 2, 2024, 8:59 a.m. UTC | #9
Em Thu, 02 May 2024 09:46:14 +0200
Takashi Iwai <tiwai@suse.de> escreveu:

> On Wed, 01 May 2024 03:56:15 +0200,
> Mark Brown wrote:
> > 
> > On Tue, Apr 30, 2024 at 05:27:52PM +0100, Mauro Carvalho Chehab wrote:  
> > > Mark Brown <broonie@kernel.org> escreveu:  
> > > > On Tue, Apr 30, 2024 at 10:21:12AM +0200, Sebastian Fricke wrote:  
> >   
> > > > The discussion around this originally was that all the audio APIs are
> > > > very much centered around real time operations rather than completely  
> >   
> > > The media subsystem is also centered around real time. Without real
> > > time, you can't have a decent video conference system. Having
> > > mem2mem transfers actually help reducing real time delays, as it 
> > > avoids extra latency due to CPU congestion and/or data transfers
> > > from/to userspace.  
> > 
> > Real time means strongly tied to wall clock times rather than fast - the
> > issue was that all the ALSA APIs are based around pushing data through
> > the system based on a clock.
> >   
> > > > That doesn't sound like an immediate solution to maintainer overload
> > > > issues...  if something like this is going to happen the DRM solution
> > > > does seem more general but I'm not sure the amount of stop energy is
> > > > proportionate.  
> >   
> > > I don't think maintainer overload is the issue here. The main
> > > point is to avoid a fork at the audio uAPI, plus the burden
> > > of re-inventing the wheel with new codes for audio formats,
> > > new documentation for them, etc.  
> > 
> > I thought that discussion had been had already at one of the earlier
> > versions?  TBH I've not really been paying attention to this since the
> > very early versions where I raised some similar "why is this in media"
> > points and I thought everyone had decided that this did actually make
> > sense.  
> 
> Yeah, it was discussed in v1 and v2 threads, e.g.
>   https://patchwork.kernel.org/project/linux-media/cover/1690265540-25999-1-git-send-email-shengjiu.wang@nxp.com/#25485573
> 
> My argument at that time was how the operation would be, and the point
> was that it'd be a "batch-like" operation via M2M without any timing
> control.  It'd be a very special usage for for ALSA, and if any, it'd
> be hwdep -- that is a very hardware-specific API implementation -- or
> try compress-offload API, which looks dubious.
> 
> OTOH, the argument was that there is already a framework for M2M in
> media API and that also fits for the batch-like operation, too.  So
> was the thread evolved until now.

M2M transfers are not a hardware-specific API, and such kind of
transfers is not new either. Old media devices like bttv have
internally a way to do PCI2PCI transfers, allowing media streams
to be transferred directly without utilizing CPU. The media driver
supports it for video, as this made a huge difference of performance
back then.

On embedded world, this is a pretty common scenario: different media
IP blocks can communicate with each other directly via memory. This
can happen for video capture, video display and audio.

With M2M, most of the control is offloaded to the hardware.

There are still time control associated with it, as audio and video
needs to be in sync. This is done by controlling the buffers size 
and could be fine-tuned by checking when the buffer transfer is done.

On media, M2M buffer transfers are started via VIDIOC_QBUF,
which is a request to do a frame transfer. A similar ioctl
(VIDIOC_DQBUF) is used to monitor when the hardware finishes
transfering the buffer. On other words, the CPU is responsible
for time control.

On other words, this is still real time. The main difference
from a "sync" transfer is that the CPU doesn't need to copy data
from/to different devices, as such operation is offloaded to the
hardware.

Regards,
Mauro
Mauro Carvalho Chehab May 2, 2024, 9:26 a.m. UTC | #10
Em Thu, 2 May 2024 09:59:56 +0100
Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:

> Em Thu, 02 May 2024 09:46:14 +0200
> Takashi Iwai <tiwai@suse.de> escreveu:
> 
> > On Wed, 01 May 2024 03:56:15 +0200,
> > Mark Brown wrote:
> > > 
> > > On Tue, Apr 30, 2024 at 05:27:52PM +0100, Mauro Carvalho Chehab wrote:  
> > > > Mark Brown <broonie@kernel.org> escreveu:  
> > > > > On Tue, Apr 30, 2024 at 10:21:12AM +0200, Sebastian Fricke wrote:  
> > >   
> > > > > The discussion around this originally was that all the audio APIs are
> > > > > very much centered around real time operations rather than completely  
> > >   
> > > > The media subsystem is also centered around real time. Without real
> > > > time, you can't have a decent video conference system. Having
> > > > mem2mem transfers actually help reducing real time delays, as it 
> > > > avoids extra latency due to CPU congestion and/or data transfers
> > > > from/to userspace.  
> > > 
> > > Real time means strongly tied to wall clock times rather than fast - the
> > > issue was that all the ALSA APIs are based around pushing data through
> > > the system based on a clock.
> > >   
> > > > > That doesn't sound like an immediate solution to maintainer overload
> > > > > issues...  if something like this is going to happen the DRM solution
> > > > > does seem more general but I'm not sure the amount of stop energy is
> > > > > proportionate.  
> > >   
> > > > I don't think maintainer overload is the issue here. The main
> > > > point is to avoid a fork at the audio uAPI, plus the burden
> > > > of re-inventing the wheel with new codes for audio formats,
> > > > new documentation for them, etc.  
> > > 
> > > I thought that discussion had been had already at one of the earlier
> > > versions?  TBH I've not really been paying attention to this since the
> > > very early versions where I raised some similar "why is this in media"
> > > points and I thought everyone had decided that this did actually make
> > > sense.  
> > 
> > Yeah, it was discussed in v1 and v2 threads, e.g.
> >   https://patchwork.kernel.org/project/linux-media/cover/1690265540-25999-1-git-send-email-shengjiu.wang@nxp.com/#25485573
> > 
> > My argument at that time was how the operation would be, and the point
> > was that it'd be a "batch-like" operation via M2M without any timing
> > control.  It'd be a very special usage for for ALSA, and if any, it'd
> > be hwdep -- that is a very hardware-specific API implementation -- or
> > try compress-offload API, which looks dubious.
> > 
> > OTOH, the argument was that there is already a framework for M2M in
> > media API and that also fits for the batch-like operation, too.  So
> > was the thread evolved until now.
> 
> M2M transfers are not a hardware-specific API, and such kind of
> transfers is not new either. Old media devices like bttv have
> internally a way to do PCI2PCI transfers, allowing media streams
> to be transferred directly without utilizing CPU. The media driver
> supports it for video, as this made a huge difference of performance
> back then.
> 
> On embedded world, this is a pretty common scenario: different media
> IP blocks can communicate with each other directly via memory. This
> can happen for video capture, video display and audio.
> 
> With M2M, most of the control is offloaded to the hardware.
> 
> There are still time control associated with it, as audio and video
> needs to be in sync. This is done by controlling the buffers size 
> and could be fine-tuned by checking when the buffer transfer is done.
> 
> On media, M2M buffer transfers are started via VIDIOC_QBUF,
> which is a request to do a frame transfer. A similar ioctl
> (VIDIOC_DQBUF) is used to monitor when the hardware finishes
> transfering the buffer. On other words, the CPU is responsible
> for time control.

Just complementing: on media, we do this per video buffer (or
per half video buffer). A typical use case on cameras is to have
buffers transferred 30 times per second, if the video was streamed 
at 30 frames per second. 

I would assume that, on an audio/video stream, the audio data
transfer will be programmed to also happen on a regular interval.

So, if the video stream is programmed to a 30 frames per second
rate, I would assume that the associated audio stream will also be
programmed to be grouped into 30 data transfers per second. On such
scenario, if the audio is sampled at 48 kHZ, it means that:

1) each M2M transfer commanded by CPU will copy 1600 samples;
2) the time between each sample will remain 1/48000;
3) a notification event telling that 1600 samples were transferred
   will be generated when the last sample happens;
4) CPU will do time control by looking at the notification events.

> On other words, this is still real time. The main difference
> from a "sync" transfer is that the CPU doesn't need to copy data
> from/to different devices, as such operation is offloaded to the
> hardware.
> 
> Regards,
> Mauro
Mark Brown May 3, 2024, 1:47 a.m. UTC | #11
On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:

> > There are still time control associated with it, as audio and video
> > needs to be in sync. This is done by controlling the buffers size 
> > and could be fine-tuned by checking when the buffer transfer is done.

...

> Just complementing: on media, we do this per video buffer (or
> per half video buffer). A typical use case on cameras is to have
> buffers transferred 30 times per second, if the video was streamed 
> at 30 frames per second. 

IIRC some big use case for this hardware was transcoding so there was a
desire to just go at whatever rate the hardware could support as there
is no interactive user consuming the output as it is generated.

> I would assume that, on an audio/video stream, the audio data
> transfer will be programmed to also happen on a regular interval.

With audio the API is very much "wake userspace every Xms".
Mauro Carvalho Chehab May 3, 2024, 8:42 a.m. UTC | #12
Em Fri, 3 May 2024 10:47:19 +0900
Mark Brown <broonie@kernel.org> escreveu:

> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
> > Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:  
> 
> > > There are still time control associated with it, as audio and video
> > > needs to be in sync. This is done by controlling the buffers size 
> > > and could be fine-tuned by checking when the buffer transfer is done.  
> 
> ...
> 
> > Just complementing: on media, we do this per video buffer (or
> > per half video buffer). A typical use case on cameras is to have
> > buffers transferred 30 times per second, if the video was streamed 
> > at 30 frames per second.   
> 
> IIRC some big use case for this hardware was transcoding so there was a
> desire to just go at whatever rate the hardware could support as there
> is no interactive user consuming the output as it is generated.

Indeed, codecs could be used to just do transcoding, but I would
expect it to be a border use case. See, as the chipsets implementing 
codecs are typically the ones used on mobiles, I would expect that
the major use cases to be to watch audio and video and to participate
on audio/video conferences.

Going further, the codec API may end supporting not only transcoding
(which is something that CPU can usually handle without too much
processing) but also audio processing that may require more 
complex algorithms - even deep learning ones - like background noise
removal, echo detection/removal, volume auto-gain, audio enhancement
and such.

On other words, the typical use cases will either have input
or output being a physical hardware (microphone or speaker).

> > I would assume that, on an audio/video stream, the audio data
> > transfer will be programmed to also happen on a regular interval.  
> 
> With audio the API is very much "wake userspace every Xms".
Shengjiu Wang May 6, 2024, 8:49 a.m. UTC | #13
On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote:
>
> Em Fri, 3 May 2024 10:47:19 +0900
> Mark Brown <broonie@kernel.org> escreveu:
>
> > On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
> > > Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:
> >
> > > > There are still time control associated with it, as audio and video
> > > > needs to be in sync. This is done by controlling the buffers size
> > > > and could be fine-tuned by checking when the buffer transfer is done.
> >
> > ...
> >
> > > Just complementing: on media, we do this per video buffer (or
> > > per half video buffer). A typical use case on cameras is to have
> > > buffers transferred 30 times per second, if the video was streamed
> > > at 30 frames per second.
> >
> > IIRC some big use case for this hardware was transcoding so there was a
> > desire to just go at whatever rate the hardware could support as there
> > is no interactive user consuming the output as it is generated.
>
> Indeed, codecs could be used to just do transcoding, but I would
> expect it to be a border use case. See, as the chipsets implementing
> codecs are typically the ones used on mobiles, I would expect that
> the major use cases to be to watch audio and video and to participate
> on audio/video conferences.
>
> Going further, the codec API may end supporting not only transcoding
> (which is something that CPU can usually handle without too much
> processing) but also audio processing that may require more
> complex algorithms - even deep learning ones - like background noise
> removal, echo detection/removal, volume auto-gain, audio enhancement
> and such.
>
> On other words, the typical use cases will either have input
> or output being a physical hardware (microphone or speaker).
>

All, thanks for spending time to discuss, it seems we go back to
the start point of this topic again.

Our main request is that there is a hardware sample rate converter
on the chip, so users can use it in user space as a component like
software sample rate converter. It mostly may run as a gstreamer plugin.
so it is a memory to memory component.

I didn't find such API in ALSA for such purpose, the best option for this
in the kernel is the V4L2 memory to memory framework I found.
As Hans said it is well designed for memory to memory.

And I think audio is one of 'media'.  As I can see that part of Radio
function is in ALSA, part of Radio function is in V4L2. part of HDMI
function is in DRM, part of HDMI function is in ALSA...
So using V4L2 for audio is not new from this point of view.

Even now I still think V4L2 is the best option, but it looks like there
are a lot of rejects.  If develop a new ALSA-mem2mem, it is also
a duplication of code (bigger duplication that just add audio support
in V4L2 I think).

Best regards
Shengjiu Wang.

> > > I would assume that, on an audio/video stream, the audio data
> > > transfer will be programmed to also happen on a regular interval.
> >
> > With audio the API is very much "wake userspace every Xms".
Jaroslav Kysela May 6, 2024, 9:42 a.m. UTC | #14
On 06. 05. 24 10:49, Shengjiu Wang wrote:

> Even now I still think V4L2 is the best option, but it looks like there
> are a lot of rejects.  If develop a new ALSA-mem2mem, it is also
> a duplication of code (bigger duplication that just add audio support
> in V4L2 I think).

Maybe not. Could you try to evaluate a pure dma-buf (drivers/dma-buf) solution 
and add only enumeration and operation trigger mechanism to the ALSA API? It 
seems that dma-buf has enough sufficient code to transfer data from and to the 
kernel space for the further processing. I think that one buffer can be as 
source and the second for the processed data.

We can eventually add new ioctls to the ALSA's control API (/dev/snd/control*) 
for this purpose (DSP processing).

					Jaroslav
Hans Verkuil May 8, 2024, 8 a.m. UTC | #15
On 06/05/2024 10:49, Shengjiu Wang wrote:
> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote:
>>
>> Em Fri, 3 May 2024 10:47:19 +0900
>> Mark Brown <broonie@kernel.org> escreveu:
>>
>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:
>>>
>>>>> There are still time control associated with it, as audio and video
>>>>> needs to be in sync. This is done by controlling the buffers size
>>>>> and could be fine-tuned by checking when the buffer transfer is done.
>>>
>>> ...
>>>
>>>> Just complementing: on media, we do this per video buffer (or
>>>> per half video buffer). A typical use case on cameras is to have
>>>> buffers transferred 30 times per second, if the video was streamed
>>>> at 30 frames per second.
>>>
>>> IIRC some big use case for this hardware was transcoding so there was a
>>> desire to just go at whatever rate the hardware could support as there
>>> is no interactive user consuming the output as it is generated.
>>
>> Indeed, codecs could be used to just do transcoding, but I would
>> expect it to be a border use case. See, as the chipsets implementing
>> codecs are typically the ones used on mobiles, I would expect that
>> the major use cases to be to watch audio and video and to participate
>> on audio/video conferences.
>>
>> Going further, the codec API may end supporting not only transcoding
>> (which is something that CPU can usually handle without too much
>> processing) but also audio processing that may require more
>> complex algorithms - even deep learning ones - like background noise
>> removal, echo detection/removal, volume auto-gain, audio enhancement
>> and such.
>>
>> On other words, the typical use cases will either have input
>> or output being a physical hardware (microphone or speaker).
>>
> 
> All, thanks for spending time to discuss, it seems we go back to
> the start point of this topic again.
> 
> Our main request is that there is a hardware sample rate converter
> on the chip, so users can use it in user space as a component like
> software sample rate converter. It mostly may run as a gstreamer plugin.
> so it is a memory to memory component.
> 
> I didn't find such API in ALSA for such purpose, the best option for this
> in the kernel is the V4L2 memory to memory framework I found.
> As Hans said it is well designed for memory to memory.
> 
> And I think audio is one of 'media'.  As I can see that part of Radio
> function is in ALSA, part of Radio function is in V4L2. part of HDMI
> function is in DRM, part of HDMI function is in ALSA...
> So using V4L2 for audio is not new from this point of view.
> 
> Even now I still think V4L2 is the best option, but it looks like there
> are a lot of rejects.  If develop a new ALSA-mem2mem, it is also
> a duplication of code (bigger duplication that just add audio support
> in V4L2 I think).

After reading this thread I still believe that the mem2mem framework is
a reasonable option, unless someone can come up with a method that is
easy to implement in the alsa subsystem. From what I can tell from this
discussion no such method exists.

>From the media side there are arguments that it adds extra maintenance
load, which is true, but I believe that it is quite limited in practice.

That said, perhaps we should make a statement that while we support the
use of audio m2m drivers, this is only for simple m2m audio processing like
this driver, specifically where there is a 1-to-1 mapping between input and
output buffers. At this point we do not want to add audio codec support or
similar complex audio processing.

Part of the reason is that codecs are hard, and we already have our hands
full with all the video codecs. Part of the reason is that the v4l2-mem2mem
framework probably needs to be forked to make a more advanced version geared
towards codecs since the current framework is too limiting for some of the
things we want to do. It was really designed for scalers, deinterlacers, etc.
and the codec support was added later.

If we ever allow such complex audio processing devices, then we would have
to have another discussion, and I believe that will only be possible if
most of the maintenance load would be on the alsa subsystem where the audio
experts are.

So my proposal is to:

1) add a clear statement to dev-audio-mem2mem.rst (patch 08/16) that only
   simple audio devices with a 1-to-1 mapping of input to output buffer are
   supported. Perhaps also in videodev2.h before struct v4l2_audio_format.

2) I will experiment a bit trying to solve the main complaint about creating
   new audio fourcc values and thus duplicating existing SNDRV_PCM_FORMAT_
   values. I have some ideas for that.

But I do not want to spend time on 2 until we agree that this is the way
forward.

Regards,

	Hans
Amadeusz Sławiński May 8, 2024, 8:13 a.m. UTC | #16
On 5/8/2024 10:00 AM, Hans Verkuil wrote:
> On 06/05/2024 10:49, Shengjiu Wang wrote:
>> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote:
>>>
>>> Em Fri, 3 May 2024 10:47:19 +0900
>>> Mark Brown <broonie@kernel.org> escreveu:
>>>
>>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
>>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:
>>>>
>>>>>> There are still time control associated with it, as audio and video
>>>>>> needs to be in sync. This is done by controlling the buffers size
>>>>>> and could be fine-tuned by checking when the buffer transfer is done.
>>>>
>>>> ...
>>>>
>>>>> Just complementing: on media, we do this per video buffer (or
>>>>> per half video buffer). A typical use case on cameras is to have
>>>>> buffers transferred 30 times per second, if the video was streamed
>>>>> at 30 frames per second.
>>>>
>>>> IIRC some big use case for this hardware was transcoding so there was a
>>>> desire to just go at whatever rate the hardware could support as there
>>>> is no interactive user consuming the output as it is generated.
>>>
>>> Indeed, codecs could be used to just do transcoding, but I would
>>> expect it to be a border use case. See, as the chipsets implementing
>>> codecs are typically the ones used on mobiles, I would expect that
>>> the major use cases to be to watch audio and video and to participate
>>> on audio/video conferences.
>>>
>>> Going further, the codec API may end supporting not only transcoding
>>> (which is something that CPU can usually handle without too much
>>> processing) but also audio processing that may require more
>>> complex algorithms - even deep learning ones - like background noise
>>> removal, echo detection/removal, volume auto-gain, audio enhancement
>>> and such.
>>>
>>> On other words, the typical use cases will either have input
>>> or output being a physical hardware (microphone or speaker).
>>>
>>
>> All, thanks for spending time to discuss, it seems we go back to
>> the start point of this topic again.
>>
>> Our main request is that there is a hardware sample rate converter
>> on the chip, so users can use it in user space as a component like
>> software sample rate converter. It mostly may run as a gstreamer plugin.
>> so it is a memory to memory component.
>>
>> I didn't find such API in ALSA for such purpose, the best option for this
>> in the kernel is the V4L2 memory to memory framework I found.
>> As Hans said it is well designed for memory to memory.
>>
>> And I think audio is one of 'media'.  As I can see that part of Radio
>> function is in ALSA, part of Radio function is in V4L2. part of HDMI
>> function is in DRM, part of HDMI function is in ALSA...
>> So using V4L2 for audio is not new from this point of view.
>>
>> Even now I still think V4L2 is the best option, but it looks like there
>> are a lot of rejects.  If develop a new ALSA-mem2mem, it is also
>> a duplication of code (bigger duplication that just add audio support
>> in V4L2 I think).
> 
> After reading this thread I still believe that the mem2mem framework is
> a reasonable option, unless someone can come up with a method that is
> easy to implement in the alsa subsystem. From what I can tell from this
> discussion no such method exists.
> 

Hi,

my main question would be how is mem2mem use case different from 
loopback exposing playback and capture frontends in user space with DSP 
(or other piece of HW) in the middle?

Amadeusz
Shengjiu Wang May 9, 2024, 9:36 a.m. UTC | #17
On Wed, May 8, 2024 at 4:14 PM Amadeusz Sławiński
<amadeuszx.slawinski@linux.intel.com> wrote:
>
> On 5/8/2024 10:00 AM, Hans Verkuil wrote:
> > On 06/05/2024 10:49, Shengjiu Wang wrote:
> >> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote:
> >>>
> >>> Em Fri, 3 May 2024 10:47:19 +0900
> >>> Mark Brown <broonie@kernel.org> escreveu:
> >>>
> >>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
> >>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:
> >>>>
> >>>>>> There are still time control associated with it, as audio and video
> >>>>>> needs to be in sync. This is done by controlling the buffers size
> >>>>>> and could be fine-tuned by checking when the buffer transfer is done.
> >>>>
> >>>> ...
> >>>>
> >>>>> Just complementing: on media, we do this per video buffer (or
> >>>>> per half video buffer). A typical use case on cameras is to have
> >>>>> buffers transferred 30 times per second, if the video was streamed
> >>>>> at 30 frames per second.
> >>>>
> >>>> IIRC some big use case for this hardware was transcoding so there was a
> >>>> desire to just go at whatever rate the hardware could support as there
> >>>> is no interactive user consuming the output as it is generated.
> >>>
> >>> Indeed, codecs could be used to just do transcoding, but I would
> >>> expect it to be a border use case. See, as the chipsets implementing
> >>> codecs are typically the ones used on mobiles, I would expect that
> >>> the major use cases to be to watch audio and video and to participate
> >>> on audio/video conferences.
> >>>
> >>> Going further, the codec API may end supporting not only transcoding
> >>> (which is something that CPU can usually handle without too much
> >>> processing) but also audio processing that may require more
> >>> complex algorithms - even deep learning ones - like background noise
> >>> removal, echo detection/removal, volume auto-gain, audio enhancement
> >>> and such.
> >>>
> >>> On other words, the typical use cases will either have input
> >>> or output being a physical hardware (microphone or speaker).
> >>>
> >>
> >> All, thanks for spending time to discuss, it seems we go back to
> >> the start point of this topic again.
> >>
> >> Our main request is that there is a hardware sample rate converter
> >> on the chip, so users can use it in user space as a component like
> >> software sample rate converter. It mostly may run as a gstreamer plugin.
> >> so it is a memory to memory component.
> >>
> >> I didn't find such API in ALSA for such purpose, the best option for this
> >> in the kernel is the V4L2 memory to memory framework I found.
> >> As Hans said it is well designed for memory to memory.
> >>
> >> And I think audio is one of 'media'.  As I can see that part of Radio
> >> function is in ALSA, part of Radio function is in V4L2. part of HDMI
> >> function is in DRM, part of HDMI function is in ALSA...
> >> So using V4L2 for audio is not new from this point of view.
> >>
> >> Even now I still think V4L2 is the best option, but it looks like there
> >> are a lot of rejects.  If develop a new ALSA-mem2mem, it is also
> >> a duplication of code (bigger duplication that just add audio support
> >> in V4L2 I think).
> >
> > After reading this thread I still believe that the mem2mem framework is
> > a reasonable option, unless someone can come up with a method that is
> > easy to implement in the alsa subsystem. From what I can tell from this
> > discussion no such method exists.
> >
>
> Hi,
>
> my main question would be how is mem2mem use case different from
> loopback exposing playback and capture frontends in user space with DSP
> (or other piece of HW) in the middle?
>
I think loopback has a timing control,  user need to feed data to playback at a
fixed time and get data from capture at a fixed time.  Otherwise there
is xrun in
playback and capture.

mem2mem case: there is no such timing control,  user feeds data to it
then it generates output,  if user doesn't feed data, there is no xrun.
but mem2mem is just one of the components in the playback or capture
pipeline, overall there is time control for whole pipeline,

Best regards
Shengjiu Wang

> Amadeusz
>
Amadeusz Sławiński May 9, 2024, 9:50 a.m. UTC | #18
On 5/9/2024 11:36 AM, Shengjiu Wang wrote:
> On Wed, May 8, 2024 at 4:14 PM Amadeusz Sławiński
> <amadeuszx.slawinski@linux.intel.com> wrote:
>>
>> On 5/8/2024 10:00 AM, Hans Verkuil wrote:
>>> On 06/05/2024 10:49, Shengjiu Wang wrote:
>>>> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote:
>>>>>
>>>>> Em Fri, 3 May 2024 10:47:19 +0900
>>>>> Mark Brown <broonie@kernel.org> escreveu:
>>>>>
>>>>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
>>>>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:
>>>>>>
>>>>>>>> There are still time control associated with it, as audio and video
>>>>>>>> needs to be in sync. This is done by controlling the buffers size
>>>>>>>> and could be fine-tuned by checking when the buffer transfer is done.
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>> Just complementing: on media, we do this per video buffer (or
>>>>>>> per half video buffer). A typical use case on cameras is to have
>>>>>>> buffers transferred 30 times per second, if the video was streamed
>>>>>>> at 30 frames per second.
>>>>>>
>>>>>> IIRC some big use case for this hardware was transcoding so there was a
>>>>>> desire to just go at whatever rate the hardware could support as there
>>>>>> is no interactive user consuming the output as it is generated.
>>>>>
>>>>> Indeed, codecs could be used to just do transcoding, but I would
>>>>> expect it to be a border use case. See, as the chipsets implementing
>>>>> codecs are typically the ones used on mobiles, I would expect that
>>>>> the major use cases to be to watch audio and video and to participate
>>>>> on audio/video conferences.
>>>>>
>>>>> Going further, the codec API may end supporting not only transcoding
>>>>> (which is something that CPU can usually handle without too much
>>>>> processing) but also audio processing that may require more
>>>>> complex algorithms - even deep learning ones - like background noise
>>>>> removal, echo detection/removal, volume auto-gain, audio enhancement
>>>>> and such.
>>>>>
>>>>> On other words, the typical use cases will either have input
>>>>> or output being a physical hardware (microphone or speaker).
>>>>>
>>>>
>>>> All, thanks for spending time to discuss, it seems we go back to
>>>> the start point of this topic again.
>>>>
>>>> Our main request is that there is a hardware sample rate converter
>>>> on the chip, so users can use it in user space as a component like
>>>> software sample rate converter. It mostly may run as a gstreamer plugin.
>>>> so it is a memory to memory component.
>>>>
>>>> I didn't find such API in ALSA for such purpose, the best option for this
>>>> in the kernel is the V4L2 memory to memory framework I found.
>>>> As Hans said it is well designed for memory to memory.
>>>>
>>>> And I think audio is one of 'media'.  As I can see that part of Radio
>>>> function is in ALSA, part of Radio function is in V4L2. part of HDMI
>>>> function is in DRM, part of HDMI function is in ALSA...
>>>> So using V4L2 for audio is not new from this point of view.
>>>>
>>>> Even now I still think V4L2 is the best option, but it looks like there
>>>> are a lot of rejects.  If develop a new ALSA-mem2mem, it is also
>>>> a duplication of code (bigger duplication that just add audio support
>>>> in V4L2 I think).
>>>
>>> After reading this thread I still believe that the mem2mem framework is
>>> a reasonable option, unless someone can come up with a method that is
>>> easy to implement in the alsa subsystem. From what I can tell from this
>>> discussion no such method exists.
>>>
>>
>> Hi,
>>
>> my main question would be how is mem2mem use case different from
>> loopback exposing playback and capture frontends in user space with DSP
>> (or other piece of HW) in the middle?
>>
> I think loopback has a timing control,  user need to feed data to playback at a
> fixed time and get data from capture at a fixed time.  Otherwise there
> is xrun in
> playback and capture.
> 
> mem2mem case: there is no such timing control,  user feeds data to it
> then it generates output,  if user doesn't feed data, there is no xrun.
> but mem2mem is just one of the components in the playback or capture
> pipeline, overall there is time control for whole pipeline,
> 

Have you looked at compress streams? If I remember correctly they are 
not tied to time due to the fact that they can pass data in arbitrary 
formats?

From:
https://docs.kernel.org/sound/designs/compress-offload.html

"No notion of underrun/overrun. Since the bytes written are compressed 
in nature and data written/read doesn’t translate directly to rendered 
output in time, this does not deal with underrun/overrun and maybe dealt 
in user-library"

Amadeusz
Shengjiu Wang May 9, 2024, 10:12 a.m. UTC | #19
On Thu, May 9, 2024 at 5:50 PM Amadeusz Sławiński
<amadeuszx.slawinski@linux.intel.com> wrote:
>
> On 5/9/2024 11:36 AM, Shengjiu Wang wrote:
> > On Wed, May 8, 2024 at 4:14 PM Amadeusz Sławiński
> > <amadeuszx.slawinski@linux.intel.com> wrote:
> >>
> >> On 5/8/2024 10:00 AM, Hans Verkuil wrote:
> >>> On 06/05/2024 10:49, Shengjiu Wang wrote:
> >>>> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote:
> >>>>>
> >>>>> Em Fri, 3 May 2024 10:47:19 +0900
> >>>>> Mark Brown <broonie@kernel.org> escreveu:
> >>>>>
> >>>>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
> >>>>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:
> >>>>>>
> >>>>>>>> There are still time control associated with it, as audio and video
> >>>>>>>> needs to be in sync. This is done by controlling the buffers size
> >>>>>>>> and could be fine-tuned by checking when the buffer transfer is done.
> >>>>>>
> >>>>>> ...
> >>>>>>
> >>>>>>> Just complementing: on media, we do this per video buffer (or
> >>>>>>> per half video buffer). A typical use case on cameras is to have
> >>>>>>> buffers transferred 30 times per second, if the video was streamed
> >>>>>>> at 30 frames per second.
> >>>>>>
> >>>>>> IIRC some big use case for this hardware was transcoding so there was a
> >>>>>> desire to just go at whatever rate the hardware could support as there
> >>>>>> is no interactive user consuming the output as it is generated.
> >>>>>
> >>>>> Indeed, codecs could be used to just do transcoding, but I would
> >>>>> expect it to be a border use case. See, as the chipsets implementing
> >>>>> codecs are typically the ones used on mobiles, I would expect that
> >>>>> the major use cases to be to watch audio and video and to participate
> >>>>> on audio/video conferences.
> >>>>>
> >>>>> Going further, the codec API may end supporting not only transcoding
> >>>>> (which is something that CPU can usually handle without too much
> >>>>> processing) but also audio processing that may require more
> >>>>> complex algorithms - even deep learning ones - like background noise
> >>>>> removal, echo detection/removal, volume auto-gain, audio enhancement
> >>>>> and such.
> >>>>>
> >>>>> On other words, the typical use cases will either have input
> >>>>> or output being a physical hardware (microphone or speaker).
> >>>>>
> >>>>
> >>>> All, thanks for spending time to discuss, it seems we go back to
> >>>> the start point of this topic again.
> >>>>
> >>>> Our main request is that there is a hardware sample rate converter
> >>>> on the chip, so users can use it in user space as a component like
> >>>> software sample rate converter. It mostly may run as a gstreamer plugin.
> >>>> so it is a memory to memory component.
> >>>>
> >>>> I didn't find such API in ALSA for such purpose, the best option for this
> >>>> in the kernel is the V4L2 memory to memory framework I found.
> >>>> As Hans said it is well designed for memory to memory.
> >>>>
> >>>> And I think audio is one of 'media'.  As I can see that part of Radio
> >>>> function is in ALSA, part of Radio function is in V4L2. part of HDMI
> >>>> function is in DRM, part of HDMI function is in ALSA...
> >>>> So using V4L2 for audio is not new from this point of view.
> >>>>
> >>>> Even now I still think V4L2 is the best option, but it looks like there
> >>>> are a lot of rejects.  If develop a new ALSA-mem2mem, it is also
> >>>> a duplication of code (bigger duplication that just add audio support
> >>>> in V4L2 I think).
> >>>
> >>> After reading this thread I still believe that the mem2mem framework is
> >>> a reasonable option, unless someone can come up with a method that is
> >>> easy to implement in the alsa subsystem. From what I can tell from this
> >>> discussion no such method exists.
> >>>
> >>
> >> Hi,
> >>
> >> my main question would be how is mem2mem use case different from
> >> loopback exposing playback and capture frontends in user space with DSP
> >> (or other piece of HW) in the middle?
> >>
> > I think loopback has a timing control,  user need to feed data to playback at a
> > fixed time and get data from capture at a fixed time.  Otherwise there
> > is xrun in
> > playback and capture.
> >
> > mem2mem case: there is no such timing control,  user feeds data to it
> > then it generates output,  if user doesn't feed data, there is no xrun.
> > but mem2mem is just one of the components in the playback or capture
> > pipeline, overall there is time control for whole pipeline,
> >
>
> Have you looked at compress streams? If I remember correctly they are
> not tied to time due to the fact that they can pass data in arbitrary
> formats?
>
> From:
> https://docs.kernel.org/sound/designs/compress-offload.html
>
> "No notion of underrun/overrun. Since the bytes written are compressed
> in nature and data written/read doesn’t translate directly to rendered
> output in time, this does not deal with underrun/overrun and maybe dealt
> in user-library"

I checked the compress stream. mem2mem case is different with
compress-offload case

compress-offload case is a full pipeline,  the user sends a compress
stream to it, then DSP decodes it and renders it to the speaker in real
time.

mem2mem is just like the decoder in the compress pipeline. which is
one of the components in the pipeline.

best regards
shengjiu wang
>
> Amadeusz
Amadeusz Sławiński May 9, 2024, 10:28 a.m. UTC | #20
On 5/9/2024 12:12 PM, Shengjiu Wang wrote:
> On Thu, May 9, 2024 at 5:50 PM Amadeusz Sławiński
> <amadeuszx.slawinski@linux.intel.com> wrote:
>>
>> On 5/9/2024 11:36 AM, Shengjiu Wang wrote:
>>> On Wed, May 8, 2024 at 4:14 PM Amadeusz Sławiński
>>> <amadeuszx.slawinski@linux.intel.com> wrote:
>>>>
>>>> On 5/8/2024 10:00 AM, Hans Verkuil wrote:
>>>>> On 06/05/2024 10:49, Shengjiu Wang wrote:
>>>>>> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote:
>>>>>>>
>>>>>>> Em Fri, 3 May 2024 10:47:19 +0900
>>>>>>> Mark Brown <broonie@kernel.org> escreveu:
>>>>>>>
>>>>>>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
>>>>>>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:
>>>>>>>>
>>>>>>>>>> There are still time control associated with it, as audio and video
>>>>>>>>>> needs to be in sync. This is done by controlling the buffers size
>>>>>>>>>> and could be fine-tuned by checking when the buffer transfer is done.
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>> Just complementing: on media, we do this per video buffer (or
>>>>>>>>> per half video buffer). A typical use case on cameras is to have
>>>>>>>>> buffers transferred 30 times per second, if the video was streamed
>>>>>>>>> at 30 frames per second.
>>>>>>>>
>>>>>>>> IIRC some big use case for this hardware was transcoding so there was a
>>>>>>>> desire to just go at whatever rate the hardware could support as there
>>>>>>>> is no interactive user consuming the output as it is generated.
>>>>>>>
>>>>>>> Indeed, codecs could be used to just do transcoding, but I would
>>>>>>> expect it to be a border use case. See, as the chipsets implementing
>>>>>>> codecs are typically the ones used on mobiles, I would expect that
>>>>>>> the major use cases to be to watch audio and video and to participate
>>>>>>> on audio/video conferences.
>>>>>>>
>>>>>>> Going further, the codec API may end supporting not only transcoding
>>>>>>> (which is something that CPU can usually handle without too much
>>>>>>> processing) but also audio processing that may require more
>>>>>>> complex algorithms - even deep learning ones - like background noise
>>>>>>> removal, echo detection/removal, volume auto-gain, audio enhancement
>>>>>>> and such.
>>>>>>>
>>>>>>> On other words, the typical use cases will either have input
>>>>>>> or output being a physical hardware (microphone or speaker).
>>>>>>>
>>>>>>
>>>>>> All, thanks for spending time to discuss, it seems we go back to
>>>>>> the start point of this topic again.
>>>>>>
>>>>>> Our main request is that there is a hardware sample rate converter
>>>>>> on the chip, so users can use it in user space as a component like
>>>>>> software sample rate converter. It mostly may run as a gstreamer plugin.
>>>>>> so it is a memory to memory component.
>>>>>>
>>>>>> I didn't find such API in ALSA for such purpose, the best option for this
>>>>>> in the kernel is the V4L2 memory to memory framework I found.
>>>>>> As Hans said it is well designed for memory to memory.
>>>>>>
>>>>>> And I think audio is one of 'media'.  As I can see that part of Radio
>>>>>> function is in ALSA, part of Radio function is in V4L2. part of HDMI
>>>>>> function is in DRM, part of HDMI function is in ALSA...
>>>>>> So using V4L2 for audio is not new from this point of view.
>>>>>>
>>>>>> Even now I still think V4L2 is the best option, but it looks like there
>>>>>> are a lot of rejects.  If develop a new ALSA-mem2mem, it is also
>>>>>> a duplication of code (bigger duplication that just add audio support
>>>>>> in V4L2 I think).
>>>>>
>>>>> After reading this thread I still believe that the mem2mem framework is
>>>>> a reasonable option, unless someone can come up with a method that is
>>>>> easy to implement in the alsa subsystem. From what I can tell from this
>>>>> discussion no such method exists.
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> my main question would be how is mem2mem use case different from
>>>> loopback exposing playback and capture frontends in user space with DSP
>>>> (or other piece of HW) in the middle?
>>>>
>>> I think loopback has a timing control,  user need to feed data to playback at a
>>> fixed time and get data from capture at a fixed time.  Otherwise there
>>> is xrun in
>>> playback and capture.
>>>
>>> mem2mem case: there is no such timing control,  user feeds data to it
>>> then it generates output,  if user doesn't feed data, there is no xrun.
>>> but mem2mem is just one of the components in the playback or capture
>>> pipeline, overall there is time control for whole pipeline,
>>>
>>
>> Have you looked at compress streams? If I remember correctly they are
>> not tied to time due to the fact that they can pass data in arbitrary
>> formats?
>>
>> From:
>> https://docs.kernel.org/sound/designs/compress-offload.html
>>
>> "No notion of underrun/overrun. Since the bytes written are compressed
>> in nature and data written/read doesn’t translate directly to rendered
>> output in time, this does not deal with underrun/overrun and maybe dealt
>> in user-library"
> 
> I checked the compress stream. mem2mem case is different with
> compress-offload case
> 
> compress-offload case is a full pipeline,  the user sends a compress
> stream to it, then DSP decodes it and renders it to the speaker in real
> time.
> 
> mem2mem is just like the decoder in the compress pipeline. which is
> one of the components in the pipeline.

I was thinking of loopback with endpoints using compress streams, 
without physical endpoint, something like:

compress playback (to feed data from userspace) -> DSP (processing) -> 
compress capture (send data back to userspace)

Unless I'm missing something, you should be able to process data as fast 
as you can feed it and consume it in such case.

Amadeusz
Shengjiu Wang May 9, 2024, 10:44 a.m. UTC | #21
On Thu, May 9, 2024 at 6:28 PM Amadeusz Sławiński
<amadeuszx.slawinski@linux.intel.com> wrote:
>
> On 5/9/2024 12:12 PM, Shengjiu Wang wrote:
> > On Thu, May 9, 2024 at 5:50 PM Amadeusz Sławiński
> > <amadeuszx.slawinski@linux.intel.com> wrote:
> >>
> >> On 5/9/2024 11:36 AM, Shengjiu Wang wrote:
> >>> On Wed, May 8, 2024 at 4:14 PM Amadeusz Sławiński
> >>> <amadeuszx.slawinski@linux.intel.com> wrote:
> >>>>
> >>>> On 5/8/2024 10:00 AM, Hans Verkuil wrote:
> >>>>> On 06/05/2024 10:49, Shengjiu Wang wrote:
> >>>>>> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote:
> >>>>>>>
> >>>>>>> Em Fri, 3 May 2024 10:47:19 +0900
> >>>>>>> Mark Brown <broonie@kernel.org> escreveu:
> >>>>>>>
> >>>>>>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote:
> >>>>>>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu:
> >>>>>>>>
> >>>>>>>>>> There are still time control associated with it, as audio and video
> >>>>>>>>>> needs to be in sync. This is done by controlling the buffers size
> >>>>>>>>>> and could be fine-tuned by checking when the buffer transfer is done.
> >>>>>>>>
> >>>>>>>> ...
> >>>>>>>>
> >>>>>>>>> Just complementing: on media, we do this per video buffer (or
> >>>>>>>>> per half video buffer). A typical use case on cameras is to have
> >>>>>>>>> buffers transferred 30 times per second, if the video was streamed
> >>>>>>>>> at 30 frames per second.
> >>>>>>>>
> >>>>>>>> IIRC some big use case for this hardware was transcoding so there was a
> >>>>>>>> desire to just go at whatever rate the hardware could support as there
> >>>>>>>> is no interactive user consuming the output as it is generated.
> >>>>>>>
> >>>>>>> Indeed, codecs could be used to just do transcoding, but I would
> >>>>>>> expect it to be a border use case. See, as the chipsets implementing
> >>>>>>> codecs are typically the ones used on mobiles, I would expect that
> >>>>>>> the major use cases to be to watch audio and video and to participate
> >>>>>>> on audio/video conferences.
> >>>>>>>
> >>>>>>> Going further, the codec API may end supporting not only transcoding
> >>>>>>> (which is something that CPU can usually handle without too much
> >>>>>>> processing) but also audio processing that may require more
> >>>>>>> complex algorithms - even deep learning ones - like background noise
> >>>>>>> removal, echo detection/removal, volume auto-gain, audio enhancement
> >>>>>>> and such.
> >>>>>>>
> >>>>>>> On other words, the typical use cases will either have input
> >>>>>>> or output being a physical hardware (microphone or speaker).
> >>>>>>>
> >>>>>>
> >>>>>> All, thanks for spending time to discuss, it seems we go back to
> >>>>>> the start point of this topic again.
> >>>>>>
> >>>>>> Our main request is that there is a hardware sample rate converter
> >>>>>> on the chip, so users can use it in user space as a component like
> >>>>>> software sample rate converter. It mostly may run as a gstreamer plugin.
> >>>>>> so it is a memory to memory component.
> >>>>>>
> >>>>>> I didn't find such API in ALSA for such purpose, the best option for this
> >>>>>> in the kernel is the V4L2 memory to memory framework I found.
> >>>>>> As Hans said it is well designed for memory to memory.
> >>>>>>
> >>>>>> And I think audio is one of 'media'.  As I can see that part of Radio
> >>>>>> function is in ALSA, part of Radio function is in V4L2. part of HDMI
> >>>>>> function is in DRM, part of HDMI function is in ALSA...
> >>>>>> So using V4L2 for audio is not new from this point of view.
> >>>>>>
> >>>>>> Even now I still think V4L2 is the best option, but it looks like there
> >>>>>> are a lot of rejects.  If develop a new ALSA-mem2mem, it is also
> >>>>>> a duplication of code (bigger duplication that just add audio support
> >>>>>> in V4L2 I think).
> >>>>>
> >>>>> After reading this thread I still believe that the mem2mem framework is
> >>>>> a reasonable option, unless someone can come up with a method that is
> >>>>> easy to implement in the alsa subsystem. From what I can tell from this
> >>>>> discussion no such method exists.
> >>>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> my main question would be how is mem2mem use case different from
> >>>> loopback exposing playback and capture frontends in user space with DSP
> >>>> (or other piece of HW) in the middle?
> >>>>
> >>> I think loopback has a timing control,  user need to feed data to playback at a
> >>> fixed time and get data from capture at a fixed time.  Otherwise there
> >>> is xrun in
> >>> playback and capture.
> >>>
> >>> mem2mem case: there is no such timing control,  user feeds data to it
> >>> then it generates output,  if user doesn't feed data, there is no xrun.
> >>> but mem2mem is just one of the components in the playback or capture
> >>> pipeline, overall there is time control for whole pipeline,
> >>>
> >>
> >> Have you looked at compress streams? If I remember correctly they are
> >> not tied to time due to the fact that they can pass data in arbitrary
> >> formats?
> >>
> >> From:
> >> https://docs.kernel.org/sound/designs/compress-offload.html
> >>
> >> "No notion of underrun/overrun. Since the bytes written are compressed
> >> in nature and data written/read doesn’t translate directly to rendered
> >> output in time, this does not deal with underrun/overrun and maybe dealt
> >> in user-library"
> >
> > I checked the compress stream. mem2mem case is different with
> > compress-offload case
> >
> > compress-offload case is a full pipeline,  the user sends a compress
> > stream to it, then DSP decodes it and renders it to the speaker in real
> > time.
> >
> > mem2mem is just like the decoder in the compress pipeline. which is
> > one of the components in the pipeline.
>
> I was thinking of loopback with endpoints using compress streams,
> without physical endpoint, something like:
>
> compress playback (to feed data from userspace) -> DSP (processing) ->
> compress capture (send data back to userspace)
>
> Unless I'm missing something, you should be able to process data as fast
> as you can feed it and consume it in such case.
>

Actually in the beginning I tried this,  but it did not work well.
ALSA needs time control for playback and capture, playback and capture
needs to synchronize.  Usually the playback and capture pipeline is
independent in ALSA design,  but in this case, the playback and capture
should synchronize, they are not independent.

Best regards
Shengjiu Wang

> Amadeusz
Jaroslav Kysela May 9, 2024, 11:13 a.m. UTC | #22
On 09. 05. 24 12:44, Shengjiu Wang wrote:
>>> mem2mem is just like the decoder in the compress pipeline. which is
>>> one of the components in the pipeline.
>>
>> I was thinking of loopback with endpoints using compress streams,
>> without physical endpoint, something like:
>>
>> compress playback (to feed data from userspace) -> DSP (processing) ->
>> compress capture (send data back to userspace)
>>
>> Unless I'm missing something, you should be able to process data as fast
>> as you can feed it and consume it in such case.
>>
> 
> Actually in the beginning I tried this,  but it did not work well.
> ALSA needs time control for playback and capture, playback and capture
> needs to synchronize.  Usually the playback and capture pipeline is
> independent in ALSA design,  but in this case, the playback and capture
> should synchronize, they are not independent.

The core compress API core no strict timing constraints. You can eventually0 
have two half-duplex compress devices, if you like to have really independent 
mechanism. If something is missing in API, you can extend this API (like to 
inform the user space that it's a producer/consumer processing without any 
relation to the real time). I like this idea.

					Jaroslav
Jaroslav Kysela May 13, 2024, 11:56 a.m. UTC | #23
On 09. 05. 24 13:13, Jaroslav Kysela wrote:
> On 09. 05. 24 12:44, Shengjiu Wang wrote:
>>>> mem2mem is just like the decoder in the compress pipeline. which is
>>>> one of the components in the pipeline.
>>>
>>> I was thinking of loopback with endpoints using compress streams,
>>> without physical endpoint, something like:
>>>
>>> compress playback (to feed data from userspace) -> DSP (processing) ->
>>> compress capture (send data back to userspace)
>>>
>>> Unless I'm missing something, you should be able to process data as fast
>>> as you can feed it and consume it in such case.
>>>
>>
>> Actually in the beginning I tried this,  but it did not work well.
>> ALSA needs time control for playback and capture, playback and capture
>> needs to synchronize.  Usually the playback and capture pipeline is
>> independent in ALSA design,  but in this case, the playback and capture
>> should synchronize, they are not independent.
> 
> The core compress API core no strict timing constraints. You can eventually0
> have two half-duplex compress devices, if you like to have really independent
> mechanism. If something is missing in API, you can extend this API (like to
> inform the user space that it's a producer/consumer processing without any
> relation to the real time). I like this idea.

I was thinking more about this. If I am right, the mentioned use in gstreamer 
is supposed to run the conversion (DSP) job in "one shot" (can be handled 
using one system call like blocking ioctl).  The goal is just to offload the 
CPU work to the DSP (co-processor). If there are no requirements for the 
queuing, we can implement this ioctl in the compress ALSA API easily using the 
data management through the dma-buf API. We can eventually define a new 
direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow 
handle this new data scheme. The API may be extended later on real demand, of 
course.

Otherwise all pieces are already in the current ALSA compress API 
(capabilities, params, enumeration). The realtime controls may be created 
using ALSA control API.

					Jaroslav
Hans Verkuil May 15, 2024, 9:17 a.m. UTC | #24
Hi Jaroslav,

On 5/13/24 13:56, Jaroslav Kysela wrote:
> On 09. 05. 24 13:13, Jaroslav Kysela wrote:
>> On 09. 05. 24 12:44, Shengjiu Wang wrote:
>>>>> mem2mem is just like the decoder in the compress pipeline. which is
>>>>> one of the components in the pipeline.
>>>>
>>>> I was thinking of loopback with endpoints using compress streams,
>>>> without physical endpoint, something like:
>>>>
>>>> compress playback (to feed data from userspace) -> DSP (processing) ->
>>>> compress capture (send data back to userspace)
>>>>
>>>> Unless I'm missing something, you should be able to process data as fast
>>>> as you can feed it and consume it in such case.
>>>>
>>>
>>> Actually in the beginning I tried this,  but it did not work well.
>>> ALSA needs time control for playback and capture, playback and capture
>>> needs to synchronize.  Usually the playback and capture pipeline is
>>> independent in ALSA design,  but in this case, the playback and capture
>>> should synchronize, they are not independent.
>>
>> The core compress API core no strict timing constraints. You can eventually0
>> have two half-duplex compress devices, if you like to have really independent
>> mechanism. If something is missing in API, you can extend this API (like to
>> inform the user space that it's a producer/consumer processing without any
>> relation to the real time). I like this idea.
> 
> I was thinking more about this. If I am right, the mentioned use in gstreamer 
> is supposed to run the conversion (DSP) job in "one shot" (can be handled 
> using one system call like blocking ioctl).  The goal is just to offload the 
> CPU work to the DSP (co-processor). If there are no requirements for the 
> queuing, we can implement this ioctl in the compress ALSA API easily using the 
> data management through the dma-buf API. We can eventually define a new 
> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow 
> handle this new data scheme. The API may be extended later on real demand, of 
> course.
> 
> Otherwise all pieces are already in the current ALSA compress API 
> (capabilities, params, enumeration). The realtime controls may be created 
> using ALSA control API.

So does this mean that Shengjiu should attempt to use this ALSA approach first?

If there is a way to do this reasonably cleanly in the ALSA API, then that
obviously is much better from my perspective as a media maintainer.

My understanding was always that it can't be done (or at least not without
a major effort) in ALSA, and in that case V4L2 is a decent plan B, but based
on this I gather that it is possible in ALSA after all.

So can I shelf this patch series for now?

Regards,

	Hans
Jaroslav Kysela May 15, 2024, 9:50 a.m. UTC | #25
On 15. 05. 24 11:17, Hans Verkuil wrote:
> Hi Jaroslav,
> 
> On 5/13/24 13:56, Jaroslav Kysela wrote:
>> On 09. 05. 24 13:13, Jaroslav Kysela wrote:
>>> On 09. 05. 24 12:44, Shengjiu Wang wrote:
>>>>>> mem2mem is just like the decoder in the compress pipeline. which is
>>>>>> one of the components in the pipeline.
>>>>>
>>>>> I was thinking of loopback with endpoints using compress streams,
>>>>> without physical endpoint, something like:
>>>>>
>>>>> compress playback (to feed data from userspace) -> DSP (processing) ->
>>>>> compress capture (send data back to userspace)
>>>>>
>>>>> Unless I'm missing something, you should be able to process data as fast
>>>>> as you can feed it and consume it in such case.
>>>>>
>>>>
>>>> Actually in the beginning I tried this,  but it did not work well.
>>>> ALSA needs time control for playback and capture, playback and capture
>>>> needs to synchronize.  Usually the playback and capture pipeline is
>>>> independent in ALSA design,  but in this case, the playback and capture
>>>> should synchronize, they are not independent.
>>>
>>> The core compress API core no strict timing constraints. You can eventually0
>>> have two half-duplex compress devices, if you like to have really independent
>>> mechanism. If something is missing in API, you can extend this API (like to
>>> inform the user space that it's a producer/consumer processing without any
>>> relation to the real time). I like this idea.
>>
>> I was thinking more about this. If I am right, the mentioned use in gstreamer
>> is supposed to run the conversion (DSP) job in "one shot" (can be handled
>> using one system call like blocking ioctl).  The goal is just to offload the
>> CPU work to the DSP (co-processor). If there are no requirements for the
>> queuing, we can implement this ioctl in the compress ALSA API easily using the
>> data management through the dma-buf API. We can eventually define a new
>> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow
>> handle this new data scheme. The API may be extended later on real demand, of
>> course.
>>
>> Otherwise all pieces are already in the current ALSA compress API
>> (capabilities, params, enumeration). The realtime controls may be created
>> using ALSA control API.
> 
> So does this mean that Shengjiu should attempt to use this ALSA approach first?

I've not seen any argument to use v4l2 mem2mem buffer scheme for this data 
conversion forcefully. It looks like a simple job and ALSA APIs may be 
extended for this simple purpose.

Shengjiu, what are your requirements for gstreamer support? Would be a new 
blocking ioctl enough for the initial support in the compress ALSA API?

						Jaroslav
Takashi Iwai May 15, 2024, 10:19 a.m. UTC | #26
On Wed, 15 May 2024 11:50:52 +0200,
Jaroslav Kysela wrote:
> 
> On 15. 05. 24 11:17, Hans Verkuil wrote:
> > Hi Jaroslav,
> > 
> > On 5/13/24 13:56, Jaroslav Kysela wrote:
> >> On 09. 05. 24 13:13, Jaroslav Kysela wrote:
> >>> On 09. 05. 24 12:44, Shengjiu Wang wrote:
> >>>>>> mem2mem is just like the decoder in the compress pipeline. which is
> >>>>>> one of the components in the pipeline.
> >>>>> 
> >>>>> I was thinking of loopback with endpoints using compress streams,
> >>>>> without physical endpoint, something like:
> >>>>> 
> >>>>> compress playback (to feed data from userspace) -> DSP (processing) ->
> >>>>> compress capture (send data back to userspace)
> >>>>> 
> >>>>> Unless I'm missing something, you should be able to process data as fast
> >>>>> as you can feed it and consume it in such case.
> >>>>> 
> >>>> 
> >>>> Actually in the beginning I tried this,  but it did not work well.
> >>>> ALSA needs time control for playback and capture, playback and capture
> >>>> needs to synchronize.  Usually the playback and capture pipeline is
> >>>> independent in ALSA design,  but in this case, the playback and capture
> >>>> should synchronize, they are not independent.
> >>> 
> >>> The core compress API core no strict timing constraints. You can eventually0
> >>> have two half-duplex compress devices, if you like to have really independent
> >>> mechanism. If something is missing in API, you can extend this API (like to
> >>> inform the user space that it's a producer/consumer processing without any
> >>> relation to the real time). I like this idea.
> >> 
> >> I was thinking more about this. If I am right, the mentioned use in gstreamer
> >> is supposed to run the conversion (DSP) job in "one shot" (can be handled
> >> using one system call like blocking ioctl).  The goal is just to offload the
> >> CPU work to the DSP (co-processor). If there are no requirements for the
> >> queuing, we can implement this ioctl in the compress ALSA API easily using the
> >> data management through the dma-buf API. We can eventually define a new
> >> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow
> >> handle this new data scheme. The API may be extended later on real demand, of
> >> course.
> >> 
> >> Otherwise all pieces are already in the current ALSA compress API
> >> (capabilities, params, enumeration). The realtime controls may be created
> >> using ALSA control API.
> > 
> > So does this mean that Shengjiu should attempt to use this ALSA approach first?
> 
> I've not seen any argument to use v4l2 mem2mem buffer scheme for this
> data conversion forcefully. It looks like a simple job and ALSA APIs
> may be extended for this simple purpose.
> 
> Shengjiu, what are your requirements for gstreamer support? Would be a
> new blocking ioctl enough for the initial support in the compress ALSA
> API?

If it works with compress API, it'd be great, yeah.
So, your idea is to open compress-offload devices for read and write,
then and let them convert a la batch jobs without timing control?

For full-duplex usages, we might need some more extensions, so that
both read and write parameters can be synchronized.  (So far the
compress stream is a unidirectional, and the runtime buffer for a
single stream.)

And the buffer management is based on the fixed size fragments.  I
hope this doesn't matter much for the intended operation?


thanks,

Takashi
Jaroslav Kysela May 15, 2024, 10:46 a.m. UTC | #27
On 15. 05. 24 12:19, Takashi Iwai wrote:
> On Wed, 15 May 2024 11:50:52 +0200,
> Jaroslav Kysela wrote:
>>
>> On 15. 05. 24 11:17, Hans Verkuil wrote:
>>> Hi Jaroslav,
>>>
>>> On 5/13/24 13:56, Jaroslav Kysela wrote:
>>>> On 09. 05. 24 13:13, Jaroslav Kysela wrote:
>>>>> On 09. 05. 24 12:44, Shengjiu Wang wrote:
>>>>>>>> mem2mem is just like the decoder in the compress pipeline. which is
>>>>>>>> one of the components in the pipeline.
>>>>>>>
>>>>>>> I was thinking of loopback with endpoints using compress streams,
>>>>>>> without physical endpoint, something like:
>>>>>>>
>>>>>>> compress playback (to feed data from userspace) -> DSP (processing) ->
>>>>>>> compress capture (send data back to userspace)
>>>>>>>
>>>>>>> Unless I'm missing something, you should be able to process data as fast
>>>>>>> as you can feed it and consume it in such case.
>>>>>>>
>>>>>>
>>>>>> Actually in the beginning I tried this,  but it did not work well.
>>>>>> ALSA needs time control for playback and capture, playback and capture
>>>>>> needs to synchronize.  Usually the playback and capture pipeline is
>>>>>> independent in ALSA design,  but in this case, the playback and capture
>>>>>> should synchronize, they are not independent.
>>>>>
>>>>> The core compress API core no strict timing constraints. You can eventually0
>>>>> have two half-duplex compress devices, if you like to have really independent
>>>>> mechanism. If something is missing in API, you can extend this API (like to
>>>>> inform the user space that it's a producer/consumer processing without any
>>>>> relation to the real time). I like this idea.
>>>>
>>>> I was thinking more about this. If I am right, the mentioned use in gstreamer
>>>> is supposed to run the conversion (DSP) job in "one shot" (can be handled
>>>> using one system call like blocking ioctl).  The goal is just to offload the
>>>> CPU work to the DSP (co-processor). If there are no requirements for the
>>>> queuing, we can implement this ioctl in the compress ALSA API easily using the
>>>> data management through the dma-buf API. We can eventually define a new
>>>> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow
>>>> handle this new data scheme. The API may be extended later on real demand, of
>>>> course.
>>>>
>>>> Otherwise all pieces are already in the current ALSA compress API
>>>> (capabilities, params, enumeration). The realtime controls may be created
>>>> using ALSA control API.
>>>
>>> So does this mean that Shengjiu should attempt to use this ALSA approach first?
>>
>> I've not seen any argument to use v4l2 mem2mem buffer scheme for this
>> data conversion forcefully. It looks like a simple job and ALSA APIs
>> may be extended for this simple purpose.
>>
>> Shengjiu, what are your requirements for gstreamer support? Would be a
>> new blocking ioctl enough for the initial support in the compress ALSA
>> API?
> 
> If it works with compress API, it'd be great, yeah.
> So, your idea is to open compress-offload devices for read and write,
> then and let them convert a la batch jobs without timing control?
> 
> For full-duplex usages, we might need some more extensions, so that
> both read and write parameters can be synchronized.  (So far the
> compress stream is a unidirectional, and the runtime buffer for a
> single stream.)
> 
> And the buffer management is based on the fixed size fragments.  I
> hope this doesn't matter much for the intended operation?

It's a question, if the standard I/O is really required for this case. My 
quick idea was to just implement a new "direction" for this job supporting 
only one ioctl for the data processing which will execute the job in "one 
shot" at the moment. The I/O may be handled through dma-buf API (which seems 
to be standard nowadays for this purpose and allows future chaining).

So something like:

struct dsp_job {
    int source_fd;     /* dma-buf FD with source data - for dma_buf_get() */
    int target_fd;     /* dma-buf FD for target data - for dma_buf_get() */
    ... maybe some extra data size members here ...
    ... maybe some special parameters here ...
};

#define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job)

This ioctl will be blocking (thus synced). My question is, if it's feasible 
for gstreamer or not. For this particular case, if the rate conversion is 
implemented in software, it will block the gstreamer data processing, too.

						Jaroslav
Shengjiu Wang May 15, 2024, 1:34 p.m. UTC | #28
On Wed, May 15, 2024 at 6:46 PM Jaroslav Kysela <perex@perex.cz> wrote:
>
> On 15. 05. 24 12:19, Takashi Iwai wrote:
> > On Wed, 15 May 2024 11:50:52 +0200,
> > Jaroslav Kysela wrote:
> >>
> >> On 15. 05. 24 11:17, Hans Verkuil wrote:
> >>> Hi Jaroslav,
> >>>
> >>> On 5/13/24 13:56, Jaroslav Kysela wrote:
> >>>> On 09. 05. 24 13:13, Jaroslav Kysela wrote:
> >>>>> On 09. 05. 24 12:44, Shengjiu Wang wrote:
> >>>>>>>> mem2mem is just like the decoder in the compress pipeline. which is
> >>>>>>>> one of the components in the pipeline.
> >>>>>>>
> >>>>>>> I was thinking of loopback with endpoints using compress streams,
> >>>>>>> without physical endpoint, something like:
> >>>>>>>
> >>>>>>> compress playback (to feed data from userspace) -> DSP (processing) ->
> >>>>>>> compress capture (send data back to userspace)
> >>>>>>>
> >>>>>>> Unless I'm missing something, you should be able to process data as fast
> >>>>>>> as you can feed it and consume it in such case.
> >>>>>>>
> >>>>>>
> >>>>>> Actually in the beginning I tried this,  but it did not work well.
> >>>>>> ALSA needs time control for playback and capture, playback and capture
> >>>>>> needs to synchronize.  Usually the playback and capture pipeline is
> >>>>>> independent in ALSA design,  but in this case, the playback and capture
> >>>>>> should synchronize, they are not independent.
> >>>>>
> >>>>> The core compress API core no strict timing constraints. You can eventually0
> >>>>> have two half-duplex compress devices, if you like to have really independent
> >>>>> mechanism. If something is missing in API, you can extend this API (like to
> >>>>> inform the user space that it's a producer/consumer processing without any
> >>>>> relation to the real time). I like this idea.
> >>>>
> >>>> I was thinking more about this. If I am right, the mentioned use in gstreamer
> >>>> is supposed to run the conversion (DSP) job in "one shot" (can be handled
> >>>> using one system call like blocking ioctl).  The goal is just to offload the
> >>>> CPU work to the DSP (co-processor). If there are no requirements for the
> >>>> queuing, we can implement this ioctl in the compress ALSA API easily using the
> >>>> data management through the dma-buf API. We can eventually define a new
> >>>> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow
> >>>> handle this new data scheme. The API may be extended later on real demand, of
> >>>> course.
> >>>>
> >>>> Otherwise all pieces are already in the current ALSA compress API
> >>>> (capabilities, params, enumeration). The realtime controls may be created
> >>>> using ALSA control API.
> >>>
> >>> So does this mean that Shengjiu should attempt to use this ALSA approach first?
> >>
> >> I've not seen any argument to use v4l2 mem2mem buffer scheme for this
> >> data conversion forcefully. It looks like a simple job and ALSA APIs
> >> may be extended for this simple purpose.
> >>
> >> Shengjiu, what are your requirements for gstreamer support? Would be a
> >> new blocking ioctl enough for the initial support in the compress ALSA
> >> API?
> >
> > If it works with compress API, it'd be great, yeah.
> > So, your idea is to open compress-offload devices for read and write,
> > then and let them convert a la batch jobs without timing control?
> >
> > For full-duplex usages, we might need some more extensions, so that
> > both read and write parameters can be synchronized.  (So far the
> > compress stream is a unidirectional, and the runtime buffer for a
> > single stream.)
> >
> > And the buffer management is based on the fixed size fragments.  I
> > hope this doesn't matter much for the intended operation?
>
> It's a question, if the standard I/O is really required for this case. My
> quick idea was to just implement a new "direction" for this job supporting
> only one ioctl for the data processing which will execute the job in "one
> shot" at the moment. The I/O may be handled through dma-buf API (which seems
> to be standard nowadays for this purpose and allows future chaining).
>
> So something like:
>
> struct dsp_job {
>     int source_fd;     /* dma-buf FD with source data - for dma_buf_get() */
>     int target_fd;     /* dma-buf FD for target data - for dma_buf_get() */
>     ... maybe some extra data size members here ...
>     ... maybe some special parameters here ...
> };
>
> #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job)
>
> This ioctl will be blocking (thus synced). My question is, if it's feasible
> for gstreamer or not. For this particular case, if the rate conversion is
> implemented in software, it will block the gstreamer data processing, too.
>

Thanks.

I have several questions:
1.  Compress API alway binds to a sound card.  Can we avoid that?
     For ASRC, it is just one component,

2.  Compress API doesn't seem to support mmap().  Is this a problem
     for sending and getting data to/from the driver?

3. How does the user get output data from ASRC after each conversion?
   it should happen every period.

best regards
Shengjiu Wang.
Pierre-Louis Bossart May 15, 2024, 2:04 p.m. UTC | #29
On 5/9/24 06:13, Jaroslav Kysela wrote:
> On 09. 05. 24 12:44, Shengjiu Wang wrote:
>>>> mem2mem is just like the decoder in the compress pipeline. which is
>>>> one of the components in the pipeline.
>>>
>>> I was thinking of loopback with endpoints using compress streams,
>>> without physical endpoint, something like:
>>>
>>> compress playback (to feed data from userspace) -> DSP (processing) ->
>>> compress capture (send data back to userspace)
>>>
>>> Unless I'm missing something, you should be able to process data as fast
>>> as you can feed it and consume it in such case.
>>>
>>
>> Actually in the beginning I tried this,  but it did not work well.
>> ALSA needs time control for playback and capture, playback and capture
>> needs to synchronize.  Usually the playback and capture pipeline is
>> independent in ALSA design,  but in this case, the playback and capture
>> should synchronize, they are not independent.
> 
> The core compress API core no strict timing constraints. You can
> eventually0 have two half-duplex compress devices, if you like to have
> really independent mechanism. If something is missing in API, you can
> extend this API (like to inform the user space that it's a
> producer/consumer processing without any relation to the real time). I
> like this idea.

The compress API was never intended to be used this way. It was meant to
send compressed data to a DSP for rendering, and keep the host processor
in a low-power state while the DSP local buffer was drained. There was
no intent to do a loop back to the host, because that keeps the host in
a high-power state and probably negates the power savings due to a DSP.

The other problem with the loopback is that the compress stuff is
usually a "Front-End" in ASoC/DPCM parlance, and we don't have a good
way to do a loopback between Front-Ends. The entire framework is based
on FEs being connected to BEs.

One problem that I can see for ASRC is that it's not clear when the data
will be completely processed on the "capture" stream when you stop the
"playback" stream. There's a non-zero risk of having a truncated output
or waiting for data that will never be generated.

In other words, it might be possible to reuse/extend the compress API
for a 'coprocessor' approach without any rendering to traditional
interfaces, but it's uncharted territory.
Nicolas Dufresne May 15, 2024, 8:33 p.m. UTC | #30
Hi,

GStreamer hat on ...

Le mercredi 15 mai 2024 à 12:46 +0200, Jaroslav Kysela a écrit :
> On 15. 05. 24 12:19, Takashi Iwai wrote:
> > On Wed, 15 May 2024 11:50:52 +0200,
> > Jaroslav Kysela wrote:
> > > 
> > > On 15. 05. 24 11:17, Hans Verkuil wrote:
> > > > Hi Jaroslav,
> > > > 
> > > > On 5/13/24 13:56, Jaroslav Kysela wrote:
> > > > > On 09. 05. 24 13:13, Jaroslav Kysela wrote:
> > > > > > On 09. 05. 24 12:44, Shengjiu Wang wrote:
> > > > > > > > > mem2mem is just like the decoder in the compress pipeline. which is
> > > > > > > > > one of the components in the pipeline.
> > > > > > > > 
> > > > > > > > I was thinking of loopback with endpoints using compress streams,
> > > > > > > > without physical endpoint, something like:
> > > > > > > > 
> > > > > > > > compress playback (to feed data from userspace) -> DSP (processing) ->
> > > > > > > > compress capture (send data back to userspace)
> > > > > > > > 
> > > > > > > > Unless I'm missing something, you should be able to process data as fast
> > > > > > > > as you can feed it and consume it in such case.
> > > > > > > > 
> > > > > > > 
> > > > > > > Actually in the beginning I tried this,  but it did not work well.
> > > > > > > ALSA needs time control for playback and capture, playback and capture
> > > > > > > needs to synchronize.  Usually the playback and capture pipeline is
> > > > > > > independent in ALSA design,  but in this case, the playback and capture
> > > > > > > should synchronize, they are not independent.
> > > > > > 
> > > > > > The core compress API core no strict timing constraints. You can eventually0
> > > > > > have two half-duplex compress devices, if you like to have really independent
> > > > > > mechanism. If something is missing in API, you can extend this API (like to
> > > > > > inform the user space that it's a producer/consumer processing without any
> > > > > > relation to the real time). I like this idea.
> > > > > 
> > > > > I was thinking more about this. If I am right, the mentioned use in gstreamer
> > > > > is supposed to run the conversion (DSP) job in "one shot" (can be handled
> > > > > using one system call like blocking ioctl).  The goal is just to offload the
> > > > > CPU work to the DSP (co-processor). If there are no requirements for the
> > > > > queuing, we can implement this ioctl in the compress ALSA API easily using the
> > > > > data management through the dma-buf API. We can eventually define a new
> > > > > direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow
> > > > > handle this new data scheme. The API may be extended later on real demand, of
> > > > > course.
> > > > > 
> > > > > Otherwise all pieces are already in the current ALSA compress API
> > > > > (capabilities, params, enumeration). The realtime controls may be created
> > > > > using ALSA control API.
> > > > 
> > > > So does this mean that Shengjiu should attempt to use this ALSA approach first?
> > > 
> > > I've not seen any argument to use v4l2 mem2mem buffer scheme for this
> > > data conversion forcefully. It looks like a simple job and ALSA APIs
> > > may be extended for this simple purpose.
> > > 
> > > Shengjiu, what are your requirements for gstreamer support? Would be a
> > > new blocking ioctl enough for the initial support in the compress ALSA
> > > API?
> > 
> > If it works with compress API, it'd be great, yeah.
> > So, your idea is to open compress-offload devices for read and write,
> > then and let them convert a la batch jobs without timing control?
> > 
> > For full-duplex usages, we might need some more extensions, so that
> > both read and write parameters can be synchronized.  (So far the
> > compress stream is a unidirectional, and the runtime buffer for a
> > single stream.)
> > 
> > And the buffer management is based on the fixed size fragments.  I
> > hope this doesn't matter much for the intended operation?
> 
> It's a question, if the standard I/O is really required for this case. My 
> quick idea was to just implement a new "direction" for this job supporting 
> only one ioctl for the data processing which will execute the job in "one 
> shot" at the moment. The I/O may be handled through dma-buf API (which seems 
> to be standard nowadays for this purpose and allows future chaining).
> 
> So something like:
> 
> struct dsp_job {
>     int source_fd;     /* dma-buf FD with source data - for dma_buf_get() */
>     int target_fd;     /* dma-buf FD for target data - for dma_buf_get() */
>     ... maybe some extra data size members here ...
>     ... maybe some special parameters here ...
> };
> 
> #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job)
> 
> This ioctl will be blocking (thus synced). My question is, if it's feasible 
> for gstreamer or not. For this particular case, if the rate conversion is 
> implemented in software, it will block the gstreamer data processing, too.

Yes, GStreamer threading is using a push-back model, so blocking for the time of
the processing is fine. Note that the extra simplicity will suffer from ioctl()
latency.

In GFX, they solve this issue with fences. That allow setting up the next
operation in the chain before the data has been produced.

In V4L2, we solve this with queues. It allows preparing the next job, while the
processing of the current job is happening. If you look at v4l2convert code in
gstreamer (for simple m2m), it currently makes no use of the queues, it simply
synchronously process the frames. There is two option, where it does not matter
that much, or no one is using it :-D Video decoders and encoders (stateful) do
run input / output from different thread to benefit from the queued.

regards,
Nicolas

> 
> 						Jaroslav
>
Jaroslav Kysela May 16, 2024, 2:50 p.m. UTC | #31
On 15. 05. 24 22:33, Nicolas Dufresne wrote:
> Hi,
> 
> GStreamer hat on ...
> 
> Le mercredi 15 mai 2024 à 12:46 +0200, Jaroslav Kysela a écrit :
>> On 15. 05. 24 12:19, Takashi Iwai wrote:
>>> On Wed, 15 May 2024 11:50:52 +0200,
>>> Jaroslav Kysela wrote:
>>>>
>>>> On 15. 05. 24 11:17, Hans Verkuil wrote:
>>>>> Hi Jaroslav,
>>>>>
>>>>> On 5/13/24 13:56, Jaroslav Kysela wrote:
>>>>>> On 09. 05. 24 13:13, Jaroslav Kysela wrote:
>>>>>>> On 09. 05. 24 12:44, Shengjiu Wang wrote:
>>>>>>>>>> mem2mem is just like the decoder in the compress pipeline. which is
>>>>>>>>>> one of the components in the pipeline.
>>>>>>>>>
>>>>>>>>> I was thinking of loopback with endpoints using compress streams,
>>>>>>>>> without physical endpoint, something like:
>>>>>>>>>
>>>>>>>>> compress playback (to feed data from userspace) -> DSP (processing) ->
>>>>>>>>> compress capture (send data back to userspace)
>>>>>>>>>
>>>>>>>>> Unless I'm missing something, you should be able to process data as fast
>>>>>>>>> as you can feed it and consume it in such case.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Actually in the beginning I tried this,  but it did not work well.
>>>>>>>> ALSA needs time control for playback and capture, playback and capture
>>>>>>>> needs to synchronize.  Usually the playback and capture pipeline is
>>>>>>>> independent in ALSA design,  but in this case, the playback and capture
>>>>>>>> should synchronize, they are not independent.
>>>>>>>
>>>>>>> The core compress API core no strict timing constraints. You can eventually0
>>>>>>> have two half-duplex compress devices, if you like to have really independent
>>>>>>> mechanism. If something is missing in API, you can extend this API (like to
>>>>>>> inform the user space that it's a producer/consumer processing without any
>>>>>>> relation to the real time). I like this idea.
>>>>>>
>>>>>> I was thinking more about this. If I am right, the mentioned use in gstreamer
>>>>>> is supposed to run the conversion (DSP) job in "one shot" (can be handled
>>>>>> using one system call like blocking ioctl).  The goal is just to offload the
>>>>>> CPU work to the DSP (co-processor). If there are no requirements for the
>>>>>> queuing, we can implement this ioctl in the compress ALSA API easily using the
>>>>>> data management through the dma-buf API. We can eventually define a new
>>>>>> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow
>>>>>> handle this new data scheme. The API may be extended later on real demand, of
>>>>>> course.
>>>>>>
>>>>>> Otherwise all pieces are already in the current ALSA compress API
>>>>>> (capabilities, params, enumeration). The realtime controls may be created
>>>>>> using ALSA control API.
>>>>>
>>>>> So does this mean that Shengjiu should attempt to use this ALSA approach first?
>>>>
>>>> I've not seen any argument to use v4l2 mem2mem buffer scheme for this
>>>> data conversion forcefully. It looks like a simple job and ALSA APIs
>>>> may be extended for this simple purpose.
>>>>
>>>> Shengjiu, what are your requirements for gstreamer support? Would be a
>>>> new blocking ioctl enough for the initial support in the compress ALSA
>>>> API?
>>>
>>> If it works with compress API, it'd be great, yeah.
>>> So, your idea is to open compress-offload devices for read and write,
>>> then and let them convert a la batch jobs without timing control?
>>>
>>> For full-duplex usages, we might need some more extensions, so that
>>> both read and write parameters can be synchronized.  (So far the
>>> compress stream is a unidirectional, and the runtime buffer for a
>>> single stream.)
>>>
>>> And the buffer management is based on the fixed size fragments.  I
>>> hope this doesn't matter much for the intended operation?
>>
>> It's a question, if the standard I/O is really required for this case. My
>> quick idea was to just implement a new "direction" for this job supporting
>> only one ioctl for the data processing which will execute the job in "one
>> shot" at the moment. The I/O may be handled through dma-buf API (which seems
>> to be standard nowadays for this purpose and allows future chaining).
>>
>> So something like:
>>
>> struct dsp_job {
>>      int source_fd;     /* dma-buf FD with source data - for dma_buf_get() */
>>      int target_fd;     /* dma-buf FD for target data - for dma_buf_get() */
>>      ... maybe some extra data size members here ...
>>      ... maybe some special parameters here ...
>> };
>>
>> #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job)
>>
>> This ioctl will be blocking (thus synced). My question is, if it's feasible
>> for gstreamer or not. For this particular case, if the rate conversion is
>> implemented in software, it will block the gstreamer data processing, too.
> 
> Yes, GStreamer threading is using a push-back model, so blocking for the time of
> the processing is fine. Note that the extra simplicity will suffer from ioctl()
> latency.
> 
> In GFX, they solve this issue with fences. That allow setting up the next
> operation in the chain before the data has been produced.

The fences look really nicely and seem more modern. It should be possible with 
dma-buf/sync_file.c interface to handle multiple jobs simultaneously and share 
the state between user space and kernel driver.

In this case, I think that two non-blocking ioctls should be enough - add a 
new job with source/target dma buffers guarded by one fence and abort (flush) 
all active jobs.

I'll try to propose an API extension for the ALSA's compress API in the 
linux-sound mailing list soon.

					Jaroslav
Jaroslav Kysela May 16, 2024, 2:58 p.m. UTC | #32
On 15. 05. 24 15:34, Shengjiu Wang wrote:
> On Wed, May 15, 2024 at 6:46 PM Jaroslav Kysela <perex@perex.cz> wrote:
>>
>> On 15. 05. 24 12:19, Takashi Iwai wrote:
>>> On Wed, 15 May 2024 11:50:52 +0200,
>>> Jaroslav Kysela wrote:
>>>>
>>>> On 15. 05. 24 11:17, Hans Verkuil wrote:
>>>>> Hi Jaroslav,
>>>>>
>>>>> On 5/13/24 13:56, Jaroslav Kysela wrote:
>>>>>> On 09. 05. 24 13:13, Jaroslav Kysela wrote:
>>>>>>> On 09. 05. 24 12:44, Shengjiu Wang wrote:
>>>>>>>>>> mem2mem is just like the decoder in the compress pipeline. which is
>>>>>>>>>> one of the components in the pipeline.
>>>>>>>>>
>>>>>>>>> I was thinking of loopback with endpoints using compress streams,
>>>>>>>>> without physical endpoint, something like:
>>>>>>>>>
>>>>>>>>> compress playback (to feed data from userspace) -> DSP (processing) ->
>>>>>>>>> compress capture (send data back to userspace)
>>>>>>>>>
>>>>>>>>> Unless I'm missing something, you should be able to process data as fast
>>>>>>>>> as you can feed it and consume it in such case.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Actually in the beginning I tried this,  but it did not work well.
>>>>>>>> ALSA needs time control for playback and capture, playback and capture
>>>>>>>> needs to synchronize.  Usually the playback and capture pipeline is
>>>>>>>> independent in ALSA design,  but in this case, the playback and capture
>>>>>>>> should synchronize, they are not independent.
>>>>>>>
>>>>>>> The core compress API core no strict timing constraints. You can eventually0
>>>>>>> have two half-duplex compress devices, if you like to have really independent
>>>>>>> mechanism. If something is missing in API, you can extend this API (like to
>>>>>>> inform the user space that it's a producer/consumer processing without any
>>>>>>> relation to the real time). I like this idea.
>>>>>>
>>>>>> I was thinking more about this. If I am right, the mentioned use in gstreamer
>>>>>> is supposed to run the conversion (DSP) job in "one shot" (can be handled
>>>>>> using one system call like blocking ioctl).  The goal is just to offload the
>>>>>> CPU work to the DSP (co-processor). If there are no requirements for the
>>>>>> queuing, we can implement this ioctl in the compress ALSA API easily using the
>>>>>> data management through the dma-buf API. We can eventually define a new
>>>>>> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow
>>>>>> handle this new data scheme. The API may be extended later on real demand, of
>>>>>> course.
>>>>>>
>>>>>> Otherwise all pieces are already in the current ALSA compress API
>>>>>> (capabilities, params, enumeration). The realtime controls may be created
>>>>>> using ALSA control API.
>>>>>
>>>>> So does this mean that Shengjiu should attempt to use this ALSA approach first?
>>>>
>>>> I've not seen any argument to use v4l2 mem2mem buffer scheme for this
>>>> data conversion forcefully. It looks like a simple job and ALSA APIs
>>>> may be extended for this simple purpose.
>>>>
>>>> Shengjiu, what are your requirements for gstreamer support? Would be a
>>>> new blocking ioctl enough for the initial support in the compress ALSA
>>>> API?
>>>
>>> If it works with compress API, it'd be great, yeah.
>>> So, your idea is to open compress-offload devices for read and write,
>>> then and let them convert a la batch jobs without timing control?
>>>
>>> For full-duplex usages, we might need some more extensions, so that
>>> both read and write parameters can be synchronized.  (So far the
>>> compress stream is a unidirectional, and the runtime buffer for a
>>> single stream.)
>>>
>>> And the buffer management is based on the fixed size fragments.  I
>>> hope this doesn't matter much for the intended operation?
>>
>> It's a question, if the standard I/O is really required for this case. My
>> quick idea was to just implement a new "direction" for this job supporting
>> only one ioctl for the data processing which will execute the job in "one
>> shot" at the moment. The I/O may be handled through dma-buf API (which seems
>> to be standard nowadays for this purpose and allows future chaining).
>>
>> So something like:
>>
>> struct dsp_job {
>>      int source_fd;     /* dma-buf FD with source data - for dma_buf_get() */
>>      int target_fd;     /* dma-buf FD for target data - for dma_buf_get() */
>>      ... maybe some extra data size members here ...
>>      ... maybe some special parameters here ...
>> };
>>
>> #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job)
>>
>> This ioctl will be blocking (thus synced). My question is, if it's feasible
>> for gstreamer or not. For this particular case, if the rate conversion is
>> implemented in software, it will block the gstreamer data processing, too.
>>
> 
> Thanks.
> 
> I have several questions:
> 1.  Compress API alway binds to a sound card.  Can we avoid that?
>       For ASRC, it is just one component,

Is this a real issue? Usually, I would expect a sound hardware (card) presence 
when ASRC is available, or not? Eventually, a separate sound card with one 
compress device may be created, too. For enumeration - the user space may just 
iterate through all sound cards / compress devices to find ASRC in the system.

The devices/interfaces in the sound card are independent. Also, USB MIDI 
converters offer only one serial MIDI interface for example, too.

> 2.  Compress API doesn't seem to support mmap().  Is this a problem
>       for sending and getting data to/from the driver?

I proposed to use dma-buf for I/O (separate source and target buffer).

> 3. How does the user get output data from ASRC after each conversion?
>     it should happen every period.

target dma-buf

				Jaroslav