Message ID | 1710834674-3285-1-git-send-email-shengjiu.wang@nxp.com (mailing list archive) |
---|---|
Headers | show |
Series | Add audio support in v4l2 framework | expand |
Hey Shengjiu, first of all thanks for all of this work and I am very sorry for only emerging this late into the series, I sadly didn't notice it earlier. I would like to voice a few concerns about the general idea of adding Audio support to the Media subsystem. 1. The biggest objection is, that the Linux Kernel has a subsystem specifically targeted for audio devices, adding support for these devices in another subsystem are counterproductive as they work around the shortcomings of the audio subsystem while forcing support for a device into a subsystem that was never designed for such devices. Instead, the audio subsystem has to be adjusted to be able to support all of the required workflows, otherwise, the next audio driver with similar requirements will have to move to the media subsystem as well, the audio subsystem would then never experience the required change and soon we would have two audio subsystems. 2. Closely connected to the previous objection, the media subsystem with its current staff of maintainers is overworked and barely capable of handling the workload, which includes an abundance of different devices from DVB, codecs, cameras, PCI devices, radio tuners, HDMI CEC, IR receivers, etc. Adding more device types to this matrix will make the situation worse and should only be done with a plan for how first to improve the current maintainer situation. 3. By using the same framework and APIs as the video codecs, the audio codecs are going to cause extra work for the video codec developers and maintainers simply by occupying the same space that was orginally designed for the purpose of video only. Even if you try to not cause any extra stress the simple presence of the audio code in the codebase is going to cause restrictions. The main issue here is that the audio subsystem doesn't provide a mem2mem framework and I would say you are in luck because the media subsystem has gathered a lot of shortcomings with its current implementation of the mem2mem framework over time, which is why a new implementation will be necessary anyway. So instead of hammering a driver into the wrong destination, I would suggest bundling our forces and implementing a general memory-to-memory framework that both the media and the audio subsystem can use, that addresses the current shortcomings of the implementation and allows you to upload the driver where it is supposed to be. This is going to cause restrictions as well, like mentioned in the concern number 3, but with the difference that we can make a general plan for such a framework that accomodates lots of use cases and each subsystem can add their routines on top of the general framework. Another possible alternative is to try and make the DRM scheduler more generally available, this scheduler is the most mature and in fact is very similar to what you and what the media devices need. Which again just shows how common your usecase actually is and how a general solution is the best long term solution. Please notice that Daniel Almeida is currently working on something related to this: https://lore.kernel.org/linux-media/3F80AC0D-DCAA-4EDE-BF58-BB1369C7EDCA@collabora.com/T/#u If the toplevel maintainers decide to add the patchset so be it, but I wanted to voice my concerns and also highlight that this is likely going to cause extra stress for the video codecs maintainers and the maintainers in general. We cannot spend a lot of time on audio codecs, as video codecs already fill up our available time sufficiently, so the use of the framework needs to be conservative and cause as little extra work as possible for the original use case of the framework. Regards, Sebastian On 19.03.2024 15:50, Shengjiu Wang wrote: >Audio signal processing also has the requirement for memory to >memory similar as Video. > >This asrc memory to memory (memory ->asrc->memory) case is a non >real time use case. > >User fills the input buffer to the asrc module, after conversion, then asrc >sends back the output buffer to user. So it is not a traditional ALSA playback >and capture case. > >It is a specific use case, there is no reference in current kernel. >v4l2 memory to memory is the closed implementation, v4l2 current >support video, image, radio, tuner, touch devices, so it is not >complicated to add support for this specific audio case. > >Because we had implemented the "memory -> asrc ->i2s device-> codec" >use case in ALSA. Now the "memory->asrc->memory" needs >to reuse the code in asrc driver, so the first 3 patches is for refining >the code to make it can be shared by the "memory->asrc->memory" >driver. > >The main change is in the v4l2 side, A /dev/vl4-audioX will be created, >user applications only use the ioctl of v4l2 framework. > >Other change is to add memory to memory support for two kinds of i.MX ASRC >module. > >changes in v15: >- update MAINTAINERS for imx-asrc.c and vim2m-audio.c > >changes in v14: >- document the reservation of 'AUXX' fourcc format. >- add v4l2_audfmt_to_fourcc() definition. > >changes in v13 >- change 'pixelformat' to 'audioformat' in dev-audio-mem2mem.rst >- add more description for clock drift in ext-ctrls-audio-m2m.rst >- Add "media: v4l2-ctrls: add support for fraction_bits" from Hans > to avoid build issue for kernel test robot > >changes in v12 >- minor changes according to comments >- drop min_buffers_needed = 1 and V4L2_CTRL_FLAG_UPDATE flag >- drop bus_info > >changes in v11 >- add add-fixed-point-test-controls in vivid. >- add v4l2_ctrl_fp_compose() helper function for min and max > >changes in v10 >- remove FIXED_POINT type >- change code base on media: v4l2-ctrls: add support for fraction_bits >- fix issue reported by kernel test robot >- remove module_alias > >changes in v9: >- add MEDIA_ENT_F_PROC_AUDIO_RESAMPLER. >- add MEDIA_INTF_T_V4L_AUDIO >- add media controller support >- refine the vim2m-audio to support 8k<->16k conversion. > >changes in v8: >- refine V4L2_CAP_AUDIO_M2M to be 0x00000008 >- update doc for FIXED_POINT >- address comments for imx-asrc > >changes in v7: >- add acked-by from Mark >- separate commit for fixed point, m2m audio class, audio rate controls >- use INTEGER_MENU for rate, FIXED_POINT for rate offset >- remove used fmts >- address other comments for Hans > >changes in v6: >- use m2m_prepare/m2m_unprepare/m2m_start/m2m_stop to replace > m2m_start_part_one/m2m_stop_part_one, m2m_start_part_two/m2m_stop_part_two. >- change V4L2_CTRL_TYPE_ASRC_RATE to V4L2_CTRL_TYPE_FIXED_POINT >- fix warning by kernel test rebot >- remove some unused format V4L2_AUDIO_FMT_XX >- Get SNDRV_PCM_FORMAT from V4L2_AUDIO_FMT in driver. >- rename audm2m to viaudm2m. > >changes in v5: >- remove V4L2_AUDIO_FMT_LPCM >- define audio pixel format like V4L2_AUDIO_FMT_S8... >- remove rate and format in struct v4l2_audio_format. >- Add V4L2_CID_ASRC_SOURCE_RATE and V4L2_CID_ASRC_DEST_RATE controls >- updata document accordingly. > >changes in v4: >- update document style >- separate V4L2_AUDIO_FMT_LPCM and V4L2_CAP_AUDIO_M2M in separate commit > >changes in v3: >- Modify documents for adding audio m2m support >- Add audio virtual m2m driver >- Defined V4L2_AUDIO_FMT_LPCM format type for audio. >- Defined V4L2_CAP_AUDIO_M2M capability type for audio m2m case. >- with modification in v4l-utils, pass v4l2-compliance test. > >changes in v2: >- decouple the implementation in v4l2 and ALSA >- implement the memory to memory driver as a platfrom driver > and move it to driver/media >- move fsl_asrc_common.h to include/sound folder > >Hans Verkuil (1): > media: v4l2-ctrls: add support for fraction_bits > >Shengjiu Wang (15): > ASoC: fsl_asrc: define functions for memory to memory usage > ASoC: fsl_easrc: define functions for memory to memory usage > ASoC: fsl_asrc: move fsl_asrc_common.h to include/sound > ASoC: fsl_asrc: register m2m platform device > ASoC: fsl_easrc: register m2m platform device > media: uapi: Add V4L2_CAP_AUDIO_M2M capability flag > media: v4l2: Add audio capture and output support > media: uapi: Define audio sample format fourcc type > media: uapi: Add V4L2_CTRL_CLASS_M2M_AUDIO > media: uapi: Add audio rate controls support > media: uapi: Declare interface types for Audio > media: uapi: Add an entity type for audio resampler > media: vivid: add fixed point test controls > media: imx-asrc: Add memory to memory driver > media: vim2m-audio: add virtual driver for audio memory to memory > > .../media/mediactl/media-types.rst | 11 + > .../userspace-api/media/v4l/buffer.rst | 6 + > .../userspace-api/media/v4l/common.rst | 1 + > .../media/v4l/dev-audio-mem2mem.rst | 71 + > .../userspace-api/media/v4l/devices.rst | 1 + > .../media/v4l/ext-ctrls-audio-m2m.rst | 59 + > .../userspace-api/media/v4l/pixfmt-audio.rst | 100 ++ > .../userspace-api/media/v4l/pixfmt.rst | 1 + > .../media/v4l/vidioc-enum-fmt.rst | 2 + > .../media/v4l/vidioc-g-ext-ctrls.rst | 4 + > .../userspace-api/media/v4l/vidioc-g-fmt.rst | 4 + > .../media/v4l/vidioc-querycap.rst | 3 + > .../media/v4l/vidioc-queryctrl.rst | 11 +- > .../media/videodev2.h.rst.exceptions | 3 + > MAINTAINERS | 17 + > .../media/common/videobuf2/videobuf2-v4l2.c | 4 + > drivers/media/platform/nxp/Kconfig | 13 + > drivers/media/platform/nxp/Makefile | 1 + > drivers/media/platform/nxp/imx-asrc.c | 1256 +++++++++++++++++ > drivers/media/test-drivers/Kconfig | 10 + > drivers/media/test-drivers/Makefile | 1 + > drivers/media/test-drivers/vim2m-audio.c | 793 +++++++++++ > drivers/media/test-drivers/vivid/vivid-core.h | 2 + > .../media/test-drivers/vivid/vivid-ctrls.c | 26 + > drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 9 + > drivers/media/v4l2-core/v4l2-ctrls-api.c | 1 + > drivers/media/v4l2-core/v4l2-ctrls-core.c | 93 +- > drivers/media/v4l2-core/v4l2-ctrls-defs.c | 10 + > drivers/media/v4l2-core/v4l2-dev.c | 21 + > drivers/media/v4l2-core/v4l2-ioctl.c | 66 + > drivers/media/v4l2-core/v4l2-mem2mem.c | 13 +- > include/media/v4l2-ctrls.h | 13 +- > include/media/v4l2-dev.h | 2 + > include/media/v4l2-ioctl.h | 34 + > .../fsl => include/sound}/fsl_asrc_common.h | 60 + > include/uapi/linux/media.h | 2 + > include/uapi/linux/v4l2-controls.h | 9 + > include/uapi/linux/videodev2.h | 50 +- > sound/soc/fsl/fsl_asrc.c | 144 ++ > sound/soc/fsl/fsl_asrc.h | 4 +- > sound/soc/fsl/fsl_asrc_dma.c | 2 +- > sound/soc/fsl/fsl_easrc.c | 233 +++ > sound/soc/fsl/fsl_easrc.h | 6 +- > 43 files changed, 3145 insertions(+), 27 deletions(-) > create mode 100644 Documentation/userspace-api/media/v4l/dev-audio-mem2mem.rst > create mode 100644 Documentation/userspace-api/media/v4l/ext-ctrls-audio-m2m.rst > create mode 100644 Documentation/userspace-api/media/v4l/pixfmt-audio.rst > create mode 100644 drivers/media/platform/nxp/imx-asrc.c > create mode 100644 drivers/media/test-drivers/vim2m-audio.c > rename {sound/soc/fsl => include/sound}/fsl_asrc_common.h (60%) > >-- >2.34.1 > >
On 30/04/2024 10:21, Sebastian Fricke wrote: > Hey Shengjiu, > > first of all thanks for all of this work and I am very sorry for only > emerging this late into the series, I sadly didn't notice it earlier. > > I would like to voice a few concerns about the general idea of adding > Audio support to the Media subsystem. > > 1. The biggest objection is, that the Linux Kernel has a subsystem > specifically targeted for audio devices, adding support for these > devices in another subsystem are counterproductive as they work around > the shortcomings of the audio subsystem while forcing support for a > device into a subsystem that was never designed for such devices. > Instead, the audio subsystem has to be adjusted to be able to support > all of the required workflows, otherwise, the next audio driver with > similar requirements will have to move to the media subsystem as well, > the audio subsystem would then never experience the required change and > soon we would have two audio subsystems. > > 2. Closely connected to the previous objection, the media subsystem with > its current staff of maintainers is overworked and barely capable of > handling the workload, which includes an abundance of different devices > from DVB, codecs, cameras, PCI devices, radio tuners, HDMI CEC, IR > receivers, etc. Adding more device types to this matrix will make the > situation worse and should only be done with a plan for how first to > improve the current maintainer situation. > > 3. By using the same framework and APIs as the video codecs, the audio > codecs are going to cause extra work for the video codec developers and > maintainers simply by occupying the same space that was orginally > designed for the purpose of video only. Even if you try to not cause any > extra stress the simple presence of the audio code in the codebase is > going to cause restrictions. > > The main issue here is that the audio subsystem doesn't provide a > mem2mem framework and I would say you are in luck because the media > subsystem has gathered a lot of shortcomings with its current > implementation of the mem2mem framework over time, which is why a new > implementation will be necessary anyway. > > So instead of hammering a driver into the wrong destination, I would > suggest bundling our forces and implementing a general memory-to-memory > framework that both the media and the audio subsystem can use, that > addresses the current shortcomings of the implementation and allows you > to upload the driver where it is supposed to be. > This is going to cause restrictions as well, like mentioned in the > concern number 3, but with the difference that we can make a general > plan for such a framework that accomodates lots of use cases and each > subsystem can add their routines on top of the general framework. > > Another possible alternative is to try and make the DRM scheduler more > generally available, this scheduler is the most mature and in fact is > very similar to what you and what the media devices need. > Which again just shows how common your usecase actually is and how a > general solution is the best long term solution. > > Please notice that Daniel Almeida is currently working on something > related to this: > https://lore.kernel.org/linux-media/3F80AC0D-DCAA-4EDE-BF58-BB1369C7EDCA@collabora.com/T/#u > > If the toplevel maintainers decide to add the patchset so be it, but I > wanted to voice my concerns and also highlight that this is likely going > to cause extra stress for the video codecs maintainers and the > maintainers in general. We cannot spend a lot of time on audio codecs, > as video codecs already fill up our available time sufficiently, > so the use of the framework needs to be conservative and cause as little > extra work as possible for the original use case of the framework. I would really like to get the input of the audio maintainers on this. Sebastian has a good point, especially with us being overworked :-) Having a shared mem2mem framework would certainly be nice, on the other hand, developing that will most likely take a substantial amount of time. Perhaps it is possible to copy the current media v4l2-mem2mem.c and turn it into an alsa-mem2mem.c? I really do not know enough about the alsa subsystem to tell if that is possible. While this driver is a rate converter, not an audio codec, the same principles would apply to off-line audio codecs as well. And it is true that we definitely do not want to support audio codecs in the media subsystem. Accepting this driver creates a precedent and would open the door for audio codecs. I may have been too hasty in saying yes to this, I did not consider the wider implications for our workload and what it can lead to. I sincerely apologize to Shengjiu Wang as it is no fun to end up in a situation like this. Regards, Hans
Em Tue, 30 Apr 2024 10:47:13 +0200 Hans Verkuil <hverkuil@xs4all.nl> escreveu: > On 30/04/2024 10:21, Sebastian Fricke wrote: > > Hey Shengjiu, > > > > first of all thanks for all of this work and I am very sorry for only > > emerging this late into the series, I sadly didn't notice it earlier. > > > > I would like to voice a few concerns about the general idea of adding > > Audio support to the Media subsystem. > > > > 1. The biggest objection is, that the Linux Kernel has a subsystem > > specifically targeted for audio devices, adding support for these > > devices in another subsystem are counterproductive as they work around > > the shortcomings of the audio subsystem while forcing support for a > > device into a subsystem that was never designed for such devices. > > Instead, the audio subsystem has to be adjusted to be able to support > > all of the required workflows, otherwise, the next audio driver with > > similar requirements will have to move to the media subsystem as well, > > the audio subsystem would then never experience the required change and > > soon we would have two audio subsystems. > > > > 2. Closely connected to the previous objection, the media subsystem with > > its current staff of maintainers is overworked and barely capable of > > handling the workload, which includes an abundance of different devices > > from DVB, codecs, cameras, PCI devices, radio tuners, HDMI CEC, IR > > receivers, etc. Adding more device types to this matrix will make the > > situation worse and should only be done with a plan for how first to > > improve the current maintainer situation. > > > > 3. By using the same framework and APIs as the video codecs, the audio > > codecs are going to cause extra work for the video codec developers and > > maintainers simply by occupying the same space that was orginally > > designed for the purpose of video only. Even if you try to not cause any > > extra stress the simple presence of the audio code in the codebase is > > going to cause restrictions. > > > > The main issue here is that the audio subsystem doesn't provide a > > mem2mem framework and I would say you are in luck because the media > > subsystem has gathered a lot of shortcomings with its current > > implementation of the mem2mem framework over time, which is why a new > > implementation will be necessary anyway. > > > > So instead of hammering a driver into the wrong destination, I would > > suggest bundling our forces and implementing a general memory-to-memory > > framework that both the media and the audio subsystem can use, that > > addresses the current shortcomings of the implementation and allows you > > to upload the driver where it is supposed to be. > > This is going to cause restrictions as well, like mentioned in the > > concern number 3, but with the difference that we can make a general > > plan for such a framework that accomodates lots of use cases and each > > subsystem can add their routines on top of the general framework. > > > > Another possible alternative is to try and make the DRM scheduler more > > generally available, this scheduler is the most mature and in fact is > > very similar to what you and what the media devices need. > > Which again just shows how common your usecase actually is and how a > > general solution is the best long term solution. > > > > Please notice that Daniel Almeida is currently working on something > > related to this: > > https://lore.kernel.org/linux-media/3F80AC0D-DCAA-4EDE-BF58-BB1369C7EDCA@collabora.com/T/#u > > > > If the toplevel maintainers decide to add the patchset so be it, but I > > wanted to voice my concerns and also highlight that this is likely going > > to cause extra stress for the video codecs maintainers and the > > maintainers in general. We cannot spend a lot of time on audio codecs, > > as video codecs already fill up our available time sufficiently, > > so the use of the framework needs to be conservative and cause as little > > extra work as possible for the original use case of the framework. > > I would really like to get the input of the audio maintainers on this. > Sebastian has a good point, especially with us being overworked :-) > > Having a shared mem2mem framework would certainly be nice, on the other > hand, developing that will most likely take a substantial amount of time. > > Perhaps it is possible to copy the current media v4l2-mem2mem.c and turn > it into an alsa-mem2mem.c? I really do not know enough about the alsa > subsystem to tell if that is possible. > > While this driver is a rate converter, not an audio codec, the same > principles would apply to off-line audio codecs as well. And it is true > that we definitely do not want to support audio codecs in the media > subsystem. > > Accepting this driver creates a precedent and would open the door for > audio codecs. > > I may have been too hasty in saying yes to this, I did not consider > the wider implications for our workload and what it can lead to. I > sincerely apologize to Shengjiu Wang as it is no fun to end up in a > situation like this. I agree with both Sebastian and Hans here: media devices always had audio streams, even on old PCI analog TV devices like bttv. There are even some devices like the ones based on usb em28xx that contains an AC97 chip on it. The decision was always to have audio supported by ALSA APIs/subsystem, as otherwise we'll end duplicating code and reinventing the wheel with new incompatible APIs for audio in and outside media, creating unneeded complexity, which will end being reflected on userspace as well. So, IMO it makes a lot more sense to place audio codecs and processor blocks inside ALSA, probably as part of ALSA SOF, if possible. Hans suggestion of forking v4l2-mem2mem.c on ALSA seems a good starting point. Also, moving the DRM mem2mem functionality to a core library that could be re-used by the three subsystems sounds a good idea, but I suspect that a change like that could be more time-consuming. Regards, Mauro
On Tue, Apr 30, 2024 at 10:21:12AM +0200, Sebastian Fricke wrote: > first of all thanks for all of this work and I am very sorry for only > emerging this late into the series, I sadly didn't notice it earlier. It might be worth checking out the discussion on earlier versions... > 1. The biggest objection is, that the Linux Kernel has a subsystem > specifically targeted for audio devices, adding support for these > devices in another subsystem are counterproductive as they work around > the shortcomings of the audio subsystem while forcing support for a > device into a subsystem that was never designed for such devices. > Instead, the audio subsystem has to be adjusted to be able to support > all of the required workflows, otherwise, the next audio driver with > similar requirements will have to move to the media subsystem as well, > the audio subsystem would then never experience the required change and > soon we would have two audio subsystems. The discussion around this originally was that all the audio APIs are very much centered around real time operations rather than completely async memory to memory operations and that it's not clear that it's worth reinventing the wheel simply for the sake of having things in ALSA when that's already pretty idiomatic for the media subsystem. It wasn't the memory to memory bit per se, it was the disconnection from any timing. > So instead of hammering a driver into the wrong destination, I would > suggest bundling our forces and implementing a general memory-to-memory > framework that both the media and the audio subsystem can use, that > addresses the current shortcomings of the implementation and allows you > to upload the driver where it is supposed to be. That doesn't sound like an immediate solution to maintainer overload issues... if something like this is going to happen the DRM solution does seem more general but I'm not sure the amount of stop energy is proportionate.
On 30. 04. 24 16:46, Mark Brown wrote: >> So instead of hammering a driver into the wrong destination, I would >> suggest bundling our forces and implementing a general memory-to-memory >> framework that both the media and the audio subsystem can use, that >> addresses the current shortcomings of the implementation and allows you >> to upload the driver where it is supposed to be. > > That doesn't sound like an immediate solution to maintainer overload > issues... if something like this is going to happen the DRM solution > does seem more general but I'm not sure the amount of stop energy is > proportionate. The "do what you want" ALSA's hwdep device / interface can be used to transfer data in/out from SRC using custom read/write/ioctl/mmap syscalls. The question is, if the changes cannot be more simpler for the first implementation keeping the hardware enumeration in one subsystem where is the driver code placed. I also see the benefit to reuse the already existing framework (but is v4l2 the right one?). Jaroslav
Em Tue, 30 Apr 2024 23:46:03 +0900 Mark Brown <broonie@kernel.org> escreveu: > On Tue, Apr 30, 2024 at 10:21:12AM +0200, Sebastian Fricke wrote: > > > first of all thanks for all of this work and I am very sorry for only > > emerging this late into the series, I sadly didn't notice it earlier. > > It might be worth checking out the discussion on earlier versions... > > > 1. The biggest objection is, that the Linux Kernel has a subsystem > > specifically targeted for audio devices, adding support for these > > devices in another subsystem are counterproductive as they work around > > the shortcomings of the audio subsystem while forcing support for a > > device into a subsystem that was never designed for such devices. > > Instead, the audio subsystem has to be adjusted to be able to support > > all of the required workflows, otherwise, the next audio driver with > > similar requirements will have to move to the media subsystem as well, > > the audio subsystem would then never experience the required change and > > soon we would have two audio subsystems. > > The discussion around this originally was that all the audio APIs are > very much centered around real time operations rather than completely > async memory to memory operations and that it's not clear that it's > worth reinventing the wheel simply for the sake of having things in > ALSA when that's already pretty idiomatic for the media subsystem. It > wasn't the memory to memory bit per se, it was the disconnection from > any timing. The media subsystem is also centered around real time. Without real time, you can't have a decent video conference system. Having mem2mem transfers actually help reducing real time delays, as it avoids extra latency due to CPU congestion and/or data transfers from/to userspace. > > > So instead of hammering a driver into the wrong destination, I would > > suggest bundling our forces and implementing a general memory-to-memory > > framework that both the media and the audio subsystem can use, that > > addresses the current shortcomings of the implementation and allows you > > to upload the driver where it is supposed to be. > > That doesn't sound like an immediate solution to maintainer overload > issues... if something like this is going to happen the DRM solution > does seem more general but I'm not sure the amount of stop energy is > proportionate. I don't think maintainer overload is the issue here. The main point is to avoid a fork at the audio uAPI, plus the burden of re-inventing the wheel with new codes for audio formats, new documentation for them, etc. Regards, Mauro
On Tue, Apr 30, 2024 at 05:27:52PM +0100, Mauro Carvalho Chehab wrote: > Mark Brown <broonie@kernel.org> escreveu: > > On Tue, Apr 30, 2024 at 10:21:12AM +0200, Sebastian Fricke wrote: > > The discussion around this originally was that all the audio APIs are > > very much centered around real time operations rather than completely > The media subsystem is also centered around real time. Without real > time, you can't have a decent video conference system. Having > mem2mem transfers actually help reducing real time delays, as it > avoids extra latency due to CPU congestion and/or data transfers > from/to userspace. Real time means strongly tied to wall clock times rather than fast - the issue was that all the ALSA APIs are based around pushing data through the system based on a clock. > > That doesn't sound like an immediate solution to maintainer overload > > issues... if something like this is going to happen the DRM solution > > does seem more general but I'm not sure the amount of stop energy is > > proportionate. > I don't think maintainer overload is the issue here. The main > point is to avoid a fork at the audio uAPI, plus the burden > of re-inventing the wheel with new codes for audio formats, > new documentation for them, etc. I thought that discussion had been had already at one of the earlier versions? TBH I've not really been paying attention to this since the very early versions where I raised some similar "why is this in media" points and I thought everyone had decided that this did actually make sense.
On Wed, 01 May 2024 03:56:15 +0200, Mark Brown wrote: > > On Tue, Apr 30, 2024 at 05:27:52PM +0100, Mauro Carvalho Chehab wrote: > > Mark Brown <broonie@kernel.org> escreveu: > > > On Tue, Apr 30, 2024 at 10:21:12AM +0200, Sebastian Fricke wrote: > > > > The discussion around this originally was that all the audio APIs are > > > very much centered around real time operations rather than completely > > > The media subsystem is also centered around real time. Without real > > time, you can't have a decent video conference system. Having > > mem2mem transfers actually help reducing real time delays, as it > > avoids extra latency due to CPU congestion and/or data transfers > > from/to userspace. > > Real time means strongly tied to wall clock times rather than fast - the > issue was that all the ALSA APIs are based around pushing data through > the system based on a clock. > > > > That doesn't sound like an immediate solution to maintainer overload > > > issues... if something like this is going to happen the DRM solution > > > does seem more general but I'm not sure the amount of stop energy is > > > proportionate. > > > I don't think maintainer overload is the issue here. The main > > point is to avoid a fork at the audio uAPI, plus the burden > > of re-inventing the wheel with new codes for audio formats, > > new documentation for them, etc. > > I thought that discussion had been had already at one of the earlier > versions? TBH I've not really been paying attention to this since the > very early versions where I raised some similar "why is this in media" > points and I thought everyone had decided that this did actually make > sense. Yeah, it was discussed in v1 and v2 threads, e.g. https://patchwork.kernel.org/project/linux-media/cover/1690265540-25999-1-git-send-email-shengjiu.wang@nxp.com/#25485573 My argument at that time was how the operation would be, and the point was that it'd be a "batch-like" operation via M2M without any timing control. It'd be a very special usage for for ALSA, and if any, it'd be hwdep -- that is a very hardware-specific API implementation -- or try compress-offload API, which looks dubious. OTOH, the argument was that there is already a framework for M2M in media API and that also fits for the batch-like operation, too. So was the thread evolved until now. thanks, Takashi
Em Thu, 02 May 2024 09:46:14 +0200 Takashi Iwai <tiwai@suse.de> escreveu: > On Wed, 01 May 2024 03:56:15 +0200, > Mark Brown wrote: > > > > On Tue, Apr 30, 2024 at 05:27:52PM +0100, Mauro Carvalho Chehab wrote: > > > Mark Brown <broonie@kernel.org> escreveu: > > > > On Tue, Apr 30, 2024 at 10:21:12AM +0200, Sebastian Fricke wrote: > > > > > > The discussion around this originally was that all the audio APIs are > > > > very much centered around real time operations rather than completely > > > > > The media subsystem is also centered around real time. Without real > > > time, you can't have a decent video conference system. Having > > > mem2mem transfers actually help reducing real time delays, as it > > > avoids extra latency due to CPU congestion and/or data transfers > > > from/to userspace. > > > > Real time means strongly tied to wall clock times rather than fast - the > > issue was that all the ALSA APIs are based around pushing data through > > the system based on a clock. > > > > > > That doesn't sound like an immediate solution to maintainer overload > > > > issues... if something like this is going to happen the DRM solution > > > > does seem more general but I'm not sure the amount of stop energy is > > > > proportionate. > > > > > I don't think maintainer overload is the issue here. The main > > > point is to avoid a fork at the audio uAPI, plus the burden > > > of re-inventing the wheel with new codes for audio formats, > > > new documentation for them, etc. > > > > I thought that discussion had been had already at one of the earlier > > versions? TBH I've not really been paying attention to this since the > > very early versions where I raised some similar "why is this in media" > > points and I thought everyone had decided that this did actually make > > sense. > > Yeah, it was discussed in v1 and v2 threads, e.g. > https://patchwork.kernel.org/project/linux-media/cover/1690265540-25999-1-git-send-email-shengjiu.wang@nxp.com/#25485573 > > My argument at that time was how the operation would be, and the point > was that it'd be a "batch-like" operation via M2M without any timing > control. It'd be a very special usage for for ALSA, and if any, it'd > be hwdep -- that is a very hardware-specific API implementation -- or > try compress-offload API, which looks dubious. > > OTOH, the argument was that there is already a framework for M2M in > media API and that also fits for the batch-like operation, too. So > was the thread evolved until now. M2M transfers are not a hardware-specific API, and such kind of transfers is not new either. Old media devices like bttv have internally a way to do PCI2PCI transfers, allowing media streams to be transferred directly without utilizing CPU. The media driver supports it for video, as this made a huge difference of performance back then. On embedded world, this is a pretty common scenario: different media IP blocks can communicate with each other directly via memory. This can happen for video capture, video display and audio. With M2M, most of the control is offloaded to the hardware. There are still time control associated with it, as audio and video needs to be in sync. This is done by controlling the buffers size and could be fine-tuned by checking when the buffer transfer is done. On media, M2M buffer transfers are started via VIDIOC_QBUF, which is a request to do a frame transfer. A similar ioctl (VIDIOC_DQBUF) is used to monitor when the hardware finishes transfering the buffer. On other words, the CPU is responsible for time control. On other words, this is still real time. The main difference from a "sync" transfer is that the CPU doesn't need to copy data from/to different devices, as such operation is offloaded to the hardware. Regards, Mauro
Em Thu, 2 May 2024 09:59:56 +0100 Mauro Carvalho Chehab <mchehab@kernel.org> escreveu: > Em Thu, 02 May 2024 09:46:14 +0200 > Takashi Iwai <tiwai@suse.de> escreveu: > > > On Wed, 01 May 2024 03:56:15 +0200, > > Mark Brown wrote: > > > > > > On Tue, Apr 30, 2024 at 05:27:52PM +0100, Mauro Carvalho Chehab wrote: > > > > Mark Brown <broonie@kernel.org> escreveu: > > > > > On Tue, Apr 30, 2024 at 10:21:12AM +0200, Sebastian Fricke wrote: > > > > > > > > The discussion around this originally was that all the audio APIs are > > > > > very much centered around real time operations rather than completely > > > > > > > The media subsystem is also centered around real time. Without real > > > > time, you can't have a decent video conference system. Having > > > > mem2mem transfers actually help reducing real time delays, as it > > > > avoids extra latency due to CPU congestion and/or data transfers > > > > from/to userspace. > > > > > > Real time means strongly tied to wall clock times rather than fast - the > > > issue was that all the ALSA APIs are based around pushing data through > > > the system based on a clock. > > > > > > > > That doesn't sound like an immediate solution to maintainer overload > > > > > issues... if something like this is going to happen the DRM solution > > > > > does seem more general but I'm not sure the amount of stop energy is > > > > > proportionate. > > > > > > > I don't think maintainer overload is the issue here. The main > > > > point is to avoid a fork at the audio uAPI, plus the burden > > > > of re-inventing the wheel with new codes for audio formats, > > > > new documentation for them, etc. > > > > > > I thought that discussion had been had already at one of the earlier > > > versions? TBH I've not really been paying attention to this since the > > > very early versions where I raised some similar "why is this in media" > > > points and I thought everyone had decided that this did actually make > > > sense. > > > > Yeah, it was discussed in v1 and v2 threads, e.g. > > https://patchwork.kernel.org/project/linux-media/cover/1690265540-25999-1-git-send-email-shengjiu.wang@nxp.com/#25485573 > > > > My argument at that time was how the operation would be, and the point > > was that it'd be a "batch-like" operation via M2M without any timing > > control. It'd be a very special usage for for ALSA, and if any, it'd > > be hwdep -- that is a very hardware-specific API implementation -- or > > try compress-offload API, which looks dubious. > > > > OTOH, the argument was that there is already a framework for M2M in > > media API and that also fits for the batch-like operation, too. So > > was the thread evolved until now. > > M2M transfers are not a hardware-specific API, and such kind of > transfers is not new either. Old media devices like bttv have > internally a way to do PCI2PCI transfers, allowing media streams > to be transferred directly without utilizing CPU. The media driver > supports it for video, as this made a huge difference of performance > back then. > > On embedded world, this is a pretty common scenario: different media > IP blocks can communicate with each other directly via memory. This > can happen for video capture, video display and audio. > > With M2M, most of the control is offloaded to the hardware. > > There are still time control associated with it, as audio and video > needs to be in sync. This is done by controlling the buffers size > and could be fine-tuned by checking when the buffer transfer is done. > > On media, M2M buffer transfers are started via VIDIOC_QBUF, > which is a request to do a frame transfer. A similar ioctl > (VIDIOC_DQBUF) is used to monitor when the hardware finishes > transfering the buffer. On other words, the CPU is responsible > for time control. Just complementing: on media, we do this per video buffer (or per half video buffer). A typical use case on cameras is to have buffers transferred 30 times per second, if the video was streamed at 30 frames per second. I would assume that, on an audio/video stream, the audio data transfer will be programmed to also happen on a regular interval. So, if the video stream is programmed to a 30 frames per second rate, I would assume that the associated audio stream will also be programmed to be grouped into 30 data transfers per second. On such scenario, if the audio is sampled at 48 kHZ, it means that: 1) each M2M transfer commanded by CPU will copy 1600 samples; 2) the time between each sample will remain 1/48000; 3) a notification event telling that 1600 samples were transferred will be generated when the last sample happens; 4) CPU will do time control by looking at the notification events. > On other words, this is still real time. The main difference > from a "sync" transfer is that the CPU doesn't need to copy data > from/to different devices, as such operation is offloaded to the > hardware. > > Regards, > Mauro
On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote: > Mauro Carvalho Chehab <mchehab@kernel.org> escreveu: > > There are still time control associated with it, as audio and video > > needs to be in sync. This is done by controlling the buffers size > > and could be fine-tuned by checking when the buffer transfer is done. ... > Just complementing: on media, we do this per video buffer (or > per half video buffer). A typical use case on cameras is to have > buffers transferred 30 times per second, if the video was streamed > at 30 frames per second. IIRC some big use case for this hardware was transcoding so there was a desire to just go at whatever rate the hardware could support as there is no interactive user consuming the output as it is generated. > I would assume that, on an audio/video stream, the audio data > transfer will be programmed to also happen on a regular interval. With audio the API is very much "wake userspace every Xms".
Em Fri, 3 May 2024 10:47:19 +0900 Mark Brown <broonie@kernel.org> escreveu: > On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote: > > Mauro Carvalho Chehab <mchehab@kernel.org> escreveu: > > > > There are still time control associated with it, as audio and video > > > needs to be in sync. This is done by controlling the buffers size > > > and could be fine-tuned by checking when the buffer transfer is done. > > ... > > > Just complementing: on media, we do this per video buffer (or > > per half video buffer). A typical use case on cameras is to have > > buffers transferred 30 times per second, if the video was streamed > > at 30 frames per second. > > IIRC some big use case for this hardware was transcoding so there was a > desire to just go at whatever rate the hardware could support as there > is no interactive user consuming the output as it is generated. Indeed, codecs could be used to just do transcoding, but I would expect it to be a border use case. See, as the chipsets implementing codecs are typically the ones used on mobiles, I would expect that the major use cases to be to watch audio and video and to participate on audio/video conferences. Going further, the codec API may end supporting not only transcoding (which is something that CPU can usually handle without too much processing) but also audio processing that may require more complex algorithms - even deep learning ones - like background noise removal, echo detection/removal, volume auto-gain, audio enhancement and such. On other words, the typical use cases will either have input or output being a physical hardware (microphone or speaker). > > I would assume that, on an audio/video stream, the audio data > > transfer will be programmed to also happen on a regular interval. > > With audio the API is very much "wake userspace every Xms".
On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote: > > Em Fri, 3 May 2024 10:47:19 +0900 > Mark Brown <broonie@kernel.org> escreveu: > > > On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote: > > > Mauro Carvalho Chehab <mchehab@kernel.org> escreveu: > > > > > > There are still time control associated with it, as audio and video > > > > needs to be in sync. This is done by controlling the buffers size > > > > and could be fine-tuned by checking when the buffer transfer is done. > > > > ... > > > > > Just complementing: on media, we do this per video buffer (or > > > per half video buffer). A typical use case on cameras is to have > > > buffers transferred 30 times per second, if the video was streamed > > > at 30 frames per second. > > > > IIRC some big use case for this hardware was transcoding so there was a > > desire to just go at whatever rate the hardware could support as there > > is no interactive user consuming the output as it is generated. > > Indeed, codecs could be used to just do transcoding, but I would > expect it to be a border use case. See, as the chipsets implementing > codecs are typically the ones used on mobiles, I would expect that > the major use cases to be to watch audio and video and to participate > on audio/video conferences. > > Going further, the codec API may end supporting not only transcoding > (which is something that CPU can usually handle without too much > processing) but also audio processing that may require more > complex algorithms - even deep learning ones - like background noise > removal, echo detection/removal, volume auto-gain, audio enhancement > and such. > > On other words, the typical use cases will either have input > or output being a physical hardware (microphone or speaker). > All, thanks for spending time to discuss, it seems we go back to the start point of this topic again. Our main request is that there is a hardware sample rate converter on the chip, so users can use it in user space as a component like software sample rate converter. It mostly may run as a gstreamer plugin. so it is a memory to memory component. I didn't find such API in ALSA for such purpose, the best option for this in the kernel is the V4L2 memory to memory framework I found. As Hans said it is well designed for memory to memory. And I think audio is one of 'media'. As I can see that part of Radio function is in ALSA, part of Radio function is in V4L2. part of HDMI function is in DRM, part of HDMI function is in ALSA... So using V4L2 for audio is not new from this point of view. Even now I still think V4L2 is the best option, but it looks like there are a lot of rejects. If develop a new ALSA-mem2mem, it is also a duplication of code (bigger duplication that just add audio support in V4L2 I think). Best regards Shengjiu Wang. > > > I would assume that, on an audio/video stream, the audio data > > > transfer will be programmed to also happen on a regular interval. > > > > With audio the API is very much "wake userspace every Xms".
On 06. 05. 24 10:49, Shengjiu Wang wrote: > Even now I still think V4L2 is the best option, but it looks like there > are a lot of rejects. If develop a new ALSA-mem2mem, it is also > a duplication of code (bigger duplication that just add audio support > in V4L2 I think). Maybe not. Could you try to evaluate a pure dma-buf (drivers/dma-buf) solution and add only enumeration and operation trigger mechanism to the ALSA API? It seems that dma-buf has enough sufficient code to transfer data from and to the kernel space for the further processing. I think that one buffer can be as source and the second for the processed data. We can eventually add new ioctls to the ALSA's control API (/dev/snd/control*) for this purpose (DSP processing). Jaroslav
On 06/05/2024 10:49, Shengjiu Wang wrote: > On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote: >> >> Em Fri, 3 May 2024 10:47:19 +0900 >> Mark Brown <broonie@kernel.org> escreveu: >> >>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote: >>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu: >>> >>>>> There are still time control associated with it, as audio and video >>>>> needs to be in sync. This is done by controlling the buffers size >>>>> and could be fine-tuned by checking when the buffer transfer is done. >>> >>> ... >>> >>>> Just complementing: on media, we do this per video buffer (or >>>> per half video buffer). A typical use case on cameras is to have >>>> buffers transferred 30 times per second, if the video was streamed >>>> at 30 frames per second. >>> >>> IIRC some big use case for this hardware was transcoding so there was a >>> desire to just go at whatever rate the hardware could support as there >>> is no interactive user consuming the output as it is generated. >> >> Indeed, codecs could be used to just do transcoding, but I would >> expect it to be a border use case. See, as the chipsets implementing >> codecs are typically the ones used on mobiles, I would expect that >> the major use cases to be to watch audio and video and to participate >> on audio/video conferences. >> >> Going further, the codec API may end supporting not only transcoding >> (which is something that CPU can usually handle without too much >> processing) but also audio processing that may require more >> complex algorithms - even deep learning ones - like background noise >> removal, echo detection/removal, volume auto-gain, audio enhancement >> and such. >> >> On other words, the typical use cases will either have input >> or output being a physical hardware (microphone or speaker). >> > > All, thanks for spending time to discuss, it seems we go back to > the start point of this topic again. > > Our main request is that there is a hardware sample rate converter > on the chip, so users can use it in user space as a component like > software sample rate converter. It mostly may run as a gstreamer plugin. > so it is a memory to memory component. > > I didn't find such API in ALSA for such purpose, the best option for this > in the kernel is the V4L2 memory to memory framework I found. > As Hans said it is well designed for memory to memory. > > And I think audio is one of 'media'. As I can see that part of Radio > function is in ALSA, part of Radio function is in V4L2. part of HDMI > function is in DRM, part of HDMI function is in ALSA... > So using V4L2 for audio is not new from this point of view. > > Even now I still think V4L2 is the best option, but it looks like there > are a lot of rejects. If develop a new ALSA-mem2mem, it is also > a duplication of code (bigger duplication that just add audio support > in V4L2 I think). After reading this thread I still believe that the mem2mem framework is a reasonable option, unless someone can come up with a method that is easy to implement in the alsa subsystem. From what I can tell from this discussion no such method exists. >From the media side there are arguments that it adds extra maintenance load, which is true, but I believe that it is quite limited in practice. That said, perhaps we should make a statement that while we support the use of audio m2m drivers, this is only for simple m2m audio processing like this driver, specifically where there is a 1-to-1 mapping between input and output buffers. At this point we do not want to add audio codec support or similar complex audio processing. Part of the reason is that codecs are hard, and we already have our hands full with all the video codecs. Part of the reason is that the v4l2-mem2mem framework probably needs to be forked to make a more advanced version geared towards codecs since the current framework is too limiting for some of the things we want to do. It was really designed for scalers, deinterlacers, etc. and the codec support was added later. If we ever allow such complex audio processing devices, then we would have to have another discussion, and I believe that will only be possible if most of the maintenance load would be on the alsa subsystem where the audio experts are. So my proposal is to: 1) add a clear statement to dev-audio-mem2mem.rst (patch 08/16) that only simple audio devices with a 1-to-1 mapping of input to output buffer are supported. Perhaps also in videodev2.h before struct v4l2_audio_format. 2) I will experiment a bit trying to solve the main complaint about creating new audio fourcc values and thus duplicating existing SNDRV_PCM_FORMAT_ values. I have some ideas for that. But I do not want to spend time on 2 until we agree that this is the way forward. Regards, Hans
On 5/8/2024 10:00 AM, Hans Verkuil wrote: > On 06/05/2024 10:49, Shengjiu Wang wrote: >> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote: >>> >>> Em Fri, 3 May 2024 10:47:19 +0900 >>> Mark Brown <broonie@kernel.org> escreveu: >>> >>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote: >>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu: >>>> >>>>>> There are still time control associated with it, as audio and video >>>>>> needs to be in sync. This is done by controlling the buffers size >>>>>> and could be fine-tuned by checking when the buffer transfer is done. >>>> >>>> ... >>>> >>>>> Just complementing: on media, we do this per video buffer (or >>>>> per half video buffer). A typical use case on cameras is to have >>>>> buffers transferred 30 times per second, if the video was streamed >>>>> at 30 frames per second. >>>> >>>> IIRC some big use case for this hardware was transcoding so there was a >>>> desire to just go at whatever rate the hardware could support as there >>>> is no interactive user consuming the output as it is generated. >>> >>> Indeed, codecs could be used to just do transcoding, but I would >>> expect it to be a border use case. See, as the chipsets implementing >>> codecs are typically the ones used on mobiles, I would expect that >>> the major use cases to be to watch audio and video and to participate >>> on audio/video conferences. >>> >>> Going further, the codec API may end supporting not only transcoding >>> (which is something that CPU can usually handle without too much >>> processing) but also audio processing that may require more >>> complex algorithms - even deep learning ones - like background noise >>> removal, echo detection/removal, volume auto-gain, audio enhancement >>> and such. >>> >>> On other words, the typical use cases will either have input >>> or output being a physical hardware (microphone or speaker). >>> >> >> All, thanks for spending time to discuss, it seems we go back to >> the start point of this topic again. >> >> Our main request is that there is a hardware sample rate converter >> on the chip, so users can use it in user space as a component like >> software sample rate converter. It mostly may run as a gstreamer plugin. >> so it is a memory to memory component. >> >> I didn't find such API in ALSA for such purpose, the best option for this >> in the kernel is the V4L2 memory to memory framework I found. >> As Hans said it is well designed for memory to memory. >> >> And I think audio is one of 'media'. As I can see that part of Radio >> function is in ALSA, part of Radio function is in V4L2. part of HDMI >> function is in DRM, part of HDMI function is in ALSA... >> So using V4L2 for audio is not new from this point of view. >> >> Even now I still think V4L2 is the best option, but it looks like there >> are a lot of rejects. If develop a new ALSA-mem2mem, it is also >> a duplication of code (bigger duplication that just add audio support >> in V4L2 I think). > > After reading this thread I still believe that the mem2mem framework is > a reasonable option, unless someone can come up with a method that is > easy to implement in the alsa subsystem. From what I can tell from this > discussion no such method exists. > Hi, my main question would be how is mem2mem use case different from loopback exposing playback and capture frontends in user space with DSP (or other piece of HW) in the middle? Amadeusz
On Wed, May 8, 2024 at 4:14 PM Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> wrote: > > On 5/8/2024 10:00 AM, Hans Verkuil wrote: > > On 06/05/2024 10:49, Shengjiu Wang wrote: > >> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote: > >>> > >>> Em Fri, 3 May 2024 10:47:19 +0900 > >>> Mark Brown <broonie@kernel.org> escreveu: > >>> > >>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote: > >>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu: > >>>> > >>>>>> There are still time control associated with it, as audio and video > >>>>>> needs to be in sync. This is done by controlling the buffers size > >>>>>> and could be fine-tuned by checking when the buffer transfer is done. > >>>> > >>>> ... > >>>> > >>>>> Just complementing: on media, we do this per video buffer (or > >>>>> per half video buffer). A typical use case on cameras is to have > >>>>> buffers transferred 30 times per second, if the video was streamed > >>>>> at 30 frames per second. > >>>> > >>>> IIRC some big use case for this hardware was transcoding so there was a > >>>> desire to just go at whatever rate the hardware could support as there > >>>> is no interactive user consuming the output as it is generated. > >>> > >>> Indeed, codecs could be used to just do transcoding, but I would > >>> expect it to be a border use case. See, as the chipsets implementing > >>> codecs are typically the ones used on mobiles, I would expect that > >>> the major use cases to be to watch audio and video and to participate > >>> on audio/video conferences. > >>> > >>> Going further, the codec API may end supporting not only transcoding > >>> (which is something that CPU can usually handle without too much > >>> processing) but also audio processing that may require more > >>> complex algorithms - even deep learning ones - like background noise > >>> removal, echo detection/removal, volume auto-gain, audio enhancement > >>> and such. > >>> > >>> On other words, the typical use cases will either have input > >>> or output being a physical hardware (microphone or speaker). > >>> > >> > >> All, thanks for spending time to discuss, it seems we go back to > >> the start point of this topic again. > >> > >> Our main request is that there is a hardware sample rate converter > >> on the chip, so users can use it in user space as a component like > >> software sample rate converter. It mostly may run as a gstreamer plugin. > >> so it is a memory to memory component. > >> > >> I didn't find such API in ALSA for such purpose, the best option for this > >> in the kernel is the V4L2 memory to memory framework I found. > >> As Hans said it is well designed for memory to memory. > >> > >> And I think audio is one of 'media'. As I can see that part of Radio > >> function is in ALSA, part of Radio function is in V4L2. part of HDMI > >> function is in DRM, part of HDMI function is in ALSA... > >> So using V4L2 for audio is not new from this point of view. > >> > >> Even now I still think V4L2 is the best option, but it looks like there > >> are a lot of rejects. If develop a new ALSA-mem2mem, it is also > >> a duplication of code (bigger duplication that just add audio support > >> in V4L2 I think). > > > > After reading this thread I still believe that the mem2mem framework is > > a reasonable option, unless someone can come up with a method that is > > easy to implement in the alsa subsystem. From what I can tell from this > > discussion no such method exists. > > > > Hi, > > my main question would be how is mem2mem use case different from > loopback exposing playback and capture frontends in user space with DSP > (or other piece of HW) in the middle? > I think loopback has a timing control, user need to feed data to playback at a fixed time and get data from capture at a fixed time. Otherwise there is xrun in playback and capture. mem2mem case: there is no such timing control, user feeds data to it then it generates output, if user doesn't feed data, there is no xrun. but mem2mem is just one of the components in the playback or capture pipeline, overall there is time control for whole pipeline, Best regards Shengjiu Wang > Amadeusz >
On 5/9/2024 11:36 AM, Shengjiu Wang wrote: > On Wed, May 8, 2024 at 4:14 PM Amadeusz Sławiński > <amadeuszx.slawinski@linux.intel.com> wrote: >> >> On 5/8/2024 10:00 AM, Hans Verkuil wrote: >>> On 06/05/2024 10:49, Shengjiu Wang wrote: >>>> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote: >>>>> >>>>> Em Fri, 3 May 2024 10:47:19 +0900 >>>>> Mark Brown <broonie@kernel.org> escreveu: >>>>> >>>>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote: >>>>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu: >>>>>> >>>>>>>> There are still time control associated with it, as audio and video >>>>>>>> needs to be in sync. This is done by controlling the buffers size >>>>>>>> and could be fine-tuned by checking when the buffer transfer is done. >>>>>> >>>>>> ... >>>>>> >>>>>>> Just complementing: on media, we do this per video buffer (or >>>>>>> per half video buffer). A typical use case on cameras is to have >>>>>>> buffers transferred 30 times per second, if the video was streamed >>>>>>> at 30 frames per second. >>>>>> >>>>>> IIRC some big use case for this hardware was transcoding so there was a >>>>>> desire to just go at whatever rate the hardware could support as there >>>>>> is no interactive user consuming the output as it is generated. >>>>> >>>>> Indeed, codecs could be used to just do transcoding, but I would >>>>> expect it to be a border use case. See, as the chipsets implementing >>>>> codecs are typically the ones used on mobiles, I would expect that >>>>> the major use cases to be to watch audio and video and to participate >>>>> on audio/video conferences. >>>>> >>>>> Going further, the codec API may end supporting not only transcoding >>>>> (which is something that CPU can usually handle without too much >>>>> processing) but also audio processing that may require more >>>>> complex algorithms - even deep learning ones - like background noise >>>>> removal, echo detection/removal, volume auto-gain, audio enhancement >>>>> and such. >>>>> >>>>> On other words, the typical use cases will either have input >>>>> or output being a physical hardware (microphone or speaker). >>>>> >>>> >>>> All, thanks for spending time to discuss, it seems we go back to >>>> the start point of this topic again. >>>> >>>> Our main request is that there is a hardware sample rate converter >>>> on the chip, so users can use it in user space as a component like >>>> software sample rate converter. It mostly may run as a gstreamer plugin. >>>> so it is a memory to memory component. >>>> >>>> I didn't find such API in ALSA for such purpose, the best option for this >>>> in the kernel is the V4L2 memory to memory framework I found. >>>> As Hans said it is well designed for memory to memory. >>>> >>>> And I think audio is one of 'media'. As I can see that part of Radio >>>> function is in ALSA, part of Radio function is in V4L2. part of HDMI >>>> function is in DRM, part of HDMI function is in ALSA... >>>> So using V4L2 for audio is not new from this point of view. >>>> >>>> Even now I still think V4L2 is the best option, but it looks like there >>>> are a lot of rejects. If develop a new ALSA-mem2mem, it is also >>>> a duplication of code (bigger duplication that just add audio support >>>> in V4L2 I think). >>> >>> After reading this thread I still believe that the mem2mem framework is >>> a reasonable option, unless someone can come up with a method that is >>> easy to implement in the alsa subsystem. From what I can tell from this >>> discussion no such method exists. >>> >> >> Hi, >> >> my main question would be how is mem2mem use case different from >> loopback exposing playback and capture frontends in user space with DSP >> (or other piece of HW) in the middle? >> > I think loopback has a timing control, user need to feed data to playback at a > fixed time and get data from capture at a fixed time. Otherwise there > is xrun in > playback and capture. > > mem2mem case: there is no such timing control, user feeds data to it > then it generates output, if user doesn't feed data, there is no xrun. > but mem2mem is just one of the components in the playback or capture > pipeline, overall there is time control for whole pipeline, > Have you looked at compress streams? If I remember correctly they are not tied to time due to the fact that they can pass data in arbitrary formats? From: https://docs.kernel.org/sound/designs/compress-offload.html "No notion of underrun/overrun. Since the bytes written are compressed in nature and data written/read doesn’t translate directly to rendered output in time, this does not deal with underrun/overrun and maybe dealt in user-library" Amadeusz
On Thu, May 9, 2024 at 5:50 PM Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> wrote: > > On 5/9/2024 11:36 AM, Shengjiu Wang wrote: > > On Wed, May 8, 2024 at 4:14 PM Amadeusz Sławiński > > <amadeuszx.slawinski@linux.intel.com> wrote: > >> > >> On 5/8/2024 10:00 AM, Hans Verkuil wrote: > >>> On 06/05/2024 10:49, Shengjiu Wang wrote: > >>>> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote: > >>>>> > >>>>> Em Fri, 3 May 2024 10:47:19 +0900 > >>>>> Mark Brown <broonie@kernel.org> escreveu: > >>>>> > >>>>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote: > >>>>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu: > >>>>>> > >>>>>>>> There are still time control associated with it, as audio and video > >>>>>>>> needs to be in sync. This is done by controlling the buffers size > >>>>>>>> and could be fine-tuned by checking when the buffer transfer is done. > >>>>>> > >>>>>> ... > >>>>>> > >>>>>>> Just complementing: on media, we do this per video buffer (or > >>>>>>> per half video buffer). A typical use case on cameras is to have > >>>>>>> buffers transferred 30 times per second, if the video was streamed > >>>>>>> at 30 frames per second. > >>>>>> > >>>>>> IIRC some big use case for this hardware was transcoding so there was a > >>>>>> desire to just go at whatever rate the hardware could support as there > >>>>>> is no interactive user consuming the output as it is generated. > >>>>> > >>>>> Indeed, codecs could be used to just do transcoding, but I would > >>>>> expect it to be a border use case. See, as the chipsets implementing > >>>>> codecs are typically the ones used on mobiles, I would expect that > >>>>> the major use cases to be to watch audio and video and to participate > >>>>> on audio/video conferences. > >>>>> > >>>>> Going further, the codec API may end supporting not only transcoding > >>>>> (which is something that CPU can usually handle without too much > >>>>> processing) but also audio processing that may require more > >>>>> complex algorithms - even deep learning ones - like background noise > >>>>> removal, echo detection/removal, volume auto-gain, audio enhancement > >>>>> and such. > >>>>> > >>>>> On other words, the typical use cases will either have input > >>>>> or output being a physical hardware (microphone or speaker). > >>>>> > >>>> > >>>> All, thanks for spending time to discuss, it seems we go back to > >>>> the start point of this topic again. > >>>> > >>>> Our main request is that there is a hardware sample rate converter > >>>> on the chip, so users can use it in user space as a component like > >>>> software sample rate converter. It mostly may run as a gstreamer plugin. > >>>> so it is a memory to memory component. > >>>> > >>>> I didn't find such API in ALSA for such purpose, the best option for this > >>>> in the kernel is the V4L2 memory to memory framework I found. > >>>> As Hans said it is well designed for memory to memory. > >>>> > >>>> And I think audio is one of 'media'. As I can see that part of Radio > >>>> function is in ALSA, part of Radio function is in V4L2. part of HDMI > >>>> function is in DRM, part of HDMI function is in ALSA... > >>>> So using V4L2 for audio is not new from this point of view. > >>>> > >>>> Even now I still think V4L2 is the best option, but it looks like there > >>>> are a lot of rejects. If develop a new ALSA-mem2mem, it is also > >>>> a duplication of code (bigger duplication that just add audio support > >>>> in V4L2 I think). > >>> > >>> After reading this thread I still believe that the mem2mem framework is > >>> a reasonable option, unless someone can come up with a method that is > >>> easy to implement in the alsa subsystem. From what I can tell from this > >>> discussion no such method exists. > >>> > >> > >> Hi, > >> > >> my main question would be how is mem2mem use case different from > >> loopback exposing playback and capture frontends in user space with DSP > >> (or other piece of HW) in the middle? > >> > > I think loopback has a timing control, user need to feed data to playback at a > > fixed time and get data from capture at a fixed time. Otherwise there > > is xrun in > > playback and capture. > > > > mem2mem case: there is no such timing control, user feeds data to it > > then it generates output, if user doesn't feed data, there is no xrun. > > but mem2mem is just one of the components in the playback or capture > > pipeline, overall there is time control for whole pipeline, > > > > Have you looked at compress streams? If I remember correctly they are > not tied to time due to the fact that they can pass data in arbitrary > formats? > > From: > https://docs.kernel.org/sound/designs/compress-offload.html > > "No notion of underrun/overrun. Since the bytes written are compressed > in nature and data written/read doesn’t translate directly to rendered > output in time, this does not deal with underrun/overrun and maybe dealt > in user-library" I checked the compress stream. mem2mem case is different with compress-offload case compress-offload case is a full pipeline, the user sends a compress stream to it, then DSP decodes it and renders it to the speaker in real time. mem2mem is just like the decoder in the compress pipeline. which is one of the components in the pipeline. best regards shengjiu wang > > Amadeusz
On 5/9/2024 12:12 PM, Shengjiu Wang wrote: > On Thu, May 9, 2024 at 5:50 PM Amadeusz Sławiński > <amadeuszx.slawinski@linux.intel.com> wrote: >> >> On 5/9/2024 11:36 AM, Shengjiu Wang wrote: >>> On Wed, May 8, 2024 at 4:14 PM Amadeusz Sławiński >>> <amadeuszx.slawinski@linux.intel.com> wrote: >>>> >>>> On 5/8/2024 10:00 AM, Hans Verkuil wrote: >>>>> On 06/05/2024 10:49, Shengjiu Wang wrote: >>>>>> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote: >>>>>>> >>>>>>> Em Fri, 3 May 2024 10:47:19 +0900 >>>>>>> Mark Brown <broonie@kernel.org> escreveu: >>>>>>> >>>>>>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote: >>>>>>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu: >>>>>>>> >>>>>>>>>> There are still time control associated with it, as audio and video >>>>>>>>>> needs to be in sync. This is done by controlling the buffers size >>>>>>>>>> and could be fine-tuned by checking when the buffer transfer is done. >>>>>>>> >>>>>>>> ... >>>>>>>> >>>>>>>>> Just complementing: on media, we do this per video buffer (or >>>>>>>>> per half video buffer). A typical use case on cameras is to have >>>>>>>>> buffers transferred 30 times per second, if the video was streamed >>>>>>>>> at 30 frames per second. >>>>>>>> >>>>>>>> IIRC some big use case for this hardware was transcoding so there was a >>>>>>>> desire to just go at whatever rate the hardware could support as there >>>>>>>> is no interactive user consuming the output as it is generated. >>>>>>> >>>>>>> Indeed, codecs could be used to just do transcoding, but I would >>>>>>> expect it to be a border use case. See, as the chipsets implementing >>>>>>> codecs are typically the ones used on mobiles, I would expect that >>>>>>> the major use cases to be to watch audio and video and to participate >>>>>>> on audio/video conferences. >>>>>>> >>>>>>> Going further, the codec API may end supporting not only transcoding >>>>>>> (which is something that CPU can usually handle without too much >>>>>>> processing) but also audio processing that may require more >>>>>>> complex algorithms - even deep learning ones - like background noise >>>>>>> removal, echo detection/removal, volume auto-gain, audio enhancement >>>>>>> and such. >>>>>>> >>>>>>> On other words, the typical use cases will either have input >>>>>>> or output being a physical hardware (microphone or speaker). >>>>>>> >>>>>> >>>>>> All, thanks for spending time to discuss, it seems we go back to >>>>>> the start point of this topic again. >>>>>> >>>>>> Our main request is that there is a hardware sample rate converter >>>>>> on the chip, so users can use it in user space as a component like >>>>>> software sample rate converter. It mostly may run as a gstreamer plugin. >>>>>> so it is a memory to memory component. >>>>>> >>>>>> I didn't find such API in ALSA for such purpose, the best option for this >>>>>> in the kernel is the V4L2 memory to memory framework I found. >>>>>> As Hans said it is well designed for memory to memory. >>>>>> >>>>>> And I think audio is one of 'media'. As I can see that part of Radio >>>>>> function is in ALSA, part of Radio function is in V4L2. part of HDMI >>>>>> function is in DRM, part of HDMI function is in ALSA... >>>>>> So using V4L2 for audio is not new from this point of view. >>>>>> >>>>>> Even now I still think V4L2 is the best option, but it looks like there >>>>>> are a lot of rejects. If develop a new ALSA-mem2mem, it is also >>>>>> a duplication of code (bigger duplication that just add audio support >>>>>> in V4L2 I think). >>>>> >>>>> After reading this thread I still believe that the mem2mem framework is >>>>> a reasonable option, unless someone can come up with a method that is >>>>> easy to implement in the alsa subsystem. From what I can tell from this >>>>> discussion no such method exists. >>>>> >>>> >>>> Hi, >>>> >>>> my main question would be how is mem2mem use case different from >>>> loopback exposing playback and capture frontends in user space with DSP >>>> (or other piece of HW) in the middle? >>>> >>> I think loopback has a timing control, user need to feed data to playback at a >>> fixed time and get data from capture at a fixed time. Otherwise there >>> is xrun in >>> playback and capture. >>> >>> mem2mem case: there is no such timing control, user feeds data to it >>> then it generates output, if user doesn't feed data, there is no xrun. >>> but mem2mem is just one of the components in the playback or capture >>> pipeline, overall there is time control for whole pipeline, >>> >> >> Have you looked at compress streams? If I remember correctly they are >> not tied to time due to the fact that they can pass data in arbitrary >> formats? >> >> From: >> https://docs.kernel.org/sound/designs/compress-offload.html >> >> "No notion of underrun/overrun. Since the bytes written are compressed >> in nature and data written/read doesn’t translate directly to rendered >> output in time, this does not deal with underrun/overrun and maybe dealt >> in user-library" > > I checked the compress stream. mem2mem case is different with > compress-offload case > > compress-offload case is a full pipeline, the user sends a compress > stream to it, then DSP decodes it and renders it to the speaker in real > time. > > mem2mem is just like the decoder in the compress pipeline. which is > one of the components in the pipeline. I was thinking of loopback with endpoints using compress streams, without physical endpoint, something like: compress playback (to feed data from userspace) -> DSP (processing) -> compress capture (send data back to userspace) Unless I'm missing something, you should be able to process data as fast as you can feed it and consume it in such case. Amadeusz
On Thu, May 9, 2024 at 6:28 PM Amadeusz Sławiński <amadeuszx.slawinski@linux.intel.com> wrote: > > On 5/9/2024 12:12 PM, Shengjiu Wang wrote: > > On Thu, May 9, 2024 at 5:50 PM Amadeusz Sławiński > > <amadeuszx.slawinski@linux.intel.com> wrote: > >> > >> On 5/9/2024 11:36 AM, Shengjiu Wang wrote: > >>> On Wed, May 8, 2024 at 4:14 PM Amadeusz Sławiński > >>> <amadeuszx.slawinski@linux.intel.com> wrote: > >>>> > >>>> On 5/8/2024 10:00 AM, Hans Verkuil wrote: > >>>>> On 06/05/2024 10:49, Shengjiu Wang wrote: > >>>>>> On Fri, May 3, 2024 at 4:42 PM Mauro Carvalho Chehab <mchehab@kernel.org> wrote: > >>>>>>> > >>>>>>> Em Fri, 3 May 2024 10:47:19 +0900 > >>>>>>> Mark Brown <broonie@kernel.org> escreveu: > >>>>>>> > >>>>>>>> On Thu, May 02, 2024 at 10:26:43AM +0100, Mauro Carvalho Chehab wrote: > >>>>>>>>> Mauro Carvalho Chehab <mchehab@kernel.org> escreveu: > >>>>>>>> > >>>>>>>>>> There are still time control associated with it, as audio and video > >>>>>>>>>> needs to be in sync. This is done by controlling the buffers size > >>>>>>>>>> and could be fine-tuned by checking when the buffer transfer is done. > >>>>>>>> > >>>>>>>> ... > >>>>>>>> > >>>>>>>>> Just complementing: on media, we do this per video buffer (or > >>>>>>>>> per half video buffer). A typical use case on cameras is to have > >>>>>>>>> buffers transferred 30 times per second, if the video was streamed > >>>>>>>>> at 30 frames per second. > >>>>>>>> > >>>>>>>> IIRC some big use case for this hardware was transcoding so there was a > >>>>>>>> desire to just go at whatever rate the hardware could support as there > >>>>>>>> is no interactive user consuming the output as it is generated. > >>>>>>> > >>>>>>> Indeed, codecs could be used to just do transcoding, but I would > >>>>>>> expect it to be a border use case. See, as the chipsets implementing > >>>>>>> codecs are typically the ones used on mobiles, I would expect that > >>>>>>> the major use cases to be to watch audio and video and to participate > >>>>>>> on audio/video conferences. > >>>>>>> > >>>>>>> Going further, the codec API may end supporting not only transcoding > >>>>>>> (which is something that CPU can usually handle without too much > >>>>>>> processing) but also audio processing that may require more > >>>>>>> complex algorithms - even deep learning ones - like background noise > >>>>>>> removal, echo detection/removal, volume auto-gain, audio enhancement > >>>>>>> and such. > >>>>>>> > >>>>>>> On other words, the typical use cases will either have input > >>>>>>> or output being a physical hardware (microphone or speaker). > >>>>>>> > >>>>>> > >>>>>> All, thanks for spending time to discuss, it seems we go back to > >>>>>> the start point of this topic again. > >>>>>> > >>>>>> Our main request is that there is a hardware sample rate converter > >>>>>> on the chip, so users can use it in user space as a component like > >>>>>> software sample rate converter. It mostly may run as a gstreamer plugin. > >>>>>> so it is a memory to memory component. > >>>>>> > >>>>>> I didn't find such API in ALSA for such purpose, the best option for this > >>>>>> in the kernel is the V4L2 memory to memory framework I found. > >>>>>> As Hans said it is well designed for memory to memory. > >>>>>> > >>>>>> And I think audio is one of 'media'. As I can see that part of Radio > >>>>>> function is in ALSA, part of Radio function is in V4L2. part of HDMI > >>>>>> function is in DRM, part of HDMI function is in ALSA... > >>>>>> So using V4L2 for audio is not new from this point of view. > >>>>>> > >>>>>> Even now I still think V4L2 is the best option, but it looks like there > >>>>>> are a lot of rejects. If develop a new ALSA-mem2mem, it is also > >>>>>> a duplication of code (bigger duplication that just add audio support > >>>>>> in V4L2 I think). > >>>>> > >>>>> After reading this thread I still believe that the mem2mem framework is > >>>>> a reasonable option, unless someone can come up with a method that is > >>>>> easy to implement in the alsa subsystem. From what I can tell from this > >>>>> discussion no such method exists. > >>>>> > >>>> > >>>> Hi, > >>>> > >>>> my main question would be how is mem2mem use case different from > >>>> loopback exposing playback and capture frontends in user space with DSP > >>>> (or other piece of HW) in the middle? > >>>> > >>> I think loopback has a timing control, user need to feed data to playback at a > >>> fixed time and get data from capture at a fixed time. Otherwise there > >>> is xrun in > >>> playback and capture. > >>> > >>> mem2mem case: there is no such timing control, user feeds data to it > >>> then it generates output, if user doesn't feed data, there is no xrun. > >>> but mem2mem is just one of the components in the playback or capture > >>> pipeline, overall there is time control for whole pipeline, > >>> > >> > >> Have you looked at compress streams? If I remember correctly they are > >> not tied to time due to the fact that they can pass data in arbitrary > >> formats? > >> > >> From: > >> https://docs.kernel.org/sound/designs/compress-offload.html > >> > >> "No notion of underrun/overrun. Since the bytes written are compressed > >> in nature and data written/read doesn’t translate directly to rendered > >> output in time, this does not deal with underrun/overrun and maybe dealt > >> in user-library" > > > > I checked the compress stream. mem2mem case is different with > > compress-offload case > > > > compress-offload case is a full pipeline, the user sends a compress > > stream to it, then DSP decodes it and renders it to the speaker in real > > time. > > > > mem2mem is just like the decoder in the compress pipeline. which is > > one of the components in the pipeline. > > I was thinking of loopback with endpoints using compress streams, > without physical endpoint, something like: > > compress playback (to feed data from userspace) -> DSP (processing) -> > compress capture (send data back to userspace) > > Unless I'm missing something, you should be able to process data as fast > as you can feed it and consume it in such case. > Actually in the beginning I tried this, but it did not work well. ALSA needs time control for playback and capture, playback and capture needs to synchronize. Usually the playback and capture pipeline is independent in ALSA design, but in this case, the playback and capture should synchronize, they are not independent. Best regards Shengjiu Wang > Amadeusz
On 09. 05. 24 12:44, Shengjiu Wang wrote: >>> mem2mem is just like the decoder in the compress pipeline. which is >>> one of the components in the pipeline. >> >> I was thinking of loopback with endpoints using compress streams, >> without physical endpoint, something like: >> >> compress playback (to feed data from userspace) -> DSP (processing) -> >> compress capture (send data back to userspace) >> >> Unless I'm missing something, you should be able to process data as fast >> as you can feed it and consume it in such case. >> > > Actually in the beginning I tried this, but it did not work well. > ALSA needs time control for playback and capture, playback and capture > needs to synchronize. Usually the playback and capture pipeline is > independent in ALSA design, but in this case, the playback and capture > should synchronize, they are not independent. The core compress API core no strict timing constraints. You can eventually0 have two half-duplex compress devices, if you like to have really independent mechanism. If something is missing in API, you can extend this API (like to inform the user space that it's a producer/consumer processing without any relation to the real time). I like this idea. Jaroslav
On 09. 05. 24 13:13, Jaroslav Kysela wrote: > On 09. 05. 24 12:44, Shengjiu Wang wrote: >>>> mem2mem is just like the decoder in the compress pipeline. which is >>>> one of the components in the pipeline. >>> >>> I was thinking of loopback with endpoints using compress streams, >>> without physical endpoint, something like: >>> >>> compress playback (to feed data from userspace) -> DSP (processing) -> >>> compress capture (send data back to userspace) >>> >>> Unless I'm missing something, you should be able to process data as fast >>> as you can feed it and consume it in such case. >>> >> >> Actually in the beginning I tried this, but it did not work well. >> ALSA needs time control for playback and capture, playback and capture >> needs to synchronize. Usually the playback and capture pipeline is >> independent in ALSA design, but in this case, the playback and capture >> should synchronize, they are not independent. > > The core compress API core no strict timing constraints. You can eventually0 > have two half-duplex compress devices, if you like to have really independent > mechanism. If something is missing in API, you can extend this API (like to > inform the user space that it's a producer/consumer processing without any > relation to the real time). I like this idea. I was thinking more about this. If I am right, the mentioned use in gstreamer is supposed to run the conversion (DSP) job in "one shot" (can be handled using one system call like blocking ioctl). The goal is just to offload the CPU work to the DSP (co-processor). If there are no requirements for the queuing, we can implement this ioctl in the compress ALSA API easily using the data management through the dma-buf API. We can eventually define a new direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow handle this new data scheme. The API may be extended later on real demand, of course. Otherwise all pieces are already in the current ALSA compress API (capabilities, params, enumeration). The realtime controls may be created using ALSA control API. Jaroslav
Hi Jaroslav, On 5/13/24 13:56, Jaroslav Kysela wrote: > On 09. 05. 24 13:13, Jaroslav Kysela wrote: >> On 09. 05. 24 12:44, Shengjiu Wang wrote: >>>>> mem2mem is just like the decoder in the compress pipeline. which is >>>>> one of the components in the pipeline. >>>> >>>> I was thinking of loopback with endpoints using compress streams, >>>> without physical endpoint, something like: >>>> >>>> compress playback (to feed data from userspace) -> DSP (processing) -> >>>> compress capture (send data back to userspace) >>>> >>>> Unless I'm missing something, you should be able to process data as fast >>>> as you can feed it and consume it in such case. >>>> >>> >>> Actually in the beginning I tried this, but it did not work well. >>> ALSA needs time control for playback and capture, playback and capture >>> needs to synchronize. Usually the playback and capture pipeline is >>> independent in ALSA design, but in this case, the playback and capture >>> should synchronize, they are not independent. >> >> The core compress API core no strict timing constraints. You can eventually0 >> have two half-duplex compress devices, if you like to have really independent >> mechanism. If something is missing in API, you can extend this API (like to >> inform the user space that it's a producer/consumer processing without any >> relation to the real time). I like this idea. > > I was thinking more about this. If I am right, the mentioned use in gstreamer > is supposed to run the conversion (DSP) job in "one shot" (can be handled > using one system call like blocking ioctl). The goal is just to offload the > CPU work to the DSP (co-processor). If there are no requirements for the > queuing, we can implement this ioctl in the compress ALSA API easily using the > data management through the dma-buf API. We can eventually define a new > direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow > handle this new data scheme. The API may be extended later on real demand, of > course. > > Otherwise all pieces are already in the current ALSA compress API > (capabilities, params, enumeration). The realtime controls may be created > using ALSA control API. So does this mean that Shengjiu should attempt to use this ALSA approach first? If there is a way to do this reasonably cleanly in the ALSA API, then that obviously is much better from my perspective as a media maintainer. My understanding was always that it can't be done (or at least not without a major effort) in ALSA, and in that case V4L2 is a decent plan B, but based on this I gather that it is possible in ALSA after all. So can I shelf this patch series for now? Regards, Hans
On 15. 05. 24 11:17, Hans Verkuil wrote: > Hi Jaroslav, > > On 5/13/24 13:56, Jaroslav Kysela wrote: >> On 09. 05. 24 13:13, Jaroslav Kysela wrote: >>> On 09. 05. 24 12:44, Shengjiu Wang wrote: >>>>>> mem2mem is just like the decoder in the compress pipeline. which is >>>>>> one of the components in the pipeline. >>>>> >>>>> I was thinking of loopback with endpoints using compress streams, >>>>> without physical endpoint, something like: >>>>> >>>>> compress playback (to feed data from userspace) -> DSP (processing) -> >>>>> compress capture (send data back to userspace) >>>>> >>>>> Unless I'm missing something, you should be able to process data as fast >>>>> as you can feed it and consume it in such case. >>>>> >>>> >>>> Actually in the beginning I tried this, but it did not work well. >>>> ALSA needs time control for playback and capture, playback and capture >>>> needs to synchronize. Usually the playback and capture pipeline is >>>> independent in ALSA design, but in this case, the playback and capture >>>> should synchronize, they are not independent. >>> >>> The core compress API core no strict timing constraints. You can eventually0 >>> have two half-duplex compress devices, if you like to have really independent >>> mechanism. If something is missing in API, you can extend this API (like to >>> inform the user space that it's a producer/consumer processing without any >>> relation to the real time). I like this idea. >> >> I was thinking more about this. If I am right, the mentioned use in gstreamer >> is supposed to run the conversion (DSP) job in "one shot" (can be handled >> using one system call like blocking ioctl). The goal is just to offload the >> CPU work to the DSP (co-processor). If there are no requirements for the >> queuing, we can implement this ioctl in the compress ALSA API easily using the >> data management through the dma-buf API. We can eventually define a new >> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow >> handle this new data scheme. The API may be extended later on real demand, of >> course. >> >> Otherwise all pieces are already in the current ALSA compress API >> (capabilities, params, enumeration). The realtime controls may be created >> using ALSA control API. > > So does this mean that Shengjiu should attempt to use this ALSA approach first? I've not seen any argument to use v4l2 mem2mem buffer scheme for this data conversion forcefully. It looks like a simple job and ALSA APIs may be extended for this simple purpose. Shengjiu, what are your requirements for gstreamer support? Would be a new blocking ioctl enough for the initial support in the compress ALSA API? Jaroslav
On Wed, 15 May 2024 11:50:52 +0200, Jaroslav Kysela wrote: > > On 15. 05. 24 11:17, Hans Verkuil wrote: > > Hi Jaroslav, > > > > On 5/13/24 13:56, Jaroslav Kysela wrote: > >> On 09. 05. 24 13:13, Jaroslav Kysela wrote: > >>> On 09. 05. 24 12:44, Shengjiu Wang wrote: > >>>>>> mem2mem is just like the decoder in the compress pipeline. which is > >>>>>> one of the components in the pipeline. > >>>>> > >>>>> I was thinking of loopback with endpoints using compress streams, > >>>>> without physical endpoint, something like: > >>>>> > >>>>> compress playback (to feed data from userspace) -> DSP (processing) -> > >>>>> compress capture (send data back to userspace) > >>>>> > >>>>> Unless I'm missing something, you should be able to process data as fast > >>>>> as you can feed it and consume it in such case. > >>>>> > >>>> > >>>> Actually in the beginning I tried this, but it did not work well. > >>>> ALSA needs time control for playback and capture, playback and capture > >>>> needs to synchronize. Usually the playback and capture pipeline is > >>>> independent in ALSA design, but in this case, the playback and capture > >>>> should synchronize, they are not independent. > >>> > >>> The core compress API core no strict timing constraints. You can eventually0 > >>> have two half-duplex compress devices, if you like to have really independent > >>> mechanism. If something is missing in API, you can extend this API (like to > >>> inform the user space that it's a producer/consumer processing without any > >>> relation to the real time). I like this idea. > >> > >> I was thinking more about this. If I am right, the mentioned use in gstreamer > >> is supposed to run the conversion (DSP) job in "one shot" (can be handled > >> using one system call like blocking ioctl). The goal is just to offload the > >> CPU work to the DSP (co-processor). If there are no requirements for the > >> queuing, we can implement this ioctl in the compress ALSA API easily using the > >> data management through the dma-buf API. We can eventually define a new > >> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow > >> handle this new data scheme. The API may be extended later on real demand, of > >> course. > >> > >> Otherwise all pieces are already in the current ALSA compress API > >> (capabilities, params, enumeration). The realtime controls may be created > >> using ALSA control API. > > > > So does this mean that Shengjiu should attempt to use this ALSA approach first? > > I've not seen any argument to use v4l2 mem2mem buffer scheme for this > data conversion forcefully. It looks like a simple job and ALSA APIs > may be extended for this simple purpose. > > Shengjiu, what are your requirements for gstreamer support? Would be a > new blocking ioctl enough for the initial support in the compress ALSA > API? If it works with compress API, it'd be great, yeah. So, your idea is to open compress-offload devices for read and write, then and let them convert a la batch jobs without timing control? For full-duplex usages, we might need some more extensions, so that both read and write parameters can be synchronized. (So far the compress stream is a unidirectional, and the runtime buffer for a single stream.) And the buffer management is based on the fixed size fragments. I hope this doesn't matter much for the intended operation? thanks, Takashi
On 15. 05. 24 12:19, Takashi Iwai wrote: > On Wed, 15 May 2024 11:50:52 +0200, > Jaroslav Kysela wrote: >> >> On 15. 05. 24 11:17, Hans Verkuil wrote: >>> Hi Jaroslav, >>> >>> On 5/13/24 13:56, Jaroslav Kysela wrote: >>>> On 09. 05. 24 13:13, Jaroslav Kysela wrote: >>>>> On 09. 05. 24 12:44, Shengjiu Wang wrote: >>>>>>>> mem2mem is just like the decoder in the compress pipeline. which is >>>>>>>> one of the components in the pipeline. >>>>>>> >>>>>>> I was thinking of loopback with endpoints using compress streams, >>>>>>> without physical endpoint, something like: >>>>>>> >>>>>>> compress playback (to feed data from userspace) -> DSP (processing) -> >>>>>>> compress capture (send data back to userspace) >>>>>>> >>>>>>> Unless I'm missing something, you should be able to process data as fast >>>>>>> as you can feed it and consume it in such case. >>>>>>> >>>>>> >>>>>> Actually in the beginning I tried this, but it did not work well. >>>>>> ALSA needs time control for playback and capture, playback and capture >>>>>> needs to synchronize. Usually the playback and capture pipeline is >>>>>> independent in ALSA design, but in this case, the playback and capture >>>>>> should synchronize, they are not independent. >>>>> >>>>> The core compress API core no strict timing constraints. You can eventually0 >>>>> have two half-duplex compress devices, if you like to have really independent >>>>> mechanism. If something is missing in API, you can extend this API (like to >>>>> inform the user space that it's a producer/consumer processing without any >>>>> relation to the real time). I like this idea. >>>> >>>> I was thinking more about this. If I am right, the mentioned use in gstreamer >>>> is supposed to run the conversion (DSP) job in "one shot" (can be handled >>>> using one system call like blocking ioctl). The goal is just to offload the >>>> CPU work to the DSP (co-processor). If there are no requirements for the >>>> queuing, we can implement this ioctl in the compress ALSA API easily using the >>>> data management through the dma-buf API. We can eventually define a new >>>> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow >>>> handle this new data scheme. The API may be extended later on real demand, of >>>> course. >>>> >>>> Otherwise all pieces are already in the current ALSA compress API >>>> (capabilities, params, enumeration). The realtime controls may be created >>>> using ALSA control API. >>> >>> So does this mean that Shengjiu should attempt to use this ALSA approach first? >> >> I've not seen any argument to use v4l2 mem2mem buffer scheme for this >> data conversion forcefully. It looks like a simple job and ALSA APIs >> may be extended for this simple purpose. >> >> Shengjiu, what are your requirements for gstreamer support? Would be a >> new blocking ioctl enough for the initial support in the compress ALSA >> API? > > If it works with compress API, it'd be great, yeah. > So, your idea is to open compress-offload devices for read and write, > then and let them convert a la batch jobs without timing control? > > For full-duplex usages, we might need some more extensions, so that > both read and write parameters can be synchronized. (So far the > compress stream is a unidirectional, and the runtime buffer for a > single stream.) > > And the buffer management is based on the fixed size fragments. I > hope this doesn't matter much for the intended operation? It's a question, if the standard I/O is really required for this case. My quick idea was to just implement a new "direction" for this job supporting only one ioctl for the data processing which will execute the job in "one shot" at the moment. The I/O may be handled through dma-buf API (which seems to be standard nowadays for this purpose and allows future chaining). So something like: struct dsp_job { int source_fd; /* dma-buf FD with source data - for dma_buf_get() */ int target_fd; /* dma-buf FD for target data - for dma_buf_get() */ ... maybe some extra data size members here ... ... maybe some special parameters here ... }; #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job) This ioctl will be blocking (thus synced). My question is, if it's feasible for gstreamer or not. For this particular case, if the rate conversion is implemented in software, it will block the gstreamer data processing, too. Jaroslav
On Wed, May 15, 2024 at 6:46 PM Jaroslav Kysela <perex@perex.cz> wrote: > > On 15. 05. 24 12:19, Takashi Iwai wrote: > > On Wed, 15 May 2024 11:50:52 +0200, > > Jaroslav Kysela wrote: > >> > >> On 15. 05. 24 11:17, Hans Verkuil wrote: > >>> Hi Jaroslav, > >>> > >>> On 5/13/24 13:56, Jaroslav Kysela wrote: > >>>> On 09. 05. 24 13:13, Jaroslav Kysela wrote: > >>>>> On 09. 05. 24 12:44, Shengjiu Wang wrote: > >>>>>>>> mem2mem is just like the decoder in the compress pipeline. which is > >>>>>>>> one of the components in the pipeline. > >>>>>>> > >>>>>>> I was thinking of loopback with endpoints using compress streams, > >>>>>>> without physical endpoint, something like: > >>>>>>> > >>>>>>> compress playback (to feed data from userspace) -> DSP (processing) -> > >>>>>>> compress capture (send data back to userspace) > >>>>>>> > >>>>>>> Unless I'm missing something, you should be able to process data as fast > >>>>>>> as you can feed it and consume it in such case. > >>>>>>> > >>>>>> > >>>>>> Actually in the beginning I tried this, but it did not work well. > >>>>>> ALSA needs time control for playback and capture, playback and capture > >>>>>> needs to synchronize. Usually the playback and capture pipeline is > >>>>>> independent in ALSA design, but in this case, the playback and capture > >>>>>> should synchronize, they are not independent. > >>>>> > >>>>> The core compress API core no strict timing constraints. You can eventually0 > >>>>> have two half-duplex compress devices, if you like to have really independent > >>>>> mechanism. If something is missing in API, you can extend this API (like to > >>>>> inform the user space that it's a producer/consumer processing without any > >>>>> relation to the real time). I like this idea. > >>>> > >>>> I was thinking more about this. If I am right, the mentioned use in gstreamer > >>>> is supposed to run the conversion (DSP) job in "one shot" (can be handled > >>>> using one system call like blocking ioctl). The goal is just to offload the > >>>> CPU work to the DSP (co-processor). If there are no requirements for the > >>>> queuing, we can implement this ioctl in the compress ALSA API easily using the > >>>> data management through the dma-buf API. We can eventually define a new > >>>> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow > >>>> handle this new data scheme. The API may be extended later on real demand, of > >>>> course. > >>>> > >>>> Otherwise all pieces are already in the current ALSA compress API > >>>> (capabilities, params, enumeration). The realtime controls may be created > >>>> using ALSA control API. > >>> > >>> So does this mean that Shengjiu should attempt to use this ALSA approach first? > >> > >> I've not seen any argument to use v4l2 mem2mem buffer scheme for this > >> data conversion forcefully. It looks like a simple job and ALSA APIs > >> may be extended for this simple purpose. > >> > >> Shengjiu, what are your requirements for gstreamer support? Would be a > >> new blocking ioctl enough for the initial support in the compress ALSA > >> API? > > > > If it works with compress API, it'd be great, yeah. > > So, your idea is to open compress-offload devices for read and write, > > then and let them convert a la batch jobs without timing control? > > > > For full-duplex usages, we might need some more extensions, so that > > both read and write parameters can be synchronized. (So far the > > compress stream is a unidirectional, and the runtime buffer for a > > single stream.) > > > > And the buffer management is based on the fixed size fragments. I > > hope this doesn't matter much for the intended operation? > > It's a question, if the standard I/O is really required for this case. My > quick idea was to just implement a new "direction" for this job supporting > only one ioctl for the data processing which will execute the job in "one > shot" at the moment. The I/O may be handled through dma-buf API (which seems > to be standard nowadays for this purpose and allows future chaining). > > So something like: > > struct dsp_job { > int source_fd; /* dma-buf FD with source data - for dma_buf_get() */ > int target_fd; /* dma-buf FD for target data - for dma_buf_get() */ > ... maybe some extra data size members here ... > ... maybe some special parameters here ... > }; > > #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job) > > This ioctl will be blocking (thus synced). My question is, if it's feasible > for gstreamer or not. For this particular case, if the rate conversion is > implemented in software, it will block the gstreamer data processing, too. > Thanks. I have several questions: 1. Compress API alway binds to a sound card. Can we avoid that? For ASRC, it is just one component, 2. Compress API doesn't seem to support mmap(). Is this a problem for sending and getting data to/from the driver? 3. How does the user get output data from ASRC after each conversion? it should happen every period. best regards Shengjiu Wang.
On 5/9/24 06:13, Jaroslav Kysela wrote: > On 09. 05. 24 12:44, Shengjiu Wang wrote: >>>> mem2mem is just like the decoder in the compress pipeline. which is >>>> one of the components in the pipeline. >>> >>> I was thinking of loopback with endpoints using compress streams, >>> without physical endpoint, something like: >>> >>> compress playback (to feed data from userspace) -> DSP (processing) -> >>> compress capture (send data back to userspace) >>> >>> Unless I'm missing something, you should be able to process data as fast >>> as you can feed it and consume it in such case. >>> >> >> Actually in the beginning I tried this, but it did not work well. >> ALSA needs time control for playback and capture, playback and capture >> needs to synchronize. Usually the playback and capture pipeline is >> independent in ALSA design, but in this case, the playback and capture >> should synchronize, they are not independent. > > The core compress API core no strict timing constraints. You can > eventually0 have two half-duplex compress devices, if you like to have > really independent mechanism. If something is missing in API, you can > extend this API (like to inform the user space that it's a > producer/consumer processing without any relation to the real time). I > like this idea. The compress API was never intended to be used this way. It was meant to send compressed data to a DSP for rendering, and keep the host processor in a low-power state while the DSP local buffer was drained. There was no intent to do a loop back to the host, because that keeps the host in a high-power state and probably negates the power savings due to a DSP. The other problem with the loopback is that the compress stuff is usually a "Front-End" in ASoC/DPCM parlance, and we don't have a good way to do a loopback between Front-Ends. The entire framework is based on FEs being connected to BEs. One problem that I can see for ASRC is that it's not clear when the data will be completely processed on the "capture" stream when you stop the "playback" stream. There's a non-zero risk of having a truncated output or waiting for data that will never be generated. In other words, it might be possible to reuse/extend the compress API for a 'coprocessor' approach without any rendering to traditional interfaces, but it's uncharted territory.
Hi, GStreamer hat on ... Le mercredi 15 mai 2024 à 12:46 +0200, Jaroslav Kysela a écrit : > On 15. 05. 24 12:19, Takashi Iwai wrote: > > On Wed, 15 May 2024 11:50:52 +0200, > > Jaroslav Kysela wrote: > > > > > > On 15. 05. 24 11:17, Hans Verkuil wrote: > > > > Hi Jaroslav, > > > > > > > > On 5/13/24 13:56, Jaroslav Kysela wrote: > > > > > On 09. 05. 24 13:13, Jaroslav Kysela wrote: > > > > > > On 09. 05. 24 12:44, Shengjiu Wang wrote: > > > > > > > > > mem2mem is just like the decoder in the compress pipeline. which is > > > > > > > > > one of the components in the pipeline. > > > > > > > > > > > > > > > > I was thinking of loopback with endpoints using compress streams, > > > > > > > > without physical endpoint, something like: > > > > > > > > > > > > > > > > compress playback (to feed data from userspace) -> DSP (processing) -> > > > > > > > > compress capture (send data back to userspace) > > > > > > > > > > > > > > > > Unless I'm missing something, you should be able to process data as fast > > > > > > > > as you can feed it and consume it in such case. > > > > > > > > > > > > > > > > > > > > > > Actually in the beginning I tried this, but it did not work well. > > > > > > > ALSA needs time control for playback and capture, playback and capture > > > > > > > needs to synchronize. Usually the playback and capture pipeline is > > > > > > > independent in ALSA design, but in this case, the playback and capture > > > > > > > should synchronize, they are not independent. > > > > > > > > > > > > The core compress API core no strict timing constraints. You can eventually0 > > > > > > have two half-duplex compress devices, if you like to have really independent > > > > > > mechanism. If something is missing in API, you can extend this API (like to > > > > > > inform the user space that it's a producer/consumer processing without any > > > > > > relation to the real time). I like this idea. > > > > > > > > > > I was thinking more about this. If I am right, the mentioned use in gstreamer > > > > > is supposed to run the conversion (DSP) job in "one shot" (can be handled > > > > > using one system call like blocking ioctl). The goal is just to offload the > > > > > CPU work to the DSP (co-processor). If there are no requirements for the > > > > > queuing, we can implement this ioctl in the compress ALSA API easily using the > > > > > data management through the dma-buf API. We can eventually define a new > > > > > direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow > > > > > handle this new data scheme. The API may be extended later on real demand, of > > > > > course. > > > > > > > > > > Otherwise all pieces are already in the current ALSA compress API > > > > > (capabilities, params, enumeration). The realtime controls may be created > > > > > using ALSA control API. > > > > > > > > So does this mean that Shengjiu should attempt to use this ALSA approach first? > > > > > > I've not seen any argument to use v4l2 mem2mem buffer scheme for this > > > data conversion forcefully. It looks like a simple job and ALSA APIs > > > may be extended for this simple purpose. > > > > > > Shengjiu, what are your requirements for gstreamer support? Would be a > > > new blocking ioctl enough for the initial support in the compress ALSA > > > API? > > > > If it works with compress API, it'd be great, yeah. > > So, your idea is to open compress-offload devices for read and write, > > then and let them convert a la batch jobs without timing control? > > > > For full-duplex usages, we might need some more extensions, so that > > both read and write parameters can be synchronized. (So far the > > compress stream is a unidirectional, and the runtime buffer for a > > single stream.) > > > > And the buffer management is based on the fixed size fragments. I > > hope this doesn't matter much for the intended operation? > > It's a question, if the standard I/O is really required for this case. My > quick idea was to just implement a new "direction" for this job supporting > only one ioctl for the data processing which will execute the job in "one > shot" at the moment. The I/O may be handled through dma-buf API (which seems > to be standard nowadays for this purpose and allows future chaining). > > So something like: > > struct dsp_job { > int source_fd; /* dma-buf FD with source data - for dma_buf_get() */ > int target_fd; /* dma-buf FD for target data - for dma_buf_get() */ > ... maybe some extra data size members here ... > ... maybe some special parameters here ... > }; > > #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job) > > This ioctl will be blocking (thus synced). My question is, if it's feasible > for gstreamer or not. For this particular case, if the rate conversion is > implemented in software, it will block the gstreamer data processing, too. Yes, GStreamer threading is using a push-back model, so blocking for the time of the processing is fine. Note that the extra simplicity will suffer from ioctl() latency. In GFX, they solve this issue with fences. That allow setting up the next operation in the chain before the data has been produced. In V4L2, we solve this with queues. It allows preparing the next job, while the processing of the current job is happening. If you look at v4l2convert code in gstreamer (for simple m2m), it currently makes no use of the queues, it simply synchronously process the frames. There is two option, where it does not matter that much, or no one is using it :-D Video decoders and encoders (stateful) do run input / output from different thread to benefit from the queued. regards, Nicolas > > Jaroslav >
On 15. 05. 24 22:33, Nicolas Dufresne wrote: > Hi, > > GStreamer hat on ... > > Le mercredi 15 mai 2024 à 12:46 +0200, Jaroslav Kysela a écrit : >> On 15. 05. 24 12:19, Takashi Iwai wrote: >>> On Wed, 15 May 2024 11:50:52 +0200, >>> Jaroslav Kysela wrote: >>>> >>>> On 15. 05. 24 11:17, Hans Verkuil wrote: >>>>> Hi Jaroslav, >>>>> >>>>> On 5/13/24 13:56, Jaroslav Kysela wrote: >>>>>> On 09. 05. 24 13:13, Jaroslav Kysela wrote: >>>>>>> On 09. 05. 24 12:44, Shengjiu Wang wrote: >>>>>>>>>> mem2mem is just like the decoder in the compress pipeline. which is >>>>>>>>>> one of the components in the pipeline. >>>>>>>>> >>>>>>>>> I was thinking of loopback with endpoints using compress streams, >>>>>>>>> without physical endpoint, something like: >>>>>>>>> >>>>>>>>> compress playback (to feed data from userspace) -> DSP (processing) -> >>>>>>>>> compress capture (send data back to userspace) >>>>>>>>> >>>>>>>>> Unless I'm missing something, you should be able to process data as fast >>>>>>>>> as you can feed it and consume it in such case. >>>>>>>>> >>>>>>>> >>>>>>>> Actually in the beginning I tried this, but it did not work well. >>>>>>>> ALSA needs time control for playback and capture, playback and capture >>>>>>>> needs to synchronize. Usually the playback and capture pipeline is >>>>>>>> independent in ALSA design, but in this case, the playback and capture >>>>>>>> should synchronize, they are not independent. >>>>>>> >>>>>>> The core compress API core no strict timing constraints. You can eventually0 >>>>>>> have two half-duplex compress devices, if you like to have really independent >>>>>>> mechanism. If something is missing in API, you can extend this API (like to >>>>>>> inform the user space that it's a producer/consumer processing without any >>>>>>> relation to the real time). I like this idea. >>>>>> >>>>>> I was thinking more about this. If I am right, the mentioned use in gstreamer >>>>>> is supposed to run the conversion (DSP) job in "one shot" (can be handled >>>>>> using one system call like blocking ioctl). The goal is just to offload the >>>>>> CPU work to the DSP (co-processor). If there are no requirements for the >>>>>> queuing, we can implement this ioctl in the compress ALSA API easily using the >>>>>> data management through the dma-buf API. We can eventually define a new >>>>>> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow >>>>>> handle this new data scheme. The API may be extended later on real demand, of >>>>>> course. >>>>>> >>>>>> Otherwise all pieces are already in the current ALSA compress API >>>>>> (capabilities, params, enumeration). The realtime controls may be created >>>>>> using ALSA control API. >>>>> >>>>> So does this mean that Shengjiu should attempt to use this ALSA approach first? >>>> >>>> I've not seen any argument to use v4l2 mem2mem buffer scheme for this >>>> data conversion forcefully. It looks like a simple job and ALSA APIs >>>> may be extended for this simple purpose. >>>> >>>> Shengjiu, what are your requirements for gstreamer support? Would be a >>>> new blocking ioctl enough for the initial support in the compress ALSA >>>> API? >>> >>> If it works with compress API, it'd be great, yeah. >>> So, your idea is to open compress-offload devices for read and write, >>> then and let them convert a la batch jobs without timing control? >>> >>> For full-duplex usages, we might need some more extensions, so that >>> both read and write parameters can be synchronized. (So far the >>> compress stream is a unidirectional, and the runtime buffer for a >>> single stream.) >>> >>> And the buffer management is based on the fixed size fragments. I >>> hope this doesn't matter much for the intended operation? >> >> It's a question, if the standard I/O is really required for this case. My >> quick idea was to just implement a new "direction" for this job supporting >> only one ioctl for the data processing which will execute the job in "one >> shot" at the moment. The I/O may be handled through dma-buf API (which seems >> to be standard nowadays for this purpose and allows future chaining). >> >> So something like: >> >> struct dsp_job { >> int source_fd; /* dma-buf FD with source data - for dma_buf_get() */ >> int target_fd; /* dma-buf FD for target data - for dma_buf_get() */ >> ... maybe some extra data size members here ... >> ... maybe some special parameters here ... >> }; >> >> #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job) >> >> This ioctl will be blocking (thus synced). My question is, if it's feasible >> for gstreamer or not. For this particular case, if the rate conversion is >> implemented in software, it will block the gstreamer data processing, too. > > Yes, GStreamer threading is using a push-back model, so blocking for the time of > the processing is fine. Note that the extra simplicity will suffer from ioctl() > latency. > > In GFX, they solve this issue with fences. That allow setting up the next > operation in the chain before the data has been produced. The fences look really nicely and seem more modern. It should be possible with dma-buf/sync_file.c interface to handle multiple jobs simultaneously and share the state between user space and kernel driver. In this case, I think that two non-blocking ioctls should be enough - add a new job with source/target dma buffers guarded by one fence and abort (flush) all active jobs. I'll try to propose an API extension for the ALSA's compress API in the linux-sound mailing list soon. Jaroslav
On 15. 05. 24 15:34, Shengjiu Wang wrote: > On Wed, May 15, 2024 at 6:46 PM Jaroslav Kysela <perex@perex.cz> wrote: >> >> On 15. 05. 24 12:19, Takashi Iwai wrote: >>> On Wed, 15 May 2024 11:50:52 +0200, >>> Jaroslav Kysela wrote: >>>> >>>> On 15. 05. 24 11:17, Hans Verkuil wrote: >>>>> Hi Jaroslav, >>>>> >>>>> On 5/13/24 13:56, Jaroslav Kysela wrote: >>>>>> On 09. 05. 24 13:13, Jaroslav Kysela wrote: >>>>>>> On 09. 05. 24 12:44, Shengjiu Wang wrote: >>>>>>>>>> mem2mem is just like the decoder in the compress pipeline. which is >>>>>>>>>> one of the components in the pipeline. >>>>>>>>> >>>>>>>>> I was thinking of loopback with endpoints using compress streams, >>>>>>>>> without physical endpoint, something like: >>>>>>>>> >>>>>>>>> compress playback (to feed data from userspace) -> DSP (processing) -> >>>>>>>>> compress capture (send data back to userspace) >>>>>>>>> >>>>>>>>> Unless I'm missing something, you should be able to process data as fast >>>>>>>>> as you can feed it and consume it in such case. >>>>>>>>> >>>>>>>> >>>>>>>> Actually in the beginning I tried this, but it did not work well. >>>>>>>> ALSA needs time control for playback and capture, playback and capture >>>>>>>> needs to synchronize. Usually the playback and capture pipeline is >>>>>>>> independent in ALSA design, but in this case, the playback and capture >>>>>>>> should synchronize, they are not independent. >>>>>>> >>>>>>> The core compress API core no strict timing constraints. You can eventually0 >>>>>>> have two half-duplex compress devices, if you like to have really independent >>>>>>> mechanism. If something is missing in API, you can extend this API (like to >>>>>>> inform the user space that it's a producer/consumer processing without any >>>>>>> relation to the real time). I like this idea. >>>>>> >>>>>> I was thinking more about this. If I am right, the mentioned use in gstreamer >>>>>> is supposed to run the conversion (DSP) job in "one shot" (can be handled >>>>>> using one system call like blocking ioctl). The goal is just to offload the >>>>>> CPU work to the DSP (co-processor). If there are no requirements for the >>>>>> queuing, we can implement this ioctl in the compress ALSA API easily using the >>>>>> data management through the dma-buf API. We can eventually define a new >>>>>> direction (enum snd_compr_direction) like SND_COMPRESS_CONVERT or so to allow >>>>>> handle this new data scheme. The API may be extended later on real demand, of >>>>>> course. >>>>>> >>>>>> Otherwise all pieces are already in the current ALSA compress API >>>>>> (capabilities, params, enumeration). The realtime controls may be created >>>>>> using ALSA control API. >>>>> >>>>> So does this mean that Shengjiu should attempt to use this ALSA approach first? >>>> >>>> I've not seen any argument to use v4l2 mem2mem buffer scheme for this >>>> data conversion forcefully. It looks like a simple job and ALSA APIs >>>> may be extended for this simple purpose. >>>> >>>> Shengjiu, what are your requirements for gstreamer support? Would be a >>>> new blocking ioctl enough for the initial support in the compress ALSA >>>> API? >>> >>> If it works with compress API, it'd be great, yeah. >>> So, your idea is to open compress-offload devices for read and write, >>> then and let them convert a la batch jobs without timing control? >>> >>> For full-duplex usages, we might need some more extensions, so that >>> both read and write parameters can be synchronized. (So far the >>> compress stream is a unidirectional, and the runtime buffer for a >>> single stream.) >>> >>> And the buffer management is based on the fixed size fragments. I >>> hope this doesn't matter much for the intended operation? >> >> It's a question, if the standard I/O is really required for this case. My >> quick idea was to just implement a new "direction" for this job supporting >> only one ioctl for the data processing which will execute the job in "one >> shot" at the moment. The I/O may be handled through dma-buf API (which seems >> to be standard nowadays for this purpose and allows future chaining). >> >> So something like: >> >> struct dsp_job { >> int source_fd; /* dma-buf FD with source data - for dma_buf_get() */ >> int target_fd; /* dma-buf FD for target data - for dma_buf_get() */ >> ... maybe some extra data size members here ... >> ... maybe some special parameters here ... >> }; >> >> #define SNDRV_COMPRESS_DSPJOB _IOWR('C', 0x60, struct dsp_job) >> >> This ioctl will be blocking (thus synced). My question is, if it's feasible >> for gstreamer or not. For this particular case, if the rate conversion is >> implemented in software, it will block the gstreamer data processing, too. >> > > Thanks. > > I have several questions: > 1. Compress API alway binds to a sound card. Can we avoid that? > For ASRC, it is just one component, Is this a real issue? Usually, I would expect a sound hardware (card) presence when ASRC is available, or not? Eventually, a separate sound card with one compress device may be created, too. For enumeration - the user space may just iterate through all sound cards / compress devices to find ASRC in the system. The devices/interfaces in the sound card are independent. Also, USB MIDI converters offer only one serial MIDI interface for example, too. > 2. Compress API doesn't seem to support mmap(). Is this a problem > for sending and getting data to/from the driver? I proposed to use dma-buf for I/O (separate source and target buffer). > 3. How does the user get output data from ASRC after each conversion? > it should happen every period. target dma-buf Jaroslav
On 16. 05. 24 16:50, Jaroslav Kysela wrote: > On 15. 05. 24 22:33, Nicolas Dufresne wrote: >> In GFX, they solve this issue with fences. That allow setting up the next >> operation in the chain before the data has been produced. > > The fences look really nicely and seem more modern. It should be possible with > dma-buf/sync_file.c interface to handle multiple jobs simultaneously and share > the state between user space and kernel driver. > > In this case, I think that two non-blocking ioctls should be enough - add a > new job with source/target dma buffers guarded by one fence and abort (flush) > all active jobs. > > I'll try to propose an API extension for the ALSA's compress API in the > linux-sound mailing list soon. I found using sync_file during the implementation to be overkill for resource management, so I proposed a simple queue with the standard poll mechanism. https://lore.kernel.org/linux-sound/20240527071133.223066-1-perex@perex.cz/ Jaroslav