Message ID | 20200816172246.69146-1-ezequiel@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] Experimental DMA-BUF Device Heaps | expand |
Hi Ezequiel, On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: > This heap is basically a wrapper around DMA-API dma_alloc_attrs, > which will allocate memory suitable for the given device. > > The implementation is mostly a port of the Contiguous Videobuf2 > memory allocator (see videobuf2/videobuf2-dma-contig.c) > over to the DMA-BUF Heap interface. > > The intention of this allocator is to provide applications > with a more system-agnostic API: the only thing the application > needs to know is which device to get the buffer for. > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool > is unknown to the application. > > I'm not really expecting this patch to be correct or even > a good idea, but just submitting it to start a discussion on DMA-BUF > heap discovery and negotiation. > My initial reaction is that I thought dmabuf heaps are meant for use to allocate buffers for sharing across devices, which doesn't fit very well with having per-device heaps. For single-device allocations, would using the buffer allocation functionality of that device's native API be better in most cases? (Some other possibly relevant discussion at [1]) I can see that this can save some boilerplate for devices that want to expose private chunks of memory, but might it also lead to 100 aliases for the system's generic coherent memory pool? I wonder if a set of helpers to allow devices to expose whatever they want with minimal effort would be better. Cheers, -Brian 1. https://lore.kernel.org/dri-devel/57062477-30e7-a3de-6723-a50d03a402c4@kapsi.fi/ > Given Plumbers is just a couple weeks from now, I've submitted > a BoF proposal to discuss this, as perhaps it would make > sense to discuss this live? > > Not-signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
On 8/17/20 8:18 AM, Brian Starkey wrote: > Hi Ezequiel, > > On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: >> This heap is basically a wrapper around DMA-API dma_alloc_attrs, >> which will allocate memory suitable for the given device. >> >> The implementation is mostly a port of the Contiguous Videobuf2 >> memory allocator (see videobuf2/videobuf2-dma-contig.c) >> over to the DMA-BUF Heap interface. >> >> The intention of this allocator is to provide applications >> with a more system-agnostic API: the only thing the application >> needs to know is which device to get the buffer for. >> >> Whether the buffer is backed by CMA, IOMMU or a DMA Pool >> is unknown to the application. >> >> I'm not really expecting this patch to be correct or even >> a good idea, but just submitting it to start a discussion on DMA-BUF >> heap discovery and negotiation. >> > > My initial reaction is that I thought dmabuf heaps are meant for use > to allocate buffers for sharing across devices, which doesn't fit very > well with having per-device heaps. > > For single-device allocations, would using the buffer allocation > functionality of that device's native API be better in most > cases? (Some other possibly relevant discussion at [1]) > > I can see that this can save some boilerplate for devices that want > to expose private chunks of memory, but might it also lead to 100 > aliases for the system's generic coherent memory pool? > > I wonder if a set of helpers to allow devices to expose whatever they > want with minimal effort would be better. I'm rather interested on where this goes, as I was toying with using some sort of heap ID as a basis for a "device-local" constraint in the memory constraints proposals Simon and I will be discussing at XDC this year. It would be rather elegant if there was one type of heap ID used universally throughout the kernel that could provide a unique handle for the shared system memory heap(s), as well as accelerator-local heaps on fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could negotiate a location among themselves. This patch seems to be a step towards that in a way, but I agree it would be counterproductive if a bunch of devices that were using the same underlying system memory ended up each getting their own heap ID just because they used some SW framework that worked that way. Would appreciate it if you could send along a pointer to your BoF if it happens! Thanks, -James > Cheers, > -Brian > > 1. https://lore.kernel.org/dri-devel/57062477-30e7-a3de-6723-a50d03a402c4@kapsi.fi/ > >> Given Plumbers is just a couple weeks from now, I've submitted >> a BoF proposal to discuss this, as perhaps it would make >> sense to discuss this live? >> >> Not-signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
On Sun, Aug 16, 2020 at 10:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > This heap is basically a wrapper around DMA-API dma_alloc_attrs, > which will allocate memory suitable for the given device. > > The implementation is mostly a port of the Contiguous Videobuf2 > memory allocator (see videobuf2/videobuf2-dma-contig.c) > over to the DMA-BUF Heap interface. > > The intention of this allocator is to provide applications > with a more system-agnostic API: the only thing the application > needs to know is which device to get the buffer for. > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool > is unknown to the application. My hesitancy here is that the main reason we have DMA BUF Heaps, and ION before it, was to expose different types of memory allocations to userspace. The main premise that often these buffers are shared with multiple devices, which have differing constraints and it is userspace that best understands the path a buffer will take through a series of devices. So userspace is best positioned to determine what type of memory should be allocated to satisfy the various devices constraints (there were some design attempts to allow DMA BUFs to use multiple attach with deferred alloc at map time to handle this constraint solving in-kernel, but it was never adopted in practice). This however, requires some system specific policy - implemented in the Android userspace by gralloc which maps "usage" types (device pipeline flows) to heaps. I liken it to fstab, which helps map mount points to partitions - it's not going to be the same on every system. What you seem to be proposing seems a bit contrary to this premise - Userland doesn't know what type of memory it needs, but given a device can somehow find the heap it should allocate for? This seems to assume the buffer is only to be used with a single device? There was at some point a discussion where folks (maybe it was DanielV? I can't remember...) suggested having something like a sysfs device node link from a device to a dma-buf heap chardev. This seems like it would add a similar behavior as what you're proposing, however without adding possibly a ton of new device specific heaps to the /dev/dma_heap/ dir. However, we would potentially need any missing heap types added first. > I'm not really expecting this patch to be correct or even > a good idea, but just submitting it to start a discussion on DMA-BUF > heap discovery and negotiation. > > Given Plumbers is just a couple weeks from now, I've submitted > a BoF proposal to discuss this, as perhaps it would make > sense to discuss this live? I do think it's an interesting idea. I agree that having every driver implement a dmabuf exporter is a bit silly, but I also think Brian's point that maybe we need some drm helper functions that provide similar functionality along with a more standardized device ioctl for single device allocations might be better. thanks -john
Hi Brian, Thanks a lot for reviewing this patch. I'm happy my humble proof-of-concept is able to spark some interest and raise some questions. On Mon, 2020-08-17 at 16:18 +0100, Brian Starkey wrote: > Hi Ezequiel, > > On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: > > This heap is basically a wrapper around DMA-API dma_alloc_attrs, > > which will allocate memory suitable for the given device. > > > > The implementation is mostly a port of the Contiguous Videobuf2 > > memory allocator (see videobuf2/videobuf2-dma-contig.c) > > over to the DMA-BUF Heap interface. > > > > The intention of this allocator is to provide applications > > with a more system-agnostic API: the only thing the application > > needs to know is which device to get the buffer for. > > > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool > > is unknown to the application. > > > > I'm not really expecting this patch to be correct or even > > a good idea, but just submitting it to start a discussion on DMA-BUF > > heap discovery and negotiation. > > > > My initial reaction is that I thought dmabuf heaps are meant for use > to allocate buffers for sharing across devices, which doesn't fit very > well with having per-device heaps. > That's true. > For single-device allocations, would using the buffer allocation > functionality of that device's native API be better in most > cases? (Some other possibly relevant discussion at [1]) > That may be true for existing subsystems. However, how about a new subsystem/API wanting to leverage the heap API and avoid exposing yet another allocator API? And then, also, if we have an allocator API, perhaps we could imagine a future where applications would only need to worry about that, and not about each per-framework allocator. > I can see that this can save some boilerplate for devices that want > to expose private chunks of memory, but might it also lead to 100 > aliases for the system's generic coherent memory pool? > The idea (even if PoC) was to let drivers decide if they are special enough to add themselves (via dev_dma_heap_add). Aliasing the heap would be ceratainly too much (although trying it for every device was a fun experiment to watch). Thanks, Ezequiel > I wonder if a set of helpers to allow devices to expose whatever they > want with minimal effort would be better. > > Cheers, > -Brian > > 1. https://lore.kernel.org/dri-devel/57062477-30e7-a3de-6723-a50d03a402c4@kapsi.fi/ > > > Given Plumbers is just a couple weeks from now, I've submitted > > a BoF proposal to discuss this, as perhaps it would make > > sense to discuss this live? > > > > Not-signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
On Mon, 2020-08-17 at 20:49 -0700, James Jones wrote: > On 8/17/20 8:18 AM, Brian Starkey wrote: > > Hi Ezequiel, > > > > On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: > > > This heap is basically a wrapper around DMA-API dma_alloc_attrs, > > > which will allocate memory suitable for the given device. > > > > > > The implementation is mostly a port of the Contiguous Videobuf2 > > > memory allocator (see videobuf2/videobuf2-dma-contig.c) > > > over to the DMA-BUF Heap interface. > > > > > > The intention of this allocator is to provide applications > > > with a more system-agnostic API: the only thing the application > > > needs to know is which device to get the buffer for. > > > > > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool > > > is unknown to the application. > > > > > > I'm not really expecting this patch to be correct or even > > > a good idea, but just submitting it to start a discussion on DMA-BUF > > > heap discovery and negotiation. > > > > > > > My initial reaction is that I thought dmabuf heaps are meant for use > > to allocate buffers for sharing across devices, which doesn't fit very > > well with having per-device heaps. > > > > For single-device allocations, would using the buffer allocation > > functionality of that device's native API be better in most > > cases? (Some other possibly relevant discussion at [1]) > > > > I can see that this can save some boilerplate for devices that want > > to expose private chunks of memory, but might it also lead to 100 > > aliases for the system's generic coherent memory pool? > > > > I wonder if a set of helpers to allow devices to expose whatever they > > want with minimal effort would be better. > > I'm rather interested on where this goes, as I was toying with using > some sort of heap ID as a basis for a "device-local" constraint in the > memory constraints proposals Simon and I will be discussing at XDC this > year. It would be rather elegant if there was one type of heap ID used > universally throughout the kernel that could provide a unique handle for > the shared system memory heap(s), as well as accelerator-local heaps on > fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could > negotiate a location among themselves. This patch seems to be a step > towards that in a way, but I agree it would be counterproductive if a > bunch of devices that were using the same underlying system memory ended > up each getting their own heap ID just because they used some SW > framework that worked that way. > > Would appreciate it if you could send along a pointer to your BoF if it > happens! > Here is it: https://linuxplumbersconf.org/event/7/contributions/818/ It would be great to see you there and discuss this, given I was hoping we could talk about how to meet a userspace allocator library expectations as well. Thanks, Ezequiel > Thanks, > -James > > > Cheers, > > -Brian > > > > 1. https://lore.kernel.org/dri-devel/57062477-30e7-a3de-6723-a50d03a402c4@kapsi.fi/ > > > > > Given Plumbers is just a couple weeks from now, I've submitted > > > a BoF proposal to discuss this, as perhaps it would make > > > sense to discuss this live? > > > > > > Not-signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
Hi John, Thanks a ton for taking the time to go thru this. On Mon, 2020-08-17 at 21:13 -0700, John Stultz wrote: > On Sun, Aug 16, 2020 at 10:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > This heap is basically a wrapper around DMA-API dma_alloc_attrs, > > which will allocate memory suitable for the given device. > > > > The implementation is mostly a port of the Contiguous Videobuf2 > > memory allocator (see videobuf2/videobuf2-dma-contig.c) > > over to the DMA-BUF Heap interface. > > > > The intention of this allocator is to provide applications > > with a more system-agnostic API: the only thing the application > > needs to know is which device to get the buffer for. > > > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool > > is unknown to the application. > > My hesitancy here is that the main reason we have DMA BUF Heaps, and > ION before it, was to expose different types of memory allocations to > userspace. The main premise that often these buffers are shared with > multiple devices, which have differing constraints and it is userspace > that best understands the path a buffer will take through a series of > devices. So userspace is best positioned to determine what type of > memory should be allocated to satisfy the various devices constraints > (there were some design attempts to allow DMA BUFs to use multiple > attach with deferred alloc at map time to handle this constraint > solving in-kernel, but it was never adopted in practice). > > This however, requires some system specific policy - implemented in > the Android userspace by gralloc which maps "usage" types (device > pipeline flows) to heaps. I liken it to fstab, which helps map mount > points to partitions - it's not going to be the same on every system. > > What you seem to be proposing seems a bit contrary to this premise - > Userland doesn't know what type of memory it needs, but given a device > can somehow find the heap it should allocate for? This seems to assume > the buffer is only to be used with a single device? > Single-device usage wasn't the intention. I see now that this patch looks too naive and it's confusing. The idea is of course to get buffers that can be shared. I'm thinking you need to share your picture buffer with a decoder, a renderer, possibly something else. Each with its own set of constraints and limitations. Of course, a buffer that works for device A may be unsuitable for device B and so this per-device heap is surely way too naive. As you rightly mention, the main intention of this RFC is to question exactly the current premise: "userspace knows". I fail to see how will (generic and system-agnostic) applications know which heap to use. Just for completion, let me throw a simple example: i.MX 8M and some Rockchip platforms share the same VPU block, except the latter have an IOMMU. So applications would need to query an iommu presence to get buffer from CMA or not. > There was at some point a discussion where folks (maybe it was > DanielV? I can't remember...) suggested having something like a sysfs > device node link from a device to a dma-buf heap chardev. This seems > like it would add a similar behavior as what you're proposing, however > without adding possibly a ton of new device specific heaps to the > /dev/dma_heap/ dir. However, we would potentially need any missing > heap types added first. > > > I'm not really expecting this patch to be correct or even > > a good idea, but just submitting it to start a discussion on DMA-BUF > > heap discovery and negotiation. > > > > Given Plumbers is just a couple weeks from now, I've submitted > > a BoF proposal to discuss this, as perhaps it would make > > sense to discuss this live? > > I do think it's an interesting idea. I agree that having every driver > implement a dmabuf exporter is a bit silly, but I also think Brian's > point that maybe we need some drm helper functions that provide > similar functionality along with a more standardized device ioctl for > single device allocations might be better. > I'm unsure we should treat single device specially. Exposing device limitations and constraints properly, allowing some sort of negotation would hopefully solve single device and shared requirements. Thanks, Ezequiel > thanks > -john
Hi, On Thu, Aug 20, 2020 at 09:07:29AM +0100, Ezequiel Garcia wrote: > > For single-device allocations, would using the buffer allocation > > functionality of that device's native API be better in most > > cases? (Some other possibly relevant discussion at [1]) > > > > That may be true for existing subsystems. > > However, how about a new subsystem/API wanting to > leverage the heap API and avoid exposing yet another > allocator API? > > And then, also, if we have an allocator API, perhaps > we could imagine a future where applications would only > need to worry about that, and not about each per-framework > allocator. Yeah both are reasonable points. I was thinking in the context of the thread I linked where allocating lots of GEM handles for process-local use is preferable to importing loads of dma_buf fds, but in that case the userspace graphics driver is somewhat "special" rather than a generic application in the traditional sense. I do think the best usage model for applications though is to use libraries which can hide the details, instead of going at the kernel APIs directly. > > > I can see that this can save some boilerplate for devices that want > > to expose private chunks of memory, but might it also lead to 100 > > aliases for the system's generic coherent memory pool? > > > > The idea (even if PoC) was to let drivers decide if they are special > enough to add themselves (via dev_dma_heap_add). OK, that makes sense. I think it's tricky to know how many "special" chunks of memory will already be hooked up to the DMA infrastructure and how many would need some extra/special work. Cheers, -Brian
Hi Ezequiel, On Thu, Aug 20, 2020 at 05:36:40AM -0300, Ezequiel Garcia wrote: > Hi John, > > Thanks a ton for taking the time > to go thru this. > > On Mon, 2020-08-17 at 21:13 -0700, John Stultz wrote: > > On Sun, Aug 16, 2020 at 10:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > This heap is basically a wrapper around DMA-API dma_alloc_attrs, > > > which will allocate memory suitable for the given device. > > > > > > The implementation is mostly a port of the Contiguous Videobuf2 > > > memory allocator (see videobuf2/videobuf2-dma-contig.c) > > > over to the DMA-BUF Heap interface. > > > > > > The intention of this allocator is to provide applications > > > with a more system-agnostic API: the only thing the application > > > needs to know is which device to get the buffer for. > > > > > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool > > > is unknown to the application. > > > > My hesitancy here is that the main reason we have DMA BUF Heaps, and > > ION before it, was to expose different types of memory allocations to > > userspace. The main premise that often these buffers are shared with > > multiple devices, which have differing constraints and it is userspace > > that best understands the path a buffer will take through a series of > > devices. So userspace is best positioned to determine what type of > > memory should be allocated to satisfy the various devices constraints > > (there were some design attempts to allow DMA BUFs to use multiple > > attach with deferred alloc at map time to handle this constraint > > solving in-kernel, but it was never adopted in practice). > > > > This however, requires some system specific policy - implemented in > > the Android userspace by gralloc which maps "usage" types (device > > pipeline flows) to heaps. I liken it to fstab, which helps map mount > > points to partitions - it's not going to be the same on every system. > > > > What you seem to be proposing seems a bit contrary to this premise - > > Userland doesn't know what type of memory it needs, but given a device > > can somehow find the heap it should allocate for? This seems to assume > > the buffer is only to be used with a single device? > > Single-device usage wasn't the intention. I see now that this patch > looks too naive and it's confusing. The idea is of course to get buffers > that can be shared. > > I'm thinking you need to share your picture buffer with a decoder, > a renderer, possibly something else. Each with its own set > of constraints and limitations. > > Of course, a buffer that works for device A may be unsuitable for > device B and so this per-device heap is surely way too naive. > > As you rightly mention, the main intention of this RFC is to > question exactly the current premise: "userspace knows". > I fail to see how will (generic and system-agnostic) applications > know which heap to use. > > Just for completion, let me throw a simple example: i.MX 8M > and some Rockchip platforms share the same VPU block, except the > latter have an IOMMU. > > So applications would need to query an iommu presence > to get buffer from CMA or not. I don't think we can do this in a system-agnostic way. What I'd like to see is an API for the kernel to expose opaque constraints for each device, and a userspace library to reconcile constraints, selecting a suitable heap, and producing heap-specific parameters for the allocation. The constraints need to be transported in a generic way, but the contents can be opaque for applications. Only the library would need to interpret them. This can be done with platform-specific code inside the library. A separate API for the library to interect with the kernel to further query or negotiate configuration parameters could be part of that picture. This would offer standardized APIs to applications (to gather constraints, pass them to the constraint resolution library, and receive a heap ID and heap parameters), while still allowing platform-specific code in userspace. > > There was at some point a discussion where folks (maybe it was > > DanielV? I can't remember...) suggested having something like a sysfs > > device node link from a device to a dma-buf heap chardev. This seems > > like it would add a similar behavior as what you're proposing, however > > without adding possibly a ton of new device specific heaps to the > > /dev/dma_heap/ dir. However, we would potentially need any missing > > heap types added first. > > > > > I'm not really expecting this patch to be correct or even > > > a good idea, but just submitting it to start a discussion on DMA-BUF > > > heap discovery and negotiation. > > > > > > Given Plumbers is just a couple weeks from now, I've submitted > > > a BoF proposal to discuss this, as perhaps it would make > > > sense to discuss this live? > > > > I do think it's an interesting idea. I agree that having every driver > > implement a dmabuf exporter is a bit silly, but I also think Brian's > > point that maybe we need some drm helper functions that provide > > similar functionality along with a more standardized device ioctl for > > single device allocations might be better. > > I'm unsure we should treat single device specially. > > Exposing device limitations and constraints properly, > allowing some sort of negotation would hopefully solve > single device and shared requirements.
On 8/20/20 1:15 AM, Ezequiel Garcia wrote: > On Mon, 2020-08-17 at 20:49 -0700, James Jones wrote: >> On 8/17/20 8:18 AM, Brian Starkey wrote: >>> Hi Ezequiel, >>> >>> On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: >>>> This heap is basically a wrapper around DMA-API dma_alloc_attrs, >>>> which will allocate memory suitable for the given device. >>>> >>>> The implementation is mostly a port of the Contiguous Videobuf2 >>>> memory allocator (see videobuf2/videobuf2-dma-contig.c) >>>> over to the DMA-BUF Heap interface. >>>> >>>> The intention of this allocator is to provide applications >>>> with a more system-agnostic API: the only thing the application >>>> needs to know is which device to get the buffer for. >>>> >>>> Whether the buffer is backed by CMA, IOMMU or a DMA Pool >>>> is unknown to the application. >>>> >>>> I'm not really expecting this patch to be correct or even >>>> a good idea, but just submitting it to start a discussion on DMA-BUF >>>> heap discovery and negotiation. >>>> >>> >>> My initial reaction is that I thought dmabuf heaps are meant for use >>> to allocate buffers for sharing across devices, which doesn't fit very >>> well with having per-device heaps. >>> >>> For single-device allocations, would using the buffer allocation >>> functionality of that device's native API be better in most >>> cases? (Some other possibly relevant discussion at [1]) >>> >>> I can see that this can save some boilerplate for devices that want >>> to expose private chunks of memory, but might it also lead to 100 >>> aliases for the system's generic coherent memory pool? >>> >>> I wonder if a set of helpers to allow devices to expose whatever they >>> want with minimal effort would be better. >> >> I'm rather interested on where this goes, as I was toying with using >> some sort of heap ID as a basis for a "device-local" constraint in the >> memory constraints proposals Simon and I will be discussing at XDC this >> year. It would be rather elegant if there was one type of heap ID used >> universally throughout the kernel that could provide a unique handle for >> the shared system memory heap(s), as well as accelerator-local heaps on >> fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could >> negotiate a location among themselves. This patch seems to be a step >> towards that in a way, but I agree it would be counterproductive if a >> bunch of devices that were using the same underlying system memory ended >> up each getting their own heap ID just because they used some SW >> framework that worked that way. >> >> Would appreciate it if you could send along a pointer to your BoF if it >> happens! >> > > Here is it: > > https://linuxplumbersconf.org/event/7/contributions/818/ > > It would be great to see you there and discuss this, > given I was hoping we could talk about how to meet a > userspace allocator library expectations as well. Thanks! I hadn't registered for LPC and it looks like it's sold out, but I'll try to watch the live stream. This is very interesting, in that it looks like we're both trying to solve roughly the same set of problems but approaching it from different angles. From what I gather, your approach is that a "heap" encompasses all the allocation constraints a device may have. The approach Simon Ser and I are tossing around so far is somewhat different, but may potentially leverage dma-buf heaps a bit as well. Our approach looks more like what I described at XDC a few years ago, where memory constraints for a given device's usage of an image are exposed up to applications, which can then somehow perform boolean intersection/union operations on them to arrive at a common set of constraints that describe something compatible with all the devices & usages desired (or fail to do so, and fall back to copying things around presumably). I believe this is more flexible than your initial proposal in that devices often support multiple usages (E.g., different formats, different proprietary layouts represented by format modifiers, etc.), and it avoids adding a combinatorial number of heaps to manage that. In my view, heaps are more like blobs of memory that can be allocated from in various different ways to satisfy constraints. I realize heaps mean something specific in the dma-buf heap design (specifically, something closer to an association between an "allocation mechanism" and "physical memory"), but I hope we don't have massive heap/allocator mechanism proliferation due to constraints alone. Perhaps some constraints, such as contiguous memory or device-local memory, are properly expressed as a specific heap, but consider the proliferation implied by even that simple pair of examples: How do you express contiguous device-local memory? Do you need to spawn two heaps on the underlying device-local memory, one for contiguous allocations and one for non-contiguous allocations? Seems excessive. Of course, our approach also has downsides and is still being worked on. For example, it works best in an ideal world where all the allocators available understand all the constraints that exist. Dealing with a reality where there are probably a handful of allocators, another handful of userspace libraries and APIs, and still more applications trying to make use of all this is one of the larger remaining challenges of the design. We'll present our work at XDC 2020. Hope you can check that out as well! Thanks, -James > Thanks, > Ezequiel > >> Thanks, >> -James >> >>> Cheers, >>> -Brian >>> >>> 1. https://lore.kernel.org/dri-devel/57062477-30e7-a3de-6723-a50d03a402c4@kapsi.fi/ >>> >>>> Given Plumbers is just a couple weeks from now, I've submitted >>>> a BoF proposal to discuss this, as perhaps it would make >>>> sense to discuss this live? >>>> >>>> Not-signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> > >
Hi James, On Sun, Aug 23, 2020 at 01:04:43PM -0700, James Jones wrote: > On 8/20/20 1:15 AM, Ezequiel Garcia wrote: > > On Mon, 2020-08-17 at 20:49 -0700, James Jones wrote: > >> On 8/17/20 8:18 AM, Brian Starkey wrote: > >>> On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: > >>>> This heap is basically a wrapper around DMA-API dma_alloc_attrs, > >>>> which will allocate memory suitable for the given device. > >>>> > >>>> The implementation is mostly a port of the Contiguous Videobuf2 > >>>> memory allocator (see videobuf2/videobuf2-dma-contig.c) > >>>> over to the DMA-BUF Heap interface. > >>>> > >>>> The intention of this allocator is to provide applications > >>>> with a more system-agnostic API: the only thing the application > >>>> needs to know is which device to get the buffer for. > >>>> > >>>> Whether the buffer is backed by CMA, IOMMU or a DMA Pool > >>>> is unknown to the application. > >>>> > >>>> I'm not really expecting this patch to be correct or even > >>>> a good idea, but just submitting it to start a discussion on DMA-BUF > >>>> heap discovery and negotiation. > >>>> > >>> > >>> My initial reaction is that I thought dmabuf heaps are meant for use > >>> to allocate buffers for sharing across devices, which doesn't fit very > >>> well with having per-device heaps. > >>> > >>> For single-device allocations, would using the buffer allocation > >>> functionality of that device's native API be better in most > >>> cases? (Some other possibly relevant discussion at [1]) > >>> > >>> I can see that this can save some boilerplate for devices that want > >>> to expose private chunks of memory, but might it also lead to 100 > >>> aliases for the system's generic coherent memory pool? > >>> > >>> I wonder if a set of helpers to allow devices to expose whatever they > >>> want with minimal effort would be better. > >> > >> I'm rather interested on where this goes, as I was toying with using > >> some sort of heap ID as a basis for a "device-local" constraint in the > >> memory constraints proposals Simon and I will be discussing at XDC this > >> year. It would be rather elegant if there was one type of heap ID used > >> universally throughout the kernel that could provide a unique handle for > >> the shared system memory heap(s), as well as accelerator-local heaps on > >> fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could > >> negotiate a location among themselves. This patch seems to be a step > >> towards that in a way, but I agree it would be counterproductive if a > >> bunch of devices that were using the same underlying system memory ended > >> up each getting their own heap ID just because they used some SW > >> framework that worked that way. > >> > >> Would appreciate it if you could send along a pointer to your BoF if it > >> happens! > > > > Here is it: > > > > https://linuxplumbersconf.org/event/7/contributions/818/ > > > > It would be great to see you there and discuss this, > > given I was hoping we could talk about how to meet a > > userspace allocator library expectations as well. > > Thanks! I hadn't registered for LPC and it looks like it's sold out, > but I'll try to watch the live stream. > > This is very interesting, in that it looks like we're both trying to > solve roughly the same set of problems but approaching it from different > angles. From what I gather, your approach is that a "heap" encompasses > all the allocation constraints a device may have. > > The approach Simon Ser and I are tossing around so far is somewhat > different, but may potentially leverage dma-buf heaps a bit as well. > > Our approach looks more like what I described at XDC a few years ago, > where memory constraints for a given device's usage of an image are > exposed up to applications, which can then somehow perform boolean > intersection/union operations on them to arrive at a common set of > constraints that describe something compatible with all the devices & > usages desired (or fail to do so, and fall back to copying things around > presumably). I believe this is more flexible than your initial proposal > in that devices often support multiple usages (E.g., different formats, > different proprietary layouts represented by format modifiers, etc.), > and it avoids adding a combinatorial number of heaps to manage that. > > In my view, heaps are more like blobs of memory that can be allocated > from in various different ways to satisfy constraints. I realize heaps > mean something specific in the dma-buf heap design (specifically, > something closer to an association between an "allocation mechanism" and > "physical memory"), but I hope we don't have massive heap/allocator > mechanism proliferation due to constraints alone. Perhaps some > constraints, such as contiguous memory or device-local memory, are > properly expressed as a specific heap, but consider the proliferation > implied by even that simple pair of examples: How do you express > contiguous device-local memory? Do you need to spawn two heaps on the > underlying device-local memory, one for contiguous allocations and one > for non-contiguous allocations? Seems excessive. > > Of course, our approach also has downsides and is still being worked on. > For example, it works best in an ideal world where all the allocators > available understand all the constraints that exist. Shouldn't allocators be decoupled of constraints ? In my imagination I see devices exposing constraints, and allocators exposing parameters, with a userspace library to reconcile the constraints and produce allocator parameters from them. > Dealing with a > reality where there are probably a handful of allocators, another > handful of userspace libraries and APIs, and still more applications > trying to make use of all this is one of the larger remaining challenges > of the design. > > We'll present our work at XDC 2020. Hope you can check that out as well! > > >>> 1. https://lore.kernel.org/dri-devel/57062477-30e7-a3de-6723-a50d03a402c4@kapsi.fi/ > >>> > >>>> Given Plumbers is just a couple weeks from now, I've submitted > >>>> a BoF proposal to discuss this, as perhaps it would make > >>>> sense to discuss this live? > >>>> > >>>> Not-signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
On 8/23/20 1:46 PM, Laurent Pinchart wrote: > Hi James, > > On Sun, Aug 23, 2020 at 01:04:43PM -0700, James Jones wrote: >> On 8/20/20 1:15 AM, Ezequiel Garcia wrote: >>> On Mon, 2020-08-17 at 20:49 -0700, James Jones wrote: >>>> On 8/17/20 8:18 AM, Brian Starkey wrote: >>>>> On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: >>>>>> This heap is basically a wrapper around DMA-API dma_alloc_attrs, >>>>>> which will allocate memory suitable for the given device. >>>>>> >>>>>> The implementation is mostly a port of the Contiguous Videobuf2 >>>>>> memory allocator (see videobuf2/videobuf2-dma-contig.c) >>>>>> over to the DMA-BUF Heap interface. >>>>>> >>>>>> The intention of this allocator is to provide applications >>>>>> with a more system-agnostic API: the only thing the application >>>>>> needs to know is which device to get the buffer for. >>>>>> >>>>>> Whether the buffer is backed by CMA, IOMMU or a DMA Pool >>>>>> is unknown to the application. >>>>>> >>>>>> I'm not really expecting this patch to be correct or even >>>>>> a good idea, but just submitting it to start a discussion on DMA-BUF >>>>>> heap discovery and negotiation. >>>>>> >>>>> >>>>> My initial reaction is that I thought dmabuf heaps are meant for use >>>>> to allocate buffers for sharing across devices, which doesn't fit very >>>>> well with having per-device heaps. >>>>> >>>>> For single-device allocations, would using the buffer allocation >>>>> functionality of that device's native API be better in most >>>>> cases? (Some other possibly relevant discussion at [1]) >>>>> >>>>> I can see that this can save some boilerplate for devices that want >>>>> to expose private chunks of memory, but might it also lead to 100 >>>>> aliases for the system's generic coherent memory pool? >>>>> >>>>> I wonder if a set of helpers to allow devices to expose whatever they >>>>> want with minimal effort would be better. >>>> >>>> I'm rather interested on where this goes, as I was toying with using >>>> some sort of heap ID as a basis for a "device-local" constraint in the >>>> memory constraints proposals Simon and I will be discussing at XDC this >>>> year. It would be rather elegant if there was one type of heap ID used >>>> universally throughout the kernel that could provide a unique handle for >>>> the shared system memory heap(s), as well as accelerator-local heaps on >>>> fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could >>>> negotiate a location among themselves. This patch seems to be a step >>>> towards that in a way, but I agree it would be counterproductive if a >>>> bunch of devices that were using the same underlying system memory ended >>>> up each getting their own heap ID just because they used some SW >>>> framework that worked that way. >>>> >>>> Would appreciate it if you could send along a pointer to your BoF if it >>>> happens! >>> >>> Here is it: >>> >>> https://linuxplumbersconf.org/event/7/contributions/818/ >>> >>> It would be great to see you there and discuss this, >>> given I was hoping we could talk about how to meet a >>> userspace allocator library expectations as well. >> >> Thanks! I hadn't registered for LPC and it looks like it's sold out, >> but I'll try to watch the live stream. >> >> This is very interesting, in that it looks like we're both trying to >> solve roughly the same set of problems but approaching it from different >> angles. From what I gather, your approach is that a "heap" encompasses >> all the allocation constraints a device may have. >> >> The approach Simon Ser and I are tossing around so far is somewhat >> different, but may potentially leverage dma-buf heaps a bit as well. >> >> Our approach looks more like what I described at XDC a few years ago, >> where memory constraints for a given device's usage of an image are >> exposed up to applications, which can then somehow perform boolean >> intersection/union operations on them to arrive at a common set of >> constraints that describe something compatible with all the devices & >> usages desired (or fail to do so, and fall back to copying things around >> presumably). I believe this is more flexible than your initial proposal >> in that devices often support multiple usages (E.g., different formats, >> different proprietary layouts represented by format modifiers, etc.), >> and it avoids adding a combinatorial number of heaps to manage that. >> >> In my view, heaps are more like blobs of memory that can be allocated >> from in various different ways to satisfy constraints. I realize heaps >> mean something specific in the dma-buf heap design (specifically, >> something closer to an association between an "allocation mechanism" and >> "physical memory"), but I hope we don't have massive heap/allocator >> mechanism proliferation due to constraints alone. Perhaps some >> constraints, such as contiguous memory or device-local memory, are >> properly expressed as a specific heap, but consider the proliferation >> implied by even that simple pair of examples: How do you express >> contiguous device-local memory? Do you need to spawn two heaps on the >> underlying device-local memory, one for contiguous allocations and one >> for non-contiguous allocations? Seems excessive. >> >> Of course, our approach also has downsides and is still being worked on. >> For example, it works best in an ideal world where all the allocators >> available understand all the constraints that exist. > > Shouldn't allocators be decoupled of constraints ? In my imagination I > see devices exposing constraints, and allocators exposing parameters, > with a userspace library to reconcile the constraints and produce > allocator parameters from them. Perhaps another level of abstraction would help. I'll have to think about that. However, as far as I can tell, it wouldn't remove the need to communicate a lot of constraints from multiple engines/devices/etc. to the allocator (likely a single allocator. I'd be interested to know if anyone has a design that effectively uses multiple allocators to satisfy a single allocation request, but I haven't come up with a good one) somehow. Either the constraints are directly used as the parameters, or there's a translation/second level of abstraction, but either way much of the information needs to make it to the allocator, or represent the need to use a particular allocator. Simple things like pitch and offset alignment can be done without help from a kernel-level allocator, but others such as cache coherency, physical memory bank placement, or device-local memory will need to make it all the way down to the kernel some how I believe. Thanks, -James >> Dealing with a >> reality where there are probably a handful of allocators, another >> handful of userspace libraries and APIs, and still more applications >> trying to make use of all this is one of the larger remaining challenges >> of the design. >> >> We'll present our work at XDC 2020. Hope you can check that out as well! >> >>>>> 1. https://lore.kernel.org/dri-devel/57062477-30e7-a3de-6723-a50d03a402c4@kapsi.fi/ >>>>> >>>>>> Given Plumbers is just a couple weeks from now, I've submitted >>>>>> a BoF proposal to discuss this, as perhaps it would make >>>>>> sense to discuss this live? >>>>>> >>>>>> Not-signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> >
Le jeudi 20 août 2020 à 18:54 +0300, Laurent Pinchart a écrit : > Hi Ezequiel, > > On Thu, Aug 20, 2020 at 05:36:40AM -0300, Ezequiel Garcia wrote: > > Hi John, > > > > Thanks a ton for taking the time > > to go thru this. > > > > On Mon, 2020-08-17 at 21:13 -0700, John Stultz wrote: > > > On Sun, Aug 16, 2020 at 10:23 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > > > This heap is basically a wrapper around DMA-API dma_alloc_attrs, > > > > which will allocate memory suitable for the given device. > > > > > > > > The implementation is mostly a port of the Contiguous Videobuf2 > > > > memory allocator (see videobuf2/videobuf2-dma-contig.c) > > > > over to the DMA-BUF Heap interface. > > > > > > > > The intention of this allocator is to provide applications > > > > with a more system-agnostic API: the only thing the application > > > > needs to know is which device to get the buffer for. > > > > > > > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool > > > > is unknown to the application. > > > > > > My hesitancy here is that the main reason we have DMA BUF Heaps, and > > > ION before it, was to expose different types of memory allocations to > > > userspace. The main premise that often these buffers are shared with > > > multiple devices, which have differing constraints and it is userspace > > > that best understands the path a buffer will take through a series of > > > devices. So userspace is best positioned to determine what type of > > > memory should be allocated to satisfy the various devices constraints > > > (there were some design attempts to allow DMA BUFs to use multiple > > > attach with deferred alloc at map time to handle this constraint > > > solving in-kernel, but it was never adopted in practice). > > > > > > This however, requires some system specific policy - implemented in > > > the Android userspace by gralloc which maps "usage" types (device > > > pipeline flows) to heaps. I liken it to fstab, which helps map mount > > > points to partitions - it's not going to be the same on every system. > > > > > > What you seem to be proposing seems a bit contrary to this premise - > > > Userland doesn't know what type of memory it needs, but given a device > > > can somehow find the heap it should allocate for? This seems to assume > > > the buffer is only to be used with a single device? > > > > Single-device usage wasn't the intention. I see now that this patch > > looks too naive and it's confusing. The idea is of course to get buffers > > that can be shared. > > > > I'm thinking you need to share your picture buffer with a decoder, > > a renderer, possibly something else. Each with its own set > > of constraints and limitations. > > > > Of course, a buffer that works for device A may be unsuitable for > > device B and so this per-device heap is surely way too naive. > > > > As you rightly mention, the main intention of this RFC is to > > question exactly the current premise: "userspace knows". > > I fail to see how will (generic and system-agnostic) applications > > know which heap to use. > > > > Just for completion, let me throw a simple example: i.MX 8M > > and some Rockchip platforms share the same VPU block, except the > > latter have an IOMMU. > > > > So applications would need to query an iommu presence > > to get buffer from CMA or not. > > I don't think we can do this in a system-agnostic way. What I'd like to > see is an API for the kernel to expose opaque constraints for each Please, take into consideration that constraints can also come from userspace. These days, there exist things we may want to do using the CPU, but the SIMD instructions and the associated algorithm will introduce constraints too. If these constraints comes too opaque, but you will also potentially limit some valid CPU interaction with HW in term of buffer access. CPU constraints todays are fairly small and one should be able to express them I believe. Of course, these are not media agnostic, some constraints may depends on the media (like an image buffer, a matrix buffer or audio buffer) and the associated algorithm to be used. An example would be an image buffers produced or modified on CPU but encoded with HW. > device, and a userspace library to reconcile constraints, selecting a > suitable heap, and producing heap-specific parameters for the > allocation. > > The constraints need to be transported in a generic way, but the > contents can be opaque for applications. Only the library would need to > interpret them. This can be done with platform-specific code inside the > library. A separate API for the library to interect with the kernel to > further query or negotiate configuration parameters could be part of > that picture. > > This would offer standardized APIs to applications (to gather > constraints, pass them to the constraint resolution library, and receive > a heap ID and heap parameters), while still allowing platform-specific > code in userspace. > > > > There was at some point a discussion where folks (maybe it was > > > DanielV? I can't remember...) suggested having something like a sysfs > > > device node link from a device to a dma-buf heap chardev. This seems > > > like it would add a similar behavior as what you're proposing, however > > > without adding possibly a ton of new device specific heaps to the > > > /dev/dma_heap/ dir. However, we would potentially need any missing > > > heap types added first. > > > > > > > I'm not really expecting this patch to be correct or even > > > > a good idea, but just submitting it to start a discussion on DMA-BUF > > > > heap discovery and negotiation. > > > > > > > > Given Plumbers is just a couple weeks from now, I've submitted > > > > a BoF proposal to discuss this, as perhaps it would make > > > > sense to discuss this live? > > > > > > I do think it's an interesting idea. I agree that having every driver > > > implement a dmabuf exporter is a bit silly, but I also think Brian's > > > point that maybe we need some drm helper functions that provide > > > similar functionality along with a more standardized device ioctl for > > > single device allocations might be better. > > > > I'm unsure we should treat single device specially. > > > > Exposing device limitations and constraints properly, > > allowing some sort of negotation would hopefully solve > > single device and shared requirements.
On Tuesday, August 25, 2020 10:26 PM, Nicolas Dufresne <nicolas.dufresne@collabora.com> wrote: > > I don't think we can do this in a system-agnostic way. What I'd like to > > see is an API for the kernel to expose opaque constraints for each > > Please, take into consideration that constraints can also come from > userspace. These days, there exist things we may want to do using the > CPU, but the SIMD instructions and the associated algorithm will > introduce constraints too. If these constraints comes too opaque, but > you will also potentially limit some valid CPU interaction with HW in > term of buffer access. CPU constraints todays are fairly small and one > should be able to express them I believe. Of course, these are not > media agnostic, some constraints may depends on the media (like an > image buffer, a matrix buffer or audio buffer) and the associated > algorithm to be used. > > An example would be an image buffers produced or modified on CPU but > encoded with HW. Actually, in the proposal we're working on, constraints may come from user-space too. Rendering APIs (ie. mesa) will need to expose constraints for buffers they can render on or texture from. Constraints would be opaque for users like compositors (ie. programs that merely pass constraints around and merge them don't need to understand them), but not for user-space drivers.
On Thursday, August 20, 2020 10:15 AM, Ezequiel Garcia <ezequiel@collabora.com> wrote: > > I'm rather interested on where this goes, as I was toying with using > > some sort of heap ID as a basis for a "device-local" constraint in the > > memory constraints proposals Simon and I will be discussing at XDC this > > year. It would be rather elegant if there was one type of heap ID used > > universally throughout the kernel that could provide a unique handle for > > the shared system memory heap(s), as well as accelerator-local heaps on > > fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could > > negotiate a location among themselves. This patch seems to be a step > > towards that in a way, but I agree it would be counterproductive if a > > bunch of devices that were using the same underlying system memory ended > > up each getting their own heap ID just because they used some SW > > framework that worked that way. > > Would appreciate it if you could send along a pointer to your BoF if it > > happens! > > Here is it: > > https://linuxplumbersconf.org/event/7/contributions/818/ > > It would be great to see you there and discuss this, > given I was hoping we could talk about how to meet a > userspace allocator library expectations as well. Unfortunately there's no livestream for BoFs. Would be very interested in having a summary of the discussions once the BoF is over!
Dear all, Here are the notes we took during the BoF. I believe the meeting was super interesting. Although it felt a bit short for the topic, we left with a few interesting ideas. Thanks everyone! Ezequiel --- LPC 2020 BoF: Negotiating DMA-BUF Heaps Attendees: * Brian Starkey * Daniel Stone * Ezequiel Garcia * James Jones * John Reitan * Laura Abbott * Laurent Pinchart * Sumit Semwal * Robert Beckett # Replacing subsystem memory allocators with dma-buf heaps * Laurent: we should not have subsystem implement their own allocator. Using heaps could be a good idea. * Laura: Wary to add too much support to heaps, from the ION experience. * Laurent: V4L2: most drivers use videobuf2, which is a fairly complex piece of code. Three constraints, sg, contig, and vmalloc: these are fairly generic and not video-specific, why can't these just use heaps? * Brian: In-kernel API will most likely just care of dma-buf not necessarily FDs. This was discussed recently, see "Role of DMA Heaps vs GEM in allocation",https://www.spinics.net/lists/dri-devel/msg268103.html * Do we expect to get a non-file descriptor identifier for a dma-buf? No. * Laurent proposes a two steps API (similar to existing GEM API), where we have one interface to allocate a buffer, with an identifier local to a process, and then another interface to wrap the buffer on a dma-buf (fd). * If devices are not meant to be shared, then we might want to avoid the DMA-BUF design entirely. As Sumit mentioned, the fundamental idea behind DMA-BUF is that they are expected to be shared. OTOH, it was mentioned that sometimes we don't know if a buffer will be shared or not, so that's why the ability to wrap a buffer over dma-buf is useful. * New subsytems would possibly want to avoid implementing its own allocator interface. But unfortunately, we don't want to produce a fd per buffer, so that will mean a new subsystem will eventually require its own API (GEM-like). If a subsystem doesn't need many buffers, and the FD semantic is fine, then it would be acceptable to avoid a subsystem-specific API. * It would be interesting to experiment replacing videobuf2-dma-contig with just dma-buf heap usage, and see what kind of code save we'd save. * John Stultz has ideas around providing in-kernel accessors for the heaps - the idea is for drivers to not have to implement full exporter functionality for an already existing dma-buf heap type. * Drawback to this idea of reusing dma-buf heaps to allocate buffers, is that it means marking every buffer that gets exported as shareable. * The benefits in having a centralized implementation would be in unifying the semantics, reusable concepts that can be used to build future APIs around, rather than trying to created unified APIs around disparate kernel allocation APIs at only the userspace level. * Robert: Is there an in-kernel user for the in-kernel dma-buf request? A possible answer would be for scratch buffers. The idea would be to avoid getting details wrong. However, doing this would allow every buffer to be exportable. Also, it sounds like this means re-implementing DMA-API? * DMA-BUF are designed to be shared, not necessarily an allocator. * Want something to expose device-local memory to rest of kernel. Could be a dma-buf heap?
Hi James, On Sun, Aug 23, 2020 at 03:53:50PM -0700, James Jones wrote: > On 8/23/20 1:46 PM, Laurent Pinchart wrote: > > On Sun, Aug 23, 2020 at 01:04:43PM -0700, James Jones wrote: > >> On 8/20/20 1:15 AM, Ezequiel Garcia wrote: > >>> On Mon, 2020-08-17 at 20:49 -0700, James Jones wrote: > >>>> On 8/17/20 8:18 AM, Brian Starkey wrote: > >>>>> On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: > >>>>>> This heap is basically a wrapper around DMA-API dma_alloc_attrs, > >>>>>> which will allocate memory suitable for the given device. > >>>>>> > >>>>>> The implementation is mostly a port of the Contiguous Videobuf2 > >>>>>> memory allocator (see videobuf2/videobuf2-dma-contig.c) > >>>>>> over to the DMA-BUF Heap interface. > >>>>>> > >>>>>> The intention of this allocator is to provide applications > >>>>>> with a more system-agnostic API: the only thing the application > >>>>>> needs to know is which device to get the buffer for. > >>>>>> > >>>>>> Whether the buffer is backed by CMA, IOMMU or a DMA Pool > >>>>>> is unknown to the application. > >>>>>> > >>>>>> I'm not really expecting this patch to be correct or even > >>>>>> a good idea, but just submitting it to start a discussion on DMA-BUF > >>>>>> heap discovery and negotiation. > >>>>>> > >>>>> > >>>>> My initial reaction is that I thought dmabuf heaps are meant for use > >>>>> to allocate buffers for sharing across devices, which doesn't fit very > >>>>> well with having per-device heaps. > >>>>> > >>>>> For single-device allocations, would using the buffer allocation > >>>>> functionality of that device's native API be better in most > >>>>> cases? (Some other possibly relevant discussion at [1]) > >>>>> > >>>>> I can see that this can save some boilerplate for devices that want > >>>>> to expose private chunks of memory, but might it also lead to 100 > >>>>> aliases for the system's generic coherent memory pool? > >>>>> > >>>>> I wonder if a set of helpers to allow devices to expose whatever they > >>>>> want with minimal effort would be better. > >>>> > >>>> I'm rather interested on where this goes, as I was toying with using > >>>> some sort of heap ID as a basis for a "device-local" constraint in the > >>>> memory constraints proposals Simon and I will be discussing at XDC this > >>>> year. It would be rather elegant if there was one type of heap ID used > >>>> universally throughout the kernel that could provide a unique handle for > >>>> the shared system memory heap(s), as well as accelerator-local heaps on > >>>> fancy NICs, GPUs, NN accelerators, capture devices, etc. so apps could > >>>> negotiate a location among themselves. This patch seems to be a step > >>>> towards that in a way, but I agree it would be counterproductive if a > >>>> bunch of devices that were using the same underlying system memory ended > >>>> up each getting their own heap ID just because they used some SW > >>>> framework that worked that way. > >>>> > >>>> Would appreciate it if you could send along a pointer to your BoF if it > >>>> happens! > >>> > >>> Here is it: > >>> > >>> https://linuxplumbersconf.org/event/7/contributions/818/ > >>> > >>> It would be great to see you there and discuss this, > >>> given I was hoping we could talk about how to meet a > >>> userspace allocator library expectations as well. > >> > >> Thanks! I hadn't registered for LPC and it looks like it's sold out, > >> but I'll try to watch the live stream. > >> > >> This is very interesting, in that it looks like we're both trying to > >> solve roughly the same set of problems but approaching it from different > >> angles. From what I gather, your approach is that a "heap" encompasses > >> all the allocation constraints a device may have. > >> > >> The approach Simon Ser and I are tossing around so far is somewhat > >> different, but may potentially leverage dma-buf heaps a bit as well. > >> > >> Our approach looks more like what I described at XDC a few years ago, > >> where memory constraints for a given device's usage of an image are > >> exposed up to applications, which can then somehow perform boolean > >> intersection/union operations on them to arrive at a common set of > >> constraints that describe something compatible with all the devices & > >> usages desired (or fail to do so, and fall back to copying things around > >> presumably). I believe this is more flexible than your initial proposal > >> in that devices often support multiple usages (E.g., different formats, > >> different proprietary layouts represented by format modifiers, etc.), > >> and it avoids adding a combinatorial number of heaps to manage that. > >> > >> In my view, heaps are more like blobs of memory that can be allocated > >> from in various different ways to satisfy constraints. I realize heaps > >> mean something specific in the dma-buf heap design (specifically, > >> something closer to an association between an "allocation mechanism" and > >> "physical memory"), but I hope we don't have massive heap/allocator > >> mechanism proliferation due to constraints alone. Perhaps some > >> constraints, such as contiguous memory or device-local memory, are > >> properly expressed as a specific heap, but consider the proliferation > >> implied by even that simple pair of examples: How do you express > >> contiguous device-local memory? Do you need to spawn two heaps on the > >> underlying device-local memory, one for contiguous allocations and one > >> for non-contiguous allocations? Seems excessive. > >> > >> Of course, our approach also has downsides and is still being worked on. > >> For example, it works best in an ideal world where all the allocators > >> available understand all the constraints that exist. > > > > Shouldn't allocators be decoupled of constraints ? In my imagination I > > see devices exposing constraints, and allocators exposing parameters, > > with a userspace library to reconcile the constraints and produce > > allocator parameters from them. > > Perhaps another level of abstraction would help. I'll have to think > about that. > > However, as far as I can tell, it wouldn't remove the need to > communicate a lot of constraints from multiple engines/devices/etc. to > the allocator (likely a single allocator. I'd be interested to know if > anyone has a design that effectively uses multiple allocators to satisfy > a single allocation request, but I haven't come up with a good one) > somehow. Either the constraints are directly used as the parameters, or > there's a translation/second level of abstraction, but either way much > of the information needs to make it to the allocator, or represent the > need to use a particular allocator. Simple things like pitch and offset > alignment can be done without help from a kernel-level allocator, but > others such as cache coherency, physical memory bank placement, or > device-local memory will need to make it all the way down to the kernel > some how I believe. I fully agree that we'll need kernel support, but I don't think the constraints reporting API and the allocator API need to speak the same language. For instance, drivers will report alignment constraints, the userspace allocator library will translate that to a pitch value, and pass it to the allocator as an allocation parameter. The allocator won't know about alignment constraints. That's a simple example, let's see how it turns out with more complex constraints. With a centralized userspace library we have the ability to decouple the two sides, which I believe can be useful to keep the complexity of constraints and allocation parameters (as) low (as possible). > >> Dealing with a > >> reality where there are probably a handful of allocators, another > >> handful of userspace libraries and APIs, and still more applications > >> trying to make use of all this is one of the larger remaining challenges > >> of the design. > >> > >> We'll present our work at XDC 2020. Hope you can check that out as well! > >> > >>>>> 1. https://lore.kernel.org/dri-devel/57062477-30e7-a3de-6723-a50d03a402c4@kapsi.fi/ > >>>>> > >>>>>> Given Plumbers is just a couple weeks from now, I've submitted > >>>>>> a BoF proposal to discuss this, as perhaps it would make > >>>>>> sense to discuss this live? > >>>>>> > >>>>>> Not-signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> > >
On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: > This heap is basically a wrapper around DMA-API dma_alloc_attrs, > which will allocate memory suitable for the given device. > > The implementation is mostly a port of the Contiguous Videobuf2 > memory allocator (see videobuf2/videobuf2-dma-contig.c) > over to the DMA-BUF Heap interface. > > The intention of this allocator is to provide applications > with a more system-agnostic API: the only thing the application > needs to know is which device to get the buffer for. > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool > is unknown to the application. > > I'm not really expecting this patch to be correct or even > a good idea, but just submitting it to start a discussion on DMA-BUF > heap discovery and negotiation. > > Given Plumbers is just a couple weeks from now, I've submitted > a BoF proposal to discuss this, as perhaps it would make > sense to discuss this live? > > Not-signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> I think on the uapi/constraint solving side there's been already tons of discussions while I enjoyed vacations, so different concern from me entirely on the implementation side: In the past the only thing we had in upstream was subsystem/driver specific allocators, and dma-buf for sharing. With dma-buf heaps we kinda get a central allocator, which duplicates large chunks of of all these allocators. And since it's a central allocator by design, the reason for having per-subsystem allocators is kinda gone. I think there's two approaches here: - we convert e.g. drm allocator helpers to internally use the right heap implementation. That would also give us some fairly direct way to expose these constraints in sysfs so a userspace allocator knows which dma-buf heap to pick for shared stuff. - we require that any heap is just a different uapi for an existing driver allocator, e.g. by having a dma-buf wrapper for all gem drivers. Otherwise I think what we end up with is a pile of dma-buf heaps for android's blob gpu driver world, and not used anywhere else. Not something even remotely interesting for upstream :-) tldr; I'd like to see how dma-buf heaps closely integrate with all the existing buffer management code we have. Both kernel (and throuhg some allocator library effort) in userspace. Cheers, Daniel > --- > drivers/dma-buf/heaps/Kconfig | 9 + > drivers/dma-buf/heaps/Makefile | 1 + > drivers/dma-buf/heaps/device_heap.c | 268 ++++++++++++++++++++++++++++ > include/linux/device.h | 5 + > include/linux/dma-heap.h | 6 + > 5 files changed, 289 insertions(+) > create mode 100644 drivers/dma-buf/heaps/device_heap.c > > diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig > index a5eef06c4226..2bb3604184bd 100644 > --- a/drivers/dma-buf/heaps/Kconfig > +++ b/drivers/dma-buf/heaps/Kconfig > @@ -12,3 +12,12 @@ config DMABUF_HEAPS_CMA > Choose this option to enable dma-buf CMA heap. This heap is backed > by the Contiguous Memory Allocator (CMA). If your system has these > regions, you should say Y here. > + > +config DMABUF_HEAPS_DEVICES > + bool "DMA-BUF Device DMA Heap (Experimental)" > + depends on DMABUF_HEAPS > + help > + Choose this option to enable dma-buf per-device heap. This heap is backed > + by the DMA-API and it's an Experimental feature, meant mostly for testing > + and experimentation. > + Just say N here. > diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile > index 6e54cdec3da0..c691d85b3044 100644 > --- a/drivers/dma-buf/heaps/Makefile > +++ b/drivers/dma-buf/heaps/Makefile > @@ -2,3 +2,4 @@ > obj-y += heap-helpers.o > obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o > obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o > +obj-$(CONFIG_DMABUF_HEAPS_DEVICES) += device_heap.o > diff --git a/drivers/dma-buf/heaps/device_heap.c b/drivers/dma-buf/heaps/device_heap.c > new file mode 100644 > index 000000000000..1803dc622dd8 > --- /dev/null > +++ b/drivers/dma-buf/heaps/device_heap.c > @@ -0,0 +1,268 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * DMABUF Device DMA heap exporter > + * > + * Copyright (C) 2020, Collabora Ltd. > + * > + * Based on: > + * videobuf2-dma-contig.c - DMA contig memory allocator for videobuf2 > + * Copyright (C) 2010 Samsung Electronics > + */ > + > +#include <linux/device.h> > +#include <linux/dma-buf.h> > +#include <linux/dma-heap.h> > +#include <linux/dma-mapping.h> > +#include <linux/scatterlist.h> > +#include <linux/slab.h> > +#include <linux/module.h> > + > +struct dev_dmabuf_attachment { > + struct sg_table sgt; > + enum dma_data_direction dma_dir; > +}; > + > +struct dev_dmabuf { > + struct dma_heap *heap; > + struct dma_buf *dmabuf; > + struct device *dev; > + size_t size; > + void *vaddr; > + dma_addr_t dma_addr; > + unsigned long attrs; > + > + struct sg_table sgt; > +}; > + > +static struct sg_table *dev_dmabuf_ops_map(struct dma_buf_attachment *db_attach, > + enum dma_data_direction dma_dir) > +{ > + struct dev_dmabuf_attachment *attach = db_attach->priv; > + /* stealing dmabuf mutex to serialize map/unmap operations */ > + struct mutex *lock = &db_attach->dmabuf->lock; > + struct sg_table *sgt; > + > + mutex_lock(lock); > + > + sgt = &attach->sgt; > + /* return previously mapped sg table */ > + if (attach->dma_dir == dma_dir) { > + mutex_unlock(lock); > + return sgt; > + } > + > + /* release any previous cache */ > + if (attach->dma_dir != DMA_NONE) { > + dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > + attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > + attach->dma_dir = DMA_NONE; > + } > + > + /* > + * mapping to the client with new direction, no cache sync > + * required see comment in .dmabuf_ops_detach() > + */ > + sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > + dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > + if (!sgt->nents) { > + dev_err(db_attach->dev, "failed to map scatterlist\n"); > + mutex_unlock(lock); > + return ERR_PTR(-EIO); > + } > + > + attach->dma_dir = dma_dir; > + > + mutex_unlock(lock); > + > + return sgt; > +} > + > +static void dev_dmabuf_ops_unmap(struct dma_buf_attachment *db_attach, > + struct sg_table *sgt, > + enum dma_data_direction dma_dir) > +{ > + /* nothing to be done here */ > +} > + > +static int dev_dmabuf_ops_attach(struct dma_buf *dmabuf, > + struct dma_buf_attachment *dbuf_attach) > +{ > + struct dev_dmabuf_attachment *attach; > + unsigned int i; > + struct scatterlist *rd, *wr; > + struct sg_table *sgt; > + struct dev_dmabuf *buf = dmabuf->priv; > + int ret; > + > + attach = kzalloc(sizeof(*attach), GFP_KERNEL); > + if (!attach) > + return -ENOMEM; > + sgt = &attach->sgt; > + > + /* > + * Copy the buf->sgt scatter list to the attachment, as we can't > + * map the same scatter list to multiple attachments at the same time. > + */ > + ret = sg_alloc_table(sgt, buf->sgt.orig_nents, GFP_KERNEL); > + if (ret) { > + kfree(attach); > + return -ENOMEM; > + } > + > + rd = buf->sgt.sgl; > + wr = sgt->sgl; > + for (i = 0; i < sgt->orig_nents; ++i) { > + sg_set_page(wr, sg_page(rd), rd->length, rd->offset); > + rd = sg_next(rd); > + wr = sg_next(wr); > + } > + > + attach->dma_dir = DMA_NONE; > + dbuf_attach->priv = attach; > + > + return 0; > +} > + > +static void dev_dmabuf_ops_detach(struct dma_buf *dmabuf, > + struct dma_buf_attachment *db_attach) > +{ > + struct dev_dmabuf_attachment *attach = db_attach->priv; > + struct sg_table *sgt; > + > + if (!attach) > + return; > + sgt = &attach->sgt; > + > + /* release the scatterlist cache */ > + if (attach->dma_dir != DMA_NONE) > + /* > + * Cache sync can be skipped here, as the memory is > + * allocated from device coherent memory, which means the > + * memory locations do not require any explicit cache > + * maintenance prior or after being used by the device. > + * > + * XXX: This needs a revisit. > + */ > + dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > + attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > + sg_free_table(sgt); > + kfree(attach); > + db_attach->priv = NULL; > +} > + > + > +static void *dev_dmabuf_ops_vmap(struct dma_buf *dmabuf) > +{ > + struct dev_dmabuf *buf = dmabuf->priv; > + > + return buf->vaddr; > +} > + > +static void dev_dmabuf_ops_release(struct dma_buf *dmabuf) > +{ > + struct dev_dmabuf *buf = dmabuf->priv; > + > + sg_free_table(&buf->sgt); > + dma_free_attrs(buf->dev, buf->size, buf->vaddr, > + buf->dma_addr, buf->attrs); > + put_device(buf->dev); > + kfree(buf); > +} > + > +static int dev_dmabuf_ops_mmap(struct dma_buf *dmabuf, > + struct vm_area_struct *vma) > +{ > + struct dev_dmabuf *buf = dmabuf->priv; > + int ret; > + > + ret = dma_mmap_attrs(buf->dev, vma, buf->vaddr, > + buf->dma_addr, buf->size, > + buf->attrs); > + if (ret) { > + dev_err(buf->dev, "remapping memory failed, error: %d\n", ret); > + return ret; > + } > + vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; > + > + return 0; > +} > + > +static const struct dma_buf_ops dev_dmabuf_ops = { > + .attach = dev_dmabuf_ops_attach, > + .detach = dev_dmabuf_ops_detach, > + .map_dma_buf = dev_dmabuf_ops_map, > + .unmap_dma_buf = dev_dmabuf_ops_unmap, > + .vmap = dev_dmabuf_ops_vmap, > + .mmap = dev_dmabuf_ops_mmap, > + .release = dev_dmabuf_ops_release, > +}; > + > +static int dev_heap_allocate(struct dma_heap *heap, > + unsigned long size, > + unsigned long fd_flags, > + unsigned long heap_flags) > +{ > + struct device *dev = dma_heap_get_drvdata(heap); > + struct dev_dmabuf *buf; > + struct dma_buf_export_info exp_info = {}; > + unsigned long attrs = 0; > + int ret = -ENOMEM; > + > + buf = kzalloc(sizeof(*buf), GFP_KERNEL); > + if (!buf) > + return -ENOMEM; > + > + buf->vaddr = dma_alloc_attrs(dev, size, &buf->dma_addr, > + GFP_KERNEL, attrs); > + /* Prevent the device from being released while the buffer is used */ > + buf->dev = get_device(dev); > + buf->heap = heap; > + buf->size = size; > + buf->attrs = attrs; > + > + /* XXX: This call is documented as unsafe. See dma_get_sgtable_attrs(). */ > + ret = dma_get_sgtable_attrs(buf->dev, &buf->sgt, > + buf->vaddr, buf->dma_addr, > + buf->size, buf->attrs); > + if (ret < 0) { > + dev_err(buf->dev, "failed to get scatterlist from DMA API\n"); > + return ret; > + } > + > + exp_info.exp_name = dev_name(dev); > + exp_info.owner = THIS_MODULE; > + exp_info.ops = &dev_dmabuf_ops; > + exp_info.size = size; > + exp_info.flags = fd_flags; > + exp_info.priv = buf; > + > + buf->dmabuf = dma_buf_export(&exp_info); > + if (IS_ERR(buf->dmabuf)) { > + dev_err(buf->dev, "failed to export dmabuf\n"); > + return PTR_ERR(buf->dmabuf); > + } > + > + ret = dma_buf_fd(buf->dmabuf, fd_flags); > + if (ret < 0) { > + dev_err(buf->dev, "failed to get dmabuf fd: %d\n", ret); > + return ret; > + } > + > + return ret; > +} > + > +static const struct dma_heap_ops dev_heap_ops = { > + .allocate = dev_heap_allocate, > +}; > + > +void dev_dma_heap_add(struct device *dev) > +{ > + struct dma_heap_export_info exp_info; > + > + exp_info.name = dev_name(dev); > + exp_info.ops = &dev_heap_ops; > + exp_info.priv = dev; > + > + dev->heap = dma_heap_add(&exp_info); > +} > +EXPORT_SYMBOL(dev_dma_heap_add); > diff --git a/include/linux/device.h b/include/linux/device.h > index ca18da4768e3..1fae95d55ea1 100644 > --- a/include/linux/device.h > +++ b/include/linux/device.h > @@ -45,6 +45,7 @@ struct iommu_ops; > struct iommu_group; > struct dev_pin_info; > struct dev_iommu; > +struct dma_heap; > > /** > * struct subsys_interface - interfaces to device functions > @@ -597,6 +598,10 @@ struct device { > struct iommu_group *iommu_group; > struct dev_iommu *iommu; > > +#ifdef CONFIG_DMABUF_HEAPS_DEVICES > + struct dma_heap *heap; > +#endif > + > bool offline_disabled:1; > bool offline:1; > bool of_node_reused:1; > diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h > index 454e354d1ffb..dcf7cca2f487 100644 > --- a/include/linux/dma-heap.h > +++ b/include/linux/dma-heap.h > @@ -56,4 +56,10 @@ void *dma_heap_get_drvdata(struct dma_heap *heap); > */ > struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info); > > +#ifdef CONFIG_DMABUF_HEAPS_DEVICES > +void dev_dma_heap_add(struct device *dev); > +#else > +static inline void dev_dma_heap_add(struct device *dev) {} > +#endif > + > #endif /* _DMA_HEAPS_H */ > -- > 2.27.0 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Hi Daniel, On Tue, Sep 01, 2020 at 09:32:22AM +0200, Daniel Vetter wrote: > On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: > > This heap is basically a wrapper around DMA-API dma_alloc_attrs, > > which will allocate memory suitable for the given device. > > > > The implementation is mostly a port of the Contiguous Videobuf2 > > memory allocator (see videobuf2/videobuf2-dma-contig.c) > > over to the DMA-BUF Heap interface. > > > > The intention of this allocator is to provide applications > > with a more system-agnostic API: the only thing the application > > needs to know is which device to get the buffer for. > > > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool > > is unknown to the application. > > > > I'm not really expecting this patch to be correct or even > > a good idea, but just submitting it to start a discussion on DMA-BUF > > heap discovery and negotiation. > > > > Given Plumbers is just a couple weeks from now, I've submitted > > a BoF proposal to discuss this, as perhaps it would make > > sense to discuss this live? > > > > Not-signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> > > I think on the uapi/constraint solving side there's been already tons of > discussions while I enjoyed vacations, so different concern from me > entirely on the implementation side: > > In the past the only thing we had in upstream was subsystem/driver > specific allocators, and dma-buf for sharing. With dma-buf heaps we kinda > get a central allocator, which duplicates large chunks of of all these > allocators. And since it's a central allocator by design, the reason for > having per-subsystem allocators is kinda gone. > > I think there's two approaches here: > - we convert e.g. drm allocator helpers to internally use the right heap > implementation. We however don't always have a need for a dmabuf, as not all buffers are meant to be shared (and we often can't tell beforehand, at allocation time, if a given buffer will be shared or not). While the overhead of allocating a dmabuf should be file, assigning a file descriptor to each buffer would be prohibitely expensive. We would need to decouple the dma heaps from file descriptors. I think that's doable, but it changes the philosophy of dma heaps and needs careful consideration. In particular, one may wonder what that would bring us, compared to the DMA mapping API for instance. > That would also give us some fairly direct way to expose > these constraints in sysfs so a userspace allocator knows which dma-buf > heap to pick for shared stuff. sysfs won't work I'm afraid, as constraints may depend on the format for instace. We need subsystem-specific APIs to expose constraints. > - we require that any heap is just a different uapi for an existing driver > allocator, e.g. by having a dma-buf wrapper for all gem drivers. > > Otherwise I think what we end up with is a pile of dma-buf heaps for > android's blob gpu driver world, and not used anywhere else. Not something > even remotely interesting for upstream :-) > > tldr; I'd like to see how dma-buf heaps closely integrate with all the > existing buffer management code we have. Both kernel (and throuhg some > allocator library effort) in userspace. > > > --- > > drivers/dma-buf/heaps/Kconfig | 9 + > > drivers/dma-buf/heaps/Makefile | 1 + > > drivers/dma-buf/heaps/device_heap.c | 268 ++++++++++++++++++++++++++++ > > include/linux/device.h | 5 + > > include/linux/dma-heap.h | 6 + > > 5 files changed, 289 insertions(+) > > create mode 100644 drivers/dma-buf/heaps/device_heap.c > > > > diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig > > index a5eef06c4226..2bb3604184bd 100644 > > --- a/drivers/dma-buf/heaps/Kconfig > > +++ b/drivers/dma-buf/heaps/Kconfig > > @@ -12,3 +12,12 @@ config DMABUF_HEAPS_CMA > > Choose this option to enable dma-buf CMA heap. This heap is backed > > by the Contiguous Memory Allocator (CMA). If your system has these > > regions, you should say Y here. > > + > > +config DMABUF_HEAPS_DEVICES > > + bool "DMA-BUF Device DMA Heap (Experimental)" > > + depends on DMABUF_HEAPS > > + help > > + Choose this option to enable dma-buf per-device heap. This heap is backed > > + by the DMA-API and it's an Experimental feature, meant mostly for testing > > + and experimentation. > > + Just say N here. > > diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile > > index 6e54cdec3da0..c691d85b3044 100644 > > --- a/drivers/dma-buf/heaps/Makefile > > +++ b/drivers/dma-buf/heaps/Makefile > > @@ -2,3 +2,4 @@ > > obj-y += heap-helpers.o > > obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o > > obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o > > +obj-$(CONFIG_DMABUF_HEAPS_DEVICES) += device_heap.o > > diff --git a/drivers/dma-buf/heaps/device_heap.c b/drivers/dma-buf/heaps/device_heap.c > > new file mode 100644 > > index 000000000000..1803dc622dd8 > > --- /dev/null > > +++ b/drivers/dma-buf/heaps/device_heap.c > > @@ -0,0 +1,268 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > +/* > > + * DMABUF Device DMA heap exporter > > + * > > + * Copyright (C) 2020, Collabora Ltd. > > + * > > + * Based on: > > + * videobuf2-dma-contig.c - DMA contig memory allocator for videobuf2 > > + * Copyright (C) 2010 Samsung Electronics > > + */ > > + > > +#include <linux/device.h> > > +#include <linux/dma-buf.h> > > +#include <linux/dma-heap.h> > > +#include <linux/dma-mapping.h> > > +#include <linux/scatterlist.h> > > +#include <linux/slab.h> > > +#include <linux/module.h> > > + > > +struct dev_dmabuf_attachment { > > + struct sg_table sgt; > > + enum dma_data_direction dma_dir; > > +}; > > + > > +struct dev_dmabuf { > > + struct dma_heap *heap; > > + struct dma_buf *dmabuf; > > + struct device *dev; > > + size_t size; > > + void *vaddr; > > + dma_addr_t dma_addr; > > + unsigned long attrs; > > + > > + struct sg_table sgt; > > +}; > > + > > +static struct sg_table *dev_dmabuf_ops_map(struct dma_buf_attachment *db_attach, > > + enum dma_data_direction dma_dir) > > +{ > > + struct dev_dmabuf_attachment *attach = db_attach->priv; > > + /* stealing dmabuf mutex to serialize map/unmap operations */ > > + struct mutex *lock = &db_attach->dmabuf->lock; > > + struct sg_table *sgt; > > + > > + mutex_lock(lock); > > + > > + sgt = &attach->sgt; > > + /* return previously mapped sg table */ > > + if (attach->dma_dir == dma_dir) { > > + mutex_unlock(lock); > > + return sgt; > > + } > > + > > + /* release any previous cache */ > > + if (attach->dma_dir != DMA_NONE) { > > + dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > > + attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > > + attach->dma_dir = DMA_NONE; > > + } > > + > > + /* > > + * mapping to the client with new direction, no cache sync > > + * required see comment in .dmabuf_ops_detach() > > + */ > > + sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > > + dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > > + if (!sgt->nents) { > > + dev_err(db_attach->dev, "failed to map scatterlist\n"); > > + mutex_unlock(lock); > > + return ERR_PTR(-EIO); > > + } > > + > > + attach->dma_dir = dma_dir; > > + > > + mutex_unlock(lock); > > + > > + return sgt; > > +} > > + > > +static void dev_dmabuf_ops_unmap(struct dma_buf_attachment *db_attach, > > + struct sg_table *sgt, > > + enum dma_data_direction dma_dir) > > +{ > > + /* nothing to be done here */ > > +} > > + > > +static int dev_dmabuf_ops_attach(struct dma_buf *dmabuf, > > + struct dma_buf_attachment *dbuf_attach) > > +{ > > + struct dev_dmabuf_attachment *attach; > > + unsigned int i; > > + struct scatterlist *rd, *wr; > > + struct sg_table *sgt; > > + struct dev_dmabuf *buf = dmabuf->priv; > > + int ret; > > + > > + attach = kzalloc(sizeof(*attach), GFP_KERNEL); > > + if (!attach) > > + return -ENOMEM; > > + sgt = &attach->sgt; > > + > > + /* > > + * Copy the buf->sgt scatter list to the attachment, as we can't > > + * map the same scatter list to multiple attachments at the same time. > > + */ > > + ret = sg_alloc_table(sgt, buf->sgt.orig_nents, GFP_KERNEL); > > + if (ret) { > > + kfree(attach); > > + return -ENOMEM; > > + } > > + > > + rd = buf->sgt.sgl; > > + wr = sgt->sgl; > > + for (i = 0; i < sgt->orig_nents; ++i) { > > + sg_set_page(wr, sg_page(rd), rd->length, rd->offset); > > + rd = sg_next(rd); > > + wr = sg_next(wr); > > + } > > + > > + attach->dma_dir = DMA_NONE; > > + dbuf_attach->priv = attach; > > + > > + return 0; > > +} > > + > > +static void dev_dmabuf_ops_detach(struct dma_buf *dmabuf, > > + struct dma_buf_attachment *db_attach) > > +{ > > + struct dev_dmabuf_attachment *attach = db_attach->priv; > > + struct sg_table *sgt; > > + > > + if (!attach) > > + return; > > + sgt = &attach->sgt; > > + > > + /* release the scatterlist cache */ > > + if (attach->dma_dir != DMA_NONE) > > + /* > > + * Cache sync can be skipped here, as the memory is > > + * allocated from device coherent memory, which means the > > + * memory locations do not require any explicit cache > > + * maintenance prior or after being used by the device. > > + * > > + * XXX: This needs a revisit. > > + */ > > + dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > > + attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > > + sg_free_table(sgt); > > + kfree(attach); > > + db_attach->priv = NULL; > > +} > > + > > + > > +static void *dev_dmabuf_ops_vmap(struct dma_buf *dmabuf) > > +{ > > + struct dev_dmabuf *buf = dmabuf->priv; > > + > > + return buf->vaddr; > > +} > > + > > +static void dev_dmabuf_ops_release(struct dma_buf *dmabuf) > > +{ > > + struct dev_dmabuf *buf = dmabuf->priv; > > + > > + sg_free_table(&buf->sgt); > > + dma_free_attrs(buf->dev, buf->size, buf->vaddr, > > + buf->dma_addr, buf->attrs); > > + put_device(buf->dev); > > + kfree(buf); > > +} > > + > > +static int dev_dmabuf_ops_mmap(struct dma_buf *dmabuf, > > + struct vm_area_struct *vma) > > +{ > > + struct dev_dmabuf *buf = dmabuf->priv; > > + int ret; > > + > > + ret = dma_mmap_attrs(buf->dev, vma, buf->vaddr, > > + buf->dma_addr, buf->size, > > + buf->attrs); > > + if (ret) { > > + dev_err(buf->dev, "remapping memory failed, error: %d\n", ret); > > + return ret; > > + } > > + vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; > > + > > + return 0; > > +} > > + > > +static const struct dma_buf_ops dev_dmabuf_ops = { > > + .attach = dev_dmabuf_ops_attach, > > + .detach = dev_dmabuf_ops_detach, > > + .map_dma_buf = dev_dmabuf_ops_map, > > + .unmap_dma_buf = dev_dmabuf_ops_unmap, > > + .vmap = dev_dmabuf_ops_vmap, > > + .mmap = dev_dmabuf_ops_mmap, > > + .release = dev_dmabuf_ops_release, > > +}; > > + > > +static int dev_heap_allocate(struct dma_heap *heap, > > + unsigned long size, > > + unsigned long fd_flags, > > + unsigned long heap_flags) > > +{ > > + struct device *dev = dma_heap_get_drvdata(heap); > > + struct dev_dmabuf *buf; > > + struct dma_buf_export_info exp_info = {}; > > + unsigned long attrs = 0; > > + int ret = -ENOMEM; > > + > > + buf = kzalloc(sizeof(*buf), GFP_KERNEL); > > + if (!buf) > > + return -ENOMEM; > > + > > + buf->vaddr = dma_alloc_attrs(dev, size, &buf->dma_addr, > > + GFP_KERNEL, attrs); > > + /* Prevent the device from being released while the buffer is used */ > > + buf->dev = get_device(dev); > > + buf->heap = heap; > > + buf->size = size; > > + buf->attrs = attrs; > > + > > + /* XXX: This call is documented as unsafe. See dma_get_sgtable_attrs(). */ > > + ret = dma_get_sgtable_attrs(buf->dev, &buf->sgt, > > + buf->vaddr, buf->dma_addr, > > + buf->size, buf->attrs); > > + if (ret < 0) { > > + dev_err(buf->dev, "failed to get scatterlist from DMA API\n"); > > + return ret; > > + } > > + > > + exp_info.exp_name = dev_name(dev); > > + exp_info.owner = THIS_MODULE; > > + exp_info.ops = &dev_dmabuf_ops; > > + exp_info.size = size; > > + exp_info.flags = fd_flags; > > + exp_info.priv = buf; > > + > > + buf->dmabuf = dma_buf_export(&exp_info); > > + if (IS_ERR(buf->dmabuf)) { > > + dev_err(buf->dev, "failed to export dmabuf\n"); > > + return PTR_ERR(buf->dmabuf); > > + } > > + > > + ret = dma_buf_fd(buf->dmabuf, fd_flags); > > + if (ret < 0) { > > + dev_err(buf->dev, "failed to get dmabuf fd: %d\n", ret); > > + return ret; > > + } > > + > > + return ret; > > +} > > + > > +static const struct dma_heap_ops dev_heap_ops = { > > + .allocate = dev_heap_allocate, > > +}; > > + > > +void dev_dma_heap_add(struct device *dev) > > +{ > > + struct dma_heap_export_info exp_info; > > + > > + exp_info.name = dev_name(dev); > > + exp_info.ops = &dev_heap_ops; > > + exp_info.priv = dev; > > + > > + dev->heap = dma_heap_add(&exp_info); > > +} > > +EXPORT_SYMBOL(dev_dma_heap_add); > > diff --git a/include/linux/device.h b/include/linux/device.h > > index ca18da4768e3..1fae95d55ea1 100644 > > --- a/include/linux/device.h > > +++ b/include/linux/device.h > > @@ -45,6 +45,7 @@ struct iommu_ops; > > struct iommu_group; > > struct dev_pin_info; > > struct dev_iommu; > > +struct dma_heap; > > > > /** > > * struct subsys_interface - interfaces to device functions > > @@ -597,6 +598,10 @@ struct device { > > struct iommu_group *iommu_group; > > struct dev_iommu *iommu; > > > > +#ifdef CONFIG_DMABUF_HEAPS_DEVICES > > + struct dma_heap *heap; > > +#endif > > + > > bool offline_disabled:1; > > bool offline:1; > > bool of_node_reused:1; > > diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h > > index 454e354d1ffb..dcf7cca2f487 100644 > > --- a/include/linux/dma-heap.h > > +++ b/include/linux/dma-heap.h > > @@ -56,4 +56,10 @@ void *dma_heap_get_drvdata(struct dma_heap *heap); > > */ > > struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info); > > > > +#ifdef CONFIG_DMABUF_HEAPS_DEVICES > > +void dev_dma_heap_add(struct device *dev); > > +#else > > +static inline void dev_dma_heap_add(struct device *dev) {} > > +#endif > > + > > #endif /* _DMA_HEAPS_H */
On Tue, Sep 08, 2020 at 08:43:59AM +0300, Laurent Pinchart wrote: > Hi Daniel, > > On Tue, Sep 01, 2020 at 09:32:22AM +0200, Daniel Vetter wrote: > > On Sun, Aug 16, 2020 at 02:22:46PM -0300, Ezequiel Garcia wrote: > > > This heap is basically a wrapper around DMA-API dma_alloc_attrs, > > > which will allocate memory suitable for the given device. > > > > > > The implementation is mostly a port of the Contiguous Videobuf2 > > > memory allocator (see videobuf2/videobuf2-dma-contig.c) > > > over to the DMA-BUF Heap interface. > > > > > > The intention of this allocator is to provide applications > > > with a more system-agnostic API: the only thing the application > > > needs to know is which device to get the buffer for. > > > > > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool > > > is unknown to the application. > > > > > > I'm not really expecting this patch to be correct or even > > > a good idea, but just submitting it to start a discussion on DMA-BUF > > > heap discovery and negotiation. > > > > > > Given Plumbers is just a couple weeks from now, I've submitted > > > a BoF proposal to discuss this, as perhaps it would make > > > sense to discuss this live? > > > > > > Not-signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> > > > > I think on the uapi/constraint solving side there's been already tons of > > discussions while I enjoyed vacations, so different concern from me > > entirely on the implementation side: > > > > In the past the only thing we had in upstream was subsystem/driver > > specific allocators, and dma-buf for sharing. With dma-buf heaps we kinda > > get a central allocator, which duplicates large chunks of of all these > > allocators. And since it's a central allocator by design, the reason for > > having per-subsystem allocators is kinda gone. > > > > I think there's two approaches here: > > - we convert e.g. drm allocator helpers to internally use the right heap > > implementation. > > We however don't always have a need for a dmabuf, as not all buffers are > meant to be shared (and we often can't tell beforehand, at allocation > time, if a given buffer will be shared or not). While the overhead of > allocating a dmabuf should be file, assigning a file descriptor to each > buffer would be prohibitely expensive. > > We would need to decouple the dma heaps from file descriptors. I think > that's doable, but it changes the philosophy of dma heaps and needs > careful consideration. In particular, one may wonder what that would > bring us, compared to the DMA mapping API for instance. Oh I don't mean that we use the dma-buf heaps interfaces, but the backend. Underneath it's all a struct dma_buf, that's where I figured we should intercept the code sharing. Definitely not file descriptor handles. > > That would also give us some fairly direct way to expose > > these constraints in sysfs so a userspace allocator knows which dma-buf > > heap to pick for shared stuff. > > sysfs won't work I'm afraid, as constraints may depend on the format for > instace. We need subsystem-specific APIs to expose constraints. Yeah sysfs would be the default, and maybe a list of useable heaps that the subsystem api then tells you which to use when. -Daniel > > > - we require that any heap is just a different uapi for an existing driver > > allocator, e.g. by having a dma-buf wrapper for all gem drivers. > > > > Otherwise I think what we end up with is a pile of dma-buf heaps for > > android's blob gpu driver world, and not used anywhere else. Not something > > even remotely interesting for upstream :-) > > > > tldr; I'd like to see how dma-buf heaps closely integrate with all the > > existing buffer management code we have. Both kernel (and throuhg some > > allocator library effort) in userspace. > > > > > --- > > > drivers/dma-buf/heaps/Kconfig | 9 + > > > drivers/dma-buf/heaps/Makefile | 1 + > > > drivers/dma-buf/heaps/device_heap.c | 268 ++++++++++++++++++++++++++++ > > > include/linux/device.h | 5 + > > > include/linux/dma-heap.h | 6 + > > > 5 files changed, 289 insertions(+) > > > create mode 100644 drivers/dma-buf/heaps/device_heap.c > > > > > > diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig > > > index a5eef06c4226..2bb3604184bd 100644 > > > --- a/drivers/dma-buf/heaps/Kconfig > > > +++ b/drivers/dma-buf/heaps/Kconfig > > > @@ -12,3 +12,12 @@ config DMABUF_HEAPS_CMA > > > Choose this option to enable dma-buf CMA heap. This heap is backed > > > by the Contiguous Memory Allocator (CMA). If your system has these > > > regions, you should say Y here. > > > + > > > +config DMABUF_HEAPS_DEVICES > > > + bool "DMA-BUF Device DMA Heap (Experimental)" > > > + depends on DMABUF_HEAPS > > > + help > > > + Choose this option to enable dma-buf per-device heap. This heap is backed > > > + by the DMA-API and it's an Experimental feature, meant mostly for testing > > > + and experimentation. > > > + Just say N here. > > > diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile > > > index 6e54cdec3da0..c691d85b3044 100644 > > > --- a/drivers/dma-buf/heaps/Makefile > > > +++ b/drivers/dma-buf/heaps/Makefile > > > @@ -2,3 +2,4 @@ > > > obj-y += heap-helpers.o > > > obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o > > > obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o > > > +obj-$(CONFIG_DMABUF_HEAPS_DEVICES) += device_heap.o > > > diff --git a/drivers/dma-buf/heaps/device_heap.c b/drivers/dma-buf/heaps/device_heap.c > > > new file mode 100644 > > > index 000000000000..1803dc622dd8 > > > --- /dev/null > > > +++ b/drivers/dma-buf/heaps/device_heap.c > > > @@ -0,0 +1,268 @@ > > > +// SPDX-License-Identifier: GPL-2.0 > > > +/* > > > + * DMABUF Device DMA heap exporter > > > + * > > > + * Copyright (C) 2020, Collabora Ltd. > > > + * > > > + * Based on: > > > + * videobuf2-dma-contig.c - DMA contig memory allocator for videobuf2 > > > + * Copyright (C) 2010 Samsung Electronics > > > + */ > > > + > > > +#include <linux/device.h> > > > +#include <linux/dma-buf.h> > > > +#include <linux/dma-heap.h> > > > +#include <linux/dma-mapping.h> > > > +#include <linux/scatterlist.h> > > > +#include <linux/slab.h> > > > +#include <linux/module.h> > > > + > > > +struct dev_dmabuf_attachment { > > > + struct sg_table sgt; > > > + enum dma_data_direction dma_dir; > > > +}; > > > + > > > +struct dev_dmabuf { > > > + struct dma_heap *heap; > > > + struct dma_buf *dmabuf; > > > + struct device *dev; > > > + size_t size; > > > + void *vaddr; > > > + dma_addr_t dma_addr; > > > + unsigned long attrs; > > > + > > > + struct sg_table sgt; > > > +}; > > > + > > > +static struct sg_table *dev_dmabuf_ops_map(struct dma_buf_attachment *db_attach, > > > + enum dma_data_direction dma_dir) > > > +{ > > > + struct dev_dmabuf_attachment *attach = db_attach->priv; > > > + /* stealing dmabuf mutex to serialize map/unmap operations */ > > > + struct mutex *lock = &db_attach->dmabuf->lock; > > > + struct sg_table *sgt; > > > + > > > + mutex_lock(lock); > > > + > > > + sgt = &attach->sgt; > > > + /* return previously mapped sg table */ > > > + if (attach->dma_dir == dma_dir) { > > > + mutex_unlock(lock); > > > + return sgt; > > > + } > > > + > > > + /* release any previous cache */ > > > + if (attach->dma_dir != DMA_NONE) { > > > + dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > > > + attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > > > + attach->dma_dir = DMA_NONE; > > > + } > > > + > > > + /* > > > + * mapping to the client with new direction, no cache sync > > > + * required see comment in .dmabuf_ops_detach() > > > + */ > > > + sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > > > + dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > > > + if (!sgt->nents) { > > > + dev_err(db_attach->dev, "failed to map scatterlist\n"); > > > + mutex_unlock(lock); > > > + return ERR_PTR(-EIO); > > > + } > > > + > > > + attach->dma_dir = dma_dir; > > > + > > > + mutex_unlock(lock); > > > + > > > + return sgt; > > > +} > > > + > > > +static void dev_dmabuf_ops_unmap(struct dma_buf_attachment *db_attach, > > > + struct sg_table *sgt, > > > + enum dma_data_direction dma_dir) > > > +{ > > > + /* nothing to be done here */ > > > +} > > > + > > > +static int dev_dmabuf_ops_attach(struct dma_buf *dmabuf, > > > + struct dma_buf_attachment *dbuf_attach) > > > +{ > > > + struct dev_dmabuf_attachment *attach; > > > + unsigned int i; > > > + struct scatterlist *rd, *wr; > > > + struct sg_table *sgt; > > > + struct dev_dmabuf *buf = dmabuf->priv; > > > + int ret; > > > + > > > + attach = kzalloc(sizeof(*attach), GFP_KERNEL); > > > + if (!attach) > > > + return -ENOMEM; > > > + sgt = &attach->sgt; > > > + > > > + /* > > > + * Copy the buf->sgt scatter list to the attachment, as we can't > > > + * map the same scatter list to multiple attachments at the same time. > > > + */ > > > + ret = sg_alloc_table(sgt, buf->sgt.orig_nents, GFP_KERNEL); > > > + if (ret) { > > > + kfree(attach); > > > + return -ENOMEM; > > > + } > > > + > > > + rd = buf->sgt.sgl; > > > + wr = sgt->sgl; > > > + for (i = 0; i < sgt->orig_nents; ++i) { > > > + sg_set_page(wr, sg_page(rd), rd->length, rd->offset); > > > + rd = sg_next(rd); > > > + wr = sg_next(wr); > > > + } > > > + > > > + attach->dma_dir = DMA_NONE; > > > + dbuf_attach->priv = attach; > > > + > > > + return 0; > > > +} > > > + > > > +static void dev_dmabuf_ops_detach(struct dma_buf *dmabuf, > > > + struct dma_buf_attachment *db_attach) > > > +{ > > > + struct dev_dmabuf_attachment *attach = db_attach->priv; > > > + struct sg_table *sgt; > > > + > > > + if (!attach) > > > + return; > > > + sgt = &attach->sgt; > > > + > > > + /* release the scatterlist cache */ > > > + if (attach->dma_dir != DMA_NONE) > > > + /* > > > + * Cache sync can be skipped here, as the memory is > > > + * allocated from device coherent memory, which means the > > > + * memory locations do not require any explicit cache > > > + * maintenance prior or after being used by the device. > > > + * > > > + * XXX: This needs a revisit. > > > + */ > > > + dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > > > + attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > > > + sg_free_table(sgt); > > > + kfree(attach); > > > + db_attach->priv = NULL; > > > +} > > > + > > > + > > > +static void *dev_dmabuf_ops_vmap(struct dma_buf *dmabuf) > > > +{ > > > + struct dev_dmabuf *buf = dmabuf->priv; > > > + > > > + return buf->vaddr; > > > +} > > > + > > > +static void dev_dmabuf_ops_release(struct dma_buf *dmabuf) > > > +{ > > > + struct dev_dmabuf *buf = dmabuf->priv; > > > + > > > + sg_free_table(&buf->sgt); > > > + dma_free_attrs(buf->dev, buf->size, buf->vaddr, > > > + buf->dma_addr, buf->attrs); > > > + put_device(buf->dev); > > > + kfree(buf); > > > +} > > > + > > > +static int dev_dmabuf_ops_mmap(struct dma_buf *dmabuf, > > > + struct vm_area_struct *vma) > > > +{ > > > + struct dev_dmabuf *buf = dmabuf->priv; > > > + int ret; > > > + > > > + ret = dma_mmap_attrs(buf->dev, vma, buf->vaddr, > > > + buf->dma_addr, buf->size, > > > + buf->attrs); > > > + if (ret) { > > > + dev_err(buf->dev, "remapping memory failed, error: %d\n", ret); > > > + return ret; > > > + } > > > + vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; > > > + > > > + return 0; > > > +} > > > + > > > +static const struct dma_buf_ops dev_dmabuf_ops = { > > > + .attach = dev_dmabuf_ops_attach, > > > + .detach = dev_dmabuf_ops_detach, > > > + .map_dma_buf = dev_dmabuf_ops_map, > > > + .unmap_dma_buf = dev_dmabuf_ops_unmap, > > > + .vmap = dev_dmabuf_ops_vmap, > > > + .mmap = dev_dmabuf_ops_mmap, > > > + .release = dev_dmabuf_ops_release, > > > +}; > > > + > > > +static int dev_heap_allocate(struct dma_heap *heap, > > > + unsigned long size, > > > + unsigned long fd_flags, > > > + unsigned long heap_flags) > > > +{ > > > + struct device *dev = dma_heap_get_drvdata(heap); > > > + struct dev_dmabuf *buf; > > > + struct dma_buf_export_info exp_info = {}; > > > + unsigned long attrs = 0; > > > + int ret = -ENOMEM; > > > + > > > + buf = kzalloc(sizeof(*buf), GFP_KERNEL); > > > + if (!buf) > > > + return -ENOMEM; > > > + > > > + buf->vaddr = dma_alloc_attrs(dev, size, &buf->dma_addr, > > > + GFP_KERNEL, attrs); > > > + /* Prevent the device from being released while the buffer is used */ > > > + buf->dev = get_device(dev); > > > + buf->heap = heap; > > > + buf->size = size; > > > + buf->attrs = attrs; > > > + > > > + /* XXX: This call is documented as unsafe. See dma_get_sgtable_attrs(). */ > > > + ret = dma_get_sgtable_attrs(buf->dev, &buf->sgt, > > > + buf->vaddr, buf->dma_addr, > > > + buf->size, buf->attrs); > > > + if (ret < 0) { > > > + dev_err(buf->dev, "failed to get scatterlist from DMA API\n"); > > > + return ret; > > > + } > > > + > > > + exp_info.exp_name = dev_name(dev); > > > + exp_info.owner = THIS_MODULE; > > > + exp_info.ops = &dev_dmabuf_ops; > > > + exp_info.size = size; > > > + exp_info.flags = fd_flags; > > > + exp_info.priv = buf; > > > + > > > + buf->dmabuf = dma_buf_export(&exp_info); > > > + if (IS_ERR(buf->dmabuf)) { > > > + dev_err(buf->dev, "failed to export dmabuf\n"); > > > + return PTR_ERR(buf->dmabuf); > > > + } > > > + > > > + ret = dma_buf_fd(buf->dmabuf, fd_flags); > > > + if (ret < 0) { > > > + dev_err(buf->dev, "failed to get dmabuf fd: %d\n", ret); > > > + return ret; > > > + } > > > + > > > + return ret; > > > +} > > > + > > > +static const struct dma_heap_ops dev_heap_ops = { > > > + .allocate = dev_heap_allocate, > > > +}; > > > + > > > +void dev_dma_heap_add(struct device *dev) > > > +{ > > > + struct dma_heap_export_info exp_info; > > > + > > > + exp_info.name = dev_name(dev); > > > + exp_info.ops = &dev_heap_ops; > > > + exp_info.priv = dev; > > > + > > > + dev->heap = dma_heap_add(&exp_info); > > > +} > > > +EXPORT_SYMBOL(dev_dma_heap_add); > > > diff --git a/include/linux/device.h b/include/linux/device.h > > > index ca18da4768e3..1fae95d55ea1 100644 > > > --- a/include/linux/device.h > > > +++ b/include/linux/device.h > > > @@ -45,6 +45,7 @@ struct iommu_ops; > > > struct iommu_group; > > > struct dev_pin_info; > > > struct dev_iommu; > > > +struct dma_heap; > > > > > > /** > > > * struct subsys_interface - interfaces to device functions > > > @@ -597,6 +598,10 @@ struct device { > > > struct iommu_group *iommu_group; > > > struct dev_iommu *iommu; > > > > > > +#ifdef CONFIG_DMABUF_HEAPS_DEVICES > > > + struct dma_heap *heap; > > > +#endif > > > + > > > bool offline_disabled:1; > > > bool offline:1; > > > bool of_node_reused:1; > > > diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h > > > index 454e354d1ffb..dcf7cca2f487 100644 > > > --- a/include/linux/dma-heap.h > > > +++ b/include/linux/dma-heap.h > > > @@ -56,4 +56,10 @@ void *dma_heap_get_drvdata(struct dma_heap *heap); > > > */ > > > struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info); > > > > > > +#ifdef CONFIG_DMABUF_HEAPS_DEVICES > > > +void dev_dma_heap_add(struct device *dev); > > > +#else > > > +static inline void dev_dma_heap_add(struct device *dev) {} > > > +#endif > > > + > > > #endif /* _DMA_HEAPS_H */ > > -- > Regards, > > Laurent Pinchart
On Mon, Aug 17, 2020 at 9:09 AM Ezequiel Garcia <ezequiel@collabora.com> wrote: > > This heap is basically a wrapper around DMA-API dma_alloc_attrs, > which will allocate memory suitable for the given device. > > The implementation is mostly a port of the Contiguous Videobuf2 > memory allocator (see videobuf2/videobuf2-dma-contig.c) > over to the DMA-BUF Heap interface. > > The intention of this allocator is to provide applications > with a more system-agnostic API: the only thing the application > needs to know is which device to get the buffer for. > > Whether the buffer is backed by CMA, IOMMU or a DMA Pool > is unknown to the application. > > I'm not really expecting this patch to be correct or even > a good idea, but just submitting it to start a discussion on DMA-BUF > heap discovery and negotiation. > > Given Plumbers is just a couple weeks from now, I've submitted > a BoF proposal to discuss this, as perhaps it would make > sense to discuss this live? > > Not-signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Adding Simon and James who've done a presentation at XDC just now about allocation constraint solving and heap ids came up. There's going to be a discussion tomorrow in the workshop track, might be worth and try joining if you can make it happen (xdc is virtual and free). Cheers, Daniel > --- > drivers/dma-buf/heaps/Kconfig | 9 + > drivers/dma-buf/heaps/Makefile | 1 + > drivers/dma-buf/heaps/device_heap.c | 268 ++++++++++++++++++++++++++++ > include/linux/device.h | 5 + > include/linux/dma-heap.h | 6 + > 5 files changed, 289 insertions(+) > create mode 100644 drivers/dma-buf/heaps/device_heap.c > > diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig > index a5eef06c4226..2bb3604184bd 100644 > --- a/drivers/dma-buf/heaps/Kconfig > +++ b/drivers/dma-buf/heaps/Kconfig > @@ -12,3 +12,12 @@ config DMABUF_HEAPS_CMA > Choose this option to enable dma-buf CMA heap. This heap is backed > by the Contiguous Memory Allocator (CMA). If your system has these > regions, you should say Y here. > + > +config DMABUF_HEAPS_DEVICES > + bool "DMA-BUF Device DMA Heap (Experimental)" > + depends on DMABUF_HEAPS > + help > + Choose this option to enable dma-buf per-device heap. This heap is backed > + by the DMA-API and it's an Experimental feature, meant mostly for testing > + and experimentation. > + Just say N here. > diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile > index 6e54cdec3da0..c691d85b3044 100644 > --- a/drivers/dma-buf/heaps/Makefile > +++ b/drivers/dma-buf/heaps/Makefile > @@ -2,3 +2,4 @@ > obj-y += heap-helpers.o > obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o > obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o > +obj-$(CONFIG_DMABUF_HEAPS_DEVICES) += device_heap.o > diff --git a/drivers/dma-buf/heaps/device_heap.c b/drivers/dma-buf/heaps/device_heap.c > new file mode 100644 > index 000000000000..1803dc622dd8 > --- /dev/null > +++ b/drivers/dma-buf/heaps/device_heap.c > @@ -0,0 +1,268 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * DMABUF Device DMA heap exporter > + * > + * Copyright (C) 2020, Collabora Ltd. > + * > + * Based on: > + * videobuf2-dma-contig.c - DMA contig memory allocator for videobuf2 > + * Copyright (C) 2010 Samsung Electronics > + */ > + > +#include <linux/device.h> > +#include <linux/dma-buf.h> > +#include <linux/dma-heap.h> > +#include <linux/dma-mapping.h> > +#include <linux/scatterlist.h> > +#include <linux/slab.h> > +#include <linux/module.h> > + > +struct dev_dmabuf_attachment { > + struct sg_table sgt; > + enum dma_data_direction dma_dir; > +}; > + > +struct dev_dmabuf { > + struct dma_heap *heap; > + struct dma_buf *dmabuf; > + struct device *dev; > + size_t size; > + void *vaddr; > + dma_addr_t dma_addr; > + unsigned long attrs; > + > + struct sg_table sgt; > +}; > + > +static struct sg_table *dev_dmabuf_ops_map(struct dma_buf_attachment *db_attach, > + enum dma_data_direction dma_dir) > +{ > + struct dev_dmabuf_attachment *attach = db_attach->priv; > + /* stealing dmabuf mutex to serialize map/unmap operations */ > + struct mutex *lock = &db_attach->dmabuf->lock; > + struct sg_table *sgt; > + > + mutex_lock(lock); > + > + sgt = &attach->sgt; > + /* return previously mapped sg table */ > + if (attach->dma_dir == dma_dir) { > + mutex_unlock(lock); > + return sgt; > + } > + > + /* release any previous cache */ > + if (attach->dma_dir != DMA_NONE) { > + dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > + attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > + attach->dma_dir = DMA_NONE; > + } > + > + /* > + * mapping to the client with new direction, no cache sync > + * required see comment in .dmabuf_ops_detach() > + */ > + sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > + dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > + if (!sgt->nents) { > + dev_err(db_attach->dev, "failed to map scatterlist\n"); > + mutex_unlock(lock); > + return ERR_PTR(-EIO); > + } > + > + attach->dma_dir = dma_dir; > + > + mutex_unlock(lock); > + > + return sgt; > +} > + > +static void dev_dmabuf_ops_unmap(struct dma_buf_attachment *db_attach, > + struct sg_table *sgt, > + enum dma_data_direction dma_dir) > +{ > + /* nothing to be done here */ > +} > + > +static int dev_dmabuf_ops_attach(struct dma_buf *dmabuf, > + struct dma_buf_attachment *dbuf_attach) > +{ > + struct dev_dmabuf_attachment *attach; > + unsigned int i; > + struct scatterlist *rd, *wr; > + struct sg_table *sgt; > + struct dev_dmabuf *buf = dmabuf->priv; > + int ret; > + > + attach = kzalloc(sizeof(*attach), GFP_KERNEL); > + if (!attach) > + return -ENOMEM; > + sgt = &attach->sgt; > + > + /* > + * Copy the buf->sgt scatter list to the attachment, as we can't > + * map the same scatter list to multiple attachments at the same time. > + */ > + ret = sg_alloc_table(sgt, buf->sgt.orig_nents, GFP_KERNEL); > + if (ret) { > + kfree(attach); > + return -ENOMEM; > + } > + > + rd = buf->sgt.sgl; > + wr = sgt->sgl; > + for (i = 0; i < sgt->orig_nents; ++i) { > + sg_set_page(wr, sg_page(rd), rd->length, rd->offset); > + rd = sg_next(rd); > + wr = sg_next(wr); > + } > + > + attach->dma_dir = DMA_NONE; > + dbuf_attach->priv = attach; > + > + return 0; > +} > + > +static void dev_dmabuf_ops_detach(struct dma_buf *dmabuf, > + struct dma_buf_attachment *db_attach) > +{ > + struct dev_dmabuf_attachment *attach = db_attach->priv; > + struct sg_table *sgt; > + > + if (!attach) > + return; > + sgt = &attach->sgt; > + > + /* release the scatterlist cache */ > + if (attach->dma_dir != DMA_NONE) > + /* > + * Cache sync can be skipped here, as the memory is > + * allocated from device coherent memory, which means the > + * memory locations do not require any explicit cache > + * maintenance prior or after being used by the device. > + * > + * XXX: This needs a revisit. > + */ > + dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, > + attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC); > + sg_free_table(sgt); > + kfree(attach); > + db_attach->priv = NULL; > +} > + > + > +static void *dev_dmabuf_ops_vmap(struct dma_buf *dmabuf) > +{ > + struct dev_dmabuf *buf = dmabuf->priv; > + > + return buf->vaddr; > +} > + > +static void dev_dmabuf_ops_release(struct dma_buf *dmabuf) > +{ > + struct dev_dmabuf *buf = dmabuf->priv; > + > + sg_free_table(&buf->sgt); > + dma_free_attrs(buf->dev, buf->size, buf->vaddr, > + buf->dma_addr, buf->attrs); > + put_device(buf->dev); > + kfree(buf); > +} > + > +static int dev_dmabuf_ops_mmap(struct dma_buf *dmabuf, > + struct vm_area_struct *vma) > +{ > + struct dev_dmabuf *buf = dmabuf->priv; > + int ret; > + > + ret = dma_mmap_attrs(buf->dev, vma, buf->vaddr, > + buf->dma_addr, buf->size, > + buf->attrs); > + if (ret) { > + dev_err(buf->dev, "remapping memory failed, error: %d\n", ret); > + return ret; > + } > + vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; > + > + return 0; > +} > + > +static const struct dma_buf_ops dev_dmabuf_ops = { > + .attach = dev_dmabuf_ops_attach, > + .detach = dev_dmabuf_ops_detach, > + .map_dma_buf = dev_dmabuf_ops_map, > + .unmap_dma_buf = dev_dmabuf_ops_unmap, > + .vmap = dev_dmabuf_ops_vmap, > + .mmap = dev_dmabuf_ops_mmap, > + .release = dev_dmabuf_ops_release, > +}; > + > +static int dev_heap_allocate(struct dma_heap *heap, > + unsigned long size, > + unsigned long fd_flags, > + unsigned long heap_flags) > +{ > + struct device *dev = dma_heap_get_drvdata(heap); > + struct dev_dmabuf *buf; > + struct dma_buf_export_info exp_info = {}; > + unsigned long attrs = 0; > + int ret = -ENOMEM; > + > + buf = kzalloc(sizeof(*buf), GFP_KERNEL); > + if (!buf) > + return -ENOMEM; > + > + buf->vaddr = dma_alloc_attrs(dev, size, &buf->dma_addr, > + GFP_KERNEL, attrs); > + /* Prevent the device from being released while the buffer is used */ > + buf->dev = get_device(dev); > + buf->heap = heap; > + buf->size = size; > + buf->attrs = attrs; > + > + /* XXX: This call is documented as unsafe. See dma_get_sgtable_attrs(). */ > + ret = dma_get_sgtable_attrs(buf->dev, &buf->sgt, > + buf->vaddr, buf->dma_addr, > + buf->size, buf->attrs); > + if (ret < 0) { > + dev_err(buf->dev, "failed to get scatterlist from DMA API\n"); > + return ret; > + } > + > + exp_info.exp_name = dev_name(dev); > + exp_info.owner = THIS_MODULE; > + exp_info.ops = &dev_dmabuf_ops; > + exp_info.size = size; > + exp_info.flags = fd_flags; > + exp_info.priv = buf; > + > + buf->dmabuf = dma_buf_export(&exp_info); > + if (IS_ERR(buf->dmabuf)) { > + dev_err(buf->dev, "failed to export dmabuf\n"); > + return PTR_ERR(buf->dmabuf); > + } > + > + ret = dma_buf_fd(buf->dmabuf, fd_flags); > + if (ret < 0) { > + dev_err(buf->dev, "failed to get dmabuf fd: %d\n", ret); > + return ret; > + } > + > + return ret; > +} > + > +static const struct dma_heap_ops dev_heap_ops = { > + .allocate = dev_heap_allocate, > +}; > + > +void dev_dma_heap_add(struct device *dev) > +{ > + struct dma_heap_export_info exp_info; > + > + exp_info.name = dev_name(dev); > + exp_info.ops = &dev_heap_ops; > + exp_info.priv = dev; > + > + dev->heap = dma_heap_add(&exp_info); > +} > +EXPORT_SYMBOL(dev_dma_heap_add); > diff --git a/include/linux/device.h b/include/linux/device.h > index ca18da4768e3..1fae95d55ea1 100644 > --- a/include/linux/device.h > +++ b/include/linux/device.h > @@ -45,6 +45,7 @@ struct iommu_ops; > struct iommu_group; > struct dev_pin_info; > struct dev_iommu; > +struct dma_heap; > > /** > * struct subsys_interface - interfaces to device functions > @@ -597,6 +598,10 @@ struct device { > struct iommu_group *iommu_group; > struct dev_iommu *iommu; > > +#ifdef CONFIG_DMABUF_HEAPS_DEVICES > + struct dma_heap *heap; > +#endif > + > bool offline_disabled:1; > bool offline:1; > bool of_node_reused:1; > diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h > index 454e354d1ffb..dcf7cca2f487 100644 > --- a/include/linux/dma-heap.h > +++ b/include/linux/dma-heap.h > @@ -56,4 +56,10 @@ void *dma_heap_get_drvdata(struct dma_heap *heap); > */ > struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info); > > +#ifdef CONFIG_DMABUF_HEAPS_DEVICES > +void dev_dma_heap_add(struct device *dev); > +#else > +static inline void dev_dma_heap_add(struct device *dev) {} > +#endif > + > #endif /* _DMA_HEAPS_H */ > -- > 2.27.0 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
diff --git a/drivers/dma-buf/heaps/Kconfig b/drivers/dma-buf/heaps/Kconfig index a5eef06c4226..2bb3604184bd 100644 --- a/drivers/dma-buf/heaps/Kconfig +++ b/drivers/dma-buf/heaps/Kconfig @@ -12,3 +12,12 @@ config DMABUF_HEAPS_CMA Choose this option to enable dma-buf CMA heap. This heap is backed by the Contiguous Memory Allocator (CMA). If your system has these regions, you should say Y here. + +config DMABUF_HEAPS_DEVICES + bool "DMA-BUF Device DMA Heap (Experimental)" + depends on DMABUF_HEAPS + help + Choose this option to enable dma-buf per-device heap. This heap is backed + by the DMA-API and it's an Experimental feature, meant mostly for testing + and experimentation. + Just say N here. diff --git a/drivers/dma-buf/heaps/Makefile b/drivers/dma-buf/heaps/Makefile index 6e54cdec3da0..c691d85b3044 100644 --- a/drivers/dma-buf/heaps/Makefile +++ b/drivers/dma-buf/heaps/Makefile @@ -2,3 +2,4 @@ obj-y += heap-helpers.o obj-$(CONFIG_DMABUF_HEAPS_SYSTEM) += system_heap.o obj-$(CONFIG_DMABUF_HEAPS_CMA) += cma_heap.o +obj-$(CONFIG_DMABUF_HEAPS_DEVICES) += device_heap.o diff --git a/drivers/dma-buf/heaps/device_heap.c b/drivers/dma-buf/heaps/device_heap.c new file mode 100644 index 000000000000..1803dc622dd8 --- /dev/null +++ b/drivers/dma-buf/heaps/device_heap.c @@ -0,0 +1,268 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * DMABUF Device DMA heap exporter + * + * Copyright (C) 2020, Collabora Ltd. + * + * Based on: + * videobuf2-dma-contig.c - DMA contig memory allocator for videobuf2 + * Copyright (C) 2010 Samsung Electronics + */ + +#include <linux/device.h> +#include <linux/dma-buf.h> +#include <linux/dma-heap.h> +#include <linux/dma-mapping.h> +#include <linux/scatterlist.h> +#include <linux/slab.h> +#include <linux/module.h> + +struct dev_dmabuf_attachment { + struct sg_table sgt; + enum dma_data_direction dma_dir; +}; + +struct dev_dmabuf { + struct dma_heap *heap; + struct dma_buf *dmabuf; + struct device *dev; + size_t size; + void *vaddr; + dma_addr_t dma_addr; + unsigned long attrs; + + struct sg_table sgt; +}; + +static struct sg_table *dev_dmabuf_ops_map(struct dma_buf_attachment *db_attach, + enum dma_data_direction dma_dir) +{ + struct dev_dmabuf_attachment *attach = db_attach->priv; + /* stealing dmabuf mutex to serialize map/unmap operations */ + struct mutex *lock = &db_attach->dmabuf->lock; + struct sg_table *sgt; + + mutex_lock(lock); + + sgt = &attach->sgt; + /* return previously mapped sg table */ + if (attach->dma_dir == dma_dir) { + mutex_unlock(lock); + return sgt; + } + + /* release any previous cache */ + if (attach->dma_dir != DMA_NONE) { + dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, + attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC); + attach->dma_dir = DMA_NONE; + } + + /* + * mapping to the client with new direction, no cache sync + * required see comment in .dmabuf_ops_detach() + */ + sgt->nents = dma_map_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, + dma_dir, DMA_ATTR_SKIP_CPU_SYNC); + if (!sgt->nents) { + dev_err(db_attach->dev, "failed to map scatterlist\n"); + mutex_unlock(lock); + return ERR_PTR(-EIO); + } + + attach->dma_dir = dma_dir; + + mutex_unlock(lock); + + return sgt; +} + +static void dev_dmabuf_ops_unmap(struct dma_buf_attachment *db_attach, + struct sg_table *sgt, + enum dma_data_direction dma_dir) +{ + /* nothing to be done here */ +} + +static int dev_dmabuf_ops_attach(struct dma_buf *dmabuf, + struct dma_buf_attachment *dbuf_attach) +{ + struct dev_dmabuf_attachment *attach; + unsigned int i; + struct scatterlist *rd, *wr; + struct sg_table *sgt; + struct dev_dmabuf *buf = dmabuf->priv; + int ret; + + attach = kzalloc(sizeof(*attach), GFP_KERNEL); + if (!attach) + return -ENOMEM; + sgt = &attach->sgt; + + /* + * Copy the buf->sgt scatter list to the attachment, as we can't + * map the same scatter list to multiple attachments at the same time. + */ + ret = sg_alloc_table(sgt, buf->sgt.orig_nents, GFP_KERNEL); + if (ret) { + kfree(attach); + return -ENOMEM; + } + + rd = buf->sgt.sgl; + wr = sgt->sgl; + for (i = 0; i < sgt->orig_nents; ++i) { + sg_set_page(wr, sg_page(rd), rd->length, rd->offset); + rd = sg_next(rd); + wr = sg_next(wr); + } + + attach->dma_dir = DMA_NONE; + dbuf_attach->priv = attach; + + return 0; +} + +static void dev_dmabuf_ops_detach(struct dma_buf *dmabuf, + struct dma_buf_attachment *db_attach) +{ + struct dev_dmabuf_attachment *attach = db_attach->priv; + struct sg_table *sgt; + + if (!attach) + return; + sgt = &attach->sgt; + + /* release the scatterlist cache */ + if (attach->dma_dir != DMA_NONE) + /* + * Cache sync can be skipped here, as the memory is + * allocated from device coherent memory, which means the + * memory locations do not require any explicit cache + * maintenance prior or after being used by the device. + * + * XXX: This needs a revisit. + */ + dma_unmap_sg_attrs(db_attach->dev, sgt->sgl, sgt->orig_nents, + attach->dma_dir, DMA_ATTR_SKIP_CPU_SYNC); + sg_free_table(sgt); + kfree(attach); + db_attach->priv = NULL; +} + + +static void *dev_dmabuf_ops_vmap(struct dma_buf *dmabuf) +{ + struct dev_dmabuf *buf = dmabuf->priv; + + return buf->vaddr; +} + +static void dev_dmabuf_ops_release(struct dma_buf *dmabuf) +{ + struct dev_dmabuf *buf = dmabuf->priv; + + sg_free_table(&buf->sgt); + dma_free_attrs(buf->dev, buf->size, buf->vaddr, + buf->dma_addr, buf->attrs); + put_device(buf->dev); + kfree(buf); +} + +static int dev_dmabuf_ops_mmap(struct dma_buf *dmabuf, + struct vm_area_struct *vma) +{ + struct dev_dmabuf *buf = dmabuf->priv; + int ret; + + ret = dma_mmap_attrs(buf->dev, vma, buf->vaddr, + buf->dma_addr, buf->size, + buf->attrs); + if (ret) { + dev_err(buf->dev, "remapping memory failed, error: %d\n", ret); + return ret; + } + vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; + + return 0; +} + +static const struct dma_buf_ops dev_dmabuf_ops = { + .attach = dev_dmabuf_ops_attach, + .detach = dev_dmabuf_ops_detach, + .map_dma_buf = dev_dmabuf_ops_map, + .unmap_dma_buf = dev_dmabuf_ops_unmap, + .vmap = dev_dmabuf_ops_vmap, + .mmap = dev_dmabuf_ops_mmap, + .release = dev_dmabuf_ops_release, +}; + +static int dev_heap_allocate(struct dma_heap *heap, + unsigned long size, + unsigned long fd_flags, + unsigned long heap_flags) +{ + struct device *dev = dma_heap_get_drvdata(heap); + struct dev_dmabuf *buf; + struct dma_buf_export_info exp_info = {}; + unsigned long attrs = 0; + int ret = -ENOMEM; + + buf = kzalloc(sizeof(*buf), GFP_KERNEL); + if (!buf) + return -ENOMEM; + + buf->vaddr = dma_alloc_attrs(dev, size, &buf->dma_addr, + GFP_KERNEL, attrs); + /* Prevent the device from being released while the buffer is used */ + buf->dev = get_device(dev); + buf->heap = heap; + buf->size = size; + buf->attrs = attrs; + + /* XXX: This call is documented as unsafe. See dma_get_sgtable_attrs(). */ + ret = dma_get_sgtable_attrs(buf->dev, &buf->sgt, + buf->vaddr, buf->dma_addr, + buf->size, buf->attrs); + if (ret < 0) { + dev_err(buf->dev, "failed to get scatterlist from DMA API\n"); + return ret; + } + + exp_info.exp_name = dev_name(dev); + exp_info.owner = THIS_MODULE; + exp_info.ops = &dev_dmabuf_ops; + exp_info.size = size; + exp_info.flags = fd_flags; + exp_info.priv = buf; + + buf->dmabuf = dma_buf_export(&exp_info); + if (IS_ERR(buf->dmabuf)) { + dev_err(buf->dev, "failed to export dmabuf\n"); + return PTR_ERR(buf->dmabuf); + } + + ret = dma_buf_fd(buf->dmabuf, fd_flags); + if (ret < 0) { + dev_err(buf->dev, "failed to get dmabuf fd: %d\n", ret); + return ret; + } + + return ret; +} + +static const struct dma_heap_ops dev_heap_ops = { + .allocate = dev_heap_allocate, +}; + +void dev_dma_heap_add(struct device *dev) +{ + struct dma_heap_export_info exp_info; + + exp_info.name = dev_name(dev); + exp_info.ops = &dev_heap_ops; + exp_info.priv = dev; + + dev->heap = dma_heap_add(&exp_info); +} +EXPORT_SYMBOL(dev_dma_heap_add); diff --git a/include/linux/device.h b/include/linux/device.h index ca18da4768e3..1fae95d55ea1 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -45,6 +45,7 @@ struct iommu_ops; struct iommu_group; struct dev_pin_info; struct dev_iommu; +struct dma_heap; /** * struct subsys_interface - interfaces to device functions @@ -597,6 +598,10 @@ struct device { struct iommu_group *iommu_group; struct dev_iommu *iommu; +#ifdef CONFIG_DMABUF_HEAPS_DEVICES + struct dma_heap *heap; +#endif + bool offline_disabled:1; bool offline:1; bool of_node_reused:1; diff --git a/include/linux/dma-heap.h b/include/linux/dma-heap.h index 454e354d1ffb..dcf7cca2f487 100644 --- a/include/linux/dma-heap.h +++ b/include/linux/dma-heap.h @@ -56,4 +56,10 @@ void *dma_heap_get_drvdata(struct dma_heap *heap); */ struct dma_heap *dma_heap_add(const struct dma_heap_export_info *exp_info); +#ifdef CONFIG_DMABUF_HEAPS_DEVICES +void dev_dma_heap_add(struct device *dev); +#else +static inline void dev_dma_heap_add(struct device *dev) {} +#endif + #endif /* _DMA_HEAPS_H */