mbox series

[0/2] Support direct I/O read and write for memory allocated by dmabuf

Message ID 20240710135757.25786-1-liulei.rjpt@vivo.com (mailing list archive)
Headers show
Series Support direct I/O read and write for memory allocated by dmabuf | expand

Message

Lei Liu July 10, 2024, 1:57 p.m. UTC
Use vm_insert_page to establish a mapping for the memory allocated
by dmabuf, thus supporting direct I/O read and write; and fix the
issue of incorrect memory statistics after mapping dmabuf memory.

Lei Liu (2):
  mm: dmabuf_direct_io: Support direct_io for memory allocated by dmabuf
  mm: dmabuf_direct_io: Fix memory statistics error for dmabuf allocated
    memory with direct_io support

 drivers/dma-buf/heaps/system_heap.c |  5 +++--
 fs/proc/task_mmu.c                  |  8 +++++++-
 include/linux/mm.h                  |  1 +
 mm/memory.c                         | 15 ++++++++++-----
 mm/rmap.c                           |  9 +++++----
 5 files changed, 26 insertions(+), 12 deletions(-)

Comments

Christian König July 11, 2024, 2:25 p.m. UTC | #1
Am 10.07.24 um 18:34 schrieb T.J. Mercier:
> On Wed, Jul 10, 2024 at 8:08 AM Lei Liu <liulei.rjpt@vivo.com> wrote:
>> on 2024/7/10 22:48, Christian König wrote:
>>> Am 10.07.24 um 16:35 schrieb Lei Liu:
>>>> on 2024/7/10 22:14, Christian König wrote:
>>>>> Am 10.07.24 um 15:57 schrieb Lei Liu:
>>>>>> Use vm_insert_page to establish a mapping for the memory allocated
>>>>>> by dmabuf, thus supporting direct I/O read and write; and fix the
>>>>>> issue of incorrect memory statistics after mapping dmabuf memory.
>>>>> Well big NAK to that! Direct I/O is intentionally disabled on DMA-bufs.
>>>> Hello! Could you explain why direct_io is disabled on DMABUF? Is
>>>> there any historical reason for this?
>>> It's basically one of the most fundamental design decision of DMA-Buf.
>>> The attachment/map/fence model DMA-buf uses is not really compatible
>>> with direct I/O on the underlying pages.
>> Thank you! Is there any related documentation on this? I would like to
>> understand and learn more about the fundamental reasons for the lack of
>> support.
> Hi Lei and Christian,
>
> This is now the third request I've seen from three different companies
> who are interested in this,

Yeah, completely agree. This is a re-occurring pattern :)

Maybe we should document the preferred solution for that.

> but the others are not for reasons of read
> performance that you mention in the commit message on your first
> patch. Someone else at Google ran a comparison between a normal read()
> and a direct I/O read() into a preallocated user buffer and found that
> with large readahead (16 MB) the throughput can actually be slightly
> higher than direct I/O. If you have concerns about read performance,
> have you tried increasing the readahead size?
>
> The other motivation is to load a gajillion byte file from disk into a
> dmabuf without evicting the entire contents of pagecache while doing
> so. Something like this (which does not currently work because read()
> tries to GUP on the dmabuf memory as you mention):
>
> static int dmabuf_heap_alloc(int heap_fd, size_t len)
> {
>      struct dma_heap_allocation_data data = {
>          .len = len,
>          .fd = 0,
>          .fd_flags = O_RDWR | O_CLOEXEC,
>          .heap_flags = 0,
>      };
>      int ret = ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &data);
>      if (ret < 0)
>          return ret;
>      return data.fd;
> }
>
> int main(int, char **argv)
> {
>          const char *file_path = argv[1];
>          printf("File: %s\n", file_path);
>          int file_fd = open(file_path, O_RDONLY | O_DIRECT);
>
>          struct stat st;
>          stat(file_path, &st);
>          ssize_t file_size = st.st_size;
>          ssize_t aligned_size = (file_size + 4095) & ~4095;
>
>          printf("File size: %zd Aligned size: %zd\n", file_size, aligned_size);
>          int heap_fd = open("/dev/dma_heap/system", O_RDONLY);
>          int dmabuf_fd = dmabuf_heap_alloc(heap_fd, aligned_size);
>
>          void *vm = mmap(nullptr, aligned_size, PROT_READ | PROT_WRITE,
> MAP_SHARED, dmabuf_fd, 0);
>          printf("VM at 0x%lx\n", (unsigned long)vm);
>
>          dma_buf_sync sync_flags { DMA_BUF_SYNC_START |
> DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE };
>          ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags);
>
>          ssize_t rc = read(file_fd, vm, file_size);
>          printf("Read: %zd %s\n", rc, rc < 0 ? strerror(errno) : "");
>
>          sync_flags.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ |
> DMA_BUF_SYNC_WRITE;
>          ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags);
> }
>
> Or replace the mmap() + read() with sendfile().

Or copy_file_range(). That's pretty much exactly what I suggested on the 
other mail thread around that topic as well.

> So I would also like to see the above code (or something else similar)
> be able to work and I understand some of the reasons why it currently
> does not, but I don't understand why we should actively prevent this
> type of behavior entirely.

+1

Regards,
Christian.

>
> Best,
> T.J.
>
>
>
>
>
>
>
>
>>>>> We already discussed enforcing that in the DMA-buf framework and
>>>>> this patch probably means that we should really do that.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>> Thank you for your response. With the application of AI large model
>>>> edgeification, we urgently need support for direct_io on DMABUF to
>>>> read some very large files. Do you have any new solutions or plans
>>>> for this?
>>> We have seen similar projects over the years and all of those turned
>>> out to be complete shipwrecks.
>>>
>>> There is currently a patch set under discussion to give the network
>>> subsystem DMA-buf support. If you are interest in network direct I/O
>>> that could help.
>> Is there a related introduction link for this patch?
>>
>>> Additional to that a lot of GPU drivers support userptr usages, e.g.
>>> to import malloced memory into the GPU driver. You can then also do
>>> direct I/O on that malloced memory and the kernel will enforce correct
>>> handling with the GPU driver through MMU notifiers.
>>>
>>> But as far as I know a general DMA-buf based solution isn't possible.
>> 1.The reason we need to use DMABUF memory here is that we need to share
>> memory between the CPU and APU. Currently, only DMABUF memory is
>> suitable for this purpose. Additionally, we need to read very large files.
>>
>> 2. Are there any other solutions for this? Also, do you have any plans
>> to support direct_io for DMABUF memory in the future?
>>
>>> Regards,
>>> Christian.
>>>
>>>> Regards,
>>>> Lei Liu.
>>>>
>>>>>> Lei Liu (2):
>>>>>>     mm: dmabuf_direct_io: Support direct_io for memory allocated by
>>>>>> dmabuf
>>>>>>     mm: dmabuf_direct_io: Fix memory statistics error for dmabuf
>>>>>> allocated
>>>>>>       memory with direct_io support
>>>>>>
>>>>>>    drivers/dma-buf/heaps/system_heap.c |  5 +++--
>>>>>>    fs/proc/task_mmu.c                  |  8 +++++++-
>>>>>>    include/linux/mm.h                  |  1 +
>>>>>>    mm/memory.c                         | 15 ++++++++++-----
>>>>>>    mm/rmap.c                           |  9 +++++----
>>>>>>    5 files changed, 26 insertions(+), 12 deletions(-)
>>>>>>
Daniel Vetter July 15, 2024, 8:52 a.m. UTC | #2
On Wed, Jul 10, 2024 at 04:14:18PM +0200, Christian König wrote:
> Am 10.07.24 um 15:57 schrieb Lei Liu:
> > Use vm_insert_page to establish a mapping for the memory allocated
> > by dmabuf, thus supporting direct I/O read and write; and fix the
> > issue of incorrect memory statistics after mapping dmabuf memory.
> 
> Well big NAK to that! Direct I/O is intentionally disabled on DMA-bufs.
> 
> We already discussed enforcing that in the DMA-buf framework and this patch
> probably means that we should really do that.

Last time I looked dma_mmap doesn't guarantee that the vma end sup with
VM_SPECIAL, and that's pretty much the only reason why we can't enforce
this. But we might be able to enforce this at least on some architectures,
I didn't check for that ... if at least x86-64 and arm64 could have the
check, that would be great. So might be worth it to re-audit this all.

I think all other dma-buf exporters/allocators do only create VM_SPECIAL
vmas.
-Sima
Lei Liu July 15, 2024, 9:07 a.m. UTC | #3
On 2024/7/11 22:25, Christian König wrote:
> Am 10.07.24 um 18:34 schrieb T.J. Mercier:
>> On Wed, Jul 10, 2024 at 8:08 AM Lei Liu <liulei.rjpt@vivo.com> wrote:
>>> on 2024/7/10 22:48, Christian König wrote:
>>>> Am 10.07.24 um 16:35 schrieb Lei Liu:
>>>>> on 2024/7/10 22:14, Christian König wrote:
>>>>>> Am 10.07.24 um 15:57 schrieb Lei Liu:
>>>>>>> Use vm_insert_page to establish a mapping for the memory allocated
>>>>>>> by dmabuf, thus supporting direct I/O read and write; and fix the
>>>>>>> issue of incorrect memory statistics after mapping dmabuf memory.
>>>>>> Well big NAK to that! Direct I/O is intentionally disabled on 
>>>>>> DMA-bufs.
>>>>> Hello! Could you explain why direct_io is disabled on DMABUF? Is
>>>>> there any historical reason for this?
>>>> It's basically one of the most fundamental design decision of DMA-Buf.
>>>> The attachment/map/fence model DMA-buf uses is not really compatible
>>>> with direct I/O on the underlying pages.
>>> Thank you! Is there any related documentation on this? I would like to
>>> understand and learn more about the fundamental reasons for the lack of
>>> support.
>> Hi Lei and Christian,
>>
>> This is now the third request I've seen from three different companies
>> who are interested in this,
>
> Yeah, completely agree. This is a re-occurring pattern :)
>
> Maybe we should document the preferred solution for that.
>
>> but the others are not for reasons of read
>> performance that you mention in the commit message on your first
>> patch. Someone else at Google ran a comparison between a normal read()
>> and a direct I/O read() into a preallocated user buffer and found that
>> with large readahead (16 MB) the throughput can actually be slightly
>> higher than direct I/O. If you have concerns about read performance,
>> have you tried increasing the readahead size?
>>
>> The other motivation is to load a gajillion byte file from disk into a
>> dmabuf without evicting the entire contents of pagecache while doing
>> so. Something like this (which does not currently work because read()
>> tries to GUP on the dmabuf memory as you mention):
>>
>> static int dmabuf_heap_alloc(int heap_fd, size_t len)
>> {
>>      struct dma_heap_allocation_data data = {
>>          .len = len,
>>          .fd = 0,
>>          .fd_flags = O_RDWR | O_CLOEXEC,
>>          .heap_flags = 0,
>>      };
>>      int ret = ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &data);
>>      if (ret < 0)
>>          return ret;
>>      return data.fd;
>> }
>>
>> int main(int, char **argv)
>> {
>>          const char *file_path = argv[1];
>>          printf("File: %s\n", file_path);
>>          int file_fd = open(file_path, O_RDONLY | O_DIRECT);
>>
>>          struct stat st;
>>          stat(file_path, &st);
>>          ssize_t file_size = st.st_size;
>>          ssize_t aligned_size = (file_size + 4095) & ~4095;
>>
>>          printf("File size: %zd Aligned size: %zd\n", file_size, 
>> aligned_size);
>>          int heap_fd = open("/dev/dma_heap/system", O_RDONLY);
>>          int dmabuf_fd = dmabuf_heap_alloc(heap_fd, aligned_size);
>>
>>          void *vm = mmap(nullptr, aligned_size, PROT_READ | PROT_WRITE,
>> MAP_SHARED, dmabuf_fd, 0);
>>          printf("VM at 0x%lx\n", (unsigned long)vm);
>>
>>          dma_buf_sync sync_flags { DMA_BUF_SYNC_START |
>> DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE };
>>          ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags);
>>
>>          ssize_t rc = read(file_fd, vm, file_size);
>>          printf("Read: %zd %s\n", rc, rc < 0 ? strerror(errno) : "");
>>
>>          sync_flags.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ |
>> DMA_BUF_SYNC_WRITE;
>>          ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags);
>> }
>>
>> Or replace the mmap() + read() with sendfile().
>
> Or copy_file_range(). That's pretty much exactly what I suggested on 
> the other mail thread around that topic as well.

Thank you for your suggestion. I will study the method you suggested 
with Yang. Using copy_file_range() might be a good solution approach.

Regards,
Lei Liu.

>
>> So I would also like to see the above code (or something else similar)
>> be able to work and I understand some of the reasons why it currently
>> does not, but I don't understand why we should actively prevent this
>> type of behavior entirely.
>
> +1
>
> Regards,
> Christian.
>
>>
>> Best,
>> T.J.
>>
>>
>>
>>
>>
>>
>>
>>
>>>>>> We already discussed enforcing that in the DMA-buf framework and
>>>>>> this patch probably means that we should really do that.
>>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>> Thank you for your response. With the application of AI large model
>>>>> edgeification, we urgently need support for direct_io on DMABUF to
>>>>> read some very large files. Do you have any new solutions or plans
>>>>> for this?
>>>> We have seen similar projects over the years and all of those turned
>>>> out to be complete shipwrecks.
>>>>
>>>> There is currently a patch set under discussion to give the network
>>>> subsystem DMA-buf support. If you are interest in network direct I/O
>>>> that could help.
>>> Is there a related introduction link for this patch?
>>>
>>>> Additional to that a lot of GPU drivers support userptr usages, e.g.
>>>> to import malloced memory into the GPU driver. You can then also do
>>>> direct I/O on that malloced memory and the kernel will enforce correct
>>>> handling with the GPU driver through MMU notifiers.
>>>>
>>>> But as far as I know a general DMA-buf based solution isn't possible.
>>> 1.The reason we need to use DMABUF memory here is that we need to share
>>> memory between the CPU and APU. Currently, only DMABUF memory is
>>> suitable for this purpose. Additionally, we need to read very large 
>>> files.
>>>
>>> 2. Are there any other solutions for this? Also, do you have any plans
>>> to support direct_io for DMABUF memory in the future?
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> Regards,
>>>>> Lei Liu.
>>>>>
>>>>>>> Lei Liu (2):
>>>>>>>     mm: dmabuf_direct_io: Support direct_io for memory allocated by
>>>>>>> dmabuf
>>>>>>>     mm: dmabuf_direct_io: Fix memory statistics error for dmabuf
>>>>>>> allocated
>>>>>>>       memory with direct_io support
>>>>>>>
>>>>>>>    drivers/dma-buf/heaps/system_heap.c |  5 +++--
>>>>>>>    fs/proc/task_mmu.c                  |  8 +++++++-
>>>>>>>    include/linux/mm.h                  |  1 +
>>>>>>>    mm/memory.c                         | 15 ++++++++++-----
>>>>>>>    mm/rmap.c                           |  9 +++++----
>>>>>>>    5 files changed, 26 insertions(+), 12 deletions(-)
>>>>>>>
>