Message ID | 20240710135757.25786-1-liulei.rjpt@vivo.com (mailing list archive) |
---|---|
Headers | show |
Series | Support direct I/O read and write for memory allocated by dmabuf | expand |
Am 10.07.24 um 18:34 schrieb T.J. Mercier: > On Wed, Jul 10, 2024 at 8:08 AM Lei Liu <liulei.rjpt@vivo.com> wrote: >> on 2024/7/10 22:48, Christian König wrote: >>> Am 10.07.24 um 16:35 schrieb Lei Liu: >>>> on 2024/7/10 22:14, Christian König wrote: >>>>> Am 10.07.24 um 15:57 schrieb Lei Liu: >>>>>> Use vm_insert_page to establish a mapping for the memory allocated >>>>>> by dmabuf, thus supporting direct I/O read and write; and fix the >>>>>> issue of incorrect memory statistics after mapping dmabuf memory. >>>>> Well big NAK to that! Direct I/O is intentionally disabled on DMA-bufs. >>>> Hello! Could you explain why direct_io is disabled on DMABUF? Is >>>> there any historical reason for this? >>> It's basically one of the most fundamental design decision of DMA-Buf. >>> The attachment/map/fence model DMA-buf uses is not really compatible >>> with direct I/O on the underlying pages. >> Thank you! Is there any related documentation on this? I would like to >> understand and learn more about the fundamental reasons for the lack of >> support. > Hi Lei and Christian, > > This is now the third request I've seen from three different companies > who are interested in this, Yeah, completely agree. This is a re-occurring pattern :) Maybe we should document the preferred solution for that. > but the others are not for reasons of read > performance that you mention in the commit message on your first > patch. Someone else at Google ran a comparison between a normal read() > and a direct I/O read() into a preallocated user buffer and found that > with large readahead (16 MB) the throughput can actually be slightly > higher than direct I/O. If you have concerns about read performance, > have you tried increasing the readahead size? > > The other motivation is to load a gajillion byte file from disk into a > dmabuf without evicting the entire contents of pagecache while doing > so. Something like this (which does not currently work because read() > tries to GUP on the dmabuf memory as you mention): > > static int dmabuf_heap_alloc(int heap_fd, size_t len) > { > struct dma_heap_allocation_data data = { > .len = len, > .fd = 0, > .fd_flags = O_RDWR | O_CLOEXEC, > .heap_flags = 0, > }; > int ret = ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &data); > if (ret < 0) > return ret; > return data.fd; > } > > int main(int, char **argv) > { > const char *file_path = argv[1]; > printf("File: %s\n", file_path); > int file_fd = open(file_path, O_RDONLY | O_DIRECT); > > struct stat st; > stat(file_path, &st); > ssize_t file_size = st.st_size; > ssize_t aligned_size = (file_size + 4095) & ~4095; > > printf("File size: %zd Aligned size: %zd\n", file_size, aligned_size); > int heap_fd = open("/dev/dma_heap/system", O_RDONLY); > int dmabuf_fd = dmabuf_heap_alloc(heap_fd, aligned_size); > > void *vm = mmap(nullptr, aligned_size, PROT_READ | PROT_WRITE, > MAP_SHARED, dmabuf_fd, 0); > printf("VM at 0x%lx\n", (unsigned long)vm); > > dma_buf_sync sync_flags { DMA_BUF_SYNC_START | > DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE }; > ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags); > > ssize_t rc = read(file_fd, vm, file_size); > printf("Read: %zd %s\n", rc, rc < 0 ? strerror(errno) : ""); > > sync_flags.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ | > DMA_BUF_SYNC_WRITE; > ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags); > } > > Or replace the mmap() + read() with sendfile(). Or copy_file_range(). That's pretty much exactly what I suggested on the other mail thread around that topic as well. > So I would also like to see the above code (or something else similar) > be able to work and I understand some of the reasons why it currently > does not, but I don't understand why we should actively prevent this > type of behavior entirely. +1 Regards, Christian. > > Best, > T.J. > > > > > > > > >>>>> We already discussed enforcing that in the DMA-buf framework and >>>>> this patch probably means that we should really do that. >>>>> >>>>> Regards, >>>>> Christian. >>>> Thank you for your response. With the application of AI large model >>>> edgeification, we urgently need support for direct_io on DMABUF to >>>> read some very large files. Do you have any new solutions or plans >>>> for this? >>> We have seen similar projects over the years and all of those turned >>> out to be complete shipwrecks. >>> >>> There is currently a patch set under discussion to give the network >>> subsystem DMA-buf support. If you are interest in network direct I/O >>> that could help. >> Is there a related introduction link for this patch? >> >>> Additional to that a lot of GPU drivers support userptr usages, e.g. >>> to import malloced memory into the GPU driver. You can then also do >>> direct I/O on that malloced memory and the kernel will enforce correct >>> handling with the GPU driver through MMU notifiers. >>> >>> But as far as I know a general DMA-buf based solution isn't possible. >> 1.The reason we need to use DMABUF memory here is that we need to share >> memory between the CPU and APU. Currently, only DMABUF memory is >> suitable for this purpose. Additionally, we need to read very large files. >> >> 2. Are there any other solutions for this? Also, do you have any plans >> to support direct_io for DMABUF memory in the future? >> >>> Regards, >>> Christian. >>> >>>> Regards, >>>> Lei Liu. >>>> >>>>>> Lei Liu (2): >>>>>> mm: dmabuf_direct_io: Support direct_io for memory allocated by >>>>>> dmabuf >>>>>> mm: dmabuf_direct_io: Fix memory statistics error for dmabuf >>>>>> allocated >>>>>> memory with direct_io support >>>>>> >>>>>> drivers/dma-buf/heaps/system_heap.c | 5 +++-- >>>>>> fs/proc/task_mmu.c | 8 +++++++- >>>>>> include/linux/mm.h | 1 + >>>>>> mm/memory.c | 15 ++++++++++----- >>>>>> mm/rmap.c | 9 +++++---- >>>>>> 5 files changed, 26 insertions(+), 12 deletions(-) >>>>>>
On Wed, Jul 10, 2024 at 04:14:18PM +0200, Christian König wrote: > Am 10.07.24 um 15:57 schrieb Lei Liu: > > Use vm_insert_page to establish a mapping for the memory allocated > > by dmabuf, thus supporting direct I/O read and write; and fix the > > issue of incorrect memory statistics after mapping dmabuf memory. > > Well big NAK to that! Direct I/O is intentionally disabled on DMA-bufs. > > We already discussed enforcing that in the DMA-buf framework and this patch > probably means that we should really do that. Last time I looked dma_mmap doesn't guarantee that the vma end sup with VM_SPECIAL, and that's pretty much the only reason why we can't enforce this. But we might be able to enforce this at least on some architectures, I didn't check for that ... if at least x86-64 and arm64 could have the check, that would be great. So might be worth it to re-audit this all. I think all other dma-buf exporters/allocators do only create VM_SPECIAL vmas. -Sima
On 2024/7/11 22:25, Christian König wrote: > Am 10.07.24 um 18:34 schrieb T.J. Mercier: >> On Wed, Jul 10, 2024 at 8:08 AM Lei Liu <liulei.rjpt@vivo.com> wrote: >>> on 2024/7/10 22:48, Christian König wrote: >>>> Am 10.07.24 um 16:35 schrieb Lei Liu: >>>>> on 2024/7/10 22:14, Christian König wrote: >>>>>> Am 10.07.24 um 15:57 schrieb Lei Liu: >>>>>>> Use vm_insert_page to establish a mapping for the memory allocated >>>>>>> by dmabuf, thus supporting direct I/O read and write; and fix the >>>>>>> issue of incorrect memory statistics after mapping dmabuf memory. >>>>>> Well big NAK to that! Direct I/O is intentionally disabled on >>>>>> DMA-bufs. >>>>> Hello! Could you explain why direct_io is disabled on DMABUF? Is >>>>> there any historical reason for this? >>>> It's basically one of the most fundamental design decision of DMA-Buf. >>>> The attachment/map/fence model DMA-buf uses is not really compatible >>>> with direct I/O on the underlying pages. >>> Thank you! Is there any related documentation on this? I would like to >>> understand and learn more about the fundamental reasons for the lack of >>> support. >> Hi Lei and Christian, >> >> This is now the third request I've seen from three different companies >> who are interested in this, > > Yeah, completely agree. This is a re-occurring pattern :) > > Maybe we should document the preferred solution for that. > >> but the others are not for reasons of read >> performance that you mention in the commit message on your first >> patch. Someone else at Google ran a comparison between a normal read() >> and a direct I/O read() into a preallocated user buffer and found that >> with large readahead (16 MB) the throughput can actually be slightly >> higher than direct I/O. If you have concerns about read performance, >> have you tried increasing the readahead size? >> >> The other motivation is to load a gajillion byte file from disk into a >> dmabuf without evicting the entire contents of pagecache while doing >> so. Something like this (which does not currently work because read() >> tries to GUP on the dmabuf memory as you mention): >> >> static int dmabuf_heap_alloc(int heap_fd, size_t len) >> { >> struct dma_heap_allocation_data data = { >> .len = len, >> .fd = 0, >> .fd_flags = O_RDWR | O_CLOEXEC, >> .heap_flags = 0, >> }; >> int ret = ioctl(heap_fd, DMA_HEAP_IOCTL_ALLOC, &data); >> if (ret < 0) >> return ret; >> return data.fd; >> } >> >> int main(int, char **argv) >> { >> const char *file_path = argv[1]; >> printf("File: %s\n", file_path); >> int file_fd = open(file_path, O_RDONLY | O_DIRECT); >> >> struct stat st; >> stat(file_path, &st); >> ssize_t file_size = st.st_size; >> ssize_t aligned_size = (file_size + 4095) & ~4095; >> >> printf("File size: %zd Aligned size: %zd\n", file_size, >> aligned_size); >> int heap_fd = open("/dev/dma_heap/system", O_RDONLY); >> int dmabuf_fd = dmabuf_heap_alloc(heap_fd, aligned_size); >> >> void *vm = mmap(nullptr, aligned_size, PROT_READ | PROT_WRITE, >> MAP_SHARED, dmabuf_fd, 0); >> printf("VM at 0x%lx\n", (unsigned long)vm); >> >> dma_buf_sync sync_flags { DMA_BUF_SYNC_START | >> DMA_BUF_SYNC_READ | DMA_BUF_SYNC_WRITE }; >> ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags); >> >> ssize_t rc = read(file_fd, vm, file_size); >> printf("Read: %zd %s\n", rc, rc < 0 ? strerror(errno) : ""); >> >> sync_flags.flags = DMA_BUF_SYNC_END | DMA_BUF_SYNC_READ | >> DMA_BUF_SYNC_WRITE; >> ioctl(dmabuf_fd, DMA_BUF_IOCTL_SYNC, &sync_flags); >> } >> >> Or replace the mmap() + read() with sendfile(). > > Or copy_file_range(). That's pretty much exactly what I suggested on > the other mail thread around that topic as well. Thank you for your suggestion. I will study the method you suggested with Yang. Using copy_file_range() might be a good solution approach. Regards, Lei Liu. > >> So I would also like to see the above code (or something else similar) >> be able to work and I understand some of the reasons why it currently >> does not, but I don't understand why we should actively prevent this >> type of behavior entirely. > > +1 > > Regards, > Christian. > >> >> Best, >> T.J. >> >> >> >> >> >> >> >> >>>>>> We already discussed enforcing that in the DMA-buf framework and >>>>>> this patch probably means that we should really do that. >>>>>> >>>>>> Regards, >>>>>> Christian. >>>>> Thank you for your response. With the application of AI large model >>>>> edgeification, we urgently need support for direct_io on DMABUF to >>>>> read some very large files. Do you have any new solutions or plans >>>>> for this? >>>> We have seen similar projects over the years and all of those turned >>>> out to be complete shipwrecks. >>>> >>>> There is currently a patch set under discussion to give the network >>>> subsystem DMA-buf support. If you are interest in network direct I/O >>>> that could help. >>> Is there a related introduction link for this patch? >>> >>>> Additional to that a lot of GPU drivers support userptr usages, e.g. >>>> to import malloced memory into the GPU driver. You can then also do >>>> direct I/O on that malloced memory and the kernel will enforce correct >>>> handling with the GPU driver through MMU notifiers. >>>> >>>> But as far as I know a general DMA-buf based solution isn't possible. >>> 1.The reason we need to use DMABUF memory here is that we need to share >>> memory between the CPU and APU. Currently, only DMABUF memory is >>> suitable for this purpose. Additionally, we need to read very large >>> files. >>> >>> 2. Are there any other solutions for this? Also, do you have any plans >>> to support direct_io for DMABUF memory in the future? >>> >>>> Regards, >>>> Christian. >>>> >>>>> Regards, >>>>> Lei Liu. >>>>> >>>>>>> Lei Liu (2): >>>>>>> mm: dmabuf_direct_io: Support direct_io for memory allocated by >>>>>>> dmabuf >>>>>>> mm: dmabuf_direct_io: Fix memory statistics error for dmabuf >>>>>>> allocated >>>>>>> memory with direct_io support >>>>>>> >>>>>>> drivers/dma-buf/heaps/system_heap.c | 5 +++-- >>>>>>> fs/proc/task_mmu.c | 8 +++++++- >>>>>>> include/linux/mm.h | 1 + >>>>>>> mm/memory.c | 15 ++++++++++----- >>>>>>> mm/rmap.c | 9 +++++---- >>>>>>> 5 files changed, 26 insertions(+), 12 deletions(-) >>>>>>> >