Message ID | 20240117181141.286383-1-tjmercier@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | dma-buf: heaps: Don't track CMA dma-buf pages under RssFile | expand |
Am 17.01.24 um 19:11 schrieb T.J. Mercier: > DMA buffers allocated from the CMA dma-buf heap get counted under > RssFile for processes that map them and trigger page faults. In > addition to the incorrect accounting reported to userspace, reclaim > behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but > this memory is not reclaimable. [1] Change the CMA dma-buf heap to set > VM_PFNMAP on the VMA so MM does not poke at the memory managed by this > dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting. > > The system dma-buf heap does not suffer from this issue since > remap_pfn_range is used during the mmap of the buffer, which also sets > VM_PFNMAP on the VMA. Mhm, not an issue with this patch but Daniel wanted to add a check for VM_PFNMAP to dma_buf_mmap() which would have noted this earlier. I don't fully remember the discussion but for some reason that was never committed. We should probably try that again. > [1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede > > Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps") > Signed-off-by: T.J. Mercier<tjmercier@google.com> Acked-by: Christian König <christian.koenig@amd.com> > --- > drivers/dma-buf/heaps/cma_heap.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c > index ee899f8e6721..4a63567e93ba 100644 > --- a/drivers/dma-buf/heaps/cma_heap.c > +++ b/drivers/dma-buf/heaps/cma_heap.c > @@ -168,10 +168,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf) > if (vmf->pgoff > buffer->pagecount) > return VM_FAULT_SIGBUS; > > - vmf->page = buffer->pages[vmf->pgoff]; > - get_page(vmf->page); > - > - return 0; > + return vmf_insert_pfn(vma, vmf->address, page_to_pfn(buffer->pages[vmf->pgoff])); > } > > static const struct vm_operations_struct dma_heap_vm_ops = { > @@ -185,6 +182,8 @@ static int cma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma) > if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0) > return -EINVAL; > > + vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); > + > vma->vm_ops = &dma_heap_vm_ops; > vma->vm_private_data = buffer; >
On Thu, Jan 18, 2024 at 11:02:22AM +0100, Christian König wrote: > Am 17.01.24 um 19:11 schrieb T.J. Mercier: > > DMA buffers allocated from the CMA dma-buf heap get counted under > > RssFile for processes that map them and trigger page faults. In > > addition to the incorrect accounting reported to userspace, reclaim > > behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but > > this memory is not reclaimable. [1] Change the CMA dma-buf heap to set > > VM_PFNMAP on the VMA so MM does not poke at the memory managed by this > > dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting. > > > > The system dma-buf heap does not suffer from this issue since > > remap_pfn_range is used during the mmap of the buffer, which also sets > > VM_PFNMAP on the VMA. > > Mhm, not an issue with this patch but Daniel wanted to add a check for > VM_PFNMAP to dma_buf_mmap() which would have noted this earlier. > > I don't fully remember the discussion but for some reason that was never > committed. We should probably try that again. Iirc the issue is that dma_mmap is not guaranteed to give you a VM_SPECIAL mapping, at least on absolutely all architectures. That's why I defacto dropped that idea, but it would indeed be really great if we could resurrect it. Maybe for x86 only? Or x86+armv8, I'm honestly not sure anymore which exact cases ended up with a VM_NORMAL mapping ... Would need a pile of digging. I think all the other patches I've landed. -Sima > > > [1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede > > > > Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps") > > Signed-off-by: T.J. Mercier<tjmercier@google.com> > > Acked-by: Christian König <christian.koenig@amd.com> > > > --- > > drivers/dma-buf/heaps/cma_heap.c | 7 +++---- > > 1 file changed, 3 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c > > index ee899f8e6721..4a63567e93ba 100644 > > --- a/drivers/dma-buf/heaps/cma_heap.c > > +++ b/drivers/dma-buf/heaps/cma_heap.c > > @@ -168,10 +168,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf) > > if (vmf->pgoff > buffer->pagecount) > > return VM_FAULT_SIGBUS; > > - vmf->page = buffer->pages[vmf->pgoff]; > > - get_page(vmf->page); > > - > > - return 0; > > + return vmf_insert_pfn(vma, vmf->address, page_to_pfn(buffer->pages[vmf->pgoff])); > > } > > static const struct vm_operations_struct dma_heap_vm_ops = { > > @@ -185,6 +182,8 @@ static int cma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma) > > if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0) > > return -EINVAL; > > + vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); > > + > > vma->vm_ops = &dma_heap_vm_ops; > > vma->vm_private_data = buffer;
On Thu, Jan 18, 2024 at 6:49 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Thu, Jan 18, 2024 at 11:02:22AM +0100, Christian König wrote: > > Am 17.01.24 um 19:11 schrieb T.J. Mercier: > > > DMA buffers allocated from the CMA dma-buf heap get counted under > > > RssFile for processes that map them and trigger page faults. In > > > addition to the incorrect accounting reported to userspace, reclaim > > > behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but > > > this memory is not reclaimable. [1] Change the CMA dma-buf heap to set > > > VM_PFNMAP on the VMA so MM does not poke at the memory managed by this > > > dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting. > > > > > > The system dma-buf heap does not suffer from this issue since > > > remap_pfn_range is used during the mmap of the buffer, which also sets > > > VM_PFNMAP on the VMA. > > > > Mhm, not an issue with this patch but Daniel wanted to add a check for > > VM_PFNMAP to dma_buf_mmap() which would have noted this earlier. > > > > I don't fully remember the discussion but for some reason that was never > > committed. We should probably try that again. > > Iirc the issue is that dma_mmap is not guaranteed to give you a VM_SPECIAL > mapping, at least on absolutely all architectures. That's why I defacto > dropped that idea, but it would indeed be really great if we could > resurrect it. I actually had it in my head that it was a BUG_ON check for VM_PFNMAP in dma_buf_mmap and it was merged, so I was surprised to discover that it wasn't set for these CMA buffers. > Maybe for x86 only? Or x86+armv8, I'm honestly not sure anymore which > exact cases ended up with a VM_NORMAL mapping ... Would need a pile of > digging. Looking back at the patch, the CI email at the end of the thread lists a bunch of now-broken links to DMESG-WARN test failures I assume pointed at a large chunk of them. https://lore.kernel.org/all/166919750173.15575.2864736980735346730@emeril.freedesktop.org/ > > > > > [1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede > > > > > > Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps") > > > Signed-off-by: T.J. Mercier<tjmercier@google.com> > > > > Acked-by: Christian König <christian.koenig@amd.com> Thanks Christian.
On Thu, Jan 18, 2024 at 08:57:16AM -0800, T.J. Mercier wrote: > On Thu, Jan 18, 2024 at 6:49 AM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > On Thu, Jan 18, 2024 at 11:02:22AM +0100, Christian König wrote: > > > Am 17.01.24 um 19:11 schrieb T.J. Mercier: > > > > DMA buffers allocated from the CMA dma-buf heap get counted under > > > > RssFile for processes that map them and trigger page faults. In > > > > addition to the incorrect accounting reported to userspace, reclaim > > > > behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but > > > > this memory is not reclaimable. [1] Change the CMA dma-buf heap to set > > > > VM_PFNMAP on the VMA so MM does not poke at the memory managed by this > > > > dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting. > > > > > > > > The system dma-buf heap does not suffer from this issue since > > > > remap_pfn_range is used during the mmap of the buffer, which also sets > > > > VM_PFNMAP on the VMA. > > > > > > Mhm, not an issue with this patch but Daniel wanted to add a check for > > > VM_PFNMAP to dma_buf_mmap() which would have noted this earlier. > > > > > > I don't fully remember the discussion but for some reason that was never > > > committed. We should probably try that again. > > > > Iirc the issue is that dma_mmap is not guaranteed to give you a VM_SPECIAL > > mapping, at least on absolutely all architectures. That's why I defacto > > dropped that idea, but it would indeed be really great if we could > > resurrect it. > > I actually had it in my head that it was a BUG_ON check for VM_PFNMAP > in dma_buf_mmap and it was merged, so I was surprised to discover that > it wasn't set for these CMA buffers. > > > Maybe for x86 only? Or x86+armv8, I'm honestly not sure anymore which > > exact cases ended up with a VM_NORMAL mapping ... Would need a pile of > > digging. > > Looking back at the patch, the CI email at the end of the thread lists > a bunch of now-broken links to DMESG-WARN test failures I assume > pointed at a large chunk of them. > > https://lore.kernel.org/all/166919750173.15575.2864736980735346730@emeril.freedesktop.org/ I thought there was a more recent submission, where I at least fixed the various fallout in gem code. But maybe I only dreamed ... Also I did the code grepping again, and at least iommu_dma_mmap() in drivers/iommu/dma-iommu.c and arm_iommu_mmap_attrs() for arm use vm_map_pages in certain cases, which is _not_ VM_PFNMAP. Means really no cases where I think we can assume we'll always get VM_PFNMAP, and unfortunately we need VM_PFNMAP or VM_IO to prevent get_user_pages and similar bad things from happening to dma-buf mmaps. So still no luck :-/ -Sima > > > > > > > > [1]https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede > > > > > > > > Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps") > > > > Signed-off-by: T.J. Mercier<tjmercier@google.com> > > > > > > Acked-by: Christian König <christian.koenig@amd.com> > > Thanks Christian.
Hi TJ, On Thu, 18 Jan 2024 at 15:32, Christian König <christian.koenig@amd.com> wrote: > Am 17.01.24 um 19:11 schrieb T.J. Mercier: > > DMA buffers allocated from the CMA dma-buf heap get counted under > RssFile for processes that map them and trigger page faults. In > addition to the incorrect accounting reported to userspace, reclaim > behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but > this memory is not reclaimable. [1] Change the CMA dma-buf heap to set > VM_PFNMAP on the VMA so MM does not poke at the memory managed by this > dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting. > > The system dma-buf heap does not suffer from this issue since > remap_pfn_range is used during the mmap of the buffer, which also sets > VM_PFNMAP on the VMA. > > > Mhm, not an issue with this patch but Daniel wanted to add a check for > VM_PFNMAP to dma_buf_mmap() which would have noted this earlier. > > I don't fully remember the discussion but for some reason that was never > committed. We should probably try that again. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede > > Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps") > Signed-off-by: T.J. Mercier <tjmercier@google.com> <tjmercier@google.com> > > > Acked-by: Christian König <christian.koenig@amd.com> > <christian.koenig@amd.com> > Thanks for the patch; pushed to drm-misc-fixes. Best, Sumit > > > --- > drivers/dma-buf/heaps/cma_heap.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c > index ee899f8e6721..4a63567e93ba 100644 > --- a/drivers/dma-buf/heaps/cma_heap.c > +++ b/drivers/dma-buf/heaps/cma_heap.c > @@ -168,10 +168,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf) > if (vmf->pgoff > buffer->pagecount) > return VM_FAULT_SIGBUS; > > - vmf->page = buffer->pages[vmf->pgoff]; > - get_page(vmf->page); > - > - return 0; > + return vmf_insert_pfn(vma, vmf->address, page_to_pfn(buffer->pages[vmf->pgoff])); > } > > static const struct vm_operations_struct dma_heap_vm_ops = { > @@ -185,6 +182,8 @@ static int cma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma) > if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0) > return -EINVAL; > > + vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); > + > vma->vm_ops = &dma_heap_vm_ops; > vma->vm_private_data = buffer; > > > >
diff --git a/drivers/dma-buf/heaps/cma_heap.c b/drivers/dma-buf/heaps/cma_heap.c index ee899f8e6721..4a63567e93ba 100644 --- a/drivers/dma-buf/heaps/cma_heap.c +++ b/drivers/dma-buf/heaps/cma_heap.c @@ -168,10 +168,7 @@ static vm_fault_t cma_heap_vm_fault(struct vm_fault *vmf) if (vmf->pgoff > buffer->pagecount) return VM_FAULT_SIGBUS; - vmf->page = buffer->pages[vmf->pgoff]; - get_page(vmf->page); - - return 0; + return vmf_insert_pfn(vma, vmf->address, page_to_pfn(buffer->pages[vmf->pgoff])); } static const struct vm_operations_struct dma_heap_vm_ops = { @@ -185,6 +182,8 @@ static int cma_heap_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma) if ((vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) == 0) return -EINVAL; + vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); + vma->vm_ops = &dma_heap_vm_ops; vma->vm_private_data = buffer;
DMA buffers allocated from the CMA dma-buf heap get counted under RssFile for processes that map them and trigger page faults. In addition to the incorrect accounting reported to userspace, reclaim behavior was influenced by the MM_FILEPAGES counter until linux 6.8, but this memory is not reclaimable. [1] Change the CMA dma-buf heap to set VM_PFNMAP on the VMA so MM does not poke at the memory managed by this dma-buf heap, and use vmf_insert_pfn to correct the RSS accounting. The system dma-buf heap does not suffer from this issue since remap_pfn_range is used during the mmap of the buffer, which also sets VM_PFNMAP on the VMA. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/mm/vmscan.c?id=fb46e22a9e3863e08aef8815df9f17d0f4b9aede Fixes: b61614ec318a ("dma-buf: heaps: Add CMA heap to dmabuf heaps") Signed-off-by: T.J. Mercier <tjmercier@google.com> --- drivers/dma-buf/heaps/cma_heap.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)