Message ID | 20210203003134.2422308-1-surenb@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] mm: replace BUG_ON in vm_insert_page with a return of an error | expand |
Hi: On 2021/2/3 8:31, Suren Baghdasaryan wrote: > Replace BUG_ON(vma->vm_flags & VM_PFNMAP) in vm_insert_page with > WARN_ON_ONCE and returning an error. This is to ensure users of the > vm_insert_page that set VM_PFNMAP are notified of the wrong flag usage > and get an indication of an error without panicing the kernel. > This will help identifying drivers that need to clear VM_PFNMAP before > using dmabuf system heap which is moving to use vm_insert_page. > > Suggested-by: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> Looks reasonable. Thanks. Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> > --- > mm/memory.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index feff48e1465a..e503c9801cd9 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1827,7 +1827,8 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, > return -EINVAL; > if (!(vma->vm_flags & VM_MIXEDMAP)) { > BUG_ON(mmap_read_trylock(vma->vm_mm)); > - BUG_ON(vma->vm_flags & VM_PFNMAP); > + if (WARN_ON_ONCE(vma->vm_flags & VM_PFNMAP)) > + return -EINVAL; > vma->vm_flags |= VM_MIXEDMAP; > } > return insert_page(vma, addr, page, vma->vm_page_prot); >
On Tue, Feb 02, 2021 at 04:31:33PM -0800, Suren Baghdasaryan wrote: > Replace BUG_ON(vma->vm_flags & VM_PFNMAP) in vm_insert_page with > WARN_ON_ONCE and returning an error. This is to ensure users of the > vm_insert_page that set VM_PFNMAP are notified of the wrong flag usage > and get an indication of an error without panicing the kernel. > This will help identifying drivers that need to clear VM_PFNMAP before > using dmabuf system heap which is moving to use vm_insert_page. > > Suggested-by: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > --- > mm/memory.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index feff48e1465a..e503c9801cd9 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1827,7 +1827,8 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, > return -EINVAL; > if (!(vma->vm_flags & VM_MIXEDMAP)) { > BUG_ON(mmap_read_trylock(vma->vm_mm)); Better to replace above BUG_ON with WARN_ON_ONCE, too?
On Tue, Feb 2, 2021 at 5:31 PM Minchan Kim <minchan@kernel.org> wrote: > > On Tue, Feb 02, 2021 at 04:31:33PM -0800, Suren Baghdasaryan wrote: > > Replace BUG_ON(vma->vm_flags & VM_PFNMAP) in vm_insert_page with > > WARN_ON_ONCE and returning an error. This is to ensure users of the > > vm_insert_page that set VM_PFNMAP are notified of the wrong flag usage > > and get an indication of an error without panicing the kernel. > > This will help identifying drivers that need to clear VM_PFNMAP before > > using dmabuf system heap which is moving to use vm_insert_page. > > > > Suggested-by: Christoph Hellwig <hch@infradead.org> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com> > > --- > > mm/memory.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index feff48e1465a..e503c9801cd9 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -1827,7 +1827,8 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, > > return -EINVAL; > > if (!(vma->vm_flags & VM_MIXEDMAP)) { > > BUG_ON(mmap_read_trylock(vma->vm_mm)); > > Better to replace above BUG_ON with WARN_ON_ONCE, too? If nobody objects I'll do that in the next respin. Thanks! > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
On Tue, Feb 02, 2021 at 04:31:33PM -0800, Suren Baghdasaryan wrote: > Replace BUG_ON(vma->vm_flags & VM_PFNMAP) in vm_insert_page with > WARN_ON_ONCE and returning an error. This is to ensure users of the > vm_insert_page that set VM_PFNMAP are notified of the wrong flag usage > and get an indication of an error without panicing the kernel. > This will help identifying drivers that need to clear VM_PFNMAP before > using dmabuf system heap which is moving to use vm_insert_page. NACK. The system may not _panic_, but it is clearly now _broken_. The device doesn't work, and so the system is useless. You haven't really improved anything here. Just bloated the kernel with yet another _ONCE variable that in a normal system will never ever ever be triggered.
On Tue, Feb 2, 2021 at 5:55 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, Feb 02, 2021 at 04:31:33PM -0800, Suren Baghdasaryan wrote: > > Replace BUG_ON(vma->vm_flags & VM_PFNMAP) in vm_insert_page with > > WARN_ON_ONCE and returning an error. This is to ensure users of the > > vm_insert_page that set VM_PFNMAP are notified of the wrong flag usage > > and get an indication of an error without panicing the kernel. > > This will help identifying drivers that need to clear VM_PFNMAP before > > using dmabuf system heap which is moving to use vm_insert_page. > > NACK. > > The system may not _panic_, but it is clearly now _broken_. The device > doesn't work, and so the system is useless. You haven't really improved > anything here. Just bloated the kernel with yet another _ONCE variable > that in a normal system will never ever ever be triggered. We had a discussion in https://lore.kernel.org/patchwork/patch/1372409 about how some DRM drivers set up their VMAs with VM_PFNMAP before mapping them. We want to use vm_insert_page instead of remap_pfn_range in the dmabuf heaps so that this memory is visible in PSS. However if a driver that sets VM_PFNMAP tries to use a dmabuf heap, it will step into this BUG_ON. We wanted to catch and gradually fix such drivers but without causing a panic in the process. I hope this clarifies the reasons why I'm making this change and I'm open to other ideas if they would address this issue in a better way.
On Wed, Feb 3, 2021 at 2:57 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, Feb 02, 2021 at 04:31:33PM -0800, Suren Baghdasaryan wrote: > > Replace BUG_ON(vma->vm_flags & VM_PFNMAP) in vm_insert_page with > > WARN_ON_ONCE and returning an error. This is to ensure users of the > > vm_insert_page that set VM_PFNMAP are notified of the wrong flag usage > > and get an indication of an error without panicing the kernel. > > This will help identifying drivers that need to clear VM_PFNMAP before > > using dmabuf system heap which is moving to use vm_insert_page. > > NACK. > > The system may not _panic_, but it is clearly now _broken_. The device > doesn't work, and so the system is useless. You haven't really improved > anything here. Just bloated the kernel with yet another _ONCE variable > that in a normal system will never ever ever be triggered. Also, what the heck are you doing with your drivers? dma-buf mmap must call dma_buf_mmap(), even for forwarded/redirected mmaps from driver char nodes. If that doesn't work we have some issues with the calling contract for that function, not in vm_insert_page. Finally why exactly do we need to make this switch for system heap? I've recently looked at gup usage by random drivers, and found a lot of worrying things there. gup on dma-buf is really bad idea in general. -Daniel
On Wed, Feb 3, 2021 at 12:52 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Wed, Feb 3, 2021 at 2:57 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Tue, Feb 02, 2021 at 04:31:33PM -0800, Suren Baghdasaryan wrote: > > > Replace BUG_ON(vma->vm_flags & VM_PFNMAP) in vm_insert_page with > > > WARN_ON_ONCE and returning an error. This is to ensure users of the > > > vm_insert_page that set VM_PFNMAP are notified of the wrong flag usage > > > and get an indication of an error without panicing the kernel. > > > This will help identifying drivers that need to clear VM_PFNMAP before > > > using dmabuf system heap which is moving to use vm_insert_page. > > > > NACK. > > > > The system may not _panic_, but it is clearly now _broken_. The device > > doesn't work, and so the system is useless. You haven't really improved > > anything here. Just bloated the kernel with yet another _ONCE variable > > that in a normal system will never ever ever be triggered. > > Also, what the heck are you doing with your drivers? dma-buf mmap must > call dma_buf_mmap(), even for forwarded/redirected mmaps from driver > char nodes. If that doesn't work we have some issues with the calling > contract for that function, not in vm_insert_page. The particular issue I observed (details were posted in https://lore.kernel.org/patchwork/patch/1372409) is that DRM drivers set VM_PFNMAP flag (via a call to drm_gem_mmap_obj) before calling dma_buf_mmap. Some drivers clear that flag but some don't. I could not find the answer to why VM_PFNMAP is required for dmabuf mappings and maybe someone can explain that here? If there is a reason to set this flag other than historical use of carveout memory then we wanted to catch such cases and fix the drivers that moved to using dmabuf heaps. However maybe there are other reasons and if so I would be very grateful if someone could explain them. That would help me to come up with a better solution. > Finally why exactly do we need to make this switch for system heap? > I've recently looked at gup usage by random drivers, and found a lot > of worrying things there. gup on dma-buf is really bad idea in > general. The reason for the switch is to be able to account dmabufs allocated using dmabuf heaps to the processes that map them. The next patch in this series https://lore.kernel.org/patchwork/patch/1374851 implementing the switch contains more details and there is an active discussion there. Would you mind joining that discussion to keep it in one place? Thanks! > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Wed, Feb 3, 2021 at 9:20 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Wed, Feb 3, 2021 at 12:52 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > On Wed, Feb 3, 2021 at 2:57 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > On Tue, Feb 02, 2021 at 04:31:33PM -0800, Suren Baghdasaryan wrote: > > > > Replace BUG_ON(vma->vm_flags & VM_PFNMAP) in vm_insert_page with > > > > WARN_ON_ONCE and returning an error. This is to ensure users of the > > > > vm_insert_page that set VM_PFNMAP are notified of the wrong flag usage > > > > and get an indication of an error without panicing the kernel. > > > > This will help identifying drivers that need to clear VM_PFNMAP before > > > > using dmabuf system heap which is moving to use vm_insert_page. > > > > > > NACK. > > > > > > The system may not _panic_, but it is clearly now _broken_. The device > > > doesn't work, and so the system is useless. You haven't really improved > > > anything here. Just bloated the kernel with yet another _ONCE variable > > > that in a normal system will never ever ever be triggered. > > > > Also, what the heck are you doing with your drivers? dma-buf mmap must > > call dma_buf_mmap(), even for forwarded/redirected mmaps from driver > > char nodes. If that doesn't work we have some issues with the calling > > contract for that function, not in vm_insert_page. > > The particular issue I observed (details were posted in > https://lore.kernel.org/patchwork/patch/1372409) is that DRM drivers > set VM_PFNMAP flag (via a call to drm_gem_mmap_obj) before calling > dma_buf_mmap. Some drivers clear that flag but some don't. I could not > find the answer to why VM_PFNMAP is required for dmabuf mappings and > maybe someone can explain that here? > If there is a reason to set this flag other than historical use of > carveout memory then we wanted to catch such cases and fix the drivers > that moved to using dmabuf heaps. However maybe there are other > reasons and if so I would be very grateful if someone could explain > them. That would help me to come up with a better solution. > > > Finally why exactly do we need to make this switch for system heap? > > I've recently looked at gup usage by random drivers, and found a lot > > of worrying things there. gup on dma-buf is really bad idea in > > general. > > The reason for the switch is to be able to account dmabufs allocated > using dmabuf heaps to the processes that map them. The next patch in > this series https://lore.kernel.org/patchwork/patch/1374851 > implementing the switch contains more details and there is an active > discussion there. Would you mind joining that discussion to keep it in > one place? How many semi-unrelated buffer accounting schemes does google come up with? We're at three with this one. And also we _cannot_ required that all dma-bufs are backed by struct page, so requiring struct page to make this work is a no-go. Second, we do not want to all get_user_pages and friends to work on dma-buf, it causes all kinds of pain. Yes on SoC where dma-buf are exclusively in system memory you can maybe get away with this, but dma-buf is supposed to work in more places than just Android SoCs. If you want to account dma-bufs, and gpu memory in general, I'd say the solid solution is cgroups. There's patches floating around. And given that Google Android can't even agree internally on what exactly you want I'd say we just need to cut over to that and make it happen. Cheers, Daniel
On Wed, Feb 3, 2021 at 9:29 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Wed, Feb 3, 2021 at 9:20 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > On Wed, Feb 3, 2021 at 12:52 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > > On Wed, Feb 3, 2021 at 2:57 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > On Tue, Feb 02, 2021 at 04:31:33PM -0800, Suren Baghdasaryan wrote: > > > > > Replace BUG_ON(vma->vm_flags & VM_PFNMAP) in vm_insert_page with > > > > > WARN_ON_ONCE and returning an error. This is to ensure users of the > > > > > vm_insert_page that set VM_PFNMAP are notified of the wrong flag usage > > > > > and get an indication of an error without panicing the kernel. > > > > > This will help identifying drivers that need to clear VM_PFNMAP before > > > > > using dmabuf system heap which is moving to use vm_insert_page. > > > > > > > > NACK. > > > > > > > > The system may not _panic_, but it is clearly now _broken_. The device > > > > doesn't work, and so the system is useless. You haven't really improved > > > > anything here. Just bloated the kernel with yet another _ONCE variable > > > > that in a normal system will never ever ever be triggered. > > > > > > Also, what the heck are you doing with your drivers? dma-buf mmap must > > > call dma_buf_mmap(), even for forwarded/redirected mmaps from driver > > > char nodes. If that doesn't work we have some issues with the calling > > > contract for that function, not in vm_insert_page. > > > > The particular issue I observed (details were posted in > > https://lore.kernel.org/patchwork/patch/1372409) is that DRM drivers > > set VM_PFNMAP flag (via a call to drm_gem_mmap_obj) before calling > > dma_buf_mmap. Some drivers clear that flag but some don't. I could not > > find the answer to why VM_PFNMAP is required for dmabuf mappings and > > maybe someone can explain that here? > > If there is a reason to set this flag other than historical use of > > carveout memory then we wanted to catch such cases and fix the drivers > > that moved to using dmabuf heaps. However maybe there are other > > reasons and if so I would be very grateful if someone could explain > > them. That would help me to come up with a better solution. > > > > > Finally why exactly do we need to make this switch for system heap? > > > I've recently looked at gup usage by random drivers, and found a lot > > > of worrying things there. gup on dma-buf is really bad idea in > > > general. > > > > The reason for the switch is to be able to account dmabufs allocated > > using dmabuf heaps to the processes that map them. The next patch in > > this series https://lore.kernel.org/patchwork/patch/1374851 > > implementing the switch contains more details and there is an active > > discussion there. Would you mind joining that discussion to keep it in > > one place? > > How many semi-unrelated buffer accounting schemes does google come up with? > > We're at three with this one. > > And also we _cannot_ required that all dma-bufs are backed by struct > page, so requiring struct page to make this work is a no-go. > > Second, we do not want to all get_user_pages and friends to work on > dma-buf, it causes all kinds of pain. Yes on SoC where dma-buf are > exclusively in system memory you can maybe get away with this, but > dma-buf is supposed to work in more places than just Android SoCs. I just realized that vm_inser_page doesn't even work for CMA, it would upset get_user_pages pretty badly - you're trying to pin a page in ZONE_MOVEABLE but you can't move it because it's rather special. VM_SPECIAL is exactly meant to catch this stuff. -Daniel > If you want to account dma-bufs, and gpu memory in general, I'd say > the solid solution is cgroups. There's patches floating around. And > given that Google Android can't even agree internally on what exactly > you want I'd say we just need to cut over to that and make it happen. > > Cheers, Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Wed, Feb 3, 2021 at 1:25 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Wed, Feb 3, 2021 at 9:29 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > On Wed, Feb 3, 2021 at 9:20 PM Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > On Wed, Feb 3, 2021 at 12:52 AM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > > > > On Wed, Feb 3, 2021 at 2:57 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > > > > > > > On Tue, Feb 02, 2021 at 04:31:33PM -0800, Suren Baghdasaryan wrote: > > > > > > Replace BUG_ON(vma->vm_flags & VM_PFNMAP) in vm_insert_page with > > > > > > WARN_ON_ONCE and returning an error. This is to ensure users of the > > > > > > vm_insert_page that set VM_PFNMAP are notified of the wrong flag usage > > > > > > and get an indication of an error without panicing the kernel. > > > > > > This will help identifying drivers that need to clear VM_PFNMAP before > > > > > > using dmabuf system heap which is moving to use vm_insert_page. > > > > > > > > > > NACK. > > > > > > > > > > The system may not _panic_, but it is clearly now _broken_. The device > > > > > doesn't work, and so the system is useless. You haven't really improved > > > > > anything here. Just bloated the kernel with yet another _ONCE variable > > > > > that in a normal system will never ever ever be triggered. > > > > > > > > Also, what the heck are you doing with your drivers? dma-buf mmap must > > > > call dma_buf_mmap(), even for forwarded/redirected mmaps from driver > > > > char nodes. If that doesn't work we have some issues with the calling > > > > contract for that function, not in vm_insert_page. > > > > > > The particular issue I observed (details were posted in > > > https://lore.kernel.org/patchwork/patch/1372409) is that DRM drivers > > > set VM_PFNMAP flag (via a call to drm_gem_mmap_obj) before calling > > > dma_buf_mmap. Some drivers clear that flag but some don't. I could not > > > find the answer to why VM_PFNMAP is required for dmabuf mappings and > > > maybe someone can explain that here? > > > If there is a reason to set this flag other than historical use of > > > carveout memory then we wanted to catch such cases and fix the drivers > > > that moved to using dmabuf heaps. However maybe there are other > > > reasons and if so I would be very grateful if someone could explain > > > them. That would help me to come up with a better solution. > > > > > > > Finally why exactly do we need to make this switch for system heap? > > > > I've recently looked at gup usage by random drivers, and found a lot > > > > of worrying things there. gup on dma-buf is really bad idea in > > > > general. > > > > > > The reason for the switch is to be able to account dmabufs allocated > > > using dmabuf heaps to the processes that map them. The next patch in > > > this series https://lore.kernel.org/patchwork/patch/1374851 > > > implementing the switch contains more details and there is an active > > > discussion there. Would you mind joining that discussion to keep it in > > > one place? > > > > How many semi-unrelated buffer accounting schemes does google come up with? > > > > We're at three with this one. > > > > And also we _cannot_ required that all dma-bufs are backed by struct > > page, so requiring struct page to make this work is a no-go. > > > > Second, we do not want to all get_user_pages and friends to work on > > dma-buf, it causes all kinds of pain. Yes on SoC where dma-buf are > > exclusively in system memory you can maybe get away with this, but > > dma-buf is supposed to work in more places than just Android SoCs. > > I just realized that vm_inser_page doesn't even work for CMA, it would > upset get_user_pages pretty badly - you're trying to pin a page in > ZONE_MOVEABLE but you can't move it because it's rather special. > VM_SPECIAL is exactly meant to catch this stuff. Thanks for the input, Daniel! Let me think about the cases you pointed out. IMHO, the issue with PSS is the difficulty of calculating this metric without struct page usage. I don't think that problem becomes easier if we use cgroups or any other API. I wanted to enable existing PSS calculation mechanisms for the dmabufs known to be backed by struct pages (since we know how the heap allocated that memory), but sounds like this would lead to problems that I did not consider. Thanks, Suren. > -Daniel > > > If you want to account dma-bufs, and gpu memory in general, I'd say > > the solid solution is cgroups. There's patches floating around. And > > given that Google Android can't even agree internally on what exactly > > you want I'd say we just need to cut over to that and make it happen. > > > > Cheers, Daniel > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
Am 03.02.21 um 21:20 schrieb Suren Baghdasaryan: > [SNIP] > If there is a reason to set this flag other than historical use of > carveout memory then we wanted to catch such cases and fix the drivers > that moved to using dmabuf heaps. However maybe there are other > reasons and if so I would be very grateful if someone could explain > them. That would help me to come up with a better solution. Well one major reason for this is to prevent accounting of DMA-buf pages. So you are going in circles here and trying to circumvent an intentional behavior. Daniel is right that this is the completely wrong approach and we need to take a step back and think about it on a higher level. Going to replay to his mail as well. Regards, Christian.
Am 03.02.21 um 22:41 schrieb Suren Baghdasaryan: > [SNIP] >>> How many semi-unrelated buffer accounting schemes does google come up with? >>> >>> We're at three with this one. >>> >>> And also we _cannot_ required that all dma-bufs are backed by struct >>> page, so requiring struct page to make this work is a no-go. >>> >>> Second, we do not want to all get_user_pages and friends to work on >>> dma-buf, it causes all kinds of pain. Yes on SoC where dma-buf are >>> exclusively in system memory you can maybe get away with this, but >>> dma-buf is supposed to work in more places than just Android SoCs. >> I just realized that vm_inser_page doesn't even work for CMA, it would >> upset get_user_pages pretty badly - you're trying to pin a page in >> ZONE_MOVEABLE but you can't move it because it's rather special. >> VM_SPECIAL is exactly meant to catch this stuff. > Thanks for the input, Daniel! Let me think about the cases you pointed out. > > IMHO, the issue with PSS is the difficulty of calculating this metric > without struct page usage. I don't think that problem becomes easier > if we use cgroups or any other API. I wanted to enable existing PSS > calculation mechanisms for the dmabufs known to be backed by struct > pages (since we know how the heap allocated that memory), but sounds > like this would lead to problems that I did not consider. Yeah, using struct page indeed won't work. We discussed that multiple times now and Daniel even has a patch to mangle the struct page pointers inside the sg_table object to prevent abuse in that direction. On the other hand I totally agree that we need to do something on this side which goes beyong what cgroups provide. A few years ago I came up with patches to improve the OOM killer to include resources bound to the processes through file descriptors. I unfortunately can't find them of hand any more and I'm currently to busy to dig them up. In general I think we need to make it possible that both the in kernel OOM killer as well as userspace processes and handlers have access to that kind of data. The fdinfo approach as suggested in the other thread sounds like the easiest solution to me. Regards, Christian. > Thanks, > Suren. > >
On Thu, Feb 04, 2021 at 09:16:32AM +0100, Christian König wrote: > Am 03.02.21 um 22:41 schrieb Suren Baghdasaryan: > > [SNIP] > > > > How many semi-unrelated buffer accounting schemes does google come up with? > > > > > > > > We're at three with this one. > > > > > > > > And also we _cannot_ required that all dma-bufs are backed by struct > > > > page, so requiring struct page to make this work is a no-go. > > > > > > > > Second, we do not want to all get_user_pages and friends to work on > > > > dma-buf, it causes all kinds of pain. Yes on SoC where dma-buf are > > > > exclusively in system memory you can maybe get away with this, but > > > > dma-buf is supposed to work in more places than just Android SoCs. > > > I just realized that vm_inser_page doesn't even work for CMA, it would > > > upset get_user_pages pretty badly - you're trying to pin a page in > > > ZONE_MOVEABLE but you can't move it because it's rather special. > > > VM_SPECIAL is exactly meant to catch this stuff. > > Thanks for the input, Daniel! Let me think about the cases you pointed out. > > > > IMHO, the issue with PSS is the difficulty of calculating this metric > > without struct page usage. I don't think that problem becomes easier > > if we use cgroups or any other API. I wanted to enable existing PSS > > calculation mechanisms for the dmabufs known to be backed by struct > > pages (since we know how the heap allocated that memory), but sounds > > like this would lead to problems that I did not consider. > > Yeah, using struct page indeed won't work. We discussed that multiple times > now and Daniel even has a patch to mangle the struct page pointers inside > the sg_table object to prevent abuse in that direction. > > On the other hand I totally agree that we need to do something on this side > which goes beyong what cgroups provide. > > A few years ago I came up with patches to improve the OOM killer to include > resources bound to the processes through file descriptors. I unfortunately > can't find them of hand any more and I'm currently to busy to dig them up. > > In general I think we need to make it possible that both the in kernel OOM > killer as well as userspace processes and handlers have access to that kind > of data. > > The fdinfo approach as suggested in the other thread sounds like the easiest > solution to me. Yeah for OOM handling cgroups alone isn't enough as the interface - we need to make sure that oom killer takes into account the system memory usage (ideally zone aware, for CMA pools). But to track that we still need that infrastructure first I think. -Daniel
On Thu, Feb 4, 2021 at 3:16 AM Christian König <christian.koenig@amd.com> wrote: > > Am 03.02.21 um 22:41 schrieb Suren Baghdasaryan: > > [SNIP] > >>> How many semi-unrelated buffer accounting schemes does google come up with? > >>> > >>> We're at three with this one. > >>> > >>> And also we _cannot_ required that all dma-bufs are backed by struct > >>> page, so requiring struct page to make this work is a no-go. > >>> > >>> Second, we do not want to all get_user_pages and friends to work on > >>> dma-buf, it causes all kinds of pain. Yes on SoC where dma-buf are > >>> exclusively in system memory you can maybe get away with this, but > >>> dma-buf is supposed to work in more places than just Android SoCs. > >> I just realized that vm_inser_page doesn't even work for CMA, it would > >> upset get_user_pages pretty badly - you're trying to pin a page in > >> ZONE_MOVEABLE but you can't move it because it's rather special. > >> VM_SPECIAL is exactly meant to catch this stuff. > > Thanks for the input, Daniel! Let me think about the cases you pointed out. > > > > IMHO, the issue with PSS is the difficulty of calculating this metric > > without struct page usage. I don't think that problem becomes easier > > if we use cgroups or any other API. I wanted to enable existing PSS > > calculation mechanisms for the dmabufs known to be backed by struct > > pages (since we know how the heap allocated that memory), but sounds > > like this would lead to problems that I did not consider. > > Yeah, using struct page indeed won't work. We discussed that multiple > times now and Daniel even has a patch to mangle the struct page pointers > inside the sg_table object to prevent abuse in that direction. > > On the other hand I totally agree that we need to do something on this > side which goes beyong what cgroups provide. > > A few years ago I came up with patches to improve the OOM killer to > include resources bound to the processes through file descriptors. I > unfortunately can't find them of hand any more and I'm currently to busy > to dig them up. https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html I think there was a more recent discussion, but I can't seem to find it. Alex > > In general I think we need to make it possible that both the in kernel > OOM killer as well as userspace processes and handlers have access to > that kind of data. > > The fdinfo approach as suggested in the other thread sounds like the > easiest solution to me. > > Regards, > Christian. > > > Thanks, > > Suren. > > > > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Thu, Feb 4, 2021 at 7:55 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > On Thu, Feb 4, 2021 at 3:16 AM Christian König <christian.koenig@amd.com> wrote: > > > > Am 03.02.21 um 22:41 schrieb Suren Baghdasaryan: > > > [SNIP] > > >>> How many semi-unrelated buffer accounting schemes does google come up with? > > >>> > > >>> We're at three with this one. > > >>> > > >>> And also we _cannot_ required that all dma-bufs are backed by struct > > >>> page, so requiring struct page to make this work is a no-go. > > >>> > > >>> Second, we do not want to all get_user_pages and friends to work on > > >>> dma-buf, it causes all kinds of pain. Yes on SoC where dma-buf are > > >>> exclusively in system memory you can maybe get away with this, but > > >>> dma-buf is supposed to work in more places than just Android SoCs. > > >> I just realized that vm_inser_page doesn't even work for CMA, it would > > >> upset get_user_pages pretty badly - you're trying to pin a page in > > >> ZONE_MOVEABLE but you can't move it because it's rather special. > > >> VM_SPECIAL is exactly meant to catch this stuff. > > > Thanks for the input, Daniel! Let me think about the cases you pointed out. > > > > > > IMHO, the issue with PSS is the difficulty of calculating this metric > > > without struct page usage. I don't think that problem becomes easier > > > if we use cgroups or any other API. I wanted to enable existing PSS > > > calculation mechanisms for the dmabufs known to be backed by struct > > > pages (since we know how the heap allocated that memory), but sounds > > > like this would lead to problems that I did not consider. > > > > Yeah, using struct page indeed won't work. We discussed that multiple > > times now and Daniel even has a patch to mangle the struct page pointers > > inside the sg_table object to prevent abuse in that direction. > > > > On the other hand I totally agree that we need to do something on this > > side which goes beyong what cgroups provide. > > > > A few years ago I came up with patches to improve the OOM killer to > > include resources bound to the processes through file descriptors. I > > unfortunately can't find them of hand any more and I'm currently to busy > > to dig them up. > > https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html > I think there was a more recent discussion, but I can't seem to find it. Thanks for the pointer! Appreciate the time everyone took to explain the issues. Thanks, Suren. > > Alex > > > > > In general I think we need to make it possible that both the in kernel > > OOM killer as well as userspace processes and handlers have access to > > that kind of data. > > > > The fdinfo approach as suggested in the other thread sounds like the > > easiest solution to me. > > > > Regards, > > Christian. > > > > > Thanks, > > > Suren. > > > > > > > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
diff --git a/mm/memory.c b/mm/memory.c index feff48e1465a..e503c9801cd9 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1827,7 +1827,8 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, return -EINVAL; if (!(vma->vm_flags & VM_MIXEDMAP)) { BUG_ON(mmap_read_trylock(vma->vm_mm)); - BUG_ON(vma->vm_flags & VM_PFNMAP); + if (WARN_ON_ONCE(vma->vm_flags & VM_PFNMAP)) + return -EINVAL; vma->vm_flags |= VM_MIXEDMAP; } return insert_page(vma, addr, page, vma->vm_page_prot);
Replace BUG_ON(vma->vm_flags & VM_PFNMAP) in vm_insert_page with WARN_ON_ONCE and returning an error. This is to ensure users of the vm_insert_page that set VM_PFNMAP are notified of the wrong flag usage and get an indication of an error without panicing the kernel. This will help identifying drivers that need to clear VM_PFNMAP before using dmabuf system heap which is moving to use vm_insert_page. Suggested-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Suren Baghdasaryan <surenb@google.com> --- mm/memory.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)