Message ID | 20220113123406.11520-1-guangming.cao@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] dma-buf: dma-heap: Add a size check for allocation | expand |
>-----Original Message----- >From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of >guangming.cao@mediatek.com >Sent: Thursday, January 13, 2022 7:34 AM >To: sumit.semwal@linaro.org >Cc: linux-arm-kernel@lists.infradead.org; mingyuan.ma@mediatek.com; >Guangming <Guangming.Cao@mediatek.com>; >wsd_upstream@mediatek.com; linux-kernel@vger.kernel.org; dri- >devel@lists.freedesktop.org; linaro-mm-sig@lists.linaro.org; >yf.wang@mediatek.com; libo.kang@mediatek.com; >benjamin.gaignard@linaro.org; bo.song@mediatek.com; >matthias.bgg@gmail.com; linux-mediatek@lists.infradead.org; >lmark@codeaurora.org; labbott@redhat.com; christian.koenig@amd.com; >jianjiao.zeng@mediatek.com; linux-media@vger.kernel.org >Subject: [PATCH v3] dma-buf: dma-heap: Add a size check for allocation > >From: Guangming <Guangming.Cao@mediatek.com> > >Add a size check for allocation since the allocation size is >always less than the total DRAM size. > >Without this check, once the invalid size allocation runs on a process that >can't be killed by OOM flow(such as "gralloc" on Android devices), it will >cause a kernel exception, and to make matters worse, we can't find who are >using >so many memory with "dma_buf_debug_show" since the relevant dma-buf >hasn't exported. > >To make OOM issue easier, maybe need dma-buf framework to dump the >buffer size >under allocating in "dma_buf_debug_show". > >Signed-off-by: Guangming <Guangming.Cao@mediatek.com> >Signed-off-by: jianjiao zeng <jianjiao.zeng@mediatek.com> >--- >v3: 1. update patch, use right shift to replace division. > 2. update patch, add reason in code and commit message. >v2: 1. update size limitation as total_dram page size. > 2. update commit message >--- > drivers/dma-buf/dma-heap.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > >diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c >index 56bf5ad01ad5..1fd382712584 100644 >--- a/drivers/dma-buf/dma-heap.c >+++ b/drivers/dma-buf/dma-heap.c >@@ -55,6 +55,16 @@ static int dma_heap_buffer_alloc(struct dma_heap >*heap, size_t len, > struct dma_buf *dmabuf; > int fd; > >+ /* >+ * Invalid size check. The "len" should be less than totalram. >+ * >+ * Without this check, once the invalid size allocation runs on a process >that >+ * can't be killed by OOM flow(such as "gralloc" on Android devices), it >will >+ * cause a kernel exception, and to make matters worse, we can't find >who are using >+ * so many memory with "dma_buf_debug_show" since the relevant >dma-buf hasn't exported. >+ */ >+ if (len >> PAGE_SHIFT > totalram_pages()) If your "heap" is from cma, is this still a valid check? M >+ return -EINVAL; > /* > * Allocations from all heaps have to begin > * and end on page boundaries. >-- >2.17.1
>-----Original Message----- >From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of >Ruhl, Michael J >Sent: Thursday, January 13, 2022 7:58 AM >To: guangming.cao@mediatek.com; sumit.semwal@linaro.org >Cc: jianjiao.zeng@mediatek.com; lmark@codeaurora.org; >wsd_upstream@mediatek.com; christian.koenig@amd.com; linux- >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; >yf.wang@mediatek.com; linaro-mm-sig@lists.linaro.org; linux- >mediatek@lists.infradead.org; libo.kang@mediatek.com; >benjamin.gaignard@linaro.org; bo.song@mediatek.com; >matthias.bgg@gmail.com; labbott@redhat.com; >mingyuan.ma@mediatek.com; linux-arm-kernel@lists.infradead.org; linux- >media@vger.kernel.org >Subject: RE: [PATCH v3] dma-buf: dma-heap: Add a size check for allocation > > >>-----Original Message----- >>From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of >>guangming.cao@mediatek.com >>Sent: Thursday, January 13, 2022 7:34 AM >>To: sumit.semwal@linaro.org >>Cc: linux-arm-kernel@lists.infradead.org; mingyuan.ma@mediatek.com; >>Guangming <Guangming.Cao@mediatek.com>; >>wsd_upstream@mediatek.com; linux-kernel@vger.kernel.org; dri- >>devel@lists.freedesktop.org; linaro-mm-sig@lists.linaro.org; >>yf.wang@mediatek.com; libo.kang@mediatek.com; >>benjamin.gaignard@linaro.org; bo.song@mediatek.com; >>matthias.bgg@gmail.com; linux-mediatek@lists.infradead.org; >>lmark@codeaurora.org; labbott@redhat.com; christian.koenig@amd.com; >>jianjiao.zeng@mediatek.com; linux-media@vger.kernel.org >>Subject: [PATCH v3] dma-buf: dma-heap: Add a size check for allocation >> >>From: Guangming <Guangming.Cao@mediatek.com> >> >>Add a size check for allocation since the allocation size is >>always less than the total DRAM size. >> >>Without this check, once the invalid size allocation runs on a process that >>can't be killed by OOM flow(such as "gralloc" on Android devices), it will >>cause a kernel exception, and to make matters worse, we can't find who are >>using >>so many memory with "dma_buf_debug_show" since the relevant dma-buf >>hasn't exported. >> >>To make OOM issue easier, maybe need dma-buf framework to dump the >>buffer size >>under allocating in "dma_buf_debug_show". >> >>Signed-off-by: Guangming <Guangming.Cao@mediatek.com> >>Signed-off-by: jianjiao zeng <jianjiao.zeng@mediatek.com> >>--- >>v3: 1. update patch, use right shift to replace division. >> 2. update patch, add reason in code and commit message. >>v2: 1. update size limitation as total_dram page size. >> 2. update commit message >>--- >> drivers/dma-buf/dma-heap.c | 10 ++++++++++ >> 1 file changed, 10 insertions(+) >> >>diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c >>index 56bf5ad01ad5..1fd382712584 100644 >>--- a/drivers/dma-buf/dma-heap.c >>+++ b/drivers/dma-buf/dma-heap.c >>@@ -55,6 +55,16 @@ static int dma_heap_buffer_alloc(struct dma_heap >>*heap, size_t len, >> struct dma_buf *dmabuf; >> int fd; >> >>+ /* >>+ * Invalid size check. The "len" should be less than totalram. >>+ * >>+ * Without this check, once the invalid size allocation runs on a process >>that >>+ * can't be killed by OOM flow(such as "gralloc" on Android devices), it >>will >>+ * cause a kernel exception, and to make matters worse, we can't find >>who are using >>+ * so many memory with "dma_buf_debug_show" since the relevant >>dma-buf hasn't exported. >>+ */ >>+ if (len >> PAGE_SHIFT > totalram_pages()) > >If your "heap" is from cma, is this still a valid check? And thinking a bit further, if I create a heap from something else (say device memory), you will need to be able to figure out the maximum allowable check for the specific heap. Maybe the heap needs a callback for max size? m >M > >>+ return -EINVAL; >> /* >> * Allocations from all heaps have to begin >> * and end on page boundaries. >>-- >>2.17.1
Am 13.01.22 um 14:00 schrieb Ruhl, Michael J: >> -----Original Message----- >> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of >> Ruhl, Michael J >> Sent: Thursday, January 13, 2022 7:58 AM >> To: guangming.cao@mediatek.com; sumit.semwal@linaro.org >> Cc: jianjiao.zeng@mediatek.com; lmark@codeaurora.org; >> wsd_upstream@mediatek.com; christian.koenig@amd.com; linux- >> kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; >> yf.wang@mediatek.com; linaro-mm-sig@lists.linaro.org; linux- >> mediatek@lists.infradead.org; libo.kang@mediatek.com; >> benjamin.gaignard@linaro.org; bo.song@mediatek.com; >> matthias.bgg@gmail.com; labbott@redhat.com; >> mingyuan.ma@mediatek.com; linux-arm-kernel@lists.infradead.org; linux- >> media@vger.kernel.org >> Subject: RE: [PATCH v3] dma-buf: dma-heap: Add a size check for allocation >> >> >>> -----Original Message----- >>> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of >>> guangming.cao@mediatek.com >>> Sent: Thursday, January 13, 2022 7:34 AM >>> To: sumit.semwal@linaro.org >>> Cc: linux-arm-kernel@lists.infradead.org; mingyuan.ma@mediatek.com; >>> Guangming <Guangming.Cao@mediatek.com>; >>> wsd_upstream@mediatek.com; linux-kernel@vger.kernel.org; dri- >>> devel@lists.freedesktop.org; linaro-mm-sig@lists.linaro.org; >>> yf.wang@mediatek.com; libo.kang@mediatek.com; >>> benjamin.gaignard@linaro.org; bo.song@mediatek.com; >>> matthias.bgg@gmail.com; linux-mediatek@lists.infradead.org; >>> lmark@codeaurora.org; labbott@redhat.com; christian.koenig@amd.com; >>> jianjiao.zeng@mediatek.com; linux-media@vger.kernel.org >>> Subject: [PATCH v3] dma-buf: dma-heap: Add a size check for allocation >>> >>> From: Guangming <Guangming.Cao@mediatek.com> >>> >>> Add a size check for allocation since the allocation size is >>> always less than the total DRAM size. >>> >>> Without this check, once the invalid size allocation runs on a process that >>> can't be killed by OOM flow(such as "gralloc" on Android devices), it will >>> cause a kernel exception, and to make matters worse, we can't find who are >>> using >>> so many memory with "dma_buf_debug_show" since the relevant dma-buf >>> hasn't exported. >>> >>> To make OOM issue easier, maybe need dma-buf framework to dump the >>> buffer size >>> under allocating in "dma_buf_debug_show". >>> >>> Signed-off-by: Guangming <Guangming.Cao@mediatek.com> >>> Signed-off-by: jianjiao zeng <jianjiao.zeng@mediatek.com> >>> --- >>> v3: 1. update patch, use right shift to replace division. >>> 2. update patch, add reason in code and commit message. >>> v2: 1. update size limitation as total_dram page size. >>> 2. update commit message >>> --- >>> drivers/dma-buf/dma-heap.c | 10 ++++++++++ >>> 1 file changed, 10 insertions(+) >>> >>> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c >>> index 56bf5ad01ad5..1fd382712584 100644 >>> --- a/drivers/dma-buf/dma-heap.c >>> +++ b/drivers/dma-buf/dma-heap.c >>> @@ -55,6 +55,16 @@ static int dma_heap_buffer_alloc(struct dma_heap >>> *heap, size_t len, >>> struct dma_buf *dmabuf; >>> int fd; >>> >>> + /* >>> + * Invalid size check. The "len" should be less than totalram. >>> + * >>> + * Without this check, once the invalid size allocation runs on a process >>> that >>> + * can't be killed by OOM flow(such as "gralloc" on Android devices), it >>> will >>> + * cause a kernel exception, and to make matters worse, we can't find >>> who are using >>> + * so many memory with "dma_buf_debug_show" since the relevant >>> dma-buf hasn't exported. >>> + */ >>> + if (len >> PAGE_SHIFT > totalram_pages()) >> If your "heap" is from cma, is this still a valid check? > And thinking a bit further, if I create a heap from something else (say device memory), > you will need to be able to figure out the maximum allowable check for the specific > heap. > > Maybe the heap needs a callback for max size? Well we currently maintain a separate allocator and don't use dma-heap, but yes we have systems with 16GiB device and only 8GiB system memory so that check here is certainly not correct. In general I would rather let the system run into -ENOMEM or -EINVAL from the allocator instead. Regards, Christian. > > m >> M >> >>> + return -EINVAL; >>> /* >>> * Allocations from all heaps have to begin >>> * and end on page boundaries. >>> -- >>> 2.17.1
On Thu, Jan 13, 2022 at 5:05 AM Christian König <christian.koenig@amd.com> wrote: > Am 13.01.22 um 14:00 schrieb Ruhl, Michael J: > >> -----Original Message----- > >> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of > >> Ruhl, Michael J > >>> -----Original Message----- > >>> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of > >>> guangming.cao@mediatek.com > >>> + /* > >>> + * Invalid size check. The "len" should be less than totalram. > >>> + * > >>> + * Without this check, once the invalid size allocation runs on a process > >>> that > >>> + * can't be killed by OOM flow(such as "gralloc" on Android devices), it > >>> will > >>> + * cause a kernel exception, and to make matters worse, we can't find > >>> who are using > >>> + * so many memory with "dma_buf_debug_show" since the relevant > >>> dma-buf hasn't exported. > >>> + */ > >>> + if (len >> PAGE_SHIFT > totalram_pages()) > >> If your "heap" is from cma, is this still a valid check? > > And thinking a bit further, if I create a heap from something else (say device memory), > > you will need to be able to figure out the maximum allowable check for the specific > > heap. > > > > Maybe the heap needs a callback for max size? > > Well we currently maintain a separate allocator and don't use dma-heap, > but yes we have systems with 16GiB device and only 8GiB system memory so > that check here is certainly not correct. Good point. > In general I would rather let the system run into -ENOMEM or -EINVAL > from the allocator instead. Probably the simpler solution is to push the allocation check to the heap driver, rather than doing it at the top level here. For CMA or other contiguous heaps, letting the allocator fail is fast enough. For noncontiguous buffers, like the system heap, the allocation can burn a lot of time and consume a lot of memory (causing other trouble) before a large allocation might naturally fail. thanks -john
Am 14.01.22 um 00:26 schrieb John Stultz: > On Thu, Jan 13, 2022 at 5:05 AM Christian König > <christian.koenig@amd.com> wrote: >> Am 13.01.22 um 14:00 schrieb Ruhl, Michael J: >>>> -----Original Message----- >>>> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of >>>> Ruhl, Michael J >>>>> -----Original Message----- >>>>> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of >>>>> guangming.cao@mediatek.com >>>>> + /* >>>>> + * Invalid size check. The "len" should be less than totalram. >>>>> + * >>>>> + * Without this check, once the invalid size allocation runs on a process >>>>> that >>>>> + * can't be killed by OOM flow(such as "gralloc" on Android devices), it >>>>> will >>>>> + * cause a kernel exception, and to make matters worse, we can't find >>>>> who are using >>>>> + * so many memory with "dma_buf_debug_show" since the relevant >>>>> dma-buf hasn't exported. >>>>> + */ >>>>> + if (len >> PAGE_SHIFT > totalram_pages()) >>>> If your "heap" is from cma, is this still a valid check? >>> And thinking a bit further, if I create a heap from something else (say device memory), >>> you will need to be able to figure out the maximum allowable check for the specific >>> heap. >>> >>> Maybe the heap needs a callback for max size? >> Well we currently maintain a separate allocator and don't use dma-heap, >> but yes we have systems with 16GiB device and only 8GiB system memory so >> that check here is certainly not correct. > Good point. > >> In general I would rather let the system run into -ENOMEM or -EINVAL >> from the allocator instead. > Probably the simpler solution is to push the allocation check to the > heap driver, rather than doing it at the top level here. > > For CMA or other contiguous heaps, letting the allocator fail is fast > enough. For noncontiguous buffers, like the system heap, the > allocation can burn a lot of time and consume a lot of memory (causing > other trouble) before a large allocation might naturally fail. Yeah, letting a alloc_page() loop run for a while is usually not nice at all :) You can still do a sanity check here, e.g. the size should never have the most significant bit set for example. Regards, Christian. > > thanks > -john
On Fri, Jan 14, 2022 at 4:04 AM Guangming.Cao <guangming.cao@mediatek.com> wrote: > > On Fri, 2022-01-14 at 08:16 +0100, Christian König wrote: > > Am 14.01.22 um 00:26 schrieb John Stultz: > > > On Thu, Jan 13, 2022 at 5:05 AM Christian König > > > <christian.koenig@amd.com> wrote: > > > > Am 13.01.22 um 14:00 schrieb Ruhl, Michael J: > > > > > > -----Original Message----- > > > > > > From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On > > > > > > Behalf Of > > > > > > Ruhl, Michael J > > > > > > > -----Original Message----- > > > > > > > From: dri-devel <dri-devel-bounces@lists.freedesktop.org> > > > > > > > On Behalf Of > > > > > > > guangming.cao@mediatek.com > > > > > > > + /* > > > > > > > + * Invalid size check. The "len" should be less than > > > > > > > totalram. > > > > > > > + * > > > > > > > + * Without this check, once the invalid size allocation > > > > > > > runs on a process > > > > > > > that > > > > > > > + * can't be killed by OOM flow(such as "gralloc" on > > > > > > > Android devices), it > > > > > > > will > > > > > > > + * cause a kernel exception, and to make matters worse, > > > > > > > we can't find > > > > > > > who are using > > > > > > > + * so many memory with "dma_buf_debug_show" since the > > > > > > > relevant > > > > > > > dma-buf hasn't exported. > > > > > > > + */ > > > > > > > + if (len >> PAGE_SHIFT > totalram_pages()) > > > > > > > > > > > > If your "heap" is from cma, is this still a valid check? > > > > > > > > > > And thinking a bit further, if I create a heap from something > > > > > else (say device memory), > > > > > you will need to be able to figure out the maximum allowable > > > > > check for the specific > > > > > heap. > > > > > > > > > > Maybe the heap needs a callback for max size? > Yes, I agree with this solution. > If dma-heap framework support this via adding a callback to support it, > seems it's more clear than adding a limitation in dma-heap framework > since each heap maybe has different limitation. > If you prefer adding callback, I can update this patch and add totalram > limitation to system dma-heap. If the max value is per-heap, why not enforce that value in the per-heap allocation function? Moving the check to the heap alloc to me seems simpler to me than adding complexity to the infrastructure to add a heap max_size callback. Is there some other use for the callback that you envision? thanks -john
On Wed, Jan 19, 2022 at 1:58 AM Guangming.Cao <guangming.cao@mediatek.com> wrote: > On Fri, 2022-01-14 at 17:17 -0800, John Stultz wrote: > > If the max value is per-heap, why not enforce that value in the > > per-heap allocation function? > > > > Moving the check to the heap alloc to me seems simpler to me than > > adding complexity to the infrastructure to add a heap max_size > > callback. Is there some other use for the callback that you envision? > > > > If you think max the value is per-heap, why not add an optional > callback for dma-heap to solve this issue(prevent consuming too much > time for a doomed to fail allocation), if the dma-heap doesn't have a > special size check, just use the default value(totalram) in dma-heap > framework to do the size check. As the totalram default isn't correct for all heaps (or necessarily even most heaps), so those heaps would need to implement the callback. I'm just not sure adding complexity to the framework to address this is useful. Instead of an additional check in the allocation function, heap implementers will need to assess if the default logic in a framework is correct, and then possibly implement the callback. > Yes, for linux dma-heaps, only system-heap needs it, so adding it in > system heap is the simplest. However, there are many vendor dma-heaps > like system-heap which won't be uploaded to linux codebase, and maybe > have same limitation, all these heaps need to add the same limitation. My worry is that without seeing these vendor heaps, this is a bit of a theoretical concern. We don't have the data on how common this is. I very much hope that vendors can start submitting their heaps upstream (along with drivers that benefit from the heaps). Then we can really assess what makes the most sense for the community maintained code. > I just think it's boring. However, If you think discussing these absent > cases based on current linux code is meaningless, I also agree to it. So, as a rule, the upstream kernel doesn't create/maintain logic to accommodate out of tree code. Now, I agree there is the potential for some duplication in the checks in the allocation logic, but until it affects the upstream kernel, community maintainers can't really make an appropriate evaluation. As a contra-example, if the allocation is some extreme hotpath, adding an extra un-inlinable function pointer traversal for the size callback may actually have a negative impact. This isn't likely but again, if we cannot demonstrate it one way or the other against the upstream tree, we can't figure out what the best solution might be. > So, to summarize, if you still think adding it in system_heap.c is > better, I also agree and I will update the patch to add it in > system_heap.c I think this is the best solution for now. As this is not part of an userland ABI, we can always change it in the future once we see the need. thanks -john
diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c index 56bf5ad01ad5..1fd382712584 100644 --- a/drivers/dma-buf/dma-heap.c +++ b/drivers/dma-buf/dma-heap.c @@ -55,6 +55,16 @@ static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len, struct dma_buf *dmabuf; int fd; + /* + * Invalid size check. The "len" should be less than totalram. + * + * Without this check, once the invalid size allocation runs on a process that + * can't be killed by OOM flow(such as "gralloc" on Android devices), it will + * cause a kernel exception, and to make matters worse, we can't find who are using + * so many memory with "dma_buf_debug_show" since the relevant dma-buf hasn't exported. + */ + if (len >> PAGE_SHIFT > totalram_pages()) + return -EINVAL; /* * Allocations from all heaps have to begin * and end on page boundaries.