diff mbox series

[v3] dma-buf: dma-heap: Add a size check for allocation

Message ID 20220113123406.11520-1-guangming.cao@mediatek.com (mailing list archive)
State New, archived
Headers show
Series [v3] dma-buf: dma-heap: Add a size check for allocation | expand

Commit Message

Guangming.Cao Jan. 13, 2022, 12:34 p.m. UTC
From: Guangming <Guangming.Cao@mediatek.com>

Add a size check for allocation since the allocation size is
always less than the total DRAM size.

Without this check, once the invalid size allocation runs on a process that
can't be killed by OOM flow(such as "gralloc" on Android devices), it will
cause a kernel exception, and to make matters worse, we can't find who are using
so many memory with "dma_buf_debug_show" since the relevant dma-buf hasn't exported.

To make OOM issue easier, maybe need dma-buf framework to dump the buffer size
under allocating in "dma_buf_debug_show".

Signed-off-by: Guangming <Guangming.Cao@mediatek.com>
Signed-off-by: jianjiao zeng <jianjiao.zeng@mediatek.com>
---
v3: 1. update patch, use right shift to replace division.
    2. update patch, add reason in code and commit message.
v2: 1. update size limitation as total_dram page size.
    2. update commit message
---
 drivers/dma-buf/dma-heap.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Ruhl, Michael J Jan. 13, 2022, 12:57 p.m. UTC | #1
>-----Original Message-----
>From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of
>guangming.cao@mediatek.com
>Sent: Thursday, January 13, 2022 7:34 AM
>To: sumit.semwal@linaro.org
>Cc: linux-arm-kernel@lists.infradead.org; mingyuan.ma@mediatek.com;
>Guangming <Guangming.Cao@mediatek.com>;
>wsd_upstream@mediatek.com; linux-kernel@vger.kernel.org; dri-
>devel@lists.freedesktop.org; linaro-mm-sig@lists.linaro.org;
>yf.wang@mediatek.com; libo.kang@mediatek.com;
>benjamin.gaignard@linaro.org; bo.song@mediatek.com;
>matthias.bgg@gmail.com; linux-mediatek@lists.infradead.org;
>lmark@codeaurora.org; labbott@redhat.com; christian.koenig@amd.com;
>jianjiao.zeng@mediatek.com; linux-media@vger.kernel.org
>Subject: [PATCH v3] dma-buf: dma-heap: Add a size check for allocation
>
>From: Guangming <Guangming.Cao@mediatek.com>
>
>Add a size check for allocation since the allocation size is
>always less than the total DRAM size.
>
>Without this check, once the invalid size allocation runs on a process that
>can't be killed by OOM flow(such as "gralloc" on Android devices), it will
>cause a kernel exception, and to make matters worse, we can't find who are
>using
>so many memory with "dma_buf_debug_show" since the relevant dma-buf
>hasn't exported.
>
>To make OOM issue easier, maybe need dma-buf framework to dump the
>buffer size
>under allocating in "dma_buf_debug_show".
>
>Signed-off-by: Guangming <Guangming.Cao@mediatek.com>
>Signed-off-by: jianjiao zeng <jianjiao.zeng@mediatek.com>
>---
>v3: 1. update patch, use right shift to replace division.
>    2. update patch, add reason in code and commit message.
>v2: 1. update size limitation as total_dram page size.
>    2. update commit message
>---
> drivers/dma-buf/dma-heap.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
>diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
>index 56bf5ad01ad5..1fd382712584 100644
>--- a/drivers/dma-buf/dma-heap.c
>+++ b/drivers/dma-buf/dma-heap.c
>@@ -55,6 +55,16 @@ static int dma_heap_buffer_alloc(struct dma_heap
>*heap, size_t len,
> 	struct dma_buf *dmabuf;
> 	int fd;
>
>+	/*
>+	 * Invalid size check. The "len" should be less than totalram.
>+	 *
>+	 * Without this check, once the invalid size allocation runs on a process
>that
>+	 * can't be killed by OOM flow(such as "gralloc" on Android devices), it
>will
>+	 * cause a kernel exception, and to make matters worse, we can't find
>who are using
>+	 * so many memory with "dma_buf_debug_show" since the relevant
>dma-buf hasn't exported.
>+	 */
>+	if (len >> PAGE_SHIFT > totalram_pages())

If your "heap" is from cma, is this still a valid check?

M

>+		return -EINVAL;
> 	/*
> 	 * Allocations from all heaps have to begin
> 	 * and end on page boundaries.
>--
>2.17.1
Ruhl, Michael J Jan. 13, 2022, 1 p.m. UTC | #2
>-----Original Message-----
>From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of
>Ruhl, Michael J
>Sent: Thursday, January 13, 2022 7:58 AM
>To: guangming.cao@mediatek.com; sumit.semwal@linaro.org
>Cc: jianjiao.zeng@mediatek.com; lmark@codeaurora.org;
>wsd_upstream@mediatek.com; christian.koenig@amd.com; linux-
>kernel@vger.kernel.org; dri-devel@lists.freedesktop.org;
>yf.wang@mediatek.com; linaro-mm-sig@lists.linaro.org; linux-
>mediatek@lists.infradead.org; libo.kang@mediatek.com;
>benjamin.gaignard@linaro.org; bo.song@mediatek.com;
>matthias.bgg@gmail.com; labbott@redhat.com;
>mingyuan.ma@mediatek.com; linux-arm-kernel@lists.infradead.org; linux-
>media@vger.kernel.org
>Subject: RE: [PATCH v3] dma-buf: dma-heap: Add a size check for allocation
>
>
>>-----Original Message-----
>>From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of
>>guangming.cao@mediatek.com
>>Sent: Thursday, January 13, 2022 7:34 AM
>>To: sumit.semwal@linaro.org
>>Cc: linux-arm-kernel@lists.infradead.org; mingyuan.ma@mediatek.com;
>>Guangming <Guangming.Cao@mediatek.com>;
>>wsd_upstream@mediatek.com; linux-kernel@vger.kernel.org; dri-
>>devel@lists.freedesktop.org; linaro-mm-sig@lists.linaro.org;
>>yf.wang@mediatek.com; libo.kang@mediatek.com;
>>benjamin.gaignard@linaro.org; bo.song@mediatek.com;
>>matthias.bgg@gmail.com; linux-mediatek@lists.infradead.org;
>>lmark@codeaurora.org; labbott@redhat.com; christian.koenig@amd.com;
>>jianjiao.zeng@mediatek.com; linux-media@vger.kernel.org
>>Subject: [PATCH v3] dma-buf: dma-heap: Add a size check for allocation
>>
>>From: Guangming <Guangming.Cao@mediatek.com>
>>
>>Add a size check for allocation since the allocation size is
>>always less than the total DRAM size.
>>
>>Without this check, once the invalid size allocation runs on a process that
>>can't be killed by OOM flow(such as "gralloc" on Android devices), it will
>>cause a kernel exception, and to make matters worse, we can't find who are
>>using
>>so many memory with "dma_buf_debug_show" since the relevant dma-buf
>>hasn't exported.
>>
>>To make OOM issue easier, maybe need dma-buf framework to dump the
>>buffer size
>>under allocating in "dma_buf_debug_show".
>>
>>Signed-off-by: Guangming <Guangming.Cao@mediatek.com>
>>Signed-off-by: jianjiao zeng <jianjiao.zeng@mediatek.com>
>>---
>>v3: 1. update patch, use right shift to replace division.
>>    2. update patch, add reason in code and commit message.
>>v2: 1. update size limitation as total_dram page size.
>>    2. update commit message
>>---
>> drivers/dma-buf/dma-heap.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>>
>>diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
>>index 56bf5ad01ad5..1fd382712584 100644
>>--- a/drivers/dma-buf/dma-heap.c
>>+++ b/drivers/dma-buf/dma-heap.c
>>@@ -55,6 +55,16 @@ static int dma_heap_buffer_alloc(struct dma_heap
>>*heap, size_t len,
>> 	struct dma_buf *dmabuf;
>> 	int fd;
>>
>>+	/*
>>+	 * Invalid size check. The "len" should be less than totalram.
>>+	 *
>>+	 * Without this check, once the invalid size allocation runs on a process
>>that
>>+	 * can't be killed by OOM flow(such as "gralloc" on Android devices), it
>>will
>>+	 * cause a kernel exception, and to make matters worse, we can't find
>>who are using
>>+	 * so many memory with "dma_buf_debug_show" since the relevant
>>dma-buf hasn't exported.
>>+	 */
>>+	if (len >> PAGE_SHIFT > totalram_pages())
>
>If your "heap" is from cma, is this still a valid check?

And thinking a bit further, if I create a heap from something else (say device memory),
you will need to be able to figure out the maximum allowable check for the specific
heap.

Maybe the heap needs a callback for max size?

m
>M
>
>>+		return -EINVAL;
>> 	/*
>> 	 * Allocations from all heaps have to begin
>> 	 * and end on page boundaries.
>>--
>>2.17.1
Christian König Jan. 13, 2022, 1:05 p.m. UTC | #3
Am 13.01.22 um 14:00 schrieb Ruhl, Michael J:
>> -----Original Message-----
>> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of
>> Ruhl, Michael J
>> Sent: Thursday, January 13, 2022 7:58 AM
>> To: guangming.cao@mediatek.com; sumit.semwal@linaro.org
>> Cc: jianjiao.zeng@mediatek.com; lmark@codeaurora.org;
>> wsd_upstream@mediatek.com; christian.koenig@amd.com; linux-
>> kernel@vger.kernel.org; dri-devel@lists.freedesktop.org;
>> yf.wang@mediatek.com; linaro-mm-sig@lists.linaro.org; linux-
>> mediatek@lists.infradead.org; libo.kang@mediatek.com;
>> benjamin.gaignard@linaro.org; bo.song@mediatek.com;
>> matthias.bgg@gmail.com; labbott@redhat.com;
>> mingyuan.ma@mediatek.com; linux-arm-kernel@lists.infradead.org; linux-
>> media@vger.kernel.org
>> Subject: RE: [PATCH v3] dma-buf: dma-heap: Add a size check for allocation
>>
>>
>>> -----Original Message-----
>>> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of
>>> guangming.cao@mediatek.com
>>> Sent: Thursday, January 13, 2022 7:34 AM
>>> To: sumit.semwal@linaro.org
>>> Cc: linux-arm-kernel@lists.infradead.org; mingyuan.ma@mediatek.com;
>>> Guangming <Guangming.Cao@mediatek.com>;
>>> wsd_upstream@mediatek.com; linux-kernel@vger.kernel.org; dri-
>>> devel@lists.freedesktop.org; linaro-mm-sig@lists.linaro.org;
>>> yf.wang@mediatek.com; libo.kang@mediatek.com;
>>> benjamin.gaignard@linaro.org; bo.song@mediatek.com;
>>> matthias.bgg@gmail.com; linux-mediatek@lists.infradead.org;
>>> lmark@codeaurora.org; labbott@redhat.com; christian.koenig@amd.com;
>>> jianjiao.zeng@mediatek.com; linux-media@vger.kernel.org
>>> Subject: [PATCH v3] dma-buf: dma-heap: Add a size check for allocation
>>>
>>> From: Guangming <Guangming.Cao@mediatek.com>
>>>
>>> Add a size check for allocation since the allocation size is
>>> always less than the total DRAM size.
>>>
>>> Without this check, once the invalid size allocation runs on a process that
>>> can't be killed by OOM flow(such as "gralloc" on Android devices), it will
>>> cause a kernel exception, and to make matters worse, we can't find who are
>>> using
>>> so many memory with "dma_buf_debug_show" since the relevant dma-buf
>>> hasn't exported.
>>>
>>> To make OOM issue easier, maybe need dma-buf framework to dump the
>>> buffer size
>>> under allocating in "dma_buf_debug_show".
>>>
>>> Signed-off-by: Guangming <Guangming.Cao@mediatek.com>
>>> Signed-off-by: jianjiao zeng <jianjiao.zeng@mediatek.com>
>>> ---
>>> v3: 1. update patch, use right shift to replace division.
>>>     2. update patch, add reason in code and commit message.
>>> v2: 1. update size limitation as total_dram page size.
>>>     2. update commit message
>>> ---
>>> drivers/dma-buf/dma-heap.c | 10 ++++++++++
>>> 1 file changed, 10 insertions(+)
>>>
>>> diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
>>> index 56bf5ad01ad5..1fd382712584 100644
>>> --- a/drivers/dma-buf/dma-heap.c
>>> +++ b/drivers/dma-buf/dma-heap.c
>>> @@ -55,6 +55,16 @@ static int dma_heap_buffer_alloc(struct dma_heap
>>> *heap, size_t len,
>>> 	struct dma_buf *dmabuf;
>>> 	int fd;
>>>
>>> +	/*
>>> +	 * Invalid size check. The "len" should be less than totalram.
>>> +	 *
>>> +	 * Without this check, once the invalid size allocation runs on a process
>>> that
>>> +	 * can't be killed by OOM flow(such as "gralloc" on Android devices), it
>>> will
>>> +	 * cause a kernel exception, and to make matters worse, we can't find
>>> who are using
>>> +	 * so many memory with "dma_buf_debug_show" since the relevant
>>> dma-buf hasn't exported.
>>> +	 */
>>> +	if (len >> PAGE_SHIFT > totalram_pages())
>> If your "heap" is from cma, is this still a valid check?
> And thinking a bit further, if I create a heap from something else (say device memory),
> you will need to be able to figure out the maximum allowable check for the specific
> heap.
>
> Maybe the heap needs a callback for max size?

Well we currently maintain a separate allocator and don't use dma-heap, 
but yes we have systems with 16GiB device and only 8GiB system memory so 
that check here is certainly not correct.

In general I would rather let the system run into -ENOMEM or -EINVAL 
from the allocator instead.

Regards,
Christian.

>
> m
>> M
>>
>>> +		return -EINVAL;
>>> 	/*
>>> 	 * Allocations from all heaps have to begin
>>> 	 * and end on page boundaries.
>>> --
>>> 2.17.1
John Stultz Jan. 13, 2022, 11:26 p.m. UTC | #4
On Thu, Jan 13, 2022 at 5:05 AM Christian König
<christian.koenig@amd.com> wrote:
> Am 13.01.22 um 14:00 schrieb Ruhl, Michael J:
> >> -----Original Message-----
> >> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of
> >> Ruhl, Michael J
> >>> -----Original Message-----
> >>> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of
> >>> guangming.cao@mediatek.com
> >>> +   /*
> >>> +    * Invalid size check. The "len" should be less than totalram.
> >>> +    *
> >>> +    * Without this check, once the invalid size allocation runs on a process
> >>> that
> >>> +    * can't be killed by OOM flow(such as "gralloc" on Android devices), it
> >>> will
> >>> +    * cause a kernel exception, and to make matters worse, we can't find
> >>> who are using
> >>> +    * so many memory with "dma_buf_debug_show" since the relevant
> >>> dma-buf hasn't exported.
> >>> +    */
> >>> +   if (len >> PAGE_SHIFT > totalram_pages())
> >> If your "heap" is from cma, is this still a valid check?
> > And thinking a bit further, if I create a heap from something else (say device memory),
> > you will need to be able to figure out the maximum allowable check for the specific
> > heap.
> >
> > Maybe the heap needs a callback for max size?
>
> Well we currently maintain a separate allocator and don't use dma-heap,
> but yes we have systems with 16GiB device and only 8GiB system memory so
> that check here is certainly not correct.

Good point.

> In general I would rather let the system run into -ENOMEM or -EINVAL
> from the allocator instead.

Probably the simpler solution is to push the allocation check to the
heap driver, rather than doing it at the top level here.

For CMA or other contiguous heaps, letting the allocator fail is fast
enough. For noncontiguous buffers, like the system heap, the
allocation can burn a lot of time and consume a lot of memory (causing
other trouble) before a large allocation might naturally fail.

thanks
-john
Christian König Jan. 14, 2022, 7:16 a.m. UTC | #5
Am 14.01.22 um 00:26 schrieb John Stultz:
> On Thu, Jan 13, 2022 at 5:05 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 13.01.22 um 14:00 schrieb Ruhl, Michael J:
>>>> -----Original Message-----
>>>> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of
>>>> Ruhl, Michael J
>>>>> -----Original Message-----
>>>>> From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On Behalf Of
>>>>> guangming.cao@mediatek.com
>>>>> +   /*
>>>>> +    * Invalid size check. The "len" should be less than totalram.
>>>>> +    *
>>>>> +    * Without this check, once the invalid size allocation runs on a process
>>>>> that
>>>>> +    * can't be killed by OOM flow(such as "gralloc" on Android devices), it
>>>>> will
>>>>> +    * cause a kernel exception, and to make matters worse, we can't find
>>>>> who are using
>>>>> +    * so many memory with "dma_buf_debug_show" since the relevant
>>>>> dma-buf hasn't exported.
>>>>> +    */
>>>>> +   if (len >> PAGE_SHIFT > totalram_pages())
>>>> If your "heap" is from cma, is this still a valid check?
>>> And thinking a bit further, if I create a heap from something else (say device memory),
>>> you will need to be able to figure out the maximum allowable check for the specific
>>> heap.
>>>
>>> Maybe the heap needs a callback for max size?
>> Well we currently maintain a separate allocator and don't use dma-heap,
>> but yes we have systems with 16GiB device and only 8GiB system memory so
>> that check here is certainly not correct.
> Good point.
>
>> In general I would rather let the system run into -ENOMEM or -EINVAL
>> from the allocator instead.
> Probably the simpler solution is to push the allocation check to the
> heap driver, rather than doing it at the top level here.
>
> For CMA or other contiguous heaps, letting the allocator fail is fast
> enough. For noncontiguous buffers, like the system heap, the
> allocation can burn a lot of time and consume a lot of memory (causing
> other trouble) before a large allocation might naturally fail.

Yeah, letting a alloc_page() loop run for a while is usually not nice at 
all :)

You can still do a sanity check here, e.g. the size should never have 
the most significant bit set for example.

Regards,
Christian.

>
> thanks
> -john
Guangming.Cao Jan. 14, 2022, 12:05 p.m. UTC | #6
On Fri, 2022-01-14 at 08:16 +0100, Christian König wrote:
> Am 14.01.22 um 00:26 schrieb John Stultz:
> > On Thu, Jan 13, 2022 at 5:05 AM Christian König
> > <christian.koenig@amd.com> wrote:
> > > Am 13.01.22 um 14:00 schrieb Ruhl, Michael J:
> > > > > -----Original Message-----
> > > > > From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On
> > > > > Behalf Of
> > > > > Ruhl, Michael J
> > > > > > -----Original Message-----
> > > > > > From: dri-devel <dri-devel-bounces@lists.freedesktop.org>
> > > > > > On Behalf Of
> > > > > > guangming.cao@mediatek.com
> > > > > > +   /*
> > > > > > +    * Invalid size check. The "len" should be less than
> > > > > > totalram.
> > > > > > +    *
> > > > > > +    * Without this check, once the invalid size allocation
> > > > > > runs on a process
> > > > > > that
> > > > > > +    * can't be killed by OOM flow(such as "gralloc" on
> > > > > > Android devices), it
> > > > > > will
> > > > > > +    * cause a kernel exception, and to make matters worse,
> > > > > > we can't find
> > > > > > who are using
> > > > > > +    * so many memory with "dma_buf_debug_show" since the
> > > > > > relevant
> > > > > > dma-buf hasn't exported.
> > > > > > +    */
> > > > > > +   if (len >> PAGE_SHIFT > totalram_pages())
> > > > > 
> > > > > If your "heap" is from cma, is this still a valid check?
> > > > 
> > > > And thinking a bit further, if I create a heap from something
> > > > else (say device memory),
> > > > you will need to be able to figure out the maximum allowable
> > > > check for the specific
> > > > heap.
> > > > 
> > > > Maybe the heap needs a callback for max size?
Yes, I agree with this solution.
If dma-heap framework support this via adding a callback to support it,
seems it's more clear than adding a limitation in dma-heap framework
since each heap maybe has different limitation.
If you prefer adding callback, I can update this patch and add totalram
limitation to system dma-heap.

Thanks!
Guangming
> > > 
> > > Well we currently maintain a separate allocator and don't use
> > > dma-heap,
> > > but yes we have systems with 16GiB device and only 8GiB system
> > > memory so
> > > that check here is certainly not correct.
> > 
> > Good point.
> > 
> > > In general I would rather let the system run into -ENOMEM or
> > > -EINVAL
> > > from the allocator instead.

For system dma-heap, it doesn't know how memory is avaliable when
allocating memory, so, use totalram_pages() just to prevent cases which
will cause oom definitely.

Just like PAGE align, this check is can be used for all heaps since
there is no dma-heap can alloc memory larger than totalram. Futhermore,
if vendors implement a variety of dma-heap like system heap for special
usages, seems need to add this check to each dma-heap, and I think this
is unnecessary.
If the dma-heap has it's own special limitations for size, and add it
into heap implementation is good.

Thanks!
Guangming
> > 
> > Probably the simpler solution is to push the allocation check to
> > the
> > heap driver, rather than doing it at the top level here.
> > 
> > For CMA or other contiguous heaps, letting the allocator fail is
> > fast
> > enough. For noncontiguous buffers, like the system heap, the
> > allocation can burn a lot of time and consume a lot of memory
> > (causing
> > other trouble) before a large allocation might naturally fail.
> 
> Yeah, letting a alloc_page() loop run for a while is usually not nice
> at 
> all :)
> 
> You can still do a sanity check here, e.g. the size should never
> have 
> the most significant bit set for example.
> 
Yes, this is a good solution. But if this a positive value, larger than
totalram, it can also pass this check, and cause OOM after some time.

From dicussion above, seems finding a proper solution that can judge
the size is valid or not for each dma-heap is more important.

Thanks!
Guangming

> Regards,
> Christian.
> 
> > 
> > thanks
> > -john
> 
>
John Stultz Jan. 15, 2022, 1:17 a.m. UTC | #7
On Fri, Jan 14, 2022 at 4:04 AM Guangming.Cao
<guangming.cao@mediatek.com> wrote:
>
> On Fri, 2022-01-14 at 08:16 +0100, Christian König wrote:
> > Am 14.01.22 um 00:26 schrieb John Stultz:
> > > On Thu, Jan 13, 2022 at 5:05 AM Christian König
> > > <christian.koenig@amd.com> wrote:
> > > > Am 13.01.22 um 14:00 schrieb Ruhl, Michael J:
> > > > > > -----Original Message-----
> > > > > > From: dri-devel <dri-devel-bounces@lists.freedesktop.org> On
> > > > > > Behalf Of
> > > > > > Ruhl, Michael J
> > > > > > > -----Original Message-----
> > > > > > > From: dri-devel <dri-devel-bounces@lists.freedesktop.org>
> > > > > > > On Behalf Of
> > > > > > > guangming.cao@mediatek.com
> > > > > > > +   /*
> > > > > > > +    * Invalid size check. The "len" should be less than
> > > > > > > totalram.
> > > > > > > +    *
> > > > > > > +    * Without this check, once the invalid size allocation
> > > > > > > runs on a process
> > > > > > > that
> > > > > > > +    * can't be killed by OOM flow(such as "gralloc" on
> > > > > > > Android devices), it
> > > > > > > will
> > > > > > > +    * cause a kernel exception, and to make matters worse,
> > > > > > > we can't find
> > > > > > > who are using
> > > > > > > +    * so many memory with "dma_buf_debug_show" since the
> > > > > > > relevant
> > > > > > > dma-buf hasn't exported.
> > > > > > > +    */
> > > > > > > +   if (len >> PAGE_SHIFT > totalram_pages())
> > > > > >
> > > > > > If your "heap" is from cma, is this still a valid check?
> > > > >
> > > > > And thinking a bit further, if I create a heap from something
> > > > > else (say device memory),
> > > > > you will need to be able to figure out the maximum allowable
> > > > > check for the specific
> > > > > heap.
> > > > >
> > > > > Maybe the heap needs a callback for max size?
> Yes, I agree with this solution.
> If dma-heap framework support this via adding a callback to support it,
> seems it's more clear than adding a limitation in dma-heap framework
> since each heap maybe has different limitation.
> If you prefer adding callback, I can update this patch and add totalram
> limitation to system dma-heap.

If the max value is per-heap, why not enforce that value in the
per-heap allocation function?

Moving the check to the heap alloc to me seems simpler to me than
adding complexity to the infrastructure to add a heap max_size
callback. Is there some other use for the callback that you envision?

thanks
-john
Guangming.Cao Jan. 19, 2022, 9:59 a.m. UTC | #8
On Fri, 2022-01-14 at 17:17 -0800, John Stultz wrote:
> On Fri, Jan 14, 2022 at 4:04 AM Guangming.Cao
> <guangming.cao@mediatek.com> wrote:
> > 
> > On Fri, 2022-01-14 at 08:16 +0100, Christian König wrote:
> > > Am 14.01.22 um 00:26 schrieb John Stultz:
> > > > On Thu, Jan 13, 2022 at 5:05 AM Christian König
> > > > <christian.koenig@amd.com> wrote:
> > > > > Am 13.01.22 um 14:00 schrieb Ruhl, Michael J:
> > > > > > > -----Original Message-----
> > > > > > > From: dri-devel <dri-devel-bounces@lists.freedesktop.org>
> > > > > > > On
> > > > > > > Behalf Of
> > > > > > > Ruhl, Michael J
> > > > > > > > -----Original Message-----
> > > > > > > > From: dri-devel <
> > > > > > > > dri-devel-bounces@lists.freedesktop.org>
> > > > > > > > On Behalf Of
> > > > > > > > guangming.cao@mediatek.com
> > > > > > > > +   /*
> > > > > > > > +    * Invalid size check. The "len" should be less
> > > > > > > > than
> > > > > > > > totalram.
> > > > > > > > +    *
> > > > > > > > +    * Without this check, once the invalid size
> > > > > > > > allocation
> > > > > > > > runs on a process
> > > > > > > > that
> > > > > > > > +    * can't be killed by OOM flow(such as "gralloc" on
> > > > > > > > Android devices), it
> > > > > > > > will
> > > > > > > > +    * cause a kernel exception, and to make matters
> > > > > > > > worse,
> > > > > > > > we can't find
> > > > > > > > who are using
> > > > > > > > +    * so many memory with "dma_buf_debug_show" since
> > > > > > > > the
> > > > > > > > relevant
> > > > > > > > dma-buf hasn't exported.
> > > > > > > > +    */
> > > > > > > > +   if (len >> PAGE_SHIFT > totalram_pages())
> > > > > > > 
> > > > > > > If your "heap" is from cma, is this still a valid check?
> > > > > > 
> > > > > > And thinking a bit further, if I create a heap from
> > > > > > something
> > > > > > else (say device memory),
> > > > > > you will need to be able to figure out the maximum
> > > > > > allowable
> > > > > > check for the specific
> > > > > > heap.
> > > > > > 
> > > > > > Maybe the heap needs a callback for max size?
> > 
> > Yes, I agree with this solution.
> > If dma-heap framework support this via adding a callback to support
> > it,
> > seems it's more clear than adding a limitation in dma-heap
> > framework
> > since each heap maybe has different limitation.
> > If you prefer adding callback, I can update this patch and add
> > totalram
> > limitation to system dma-heap.
> 
> If the max value is per-heap, why not enforce that value in the
> per-heap allocation function?
> 
> Moving the check to the heap alloc to me seems simpler to me than
> adding complexity to the infrastructure to add a heap max_size
> callback. Is there some other use for the callback that you envision?
> 
> thanks
> -john

Thanks for your comment.

If you think max the value is per-heap, why not add an optional
callback for dma-heap to solve this issue(prevent consuming too much
time for a doomed to fail allocation), if the dma-heap doesn't have a
special size check, just use the default value(totalram) in dma-heap 
framework to do the size check.

Yes, for linux dma-heaps, only system-heap needs it, so adding it in
system heap is the simplest. However, there are many vendor dma-heaps
like system-heap which won't be uploaded to linux codebase, and maybe
have same limitation, all these heaps need to add the same limitation.
I just think it's boring. However, If you think discussing these absent
cases based on current linux code is meaningless, I also agree to it.

So, to summarize, if you still think adding it in system_heap.c is
better, I also agree and I will update the patch to add it in
system_heap.c

Thanks~
Guangming

> 
> _______________________________________________
> Linux-mediatek mailing list
> Linux-mediatek@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-mediatek
John Stultz Jan. 19, 2022, 8:37 p.m. UTC | #9
On Wed, Jan 19, 2022 at 1:58 AM Guangming.Cao
<guangming.cao@mediatek.com> wrote:
> On Fri, 2022-01-14 at 17:17 -0800, John Stultz wrote:
> > If the max value is per-heap, why not enforce that value in the
> > per-heap allocation function?
> >
> > Moving the check to the heap alloc to me seems simpler to me than
> > adding complexity to the infrastructure to add a heap max_size
> > callback. Is there some other use for the callback that you envision?
> >
>
> If you think max the value is per-heap, why not add an optional
> callback for dma-heap to solve this issue(prevent consuming too much
> time for a doomed to fail allocation), if the dma-heap doesn't have a
> special size check, just use the default value(totalram) in dma-heap
> framework to do the size check.

As the totalram default isn't correct for all heaps (or necessarily
even most heaps), so those heaps would need to implement the callback.

I'm just not sure adding complexity to the framework to address this
is useful. Instead of an additional check in the allocation function,
heap implementers will need to assess if the default logic in a
framework is correct, and then possibly implement the callback.

> Yes, for linux dma-heaps, only system-heap needs it, so adding it in
> system heap is the simplest. However, there are many vendor dma-heaps
> like system-heap which won't be uploaded to linux codebase, and maybe
> have same limitation, all these heaps need to add the same limitation.

My worry is that without seeing these vendor heaps, this is a bit of a
theoretical concern. We don't have the data on how common this is.
I very much hope that vendors can start submitting their heaps
upstream (along with drivers that benefit from the heaps). Then we can
really assess what makes the most sense for the community maintained
code.


> I just think it's boring. However, If you think discussing these absent
> cases based on current linux code is meaningless, I also agree to it.

So, as a rule, the upstream kernel doesn't create/maintain logic to
accommodate out of tree code.

Now, I agree there is the potential for some duplication in the checks
in the allocation logic, but until it affects the upstream kernel,
community maintainers can't really make an appropriate evaluation.

As a contra-example, if the allocation is some extreme hotpath, adding
an extra un-inlinable function pointer traversal for the size callback
may actually have a negative impact. This isn't likely but again, if
we cannot demonstrate it one way or the other against the upstream
tree, we can't figure out what the best solution might be.


> So, to summarize, if you still think adding it in system_heap.c is
> better, I also agree and I will update the patch to add it in
> system_heap.c

I think this is the best solution for now. As this is not part of an
userland ABI, we can always change it in the future once we see the
need.

thanks
-john
diff mbox series

Patch

diff --git a/drivers/dma-buf/dma-heap.c b/drivers/dma-buf/dma-heap.c
index 56bf5ad01ad5..1fd382712584 100644
--- a/drivers/dma-buf/dma-heap.c
+++ b/drivers/dma-buf/dma-heap.c
@@ -55,6 +55,16 @@  static int dma_heap_buffer_alloc(struct dma_heap *heap, size_t len,
 	struct dma_buf *dmabuf;
 	int fd;
 
+	/*
+	 * Invalid size check. The "len" should be less than totalram.
+	 *
+	 * Without this check, once the invalid size allocation runs on a process that
+	 * can't be killed by OOM flow(such as "gralloc" on Android devices), it will
+	 * cause a kernel exception, and to make matters worse, we can't find who are using
+	 * so many memory with "dma_buf_debug_show" since the relevant dma-buf hasn't exported.
+	 */
+	if (len >> PAGE_SHIFT > totalram_pages())
+		return -EINVAL;
 	/*
 	 * Allocations from all heaps have to begin
 	 * and end on page boundaries.