diff mbox series

[v2] dma-buf/heaps: system_heap: Avoid DoS by limiting single allocations to half of all memory

Message ID 20230406000854.25764-1-jaewon31.kim@samsung.com (mailing list archive)
State New
Headers show
Series [v2] dma-buf/heaps: system_heap: Avoid DoS by limiting single allocations to half of all memory | expand

Commit Message

Jaewon Kim April 6, 2023, 12:08 a.m. UTC
Normal free:212600kB min:7664kB low:57100kB high:106536kB
  reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
  active_file:1200kB inactive_file:0kB unevictable:2932kB
  writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
  pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
  free_cma:200844kB
Out of memory and no killable processes...
Kernel panic - not syncing: System is deadlocked on memory

An OoM panic was reported, there were only native processes which are
non-killable as OOM_SCORE_ADJ_MIN.

After looking into the dump, I've found the dma-buf system heap was
trying to allocate a huge size. It seems to be a signed negative value.

dma_heap_ioctl_allocate(inline)
    |  heap_allocation = 0xFFFFFFC02247BD38 -> (
    |    len = 0xFFFFFFFFE7225100,

Actually the old ion system heap had policy which does not allow that
huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
bugs in system heap"). We need this change again. Single allocation
should not be bigger than half of all memory.

Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Acked-by: John Stultz <jstultz@google.com>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
---
 drivers/dma-buf/heaps/system_heap.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Andrew Morton April 6, 2023, 12:25 a.m. UTC | #1
On Thu,  6 Apr 2023 09:08:54 +0900 Jaewon Kim <jaewon31.kim@samsung.com> wrote:

> Normal free:212600kB min:7664kB low:57100kB high:106536kB
>   reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
>   active_file:1200kB inactive_file:0kB unevictable:2932kB
>   writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
>   pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
>   free_cma:200844kB
> Out of memory and no killable processes...
> Kernel panic - not syncing: System is deadlocked on memory
> 
> An OoM panic was reported, there were only native processes which are
> non-killable as OOM_SCORE_ADJ_MIN.
> 
> After looking into the dump, I've found the dma-buf system heap was
> trying to allocate a huge size. It seems to be a signed negative value.
> 
> dma_heap_ioctl_allocate(inline)
>     |  heap_allocation = 0xFFFFFFC02247BD38 -> (
>     |    len = 0xFFFFFFFFE7225100,
> 
> Actually the old ion system heap had policy which does not allow that
> huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
> bugs in system heap"). We need this change again. Single allocation
> should not be bigger than half of all memory.
> 
> ...
>
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
>  	struct page *page, *tmp_page;
>  	int i, ret = -ENOMEM;
>  
> +	if (len / PAGE_SIZE > totalram_pages() / 2)
> +		return ERR_PTR(-ENOMEM);
> +

This seems so random.  Why ram/2 rather than ram/3 or 17*ram/35?

Better behavior would be to try to allocate what the caller asked
for and if that doesn't work out, fail gracefully after freeing the
partial allocations which have been performed thus far.  If dma_buf
is changed to do this then that change is useful in many scenarios other
than this crazy corner case.
Jaewon Kim April 6, 2023, 1:44 a.m. UTC | #2
>On Thu,  6 Apr 2023 09:08:54 +0900 Jaewon Kim <jaewon31.kim@samsung.com> wrote:
>
>> Normal free:212600kB min:7664kB low:57100kB high:106536kB
>>   reserved_highatomic:4096KB active_anon:276kB inactive_anon:180kB
>>   active_file:1200kB inactive_file:0kB unevictable:2932kB
>>   writepending:0kB present:4109312kB managed:3689488kB mlocked:2932kB
>>   pagetables:13600kB bounce:0kB free_pcp:0kB local_pcp:0kB
>>   free_cma:200844kB
>> Out of memory and no killable processes...
>> Kernel panic - not syncing: System is deadlocked on memory
>> 
>> An OoM panic was reported, there were only native processes which are
>> non-killable as OOM_SCORE_ADJ_MIN.
>> 
>> After looking into the dump, I've found the dma-buf system heap was
>> trying to allocate a huge size. It seems to be a signed negative value.
>> 
>> dma_heap_ioctl_allocate(inline)
>>     |  heap_allocation = 0xFFFFFFC02247BD38 -> (
>>     |    len = 0xFFFFFFFFE7225100,
>> 
>> Actually the old ion system heap had policy which does not allow that
>> huge size with commit c9e8440eca61 ("staging: ion: Fix overflow and list
>> bugs in system heap"). We need this change again. Single allocation
>> should not be bigger than half of all memory.
>> 
>> ...
>>
>> --- a/drivers/dma-buf/heaps/system_heap.c
>> +++ b/drivers/dma-buf/heaps/system_heap.c
>> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
>>  	struct page *page, *tmp_page;
>>  	int i, ret = -ENOMEM;
>>  
>> +	if (len / PAGE_SIZE > totalram_pages() / 2)
>> +		return ERR_PTR(-ENOMEM);
>> +
>
>This seems so random.  Why ram/2 rather than ram/3 or 17*ram/35?

Hello

Thank you for your comment.

I just took the change from the old ion driver code, and actually I thought the
half of all memory is unrealistic. It could be unwanted size like negative,
or too big size which incurs slowness or OoM panic.

>
>Better behavior would be to try to allocate what the caller asked
>for and if that doesn't work out, fail gracefully after freeing the
>partial allocations which have been performed thus far.  If dma_buf
>is changed to do this then that change is useful in many scenarios other
>than this crazy corner case.

I think you would like __GFP_RETRY_MAYFAIL. Actually T.J. Mercier recommended
earlier, here's what we discussed.
https://lore.kernel.org/linux-mm/20230331005140epcms1p1ac5241f02f645e9dbc29626309a53b24@epcms1p1/

I just worried about a case in which we need oom kill to get more memory but
let me change my mind. That case seems to be rare. I think now it's time when
we need to make a decision and not to allow oom kill for dma-buf system heap
allocations.

But I still want to block that huge size over ram. For an unavailabe size,
I think, we don't have to do memory reclaim or killing processes, and we can
avoid freezing screen in user perspecitve.

This is eventually what I want. Can we check totalram_pages and and apply
__GFP_RETRY_MAYFAIL?

--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -41,7 +41,7 @@ struct dma_heap_attachment {
        bool mapped;
 };
 
-#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP)
+#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP | __GFP_RETRY_MAYFAIL)
 #define MID_ORDER_GFP (LOW_ORDER_GFP | __GFP_NOWARN)
 #define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
                                | __GFP_NORETRY) & ~__GFP_RECLAIM) \
@@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
        struct page *page, *tmp_page;
        int i, ret = -ENOMEM;
 
+       if (len / PAGE_SIZE > totalram_pages())
+               return ERR_PTR(-ENOMEM);
+

BR
Jaewon Kim
Andrew Morton April 6, 2023, 1:56 a.m. UTC | #3
On Thu, 06 Apr 2023 10:44:19 +0900 Jaewon Kim <jaewon31.kim@samsung.com> wrote:

> >> ...
> >>
> >> --- a/drivers/dma-buf/heaps/system_heap.c
> >> +++ b/drivers/dma-buf/heaps/system_heap.c
> >> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
> >>  	struct page *page, *tmp_page;
> >>  	int i, ret = -ENOMEM;
> >>  
> >> +	if (len / PAGE_SIZE > totalram_pages() / 2)
> >> +		return ERR_PTR(-ENOMEM);
> >> +
> >
> >This seems so random.  Why ram/2 rather than ram/3 or 17*ram/35?
> 
> Hello
> 
> Thank you for your comment.
> 
> I just took the change from the old ion driver code, and actually I thought the
> half of all memory is unrealistic. It could be unwanted size like negative,
> or too big size which incurs slowness or OoM panic.
> 
> >
> >Better behavior would be to try to allocate what the caller asked
> >for and if that doesn't work out, fail gracefully after freeing the
> >partial allocations which have been performed thus far.  If dma_buf
> >is changed to do this then that change is useful in many scenarios other
> >than this crazy corner case.
> 
> I think you would like __GFP_RETRY_MAYFAIL. Actually T.J. Mercier recommended
> earlier, here's what we discussed.
> https://lore.kernel.org/linux-mm/20230331005140epcms1p1ac5241f02f645e9dbc29626309a53b24@epcms1p1/
> 
> I just worried about a case in which we need oom kill to get more memory but
> let me change my mind. That case seems to be rare. I think now it's time when
> we need to make a decision and not to allow oom kill for dma-buf system heap
> allocations.
> 
> But I still want to block that huge size over ram. For an unavailabe size,
> I think, we don't have to do memory reclaim or killing processes, and we can
> avoid freezing screen in user perspecitve.
> 
> This is eventually what I want. Can we check totalram_pages and and apply
> __GFP_RETRY_MAYFAIL?
> 
> --- a/drivers/dma-buf/heaps/system_heap.c
> +++ b/drivers/dma-buf/heaps/system_heap.c
> @@ -41,7 +41,7 @@ struct dma_heap_attachment {
>         bool mapped;
>  };
>  
> -#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP)
> +#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP | __GFP_RETRY_MAYFAIL)
>  #define MID_ORDER_GFP (LOW_ORDER_GFP | __GFP_NOWARN)
>  #define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
>                                 | __GFP_NORETRY) & ~__GFP_RECLAIM) \
> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
>         struct page *page, *tmp_page;
>         int i, ret = -ENOMEM;
>  
> +       if (len / PAGE_SIZE > totalram_pages())
> +               return ERR_PTR(-ENOMEM);

We're catering for a buggy caller here, aren't we?  Are such large
requests ever reasonable?

How about we decide what's the largest reasonable size and do a
WARN_ON(larger-than-that), so the buggy caller gets fixed?
Jaewon Kim April 6, 2023, 2:17 a.m. UTC | #4
>On Thu, 06 Apr 2023 10:44:19 +0900 Jaewon Kim <jaewon31.kim@samsung.com> wrote:
>
>> >> ...
>> >>
>> >> --- a/drivers/dma-buf/heaps/system_heap.c
>> >> +++ b/drivers/dma-buf/heaps/system_heap.c
>> >> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
>> >>  	struct page *page, *tmp_page;
>> >>  	int i, ret = -ENOMEM;
>> >>  
>> >> +	if (len / PAGE_SIZE > totalram_pages() / 2)
>> >> +		return ERR_PTR(-ENOMEM);
>> >> +
>> >
>> >This seems so random.  Why ram/2 rather than ram/3 or 17*ram/35?
>> 
>> Hello
>> 
>> Thank you for your comment.
>> 
>> I just took the change from the old ion driver code, and actually I thought the
>> half of all memory is unrealistic. It could be unwanted size like negative,
>> or too big size which incurs slowness or OoM panic.
>> 
>> >
>> >Better behavior would be to try to allocate what the caller asked
>> >for and if that doesn't work out, fail gracefully after freeing the
>> >partial allocations which have been performed thus far.  If dma_buf
>> >is changed to do this then that change is useful in many scenarios other
>> >than this crazy corner case.
>> 
>> I think you would like __GFP_RETRY_MAYFAIL. Actually T.J. Mercier recommended
>> earlier, here's what we discussed.
>> https://lore.kernel.org/linux-mm/20230331005140epcms1p1ac5241f02f645e9dbc29626309a53b24@epcms1p1/
>> 
>> I just worried about a case in which we need oom kill to get more memory but
>> let me change my mind. That case seems to be rare. I think now it's time when
>> we need to make a decision and not to allow oom kill for dma-buf system heap
>> allocations.
>> 
>> But I still want to block that huge size over ram. For an unavailabe size,
>> I think, we don't have to do memory reclaim or killing processes, and we can
>> avoid freezing screen in user perspecitve.
>> 
>> This is eventually what I want. Can we check totalram_pages and and apply
>> __GFP_RETRY_MAYFAIL?
>> 
>> --- a/drivers/dma-buf/heaps/system_heap.c
>> +++ b/drivers/dma-buf/heaps/system_heap.c
>> @@ -41,7 +41,7 @@ struct dma_heap_attachment {
>>         bool mapped;
>>  };
>>  
>> -#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP)
>> +#define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP | __GFP_RETRY_MAYFAIL)
>>  #define MID_ORDER_GFP (LOW_ORDER_GFP | __GFP_NOWARN)
>>  #define HIGH_ORDER_GFP  (((GFP_HIGHUSER | __GFP_ZERO | __GFP_NOWARN \
>>                                 | __GFP_NORETRY) & ~__GFP_RECLAIM) \
>> @@ -351,6 +351,9 @@ static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
>>         struct page *page, *tmp_page;
>>         int i, ret = -ENOMEM;
>>  
>> +       if (len / PAGE_SIZE > totalram_pages())
>> +               return ERR_PTR(-ENOMEM);
>
>We're catering for a buggy caller here, aren't we?  Are such large
>requests ever reasonable?
>
>How about we decide what's the largest reasonable size and do a
>WARN_ON(larger-than-that), so the buggy caller gets fixed?

Yes we're considering a buggy caller. I thought even totalram_pages() / 2 in
the old ion system is also unreasonable. To avoid the /2, I changed it to
totalram_pages() though.

Because userspace can request that size repeately, I think WARN_ON() may be
called to too often, so that it would fill the kernel log buffer.

Even we think WARN_ON_ONCE rather than WARN_ON, the buggy point is not kernel
layer. Unlike page fault mechanism, this dma-buf system heap gets the size from 
userspace, and it is allowing unlimited size. I think we can't fix the buggy
user space with the kernel warning log. So I think warning is not enough,
and we need a safeguard in kernel layer.
Andrew Morton April 6, 2023, 3:09 a.m. UTC | #5
On Thu, 06 Apr 2023 11:17:12 +0900 Jaewon Kim <jaewon31.kim@samsung.com> wrote:

> >> +       if (len / PAGE_SIZE > totalram_pages())
> >> +               return ERR_PTR(-ENOMEM);
> >
> >We're catering for a buggy caller here, aren't we?  Are such large
> >requests ever reasonable?
> >
> >How about we decide what's the largest reasonable size and do a
> >WARN_ON(larger-than-that), so the buggy caller gets fixed?
> 
> Yes we're considering a buggy caller. I thought even totalram_pages() / 2 in
> the old ion system is also unreasonable. To avoid the /2, I changed it to
> totalram_pages() though.
> 
> Because userspace can request that size repeately, I think WARN_ON() may be
> called to too often, so that it would fill the kernel log buffer.

Oh geeze.  I trust that userspace needs elevated privileges of some form?

If so, then spamming dmesg isn't an issue - root can do much worse than
that.

> Even we think WARN_ON_ONCE rather than WARN_ON, the buggy point is not kernel
> layer. Unlike page fault mechanism, this dma-buf system heap gets the size from 
> userspace, and it is allowing unlimited size. I think we can't fix the buggy
> user space with the kernel warning log. So I think warning is not enough,
> and we need a safeguard in kernel layer.

I really dislike that ram/2 thing - it's so arbitrary, hence is surely
wrong for all cases.  Is there something more thoughtful we can do?

I mean, top priority here is to inform userspace that it's buggy so
that it gets fixed (assuming this requires elevated privileges).  And
userspace which requests (totalram_pages()/2 - 1) bytes is still buggy,
but we did nothing to get the bug fixed.
Jaewon Kim April 6, 2023, 3:46 a.m. UTC | #6
>On Thu, 06 Apr 2023 11:17:12 +0900 Jaewon Kim <jaewon31.kim@samsung.com> wrote:
>
>> >> +       if (len / PAGE_SIZE > totalram_pages())
>> >> +               return ERR_PTR(-ENOMEM);
>> >
>> >We're catering for a buggy caller here, aren't we?  Are such large
>> >requests ever reasonable?
>> >
>> >How about we decide what's the largest reasonable size and do a
>> >WARN_ON(larger-than-that), so the buggy caller gets fixed?
>> 
>> Yes we're considering a buggy caller. I thought even totalram_pages() / 2 in
>> the old ion system is also unreasonable. To avoid the /2, I changed it to
>> totalram_pages() though.
>> 
>> Because userspace can request that size repeately, I think WARN_ON() may be
>> called to too often, so that it would fill the kernel log buffer.
>
>Oh geeze.  I trust that userspace needs elevated privileges of some form?
>

I'm sorry I don't know the whole Android dma-buf allocation process. AFAIK
Android apps do not allocate itself, they request dma-buf memory to the
Android system process named of Allocator. Then the Allocator seems to have
privileges to use the dma-buf heap. So normal Android apps do NOT need the
privileges.

>If so, then spamming dmesg isn't an issue - root can do much worse than
>that.
>
>> Even we think WARN_ON_ONCE rather than WARN_ON, the buggy point is not kernel
>> layer. Unlike page fault mechanism, this dma-buf system heap gets the size from 
>> userspace, and it is allowing unlimited size. I think we can't fix the buggy
>> user space with the kernel warning log. So I think warning is not enough,
>> and we need a safeguard in kernel layer.
>
>I really dislike that ram/2 thing - it's so arbitrary, hence is surely
>wrong for all cases.  Is there something more thoughtful we can do?
>
>I mean, top priority here is to inform userspace that it's buggy so
>that it gets fixed (assuming this requires elevated privileges).  And
>userspace which requests (totalram_pages()/2 - 1) bytes is still buggy,
>but we did nothing to get the bug fixed.

That ram/2 seems to be arbitrary, but I thought just ram, not ram/2, is 
reasonble as a safeguard in kernel side even after the userspace process or
common library check the abnormal size.

I thought the userspace context would get -ENOMEM either in ram case or in
__GFP_RETRY_MAYFAIL case. And -ENOMEM is enough to inform. The usespace
may not fully recognize that it is buggy though.

I'm not sure that all the userspace context use the Android libdmabufheap
library and there is no other route directly using kernel dma-buf heap
ioctl. Anyway do you think size checking code should be added to the
libdmabufheap rather than kernel side?
John Stultz April 6, 2023, 4:24 a.m. UTC | #7
On Wed, Apr 5, 2023 at 8:09 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> On Thu, 06 Apr 2023 11:17:12 +0900 Jaewon Kim <jaewon31.kim@samsung.com> wrote:
>
> > >> +       if (len / PAGE_SIZE > totalram_pages())
> > >> +               return ERR_PTR(-ENOMEM);
> > >
> > >We're catering for a buggy caller here, aren't we?  Are such large
> > >requests ever reasonable?
> > >
> > >How about we decide what's the largest reasonable size and do a
> > >WARN_ON(larger-than-that), so the buggy caller gets fixed?
> >
> > Yes we're considering a buggy caller. I thought even totalram_pages() / 2 in
> > the old ion system is also unreasonable. To avoid the /2, I changed it to
> > totalram_pages() though.
> >
> > Because userspace can request that size repeately, I think WARN_ON() may be
> > called to too often, so that it would fill the kernel log buffer.
>
> Oh geeze.  I trust that userspace needs elevated privileges of some form?
>
> If so, then spamming dmesg isn't an issue - root can do much worse than
> that.
>
> > Even we think WARN_ON_ONCE rather than WARN_ON, the buggy point is not kernel
> > layer. Unlike page fault mechanism, this dma-buf system heap gets the size from
> > userspace, and it is allowing unlimited size. I think we can't fix the buggy
> > user space with the kernel warning log. So I think warning is not enough,
> > and we need a safeguard in kernel layer.
>
> I really dislike that ram/2 thing - it's so arbitrary, hence is surely
> wrong for all cases.  Is there something more thoughtful we can do?

Just for context, here's the old commit that added this to ION:
  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c9e8440eca61298ecccbb27f53036124a7a3c6c8

I think the consideration was that allocations larger than half of
memory are likely due to erroneously "negative" size values.

My memory is foggy on any discussions from that time, but I imagine
the thinking was treating the value as if it were signed and error out
immediately on negative values, but rather than just capping at 2gb on
32bit systems, one could scale it to half of the system memory size,
as that seemed an appropriate border of "obviously wrong".

And the reason why I think folks wanted to avoid just warning and
continuing with the allocation, is that these large allocations would
bog the system down badly before it failed, so failing quickly was a
benefit as the system was still responsive and able to be used to
collect logs and debug the issue.

When you say "decide what's the largest reasonable size", I think it
is difficult as with the variety of RAM sizes and buffer sizes I don't
think there's a fixed limit. Systems with more ram will use larger
buffers for image/video capture buffers.  And yes, you're right that
ram/2-1 in a single allocation is just as broken, but I'm not sure how
to establish a better guard rail.

thanks
-john
T.J. Mercier April 6, 2023, 11:27 p.m. UTC | #8
On Wed, Apr 5, 2023 at 9:24 PM John Stultz <jstultz@google.com> wrote:
>
> On Wed, Apr 5, 2023 at 8:09 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Thu, 06 Apr 2023 11:17:12 +0900 Jaewon Kim <jaewon31.kim@samsung.com> wrote:
> >
> > > >> +       if (len / PAGE_SIZE > totalram_pages())
> > > >> +               return ERR_PTR(-ENOMEM);
> > > >
> > > >We're catering for a buggy caller here, aren't we?  Are such large
> > > >requests ever reasonable?
> > > >
> > > >How about we decide what's the largest reasonable size and do a
> > > >WARN_ON(larger-than-that), so the buggy caller gets fixed?
> > >
> > > Yes we're considering a buggy caller. I thought even totalram_pages() / 2 in
> > > the old ion system is also unreasonable. To avoid the /2, I changed it to
> > > totalram_pages() though.
> > >
> > > Because userspace can request that size repeately, I think WARN_ON() may be
> > > called to too often, so that it would fill the kernel log buffer.
> >
> > Oh geeze.  I trust that userspace needs elevated privileges of some form?
> >
> > If so, then spamming dmesg isn't an issue - root can do much worse than
> > that.
> >
> > > Even we think WARN_ON_ONCE rather than WARN_ON, the buggy point is not kernel
> > > layer. Unlike page fault mechanism, this dma-buf system heap gets the size from
> > > userspace, and it is allowing unlimited size. I think we can't fix the buggy
> > > user space with the kernel warning log. So I think warning is not enough,
> > > and we need a safeguard in kernel layer.
> >
> > I really dislike that ram/2 thing - it's so arbitrary, hence is surely
> > wrong for all cases.  Is there something more thoughtful we can do?
>
> Just for context, here's the old commit that added this to ION:
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c9e8440eca61298ecccbb27f53036124a7a3c6c8
>
> I think the consideration was that allocations larger than half of
> memory are likely due to erroneously "negative" size values.
>
> My memory is foggy on any discussions from that time, but I imagine
> the thinking was treating the value as if it were signed and error out
> immediately on negative values, but rather than just capping at 2gb on
> 32bit systems, one could scale it to half of the system memory size,
> as that seemed an appropriate border of "obviously wrong".
>
> And the reason why I think folks wanted to avoid just warning and
> continuing with the allocation, is that these large allocations would
> bog the system down badly before it failed, so failing quickly was a
> benefit as the system was still responsive and able to be used to
> collect logs and debug the issue.
>
> When you say "decide what's the largest reasonable size", I think it
> is difficult as with the variety of RAM sizes and buffer sizes I don't
> think there's a fixed limit. Systems with more ram will use larger
> buffers for image/video capture buffers.  And yes, you're right that
> ram/2-1 in a single allocation is just as broken, but I'm not sure how
> to establish a better guard rail.
>
> thanks
> -john

I like ENOMEM with the len / PAGE_SIZE > totalram_pages() check and
WARN_ON. We know for sure that's an invalid request, and it's pretty
cheap to check as opposed to trying a bunch of reclaim before failing.
For buffers smaller than that I agree with John in that I'm not sure
there's a definitive threshold.
Andrew Morton April 6, 2023, 11:38 p.m. UTC | #9
On Thu, 6 Apr 2023 16:27:28 -0700 "T.J. Mercier" <tjmercier@google.com> wrote:

> > When you say "decide what's the largest reasonable size", I think it
> > is difficult as with the variety of RAM sizes and buffer sizes I don't
> > think there's a fixed limit. Systems with more ram will use larger
> > buffers for image/video capture buffers.  And yes, you're right that
> > ram/2-1 in a single allocation is just as broken, but I'm not sure how
> > to establish a better guard rail.
> >
> > thanks
> > -john
> 
> I like ENOMEM with the len / PAGE_SIZE > totalram_pages() check and
> WARN_ON. We know for sure that's an invalid request, and it's pretty
> cheap to check as opposed to trying a bunch of reclaim before failing.

Well, if some buggy caller has gone and requested eleventy bigabytes of
memory, doing a lot of reclaiming before failing isn't really a problem
- we don't want to optimize for this case!

> For buffers smaller than that I agree with John in that I'm not sure
> there's a definitive threshold.

Well...  why do we want to do _anything_ here?  Why cater for buggy
callers?  I think it's because "dma-buf behaves really badly with very
large allocation requests".  Again, can we fix that instead?
T.J. Mercier April 7, 2023, midnight UTC | #10
On Thu, Apr 6, 2023 at 4:38 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Thu, 6 Apr 2023 16:27:28 -0700 "T.J. Mercier" <tjmercier@google.com> wrote:
>
> > > When you say "decide what's the largest reasonable size", I think it
> > > is difficult as with the variety of RAM sizes and buffer sizes I don't
> > > think there's a fixed limit. Systems with more ram will use larger
> > > buffers for image/video capture buffers.  And yes, you're right that
> > > ram/2-1 in a single allocation is just as broken, but I'm not sure how
> > > to establish a better guard rail.
> > >
> > > thanks
> > > -john
> >
> > I like ENOMEM with the len / PAGE_SIZE > totalram_pages() check and
> > WARN_ON. We know for sure that's an invalid request, and it's pretty
> > cheap to check as opposed to trying a bunch of reclaim before failing.
>
> Well, if some buggy caller has gone and requested eleventy bigabytes of
:)
> memory, doing a lot of reclaiming before failing isn't really a problem
> - we don't want to optimize for this case!
>
The issue I see is that it could delay other non-buggy callers, or
cause reclaim that wouldn't have happened if we just outright rejected
a known-bad allocation request from the beginning.

> > For buffers smaller than that I agree with John in that I'm not sure
> > there's a definitive threshold.
>
> Well...  why do we want to do _anything_ here?  Why cater for buggy
> callers?  I think it's because "dma-buf behaves really badly with very
> large allocation requests".  Again, can we fix that instead?
>
There are a variety of different allocation strategies used by
different exporters so I don't think there's one dma-buf thing we
could fix for slow, large allocations in general. For the system_heap
in this patch it's really just alloc_pages. I'm saying I don't think
the kernel should ever ask alloc_pages for more memory than exists on
the system, which seems like a pretty reasonable sanity check to me.
Given that, I don't think we should do anything for buffers smaller
than totalram_pages() (except maybe to prevent OOM panics via
__GFP_RETRY_MAYFAIL when we attempt to exhaust system memory on any
request - valid or otherwise).
Jaewon Kim April 7, 2023, 2:24 a.m. UTC | #11
>On Thu, Apr 6, 2023 at 4:38?PM Andrew Morton <akpm@linux-foundation.org> wrote:
>>
>> On Thu, 6 Apr 2023 16:27:28 -0700 "T.J. Mercier" <tjmercier@google.com> wrote:
>>
>> > > When you say "decide what's the largest reasonable size", I think it
>> > > is difficult as with the variety of RAM sizes and buffer sizes I don't
>> > > think there's a fixed limit. Systems with more ram will use larger
>> > > buffers for image/video capture buffers.  And yes, you're right that
>> > > ram/2-1 in a single allocation is just as broken, but I'm not sure how
>> > > to establish a better guard rail.
>> > >
>> > > thanks
>> > > -john
>> >
>> > I like ENOMEM with the len / PAGE_SIZE > totalram_pages() check and
>> > WARN_ON. We know for sure that's an invalid request, and it's pretty
>> > cheap to check as opposed to trying a bunch of reclaim before failing.
>>
>> Well, if some buggy caller has gone and requested eleventy bigabytes of
>:)
>> memory, doing a lot of reclaiming before failing isn't really a problem
>> - we don't want to optimize for this case!
>>
>The issue I see is that it could delay other non-buggy callers, or
>cause reclaim that wouldn't have happened if we just outright rejected
>a known-bad allocation request from the beginning.
>
>> > For buffers smaller than that I agree with John in that I'm not sure
>> > there's a definitive threshold.
>>
>> Well...  why do we want to do _anything_ here?  Why cater for buggy
>> callers?  I think it's because "dma-buf behaves really badly with very
>> large allocation requests".  Again, can we fix that instead?
>>
>There are a variety of different allocation strategies used by
>different exporters so I don't think there's one dma-buf thing we
>could fix for slow, large allocations in general. For the system_heap
>in this patch it's really just alloc_pages. I'm saying I don't think
>the kernel should ever ask alloc_pages for more memory than exists on
>the system, which seems like a pretty reasonable sanity check to me.
>Given that, I don't think we should do anything for buffers smaller
>than totalram_pages() (except maybe to prevent OOM panics via
>__GFP_RETRY_MAYFAIL when we attempt to exhaust system memory on any
>request - valid or otherwise).

I think T. J. also agree with me on what I shared.
  if (len / PAGE_SIZE > totalram_pages()) return ERR_PTR(-ENOMEM);
  #define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP | __GFP_RETRY_MAYFAIL)

Regarding the dma-buf behavior, I also would like to say that the dma-buf
system heap seems to be designed to allocate that large memory. In mobile
devices, we need that large memory for camera buffers or grahpics
rendendering buffers. So that large memory should be allowed but the invalid
huge size over ram should be avoided.

I agree on that mm should reclaim even for the large size. But that reclaim
process may affect system performance or user convenience. In that perspective
I thought ram / 2 was reasonable, but yes not a golden value. I hope to use
just ram size as sanity check.

Additionally if all agree, we may be able to apply __GFP_RETRY_MAYFAIL too.

BR
T.J. Mercier April 7, 2023, 5:12 a.m. UTC | #12
On Thu, Apr 6, 2023 at 7:24 PM Jaewon Kim <jaewon31.kim@samsung.com> wrote:
>
> >On Thu, Apr 6, 2023 at 4:38?PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >>
> >> On Thu, 6 Apr 2023 16:27:28 -0700 "T.J. Mercier" <tjmercier@google.com> wrote:
> >>
> >> > > When you say "decide what's the largest reasonable size", I think it
> >> > > is difficult as with the variety of RAM sizes and buffer sizes I don't
> >> > > think there's a fixed limit. Systems with more ram will use larger
> >> > > buffers for image/video capture buffers.  And yes, you're right that
> >> > > ram/2-1 in a single allocation is just as broken, but I'm not sure how
> >> > > to establish a better guard rail.
> >> > >
> >> > > thanks
> >> > > -john
> >> >
> >> > I like ENOMEM with the len / PAGE_SIZE > totalram_pages() check and
> >> > WARN_ON. We know for sure that's an invalid request, and it's pretty
> >> > cheap to check as opposed to trying a bunch of reclaim before failing.
> >>
> >> Well, if some buggy caller has gone and requested eleventy bigabytes of
> >:)
> >> memory, doing a lot of reclaiming before failing isn't really a problem
> >> - we don't want to optimize for this case!
> >>
> >The issue I see is that it could delay other non-buggy callers, or
> >cause reclaim that wouldn't have happened if we just outright rejected
> >a known-bad allocation request from the beginning.
> >
> >> > For buffers smaller than that I agree with John in that I'm not sure
> >> > there's a definitive threshold.
> >>
> >> Well...  why do we want to do _anything_ here?  Why cater for buggy
> >> callers?  I think it's because "dma-buf behaves really badly with very
> >> large allocation requests".  Again, can we fix that instead?
> >>
> >There are a variety of different allocation strategies used by
> >different exporters so I don't think there's one dma-buf thing we
> >could fix for slow, large allocations in general. For the system_heap
> >in this patch it's really just alloc_pages. I'm saying I don't think
> >the kernel should ever ask alloc_pages for more memory than exists on
> >the system, which seems like a pretty reasonable sanity check to me.
> >Given that, I don't think we should do anything for buffers smaller
> >than totalram_pages() (except maybe to prevent OOM panics via
> >__GFP_RETRY_MAYFAIL when we attempt to exhaust system memory on any
> >request - valid or otherwise).
>
> I think T. J. also agree with me on what I shared.
>   if (len / PAGE_SIZE > totalram_pages()) return ERR_PTR(-ENOMEM);
>   #define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP | __GFP_RETRY_MAYFAIL)
>
Oh yeah, sorry if that wasn't clear. I was referring to your updated
check for just totalram_pages() above, not totalram_pages() / 2.

> Regarding the dma-buf behavior, I also would like to say that the dma-buf
> system heap seems to be designed to allocate that large memory. In mobile
> devices, we need that large memory for camera buffers or grahpics
> rendendering buffers. So that large memory should be allowed but the invalid
> huge size over ram should be avoided.
>
> I agree on that mm should reclaim even for the large size. But that reclaim
> process may affect system performance or user convenience. In that perspective
> I thought ram / 2 was reasonable, but yes not a golden value. I hope to use
> just ram size as sanity check.
>
> Additionally if all agree, we may be able to apply __GFP_RETRY_MAYFAIL too.
>
> BR
Jaewon Kim April 7, 2023, 1:03 p.m. UTC | #13
>On Thu, Apr 6, 2023 at 7:24?PM Jaewon Kim <jaewon31.kim@samsung.com> wrote:
>>
>> >On Thu, Apr 6, 2023 at 4:38?PM Andrew Morton <akpm@linux-foundation.org> wrote:
>> >>
>> >> On Thu, 6 Apr 2023 16:27:28 -0700 "T.J. Mercier" <tjmercier@google.com> wrote:
>> >>
>> >> > > When you say "decide what's the largest reasonable size", I think it
>> >> > > is difficult as with the variety of RAM sizes and buffer sizes I don't
>> >> > > think there's a fixed limit. Systems with more ram will use larger
>> >> > > buffers for image/video capture buffers.  And yes, you're right that
>> >> > > ram/2-1 in a single allocation is just as broken, but I'm not sure how
>> >> > > to establish a better guard rail.
>> >> > >
>> >> > > thanks
>> >> > > -john
>> >> >
>> >> > I like ENOMEM with the len / PAGE_SIZE > totalram_pages() check and
>> >> > WARN_ON. We know for sure that's an invalid request, and it's pretty
>> >> > cheap to check as opposed to trying a bunch of reclaim before failing.
>> >>
>> >> Well, if some buggy caller has gone and requested eleventy bigabytes of
>> >:)
>> >> memory, doing a lot of reclaiming before failing isn't really a problem
>> >> - we don't want to optimize for this case!
>> >>
>> >The issue I see is that it could delay other non-buggy callers, or
>> >cause reclaim that wouldn't have happened if we just outright rejected
>> >a known-bad allocation request from the beginning.
>> >
>> >> > For buffers smaller than that I agree with John in that I'm not sure
>> >> > there's a definitive threshold.
>> >>
>> >> Well...  why do we want to do _anything_ here?  Why cater for buggy
>> >> callers?  I think it's because "dma-buf behaves really badly with very
>> >> large allocation requests".  Again, can we fix that instead?
>> >>
>> >There are a variety of different allocation strategies used by
>> >different exporters so I don't think there's one dma-buf thing we
>> >could fix for slow, large allocations in general. For the system_heap
>> >in this patch it's really just alloc_pages. I'm saying I don't think
>> >the kernel should ever ask alloc_pages for more memory than exists on
>> >the system, which seems like a pretty reasonable sanity check to me.
>> >Given that, I don't think we should do anything for buffers smaller
>> >than totalram_pages() (except maybe to prevent OOM panics via
>> >__GFP_RETRY_MAYFAIL when we attempt to exhaust system memory on any
>> >request - valid or otherwise).
>>
>> I think T. J. also agree with me on what I shared.
>>   if (len / PAGE_SIZE > totalram_pages()) return ERR_PTR(-ENOMEM);
>>   #define LOW_ORDER_GFP (GFP_HIGHUSER | __GFP_ZERO | __GFP_COMP | __GFP_RETRY_MAYFAIL)
>>
>Oh yeah, sorry if that wasn't clear. I was referring to your updated
>check for just totalram_pages() above, not totalram_pages() / 2.
>

Yes I thought you meant that. Thank you.
If there is no objection, I will resend with that, totalram_pages (not / 2) and
__GFP_RETRY_MAYFAIL.

>> Regarding the dma-buf behavior, I also would like to say that the dma-buf
>> system heap seems to be designed to allocate that large memory. In mobile
>> devices, we need that large memory for camera buffers or grahpics
>> rendendering buffers. So that large memory should be allowed but the invalid
>> huge size over ram should be avoided.
>>
>> I agree on that mm should reclaim even for the large size. But that reclaim
>> process may affect system performance or user convenience. In that perspective
>> I thought ram / 2 was reasonable, but yes not a golden value. I hope to use
>> just ram size as sanity check.
>>
>> Additionally if all agree, we may be able to apply __GFP_RETRY_MAYFAIL too.
>>
>> BR
diff mbox series

Patch

diff --git a/drivers/dma-buf/heaps/system_heap.c b/drivers/dma-buf/heaps/system_heap.c
index e8bd10e60998..4c1ef2ecfb0f 100644
--- a/drivers/dma-buf/heaps/system_heap.c
+++ b/drivers/dma-buf/heaps/system_heap.c
@@ -351,6 +351,9 @@  static struct dma_buf *system_heap_allocate(struct dma_heap *heap,
 	struct page *page, *tmp_page;
 	int i, ret = -ENOMEM;
 
+	if (len / PAGE_SIZE > totalram_pages() / 2)
+		return ERR_PTR(-ENOMEM);
+
 	buffer = kzalloc(sizeof(*buffer), GFP_KERNEL);
 	if (!buffer)
 		return ERR_PTR(-ENOMEM);