mbox series

[0/3] Add global CMA reserve area

Message ID 20240613150902.2173582-1-devarsht@ti.com (mailing list archive)
Headers show
Series Add global CMA reserve area | expand

Message

Devarsh Thakkar June 13, 2024, 3:08 p.m. UTC
Add global CMA reserve area for AM62x, AM62A and AM62P SoCs.
These SoCs do not have MMU and hence require contiguous memory pool to
support various multimedia use-cases.

Brandon Brnich (1):
  arm64: dts: ti: k3-am62p5-sk: Reserve 576 MiB of global CMA

Devarsh Thakkar (2):
  arm64: dts: ti: k3-am62x-sk-common: Reserve 128MiB of global CMA
  arm64: dts: ti: k3-am62a7-sk: Reserve 576MiB of global CMA

 arch/arm64/boot/dts/ti/k3-am62a7-sk.dts        | 9 +++++++++
 arch/arm64/boot/dts/ti/k3-am62p5-sk.dts        | 7 +++++++
 arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi | 8 ++++++++
 3 files changed, 24 insertions(+)

Comments

Randolph Sapp June 14, 2024, 5:58 p.m. UTC | #1
On Thu Jun 13, 2024 at 10:08 AM CDT, Devarsh Thakkar wrote:
> Add global CMA reserve area for AM62x, AM62A and AM62P SoCs.
> These SoCs do not have MMU and hence require contiguous memory pool to
> support various multimedia use-cases.
>
> Brandon Brnich (1):
>   arm64: dts: ti: k3-am62p5-sk: Reserve 576 MiB of global CMA
>
> Devarsh Thakkar (2):
>   arm64: dts: ti: k3-am62x-sk-common: Reserve 128MiB of global CMA
>   arm64: dts: ti: k3-am62a7-sk: Reserve 576MiB of global CMA
>
>  arch/arm64/boot/dts/ti/k3-am62a7-sk.dts        | 9 +++++++++
>  arch/arm64/boot/dts/ti/k3-am62p5-sk.dts        | 7 +++++++
>  arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi | 8 ++++++++
>  3 files changed, 24 insertions(+)

I'm still a little torn about putting this allocation into the device tree
directly as the actual required allocation size depends on the task.

If it's allowed though, this series is fine for introducing those changes. This
uses the long-tested values we've been using on our tree for a bit now. The only
thing that's a little worrying is the missing range definitions for devices with
more than 32bits of addressable memory as Brandon has pointed out. Once that's
addressed:

Reviewed-by: Randolph Sapp <rs@ti.com>

Specifying these regions using the kernel cmdline parameter via u-boot was
brought up as a potential workaround. This is fine until you get into distro
boot methods which will almost certainly attempt to override those. I don't
know. Still a little odd. Curious how the community feels about it.

Technically the user or distro can still override it with the cmdline parameter
if necessary, so this may be the best way to have a useful default.
Andrew Davis June 24, 2024, 4:33 p.m. UTC | #2
On 6/14/24 12:58 PM, Randolph Sapp wrote:
> On Thu Jun 13, 2024 at 10:08 AM CDT, Devarsh Thakkar wrote:
>> Add global CMA reserve area for AM62x, AM62A and AM62P SoCs.
>> These SoCs do not have MMU and hence require contiguous memory pool to
>> support various multimedia use-cases.
>>
>> Brandon Brnich (1):
>>    arm64: dts: ti: k3-am62p5-sk: Reserve 576 MiB of global CMA
>>
>> Devarsh Thakkar (2):
>>    arm64: dts: ti: k3-am62x-sk-common: Reserve 128MiB of global CMA
>>    arm64: dts: ti: k3-am62a7-sk: Reserve 576MiB of global CMA
>>
>>   arch/arm64/boot/dts/ti/k3-am62a7-sk.dts        | 9 +++++++++
>>   arch/arm64/boot/dts/ti/k3-am62p5-sk.dts        | 7 +++++++
>>   arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi | 8 ++++++++
>>   3 files changed, 24 insertions(+)
> 
> I'm still a little torn about putting this allocation into the device tree
> directly as the actual required allocation size depends on the task.
> 

That is the exact reason this does not belong in DT. For everyone *not*
using the most extreme case (12x decodes at the same time), this is
all wasted space. If one is running out of CMA, they can add more on
the kernel cmdline.

> If it's allowed though, this series is fine for introducing those changes. This
> uses the long-tested values we've been using on our tree for a bit now. The only
> thing that's a little worrying is the missing range definitions for devices with
> more than 32bits of addressable memory as Brandon has pointed out. Once that's
> addressed:
> 
> Reviewed-by: Randolph Sapp <rs@ti.com>
> 
> Specifying these regions using the kernel cmdline parameter via u-boot was
> brought up as a potential workaround. This is fine until you get into distro
> boot methods which will almost certainly attempt to override those. I don't
> know. Still a little odd. Curious how the community feels about it.
> 
> Technically the user or distro can still override it with the cmdline parameter
> if necessary, so this may be the best way to have a useful default.
> 

The most useful default is one that doesn't eat 576 MiB of memory "just in case".
Needing that much CMA is the exception case and should be the one that requires
adding something to the kernel cmdline.

If the kernel cmdline option does not work in some cases then we should
fix that instead of hard-coding a workaround here in DT. We are robbing
ourselves of a better solution here.

Andrew
Devarsh Thakkar June 28, 2024, 3:57 p.m. UTC | #3
Hi Andrew, Vignesh,

On 24/06/24 22:03, Andrew Davis wrote:
> On 6/14/24 12:58 PM, Randolph Sapp wrote:
>> On Thu Jun 13, 2024 at 10:08 AM CDT, Devarsh Thakkar wrote:
>>> Add global CMA reserve area for AM62x, AM62A and AM62P SoCs.
>>> These SoCs do not have MMU and hence require contiguous memory pool to
>>> support various multimedia use-cases.
>>>
>>> Brandon Brnich (1):
>>>    arm64: dts: ti: k3-am62p5-sk: Reserve 576 MiB of global CMA
>>>
>>> Devarsh Thakkar (2):
>>>    arm64: dts: ti: k3-am62x-sk-common: Reserve 128MiB of global CMA
>>>    arm64: dts: ti: k3-am62a7-sk: Reserve 576MiB of global CMA
>>>
>>>   arch/arm64/boot/dts/ti/k3-am62a7-sk.dts        | 9 +++++++++
>>>   arch/arm64/boot/dts/ti/k3-am62p5-sk.dts        | 7 +++++++
>>>   arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi | 8 ++++++++
>>>   3 files changed, 24 insertions(+)
>>
>> I'm still a little torn about putting this allocation into the device tree
>> directly as the actual required allocation size depends on the task.
>>
> 
> That is the exact reason this does not belong in DT. For everyone *not*
> using the most extreme case (12x decodes at the same time), this is
> all wasted space. If one is running out of CMA, they can add more on
> the kernel cmdline.
> 

I disagree with this. The 12x decode for 480p is not an extreme use-case this
is something VPU is capable to run at optimum frame-rate (12x 1080p it can't)
and as the AM62A7 is meant to be AI + multimedia centric device, per the
device definition we were given the requirements to support a list of
multimedia use-cases which should work out of box and 12x decode for 480p was
one of them as device is very much capable of doing that with optimum
performance and I don't think it is right to change these requirements on the fly.

The AM62A7 board has 4 GiB of DDR and we have been using this CMA value since
more than a year, I have never heard anyone complain about out of memory or
CMA starvation and it suffices to requirements of *most use-cases*, but if for
some specific use-case it doesn't suffice, user can change it via kernel cmdline.

The kernelcmdline suggestion doesn't suffice out of box experience required,
we don't want to ask the user to reboot the board everytime they run out of CMA.


>> If it's allowed though, this series is fine for introducing those changes. This
>> uses the long-tested values we've been using on our tree for a bit now. The
>> only
>> thing that's a little worrying is the missing range definitions for devices
>> with
>> more than 32bits of addressable memory as Brandon has pointed out. Once that's
>> addressed:
>>
>> Reviewed-by: Randolph Sapp <rs@ti.com>
>>
>> Specifying these regions using the kernel cmdline parameter via u-boot was
>> brought up as a potential workaround. This is fine until you get into distro
>> boot methods which will almost certainly attempt to override those. I don't
>> know. Still a little odd. Curious how the community feels about it.
>>
>> Technically the user or distro can still override it with the cmdline parameter
>> if necessary, so this may be the best way to have a useful default.
>>
> 

Unlike above, this solution is independent of distro as it should be as we
want that all the supported multimedia use-cases should work out of box. This
solution is nothing illegal as CMA region carveouts are not a kernel
deprecated feature.

> The most useful default is one that doesn't eat 576 MiB of memory "just in case".
> Needing that much CMA is the exception case and should be the one that requires
> adding something to the kernel cmdline.
> 

I disagree with this, I would have agreed if this point was made in context of
generic device, but we are forgetting here that AM62A7 is a AI+multimedia
centric device and customers expect multimedia use-cases to work out of box.

We have 4 GiB RAM and if carving out 576 MiB is helping achieve all major
multimedia use-cases then what's the problem ? What exactly is failing for you
? If some specific scenarios are getting hurt then in that case overlays or
kernel cmdline option can be used to override the cma.

I feel we are over-complicating things here and going back-and-forth each
cycle even though there are no competing alternatives present today.
And this blocks out of box experience, as today even the basic HDMI and GPU
use-cases don't work out of box.

I had also discussed around this with community on last OSS summit as
discussed here [1] were it was suggested too to use this solution as adopted
by other vendors.

[1]: https://lore.kernel.org/all/0eee0424-f177-808f-3a86-499443155ddb@ti.com/

Regards
Devarsh
Randolph Sapp June 28, 2024, 4:35 p.m. UTC | #4
On Fri Jun 28, 2024 at 10:57 AM CDT, Devarsh Thakkar wrote:
> Hi Andrew, Vignesh,
>
> On 24/06/24 22:03, Andrew Davis wrote:
> > On 6/14/24 12:58 PM, Randolph Sapp wrote:
> >> On Thu Jun 13, 2024 at 10:08 AM CDT, Devarsh Thakkar wrote:
> >>> Add global CMA reserve area for AM62x, AM62A and AM62P SoCs.
> >>> These SoCs do not have MMU and hence require contiguous memory pool to
> >>> support various multimedia use-cases.
> >>>
> >>> Brandon Brnich (1):
> >>>    arm64: dts: ti: k3-am62p5-sk: Reserve 576 MiB of global CMA
> >>>
> >>> Devarsh Thakkar (2):
> >>>    arm64: dts: ti: k3-am62x-sk-common: Reserve 128MiB of global CMA
> >>>    arm64: dts: ti: k3-am62a7-sk: Reserve 576MiB of global CMA
> >>>
> >>>   arch/arm64/boot/dts/ti/k3-am62a7-sk.dts        | 9 +++++++++
> >>>   arch/arm64/boot/dts/ti/k3-am62p5-sk.dts        | 7 +++++++
> >>>   arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi | 8 ++++++++
> >>>   3 files changed, 24 insertions(+)
> >>
> >> I'm still a little torn about putting this allocation into the device tree
> >> directly as the actual required allocation size depends on the task.
> >>
> > 
> > That is the exact reason this does not belong in DT. For everyone *not*
> > using the most extreme case (12x decodes at the same time), this is
> > all wasted space. If one is running out of CMA, they can add more on
> > the kernel cmdline.
> > 
>
> I disagree with this. The 12x decode for 480p is not an extreme use-case this
> is something VPU is capable to run at optimum frame-rate (12x 1080p it can't)
> and as the AM62A7 is meant to be AI + multimedia centric device, per the
> device definition we were given the requirements to support a list of
> multimedia use-cases which should work out of box and 12x decode for 480p was
> one of them as device is very much capable of doing that with optimum
> performance and I don't think it is right to change these requirements on the fly.
>
> The AM62A7 board has 4 GiB of DDR and we have been using this CMA value since
> more than a year, I have never heard anyone complain about out of memory or
> CMA starvation and it suffices to requirements of *most use-cases*, but if for
> some specific use-case it doesn't suffice, user can change it via kernel cmdline.
>
> The kernelcmdline suggestion doesn't suffice out of box experience required,
> we don't want to ask the user to reboot the board everytime they run out of CMA.
>
>
> >> If it's allowed though, this series is fine for introducing those changes. This
> >> uses the long-tested values we've been using on our tree for a bit now. The
> >> only
> >> thing that's a little worrying is the missing range definitions for devices
> >> with
> >> more than 32bits of addressable memory as Brandon has pointed out. Once that's
> >> addressed:
> >>
> >> Reviewed-by: Randolph Sapp <rs@ti.com>
> >>
> >> Specifying these regions using the kernel cmdline parameter via u-boot was
> >> brought up as a potential workaround. This is fine until you get into distro
> >> boot methods which will almost certainly attempt to override those. I don't
> >> know. Still a little odd. Curious how the community feels about it.
> >>
> >> Technically the user or distro can still override it with the cmdline parameter
> >> if necessary, so this may be the best way to have a useful default.
> >>
> > 
>
> Unlike above, this solution is independent of distro as it should be as we
> want that all the supported multimedia use-cases should work out of box. This
> solution is nothing illegal as CMA region carveouts are not a kernel
> deprecated feature.

Right. I support this change for at least introducing a usable default. 32M of
CMA is barely enough to run glmark2 under Weston once everything's up and
running.

As I said before, the user or distro can still override the dt CMA block with
the cma kernel parameter if they aren't happy with the default block.
Unfortunately this is about the only way to have a usable default value to fall
back on.

> > The most useful default is one that doesn't eat 576 MiB of memory "just in case".
> > Needing that much CMA is the exception case and should be the one that requires
> > adding something to the kernel cmdline.
> > 
>
> I disagree with this, I would have agreed if this point was made in context of
> generic device, but we are forgetting here that AM62A7 is a AI+multimedia
> centric device and customers expect multimedia use-cases to work out of box.
>
> We have 4 GiB RAM and if carving out 576 MiB is helping achieve all major
> multimedia use-cases then what's the problem ? What exactly is failing for you
> ? If some specific scenarios are getting hurt then in that case overlays or
> kernel cmdline option can be used to override the cma.
>
> I feel we are over-complicating things here and going back-and-forth each
> cycle even though there are no competing alternatives present today.
> And this blocks out of box experience, as today even the basic HDMI and GPU
> use-cases don't work out of box.
>
> I had also discussed around this with community on last OSS summit as
> discussed here [1] were it was suggested too to use this solution as adopted
> by other vendors.
>
> [1]: https://lore.kernel.org/all/0eee0424-f177-808f-3a86-499443155ddb@ti.com/
>
> Regards
> Devarsh

If the community accepts it, it's fine by me.

- Randolph
Vignesh Raghavendra June 30, 2024, 7:09 a.m. UTC | #5
On 28/06/24 22:05, Randolph Sapp wrote:
> On Fri Jun 28, 2024 at 10:57 AM CDT, Devarsh Thakkar wrote:
>> Hi Andrew, Vignesh,
>>
>> On 24/06/24 22:03, Andrew Davis wrote:
>>> On 6/14/24 12:58 PM, Randolph Sapp wrote:
>>>> On Thu Jun 13, 2024 at 10:08 AM CDT, Devarsh Thakkar wrote:
>>>>> Add global CMA reserve area for AM62x, AM62A and AM62P SoCs.
>>>>> These SoCs do not have MMU and hence require contiguous memory pool to
>>>>> support various multimedia use-cases.
>>>>>
>>>>> Brandon Brnich (1):
>>>>>    arm64: dts: ti: k3-am62p5-sk: Reserve 576 MiB of global CMA
>>>>>
>>>>> Devarsh Thakkar (2):
>>>>>    arm64: dts: ti: k3-am62x-sk-common: Reserve 128MiB of global CMA
>>>>>    arm64: dts: ti: k3-am62a7-sk: Reserve 576MiB of global CMA
>>>>>
>>>>>   arch/arm64/boot/dts/ti/k3-am62a7-sk.dts        | 9 +++++++++
>>>>>   arch/arm64/boot/dts/ti/k3-am62p5-sk.dts        | 7 +++++++
>>>>>   arch/arm64/boot/dts/ti/k3-am62x-sk-common.dtsi | 8 ++++++++
>>>>>   3 files changed, 24 insertions(+)
>>>>
>>>> I'm still a little torn about putting this allocation into the device tree
>>>> directly as the actual required allocation size depends on the task.
>>>>
>>>
>>> That is the exact reason this does not belong in DT. For everyone *not*
>>> using the most extreme case (12x decodes at the same time), this is
>>> all wasted space. If one is running out of CMA, they can add more on
>>> the kernel cmdline.
>>>
>>
>> I disagree with this. The 12x decode for 480p is not an extreme use-case this
>> is something VPU is capable to run at optimum frame-rate (12x 1080p it can't)
>> and as the AM62A7 is meant to be AI + multimedia centric device, per the
>> device definition we were given the requirements to support a list of
>> multimedia use-cases which should work out of box and 12x decode for 480p was
>> one of them as device is very much capable of doing that with optimum
>> performance and I don't think it is right to change these requirements on the fly.
>>
>> The AM62A7 board has 4 GiB of DDR and we have been using this CMA value since
>> more than a year, I have never heard anyone complain about out of memory or
>> CMA starvation and it suffices to requirements of *most use-cases*, but if for
>> some specific use-case it doesn't suffice, user can change it via kernel cmdline.
>>
>> The kernelcmdline suggestion doesn't suffice out of box experience required,
>> we don't want to ask the user to reboot the board everytime they run out of CMA.
>>
>>
>>>> If it's allowed though, this series is fine for introducing those changes. This
>>>> uses the long-tested values we've been using on our tree for a bit now. The
>>>> only
>>>> thing that's a little worrying is the missing range definitions for devices
>>>> with
>>>> more than 32bits of addressable memory as Brandon has pointed out. Once that's
>>>> addressed:
>>>>
>>>> Reviewed-by: Randolph Sapp <rs@ti.com>
>>>>
>>>> Specifying these regions using the kernel cmdline parameter via u-boot was
>>>> brought up as a potential workaround. This is fine until you get into distro
>>>> boot methods which will almost certainly attempt to override those. I don't
>>>> know. Still a little odd. Curious how the community feels about it.
>>>>
>>>> Technically the user or distro can still override it with the cmdline parameter
>>>> if necessary, so this may be the best way to have a useful default.
>>>>
>>>
>>
>> Unlike above, this solution is independent of distro as it should be as we
>> want that all the supported multimedia use-cases should work out of box. This
>> solution is nothing illegal as CMA region carveouts are not a kernel
>> deprecated feature.
> 
> Right. I support this change for at least introducing a usable default. 32M of
> CMA is barely enough to run glmark2 under Weston once everything's up and
> running.
> 
> As I said before, the user or distro can still override the dt CMA block with
> the cma kernel parameter if they aren't happy with the default block.
> Unfortunately this is about the only way to have a usable default value to fall
> back on.
> 


Given the number of SoMs and non TI EVMs that are about to come out with
AM62A/P and AM67s, we need to provide a consistent way of being able to
support multimedia IPs out of the box. Modifying cmdline may not always
be feasible given distro defaults don't always provide a way to do so.

So I am inclined to queue first 2 patches unless there is another way t
achieve this.

[...]