diff mbox

[4/4] arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX

Message ID 1430686172-18222-5-git-send-email-rric@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Robert Richter May 3, 2015, 8:49 p.m. UTC
From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>

In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
which is bigger than the allowed max order. So we are forcing it only in
case of 4KB page size.

Signed-off-by: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
Signed-off-by: Robert Richter <rrichter@cavium.com>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

Comments

Will Deacon May 5, 2015, 10:53 a.m. UTC | #1
On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
> 
> In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> which is bigger than the allowed max order. So we are forcing it only in
> case of 4KB page size.

Does this problem disappear if the ITS driver uses dma_alloc_coherent
instead? That would also allow us to remove the __flush_dcache_area abuse
from the driver.

Will
Robert Richter May 11, 2015, 9:14 a.m. UTC | #2
On 05.05.15 11:53:29, Will Deacon wrote:
> On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
> > 
> > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > which is bigger than the allowed max order. So we are forcing it only in
> > case of 4KB page size.
> 
> Does this problem disappear if the ITS driver uses dma_alloc_coherent
> instead? That would also allow us to remove the __flush_dcache_area abuse
> from the driver.

__get_free_pages() is also used internally in dma_alloc_coherent().

There is another case if the device brings dma mem with it. I am not
sure if it would be possible to assign some phys memory via devicetree
to the interrupt controller and then assign that range for its table
allocation.

Another option would be to allocate a hugepage. This would require
setting up hugepages during boottime. I need to figure out whether
that could work.

What about on the remaining 3 patches?

Thanks,

-Robert
Will Deacon May 12, 2015, 12:30 p.m. UTC | #3
On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote:
> On 05.05.15 11:53:29, Will Deacon wrote:
> > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
> > > 
> > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > > which is bigger than the allowed max order. So we are forcing it only in
> > > case of 4KB page size.
> > 
> > Does this problem disappear if the ITS driver uses dma_alloc_coherent
> > instead? That would also allow us to remove the __flush_dcache_area abuse
> > from the driver.
> 
> __get_free_pages() is also used internally in dma_alloc_coherent().
> 
> There is another case if the device brings dma mem with it. I am not
> sure if it would be possible to assign some phys memory via devicetree
> to the interrupt controller and then assign that range for its table
> allocation.
> 
> Another option would be to allocate a hugepage. This would require
> setting up hugepages during boottime. I need to figure out whether
> that could work.
> 
> What about on the remaining 3 patches?

Marc would be the best guy to review those, but he's on holiday for a couple
of weeks at the moment.

Will
Robert Richter May 12, 2015, 4:20 p.m. UTC | #4
Will,

On 12.05.15 13:30:57, Will Deacon wrote:
> On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote:
> > On 05.05.15 11:53:29, Will Deacon wrote:
> > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
> > > > 
> > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > > > which is bigger than the allowed max order. So we are forcing it only in
> > > > case of 4KB page size.
> > > 
> > > Does this problem disappear if the ITS driver uses dma_alloc_coherent
> > > instead? That would also allow us to remove the __flush_dcache_area abuse
> > > from the driver.
> > 
> > __get_free_pages() is also used internally in dma_alloc_coherent().
> > 
> > There is another case if the device brings dma mem with it. I am not
> > sure if it would be possible to assign some phys memory via devicetree
> > to the interrupt controller and then assign that range for its table
> > allocation.
> > 
> > Another option would be to allocate a hugepage. This would require
> > setting up hugepages during boottime. I need to figure out whether
> > that could work.

For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
default pagesize) I see this different approaches:

 * set FORCE_MAX_ZONEORDER to 13 as default,

 * set FORCE_MAX_ZONEORDER to 13 if ARM_GIC_V3 is set,

 * set FORCE_MAX_ZONEORDER to 13 if ARCH_THUNDER is set (this patch),

 * use hugepages if enabled (defconfig has the following options
   enable: CGROUP_HUGETLB, TRANSPARENT_HUGEPAGE, HUGETLBFS, this might
   work with current default kernel without changing defconfig
   options),

 * use devicetree to reserve mem for gicv3 (need to check ACPI).

Do you see any direction?

> > What about on the remaining 3 patches?
> 
> Marc would be the best guy to review those, but he's on holiday for a couple
> of weeks at the moment.

Thanks for the note and your comments.

-Robert
Will Deacon May 12, 2015, 5:24 p.m. UTC | #5
On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> On 12.05.15 13:30:57, Will Deacon wrote:
> > On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote:
> > > On 05.05.15 11:53:29, Will Deacon wrote:
> > > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > > > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
> > > > > 
> > > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > > > > which is bigger than the allowed max order. So we are forcing it only in
> > > > > case of 4KB page size.
> > > > 
> > > > Does this problem disappear if the ITS driver uses dma_alloc_coherent
> > > > instead? That would also allow us to remove the __flush_dcache_area abuse
> > > > from the driver.
> > > 
> > > __get_free_pages() is also used internally in dma_alloc_coherent().
> > > 
> > > There is another case if the device brings dma mem with it. I am not
> > > sure if it would be possible to assign some phys memory via devicetree
> > > to the interrupt controller and then assign that range for its table
> > > allocation.
> > > 
> > > Another option would be to allocate a hugepage. This would require
> > > setting up hugepages during boottime. I need to figure out whether
> > > that could work.
> 
> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> default pagesize) I see this different approaches:

16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
a sparse DeviceID space or both?

>  * set FORCE_MAX_ZONEORDER to 13 as default,
> 
>  * set FORCE_MAX_ZONEORDER to 13 if ARM_GIC_V3 is set,
> 
>  * set FORCE_MAX_ZONEORDER to 13 if ARCH_THUNDER is set (this patch),

I'm not hugely fond of these suggestions, as there's still no guarantee
that such a huge allocation is going to succeed and we end up bumping
MAX_ORDER for all platforms in defconfig if we enable THUNDER there.

>  * use hugepages if enabled (defconfig has the following options
>    enable: CGROUP_HUGETLB, TRANSPARENT_HUGEPAGE, HUGETLBFS, this might
>    work with current default kernel without changing defconfig
>    options),

I don't think hugepages help with DMA.

>  * use devicetree to reserve mem for gicv3 (need to check ACPI).

Using a carveout like this might be the best bet. I assume the memory used
by the ITS can never be reclaimed by the syste (and therefore there's no
issue with wastage)?

> Do you see any direction?

Dunno, does CMA also require the MAX_ORDER bump?

Will
Robert Richter May 12, 2015, 5:46 p.m. UTC | #6
On 12.05.15 18:24:16, Will Deacon wrote:
> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> > default pagesize) I see this different approaches:
> 
> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> a sparse DeviceID space or both?
> 
> >  * set FORCE_MAX_ZONEORDER to 13 as default,
> > 
> >  * set FORCE_MAX_ZONEORDER to 13 if ARM_GIC_V3 is set,
> > 
> >  * set FORCE_MAX_ZONEORDER to 13 if ARCH_THUNDER is set (this patch),
> 
> I'm not hugely fond of these suggestions, as there's still no guarantee
> that such a huge allocation is going to succeed and we end up bumping
> MAX_ORDER for all platforms in defconfig if we enable THUNDER there.

I actually was expecting this...

> >  * use hugepages if enabled (defconfig has the following options
> >    enable: CGROUP_HUGETLB, TRANSPARENT_HUGEPAGE, HUGETLBFS, this might
> >    work with current default kernel without changing defconfig
> >    options),
> 
> I don't think hugepages help with DMA.
> 
> >  * use devicetree to reserve mem for gicv3 (need to check ACPI).

I am quite a bit concerned letting firmware handle this. But if that
would solve it, fine.

> Using a carveout like this might be the best bet. I assume the memory used
> by the ITS can never be reclaimed by the syste (and therefore there's no
> issue with wastage)?
> 
> > Do you see any direction?
> 
> Dunno, does CMA also require the MAX_ORDER bump?

Looks promising at the first glance. Will look into it.

Thanks,

-Robert
Marc Zyngier May 20, 2015, 12:22 p.m. UTC | #7
On Tue, 12 May 2015 18:24:16 +0100
Will Deacon <will.deacon@arm.com> wrote:

> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> > On 12.05.15 13:30:57, Will Deacon wrote:
> > > On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote:
> > > > On 05.05.15 11:53:29, Will Deacon wrote:
> > > > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote:
> > > > > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com>
> > > > > > 
> > > > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
> > > > > > which is bigger than the allowed max order. So we are forcing it only in
> > > > > > case of 4KB page size.
> > > > > 
> > > > > Does this problem disappear if the ITS driver uses dma_alloc_coherent
> > > > > instead? That would also allow us to remove the __flush_dcache_area abuse
> > > > > from the driver.
> > > > 
> > > > __get_free_pages() is also used internally in dma_alloc_coherent().
> > > > 
> > > > There is another case if the device brings dma mem with it. I am not
> > > > sure if it would be possible to assign some phys memory via devicetree
> > > > to the interrupt controller and then assign that range for its table
> > > > allocation.
> > > > 
> > > > Another option would be to allocate a hugepage. This would require
> > > > setting up hugepages during boottime. I need to figure out whether
> > > > that could work.
> > 
> > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> > default pagesize) I see this different approaches:
> 
> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> a sparse DeviceID space or both?

That's probably due to the sparseness of the DeviceID space. With some
form of bridge number encoded on top of the BFD number, the device
table is enormous, and I don't see a nice way to avoid it...

	M.
Robert Richter May 20, 2015, 12:31 p.m. UTC | #8
On 20.05.15 13:22:13, Marc Zyngier wrote:
> On Tue, 12 May 2015 18:24:16 +0100
> Will Deacon <will.deacon@arm.com> wrote:
> > On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> > > On 12.05.15 13:30:57, Will Deacon wrote:

> > > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> > > default pagesize) I see this different approaches:
> > 
> > 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> > a sparse DeviceID space or both?
> 
> That's probably due to the sparseness of the DeviceID space. With some
> form of bridge number encoded on top of the BFD number, the device
> table is enormous, and I don't see a nice way to avoid it...

Right. At the momement out of 21 bits (16MB) we currently have 2 spare
bits, which reduces the actually size used to 4MB. Though, for the
current cpu model we can reduce it at least to 8MB total.

I will come up with an additional patch setting this to 8MB.

As said before, I also write on a patch to use CMA.

Thanks,

-Robert
Catalin Marinas May 20, 2015, 4:48 p.m. UTC | #9
On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote:
> On 20.05.15 13:22:13, Marc Zyngier wrote:
> > On Tue, 12 May 2015 18:24:16 +0100
> > Will Deacon <will.deacon@arm.com> wrote:
> > > On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> > > > On 12.05.15 13:30:57, Will Deacon wrote:
> 
> > > > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> > > > default pagesize) I see this different approaches:
> > > 
> > > 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> > > a sparse DeviceID space or both?
> > 
> > That's probably due to the sparseness of the DeviceID space. With some
> > form of bridge number encoded on top of the BFD number, the device
> > table is enormous, and I don't see a nice way to avoid it...
> 
> Right. At the momement out of 21 bits (16MB) we currently have 2 spare
> bits, which reduces the actually size used to 4MB. Though, for the
> current cpu model we can reduce it at least to 8MB total.
> 
> I will come up with an additional patch setting this to 8MB.
> 
> As said before, I also write on a patch to use CMA.

Can we not reserve a chunk of memory and pass the information to the
kernel via DT (/memreserve/ and a new GIC-specific binding)?
Marc Zyngier May 21, 2015, 8:35 a.m. UTC | #10
On 20/05/15 17:48, Catalin Marinas wrote:
> On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote:
>> On 20.05.15 13:22:13, Marc Zyngier wrote:
>>> On Tue, 12 May 2015 18:24:16 +0100
>>> Will Deacon <will.deacon@arm.com> wrote:
>>>> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
>>>>> On 12.05.15 13:30:57, Will Deacon wrote:
>>
>>>>> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
>>>>> default pagesize) I see this different approaches:
>>>>
>>>> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
>>>> a sparse DeviceID space or both?
>>>
>>> That's probably due to the sparseness of the DeviceID space. With some
>>> form of bridge number encoded on top of the BFD number, the device
>>> table is enormous, and I don't see a nice way to avoid it...
>>
>> Right. At the momement out of 21 bits (16MB) we currently have 2 spare
>> bits, which reduces the actually size used to 4MB. Though, for the
>> current cpu model we can reduce it at least to 8MB total.
>>
>> I will come up with an additional patch setting this to 8MB.
>>
>> As said before, I also write on a patch to use CMA.
> 
> Can we not reserve a chunk of memory and pass the information to the
> kernel via DT (/memreserve/ and a new GIC-specific binding)?

That would have to be done on a per-table basis then. And how would that
work with ACPI? I don't think the ACPI ITS table specifies anything in
that respect.

We're just facing the horrible reality that linear tables are not very
well suited to sparse addressing. Nobody copied the VAX MMU model for a
reason... until now.

	M.
Robert Richter May 21, 2015, 12:13 p.m. UTC | #11
On 21.05.15 09:35:47, Marc Zyngier wrote:
> On 20/05/15 17:48, Catalin Marinas wrote:
> > On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote:
> >> On 20.05.15 13:22:13, Marc Zyngier wrote:
> >>> On Tue, 12 May 2015 18:24:16 +0100
> >>> Will Deacon <will.deacon@arm.com> wrote:
> >>>> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote:
> >>>>> On 12.05.15 13:30:57, Will Deacon wrote:
> >>
> >>>>> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB
> >>>>> default pagesize) I see this different approaches:
> >>>>
> >>>> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or
> >>>> a sparse DeviceID space or both?
> >>>
> >>> That's probably due to the sparseness of the DeviceID space. With some
> >>> form of bridge number encoded on top of the BFD number, the device
> >>> table is enormous, and I don't see a nice way to avoid it...
> >>
> >> Right. At the momement out of 21 bits (16MB) we currently have 2 spare
> >> bits, which reduces the actually size used to 4MB. Though, for the
> >> current cpu model we can reduce it at least to 8MB total.
> >>
> >> I will come up with an additional patch setting this to 8MB.
> >>
> >> As said before, I also write on a patch to use CMA.
> > 
> > Can we not reserve a chunk of memory and pass the information to the
> > kernel via DT (/memreserve/ and a new GIC-specific binding)?
> 
> That would have to be done on a per-table basis then. And how would that
> work with ACPI? I don't think the ACPI ITS table specifies anything in
> that respect.
> 
> We're just facing the horrible reality that linear tables are not very
> well suited to sparse addressing. Nobody copied the VAX MMU model for a
> reason... until now.

We could still fall back to mem alloc if the DT or ACPI does not
provide a base address for the table.

For know I would prefer to just implement mem allocation with CMA.

-Robert
diff mbox

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1b8e97331ffb..c519f3777453 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -542,6 +542,7 @@  config XEN
 config FORCE_MAX_ZONEORDER
 	int
 	default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE)
+	default "13" if (ARCH_THUNDER && !ARM64_64K_PAGES)
 	default "11"
 
 menuconfig ARMV8_DEPRECATED