Message ID | 1430686172-18222-5-git-send-email-rric@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote: > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com> > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table > which is bigger than the allowed max order. So we are forcing it only in > case of 4KB page size. Does this problem disappear if the ITS driver uses dma_alloc_coherent instead? That would also allow us to remove the __flush_dcache_area abuse from the driver. Will
On 05.05.15 11:53:29, Will Deacon wrote: > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote: > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com> > > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table > > which is bigger than the allowed max order. So we are forcing it only in > > case of 4KB page size. > > Does this problem disappear if the ITS driver uses dma_alloc_coherent > instead? That would also allow us to remove the __flush_dcache_area abuse > from the driver. __get_free_pages() is also used internally in dma_alloc_coherent(). There is another case if the device brings dma mem with it. I am not sure if it would be possible to assign some phys memory via devicetree to the interrupt controller and then assign that range for its table allocation. Another option would be to allocate a hugepage. This would require setting up hugepages during boottime. I need to figure out whether that could work. What about on the remaining 3 patches? Thanks, -Robert
On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote: > On 05.05.15 11:53:29, Will Deacon wrote: > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote: > > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com> > > > > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table > > > which is bigger than the allowed max order. So we are forcing it only in > > > case of 4KB page size. > > > > Does this problem disappear if the ITS driver uses dma_alloc_coherent > > instead? That would also allow us to remove the __flush_dcache_area abuse > > from the driver. > > __get_free_pages() is also used internally in dma_alloc_coherent(). > > There is another case if the device brings dma mem with it. I am not > sure if it would be possible to assign some phys memory via devicetree > to the interrupt controller and then assign that range for its table > allocation. > > Another option would be to allocate a hugepage. This would require > setting up hugepages during boottime. I need to figure out whether > that could work. > > What about on the remaining 3 patches? Marc would be the best guy to review those, but he's on holiday for a couple of weeks at the moment. Will
Will, On 12.05.15 13:30:57, Will Deacon wrote: > On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote: > > On 05.05.15 11:53:29, Will Deacon wrote: > > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote: > > > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com> > > > > > > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table > > > > which is bigger than the allowed max order. So we are forcing it only in > > > > case of 4KB page size. > > > > > > Does this problem disappear if the ITS driver uses dma_alloc_coherent > > > instead? That would also allow us to remove the __flush_dcache_area abuse > > > from the driver. > > > > __get_free_pages() is also used internally in dma_alloc_coherent(). > > > > There is another case if the device brings dma mem with it. I am not > > sure if it would be possible to assign some phys memory via devicetree > > to the interrupt controller and then assign that range for its table > > allocation. > > > > Another option would be to allocate a hugepage. This would require > > setting up hugepages during boottime. I need to figure out whether > > that could work. For allocation of 16MB cont. phys mem of a defconfig kernel (4KB default pagesize) I see this different approaches: * set FORCE_MAX_ZONEORDER to 13 as default, * set FORCE_MAX_ZONEORDER to 13 if ARM_GIC_V3 is set, * set FORCE_MAX_ZONEORDER to 13 if ARCH_THUNDER is set (this patch), * use hugepages if enabled (defconfig has the following options enable: CGROUP_HUGETLB, TRANSPARENT_HUGEPAGE, HUGETLBFS, this might work with current default kernel without changing defconfig options), * use devicetree to reserve mem for gicv3 (need to check ACPI). Do you see any direction? > > What about on the remaining 3 patches? > > Marc would be the best guy to review those, but he's on holiday for a couple > of weeks at the moment. Thanks for the note and your comments. -Robert
On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote: > On 12.05.15 13:30:57, Will Deacon wrote: > > On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote: > > > On 05.05.15 11:53:29, Will Deacon wrote: > > > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote: > > > > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com> > > > > > > > > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table > > > > > which is bigger than the allowed max order. So we are forcing it only in > > > > > case of 4KB page size. > > > > > > > > Does this problem disappear if the ITS driver uses dma_alloc_coherent > > > > instead? That would also allow us to remove the __flush_dcache_area abuse > > > > from the driver. > > > > > > __get_free_pages() is also used internally in dma_alloc_coherent(). > > > > > > There is another case if the device brings dma mem with it. I am not > > > sure if it would be possible to assign some phys memory via devicetree > > > to the interrupt controller and then assign that range for its table > > > allocation. > > > > > > Another option would be to allocate a hugepage. This would require > > > setting up hugepages during boottime. I need to figure out whether > > > that could work. > > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB > default pagesize) I see this different approaches: 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or a sparse DeviceID space or both? > * set FORCE_MAX_ZONEORDER to 13 as default, > > * set FORCE_MAX_ZONEORDER to 13 if ARM_GIC_V3 is set, > > * set FORCE_MAX_ZONEORDER to 13 if ARCH_THUNDER is set (this patch), I'm not hugely fond of these suggestions, as there's still no guarantee that such a huge allocation is going to succeed and we end up bumping MAX_ORDER for all platforms in defconfig if we enable THUNDER there. > * use hugepages if enabled (defconfig has the following options > enable: CGROUP_HUGETLB, TRANSPARENT_HUGEPAGE, HUGETLBFS, this might > work with current default kernel without changing defconfig > options), I don't think hugepages help with DMA. > * use devicetree to reserve mem for gicv3 (need to check ACPI). Using a carveout like this might be the best bet. I assume the memory used by the ITS can never be reclaimed by the syste (and therefore there's no issue with wastage)? > Do you see any direction? Dunno, does CMA also require the MAX_ORDER bump? Will
On 12.05.15 18:24:16, Will Deacon wrote: > On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote: > > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB > > default pagesize) I see this different approaches: > > 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or > a sparse DeviceID space or both? > > > * set FORCE_MAX_ZONEORDER to 13 as default, > > > > * set FORCE_MAX_ZONEORDER to 13 if ARM_GIC_V3 is set, > > > > * set FORCE_MAX_ZONEORDER to 13 if ARCH_THUNDER is set (this patch), > > I'm not hugely fond of these suggestions, as there's still no guarantee > that such a huge allocation is going to succeed and we end up bumping > MAX_ORDER for all platforms in defconfig if we enable THUNDER there. I actually was expecting this... > > * use hugepages if enabled (defconfig has the following options > > enable: CGROUP_HUGETLB, TRANSPARENT_HUGEPAGE, HUGETLBFS, this might > > work with current default kernel without changing defconfig > > options), > > I don't think hugepages help with DMA. > > > * use devicetree to reserve mem for gicv3 (need to check ACPI). I am quite a bit concerned letting firmware handle this. But if that would solve it, fine. > Using a carveout like this might be the best bet. I assume the memory used > by the ITS can never be reclaimed by the syste (and therefore there's no > issue with wastage)? > > > Do you see any direction? > > Dunno, does CMA also require the MAX_ORDER bump? Looks promising at the first glance. Will look into it. Thanks, -Robert
On Tue, 12 May 2015 18:24:16 +0100 Will Deacon <will.deacon@arm.com> wrote: > On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote: > > On 12.05.15 13:30:57, Will Deacon wrote: > > > On Mon, May 11, 2015 at 10:14:38AM +0100, Robert Richter wrote: > > > > On 05.05.15 11:53:29, Will Deacon wrote: > > > > > On Sun, May 03, 2015 at 09:49:32PM +0100, Robert Richter wrote: > > > > > > From: Radha Mohan Chintakuntla <rchintakuntla@cavium.com> > > > > > > > > > > > > In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table > > > > > > which is bigger than the allowed max order. So we are forcing it only in > > > > > > case of 4KB page size. > > > > > > > > > > Does this problem disappear if the ITS driver uses dma_alloc_coherent > > > > > instead? That would also allow us to remove the __flush_dcache_area abuse > > > > > from the driver. > > > > > > > > __get_free_pages() is also used internally in dma_alloc_coherent(). > > > > > > > > There is another case if the device brings dma mem with it. I am not > > > > sure if it would be possible to assign some phys memory via devicetree > > > > to the interrupt controller and then assign that range for its table > > > > allocation. > > > > > > > > Another option would be to allocate a hugepage. This would require > > > > setting up hugepages during boottime. I need to figure out whether > > > > that could work. > > > > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB > > default pagesize) I see this different approaches: > > 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or > a sparse DeviceID space or both? That's probably due to the sparseness of the DeviceID space. With some form of bridge number encoded on top of the BFD number, the device table is enormous, and I don't see a nice way to avoid it... M.
On 20.05.15 13:22:13, Marc Zyngier wrote: > On Tue, 12 May 2015 18:24:16 +0100 > Will Deacon <will.deacon@arm.com> wrote: > > On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote: > > > On 12.05.15 13:30:57, Will Deacon wrote: > > > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB > > > default pagesize) I see this different approaches: > > > > 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or > > a sparse DeviceID space or both? > > That's probably due to the sparseness of the DeviceID space. With some > form of bridge number encoded on top of the BFD number, the device > table is enormous, and I don't see a nice way to avoid it... Right. At the momement out of 21 bits (16MB) we currently have 2 spare bits, which reduces the actually size used to 4MB. Though, for the current cpu model we can reduce it at least to 8MB total. I will come up with an additional patch setting this to 8MB. As said before, I also write on a patch to use CMA. Thanks, -Robert
On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote: > On 20.05.15 13:22:13, Marc Zyngier wrote: > > On Tue, 12 May 2015 18:24:16 +0100 > > Will Deacon <will.deacon@arm.com> wrote: > > > On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote: > > > > On 12.05.15 13:30:57, Will Deacon wrote: > > > > > For allocation of 16MB cont. phys mem of a defconfig kernel (4KB > > > > default pagesize) I see this different approaches: > > > > > > 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or > > > a sparse DeviceID space or both? > > > > That's probably due to the sparseness of the DeviceID space. With some > > form of bridge number encoded on top of the BFD number, the device > > table is enormous, and I don't see a nice way to avoid it... > > Right. At the momement out of 21 bits (16MB) we currently have 2 spare > bits, which reduces the actually size used to 4MB. Though, for the > current cpu model we can reduce it at least to 8MB total. > > I will come up with an additional patch setting this to 8MB. > > As said before, I also write on a patch to use CMA. Can we not reserve a chunk of memory and pass the information to the kernel via DT (/memreserve/ and a new GIC-specific binding)?
On 20/05/15 17:48, Catalin Marinas wrote: > On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote: >> On 20.05.15 13:22:13, Marc Zyngier wrote: >>> On Tue, 12 May 2015 18:24:16 +0100 >>> Will Deacon <will.deacon@arm.com> wrote: >>>> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote: >>>>> On 12.05.15 13:30:57, Will Deacon wrote: >> >>>>> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB >>>>> default pagesize) I see this different approaches: >>>> >>>> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or >>>> a sparse DeviceID space or both? >>> >>> That's probably due to the sparseness of the DeviceID space. With some >>> form of bridge number encoded on top of the BFD number, the device >>> table is enormous, and I don't see a nice way to avoid it... >> >> Right. At the momement out of 21 bits (16MB) we currently have 2 spare >> bits, which reduces the actually size used to 4MB. Though, for the >> current cpu model we can reduce it at least to 8MB total. >> >> I will come up with an additional patch setting this to 8MB. >> >> As said before, I also write on a patch to use CMA. > > Can we not reserve a chunk of memory and pass the information to the > kernel via DT (/memreserve/ and a new GIC-specific binding)? That would have to be done on a per-table basis then. And how would that work with ACPI? I don't think the ACPI ITS table specifies anything in that respect. We're just facing the horrible reality that linear tables are not very well suited to sparse addressing. Nobody copied the VAX MMU model for a reason... until now. M.
On 21.05.15 09:35:47, Marc Zyngier wrote: > On 20/05/15 17:48, Catalin Marinas wrote: > > On Wed, May 20, 2015 at 02:31:59PM +0200, Robert Richter wrote: > >> On 20.05.15 13:22:13, Marc Zyngier wrote: > >>> On Tue, 12 May 2015 18:24:16 +0100 > >>> Will Deacon <will.deacon@arm.com> wrote: > >>>> On Tue, May 12, 2015 at 05:20:49PM +0100, Robert Richter wrote: > >>>>> On 12.05.15 13:30:57, Will Deacon wrote: > >> > >>>>> For allocation of 16MB cont. phys mem of a defconfig kernel (4KB > >>>>> default pagesize) I see this different approaches: > >>>> > >>>> 16MB sounds like an awful lot. Is this because you have tonnes of MSIs or > >>>> a sparse DeviceID space or both? > >>> > >>> That's probably due to the sparseness of the DeviceID space. With some > >>> form of bridge number encoded on top of the BFD number, the device > >>> table is enormous, and I don't see a nice way to avoid it... > >> > >> Right. At the momement out of 21 bits (16MB) we currently have 2 spare > >> bits, which reduces the actually size used to 4MB. Though, for the > >> current cpu model we can reduce it at least to 8MB total. > >> > >> I will come up with an additional patch setting this to 8MB. > >> > >> As said before, I also write on a patch to use CMA. > > > > Can we not reserve a chunk of memory and pass the information to the > > kernel via DT (/memreserve/ and a new GIC-specific binding)? > > That would have to be done on a per-table basis then. And how would that > work with ACPI? I don't think the ACPI ITS table specifies anything in > that respect. > > We're just facing the horrible reality that linear tables are not very > well suited to sparse addressing. Nobody copied the VAX MMU model for a > reason... until now. We could still fall back to mem alloc if the DT or ACPI does not provide a base address for the table. For know I would prefer to just implement mem allocation with CMA. -Robert
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 1b8e97331ffb..c519f3777453 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -542,6 +542,7 @@ config XEN config FORCE_MAX_ZONEORDER int default "14" if (ARM64_64K_PAGES && TRANSPARENT_HUGEPAGE) + default "13" if (ARCH_THUNDER && !ARM64_64K_PAGES) default "11" menuconfig ARMV8_DEPRECATED