diff mbox series

[v3,13/13] dma: arm64: Add CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC and enable it for arm64

Message ID 20221106220143.2129263-14-catalin.marinas@arm.com (mailing list archive)
State New
Headers show
Series mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 | expand

Commit Message

Catalin Marinas Nov. 6, 2022, 10:01 p.m. UTC
With all the infrastructure in place for bouncing small kmalloc()
buffers, add the corresponding Kconfig entry and select it for arm64.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Robin Murphy <robin.murphy@arm.com>
---
 arch/arm64/Kconfig | 1 +
 kernel/dma/Kconfig | 8 ++++++++
 2 files changed, 9 insertions(+)

Comments

Robin Murphy Nov. 7, 2022, 1:03 p.m. UTC | #1
On 2022-11-06 22:01, Catalin Marinas wrote:
> With all the infrastructure in place for bouncing small kmalloc()
> buffers, add the corresponding Kconfig entry and select it for arm64.

AFAICS we're missing the crucial part to ensure that SWIOTLB is 
available even when max_pfn <= arm64_dma_phys_limit, which is very 
likely to be true on low-memory systems that care most about kmalloc 
wastage. The only way to override that currently is with 
"swiotlb=force", but bouncing *everything* is not desirable either.

Thanks,
Robin.

> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Robin Murphy <robin.murphy@arm.com>
> ---
>   arch/arm64/Kconfig | 1 +
>   kernel/dma/Kconfig | 8 ++++++++
>   2 files changed, 9 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 3991cb7b8a33..f889cf16e6ab 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -100,6 +100,7 @@ config ARM64
>   	select ARCH_WANT_FRAME_POINTERS
>   	select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
>   	select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
> +	select ARCH_WANT_KMALLOC_DMA_BOUNCE
>   	select ARCH_WANT_LD_ORPHAN_WARN
>   	select ARCH_WANTS_NO_INSTR
>   	select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
> index d6fab8e3cbae..b56e76371023 100644
> --- a/kernel/dma/Kconfig
> +++ b/kernel/dma/Kconfig
> @@ -86,6 +86,14 @@ config SWIOTLB
>   	bool
>   	select NEED_DMA_MAP_STATE
>   
> +config ARCH_WANT_KMALLOC_DMA_BOUNCE
> +	bool
> +
> +config DMA_BOUNCE_UNALIGNED_KMALLOC
> +	def_bool y
> +	depends on ARCH_WANT_KMALLOC_DMA_BOUNCE
> +	depends on SWIOTLB && !SLOB
> +
>   config DMA_RESTRICTED_POOL
>   	bool "DMA Restricted Pool"
>   	depends on OF && OF_RESERVED_MEM && SWIOTLB
Christoph Hellwig Nov. 7, 2022, 2:38 p.m. UTC | #2
On Mon, Nov 07, 2022 at 01:03:31PM +0000, Robin Murphy wrote:
> On 2022-11-06 22:01, Catalin Marinas wrote:
>> With all the infrastructure in place for bouncing small kmalloc()
>> buffers, add the corresponding Kconfig entry and select it for arm64.
>
> AFAICS we're missing the crucial part to ensure that SWIOTLB is available 
> even when max_pfn <= arm64_dma_phys_limit, which is very likely to be true 
> on low-memory systems that care most about kmalloc wastage. The only way to 
> override that currently is with "swiotlb=force", but bouncing *everything* 
> is not desirable either.

FYI, one of the reasons for the swiotlb_init refactor that passes
flags and a boolean a while ago is that we can trivially just either
pass another flag or check a condition in swiotlb_init to allocate the
buffer.  There's actually another case for which we need the
unconditional allocation, and that is the bouncing for untrusted
external devices with dma-iommu.
Robin Murphy Nov. 7, 2022, 3:24 p.m. UTC | #3
On 2022-11-07 14:38, Christoph Hellwig wrote:
> On Mon, Nov 07, 2022 at 01:03:31PM +0000, Robin Murphy wrote:
>> On 2022-11-06 22:01, Catalin Marinas wrote:
>>> With all the infrastructure in place for bouncing small kmalloc()
>>> buffers, add the corresponding Kconfig entry and select it for arm64.
>>
>> AFAICS we're missing the crucial part to ensure that SWIOTLB is available
>> even when max_pfn <= arm64_dma_phys_limit, which is very likely to be true
>> on low-memory systems that care most about kmalloc wastage. The only way to
>> override that currently is with "swiotlb=force", but bouncing *everything*
>> is not desirable either.
> 
> FYI, one of the reasons for the swiotlb_init refactor that passes
> flags and a boolean a while ago is that we can trivially just either
> pass another flag or check a condition in swiotlb_init to allocate the
> buffer.  There's actually another case for which we need the
> unconditional allocation, and that is the bouncing for untrusted
> external devices with dma-iommu.

Right, I guess machines with Thunderbolt and all the firmware 
annotations but less than 4GB of RAM are unlikely to exist in the wild, 
so the untrusted bouncing logic has been getting lucky so far. There are 
however plenty of arm64 systems with small amounts of RAM and 
non-coherent USB so in this case someone's likely to fall over it pretty 
much right away. I know it's easy to add a new condition, but it still 
has to actually *be* added.

Thanks,
Robin.
Catalin Marinas Nov. 8, 2022, 9:52 a.m. UTC | #4
On Mon, Nov 07, 2022 at 01:03:31PM +0000, Robin Murphy wrote:
> On 2022-11-06 22:01, Catalin Marinas wrote:
> > With all the infrastructure in place for bouncing small kmalloc()
> > buffers, add the corresponding Kconfig entry and select it for arm64.
> 
> AFAICS we're missing the crucial part to ensure that SWIOTLB is available
> even when max_pfn <= arm64_dma_phys_limit, which is very likely to be true
> on low-memory systems that care most about kmalloc wastage.

This was a deliberate decision for this version. Patch 4 mitigates it a
bit by raising the kmalloc() minimum alignment to the cache line size
(typically 64). It's still an improvement from the current 128-byte
alignment.

Since it's hard to guess the optimal swiotlb buffer for such platforms,
I think a follow-up step would be to use the DMA coherent pool for
bouncing if no swiotlb buffer is available. At least the pool can grow
dynamically. Yet another option would be to increase the swiotlb buffer
at run-time but it has an overhead for is_swiotlb_buffer().
Christoph Hellwig Nov. 8, 2022, 10:03 a.m. UTC | #5
On Tue, Nov 08, 2022 at 09:52:15AM +0000, Catalin Marinas wrote:
> Since it's hard to guess the optimal swiotlb buffer for such platforms,
> I think a follow-up step would be to use the DMA coherent pool for
> bouncing if no swiotlb buffer is available. At least the pool can grow
> dynamically. Yet another option would be to increase the swiotlb buffer
> at run-time but it has an overhead for is_swiotlb_buffer().

Alex said he wanted to look into growing the swiotlb buffer on demand
for other reason, so adding him to Cc to check if there has been any
progress on that.
Isaac Manjarres Nov. 30, 2022, 6:48 p.m. UTC | #6
On Tue, Nov 08, 2022 at 11:03:31AM +0100, Christoph Hellwig wrote:
> On Tue, Nov 08, 2022 at 09:52:15AM +0000, Catalin Marinas wrote:
> > Since it's hard to guess the optimal swiotlb buffer for such platforms,
> > I think a follow-up step would be to use the DMA coherent pool for
> > bouncing if no swiotlb buffer is available. At least the pool can grow
> > dynamically. Yet another option would be to increase the swiotlb buffer
> > at run-time but it has an overhead for is_swiotlb_buffer().
> 
> Alex said he wanted to look into growing the swiotlb buffer on demand
> for other reason, so adding him to Cc to check if there has been any
> progress on that.
Hi Alex,

Did you get a chance to look into this? If so, have you been able to
make progress on being able to grow the SWIOTLB buffer on demand?

Thanks,
Isaac
Alexander Graf Nov. 30, 2022, 11:32 p.m. UTC | #7
Hi Isaac,

On 30.11.22 19:48, Isaac Manjarres wrote:
> On Tue, Nov 08, 2022 at 11:03:31AM +0100, Christoph Hellwig wrote:
>> On Tue, Nov 08, 2022 at 09:52:15AM +0000, Catalin Marinas wrote:
>>> Since it's hard to guess the optimal swiotlb buffer for such platforms,
>>> I think a follow-up step would be to use the DMA coherent pool for
>>> bouncing if no swiotlb buffer is available. At least the pool can grow
>>> dynamically. Yet another option would be to increase the swiotlb buffer
>>> at run-time but it has an overhead for is_swiotlb_buffer().
>> Alex said he wanted to look into growing the swiotlb buffer on demand
>> for other reason, so adding him to Cc to check if there has been any
>> progress on that.
> Hi Alex,
>
> Did you get a chance to look into this? If so, have you been able to
> make progress on being able to grow the SWIOTLB buffer on demand?


I've been slightly under water and haven't been able to look at this yet 
:). It's on my list, but will probably be a while until I get to it. 
Would you be interested in having a first try?


Thanks,

Alex
Petr Tesařík April 20, 2023, 11:51 a.m. UTC | #8
Hi Alex!

Nice to meet you again...

On Thu, 1 Dec 2022 00:32:07 +0100
Alexander Graf <agraf@csgraf.de> wrote:

> Hi Isaac,
> 
> On 30.11.22 19:48, Isaac Manjarres wrote:
> > On Tue, Nov 08, 2022 at 11:03:31AM +0100, Christoph Hellwig wrote:  
> >> On Tue, Nov 08, 2022 at 09:52:15AM +0000, Catalin Marinas wrote:  
> >>> Since it's hard to guess the optimal swiotlb buffer for such platforms,
> >>> I think a follow-up step would be to use the DMA coherent pool for
> >>> bouncing if no swiotlb buffer is available. At least the pool can grow
> >>> dynamically. Yet another option would be to increase the swiotlb buffer
> >>> at run-time but it has an overhead for is_swiotlb_buffer().  
> >> Alex said he wanted to look into growing the swiotlb buffer on demand
> >> for other reason, so adding him to Cc to check if there has been any
> >> progress on that.  
> > Hi Alex,
> >
> > Did you get a chance to look into this? If so, have you been able to
> > make progress on being able to grow the SWIOTLB buffer on demand?  
> 
> 
> I've been slightly under water and haven't been able to look at this yet 
> :). It's on my list, but will probably be a while until I get to it. 
> Would you be interested in having a first try?

All right, I have just found this thread now after having sent my own
patch series to make SWIOTLB dynamic. I hope you don't mind. I didn't
want to "steal" the project from you.

Petr T
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3991cb7b8a33..f889cf16e6ab 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -100,6 +100,7 @@  config ARM64
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
 	select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
+	select ARCH_WANT_KMALLOC_DMA_BOUNCE
 	select ARCH_WANT_LD_ORPHAN_WARN
 	select ARCH_WANTS_NO_INSTR
 	select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index d6fab8e3cbae..b56e76371023 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -86,6 +86,14 @@  config SWIOTLB
 	bool
 	select NEED_DMA_MAP_STATE
 
+config ARCH_WANT_KMALLOC_DMA_BOUNCE
+	bool
+
+config DMA_BOUNCE_UNALIGNED_KMALLOC
+	def_bool y
+	depends on ARCH_WANT_KMALLOC_DMA_BOUNCE
+	depends on SWIOTLB && !SLOB
+
 config DMA_RESTRICTED_POOL
 	bool "DMA Restricted Pool"
 	depends on OF && OF_RESERVED_MEM && SWIOTLB