Message ID | 20221106220143.2129263-1-catalin.marinas@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 | expand |
On Sun, Nov 06, 2022 at 10:01:30PM +0000, Catalin Marinas wrote: > Patches 7-12 change some ARCH_KMALLOC_MINALIGN uses to > ARCH_DMA_MINALIGN. The crypto changes have been rejected by Herbert > previously but I still included them here until the crypto code is > refactored. Hi Catalin, Herbert merged the changes to the crypto code that were required to be able to safely lower the minimum alignment for kmalloc in [1]. Given this, I don't think there's anything blocking this series from being merged. The requirement for SWIOTLB to get to the minimum kmalloc alignment down to 8 bytes shouldn't prevent this series from being merged, as the amount of memory that is allocated for SWIOTLB can be configured through the commandline to minimize the impact of having SWIOTLB memory. Additionally, even if no SWIOTLB is present, this series still offers memory savings on a lot of ARM64 platforms by using the cache line size as the minimum alignment for kmalloc. Can you please rebase this series so that it can be merged? Thanks, Isaac [1]: https://lore.kernel.org/all/Y4nDL50nToBbi4DS@gondor.apana.org.au/
On Thu, Mar 16, 2023 at 11:38:47AM -0700, Isaac Manjarres wrote: > On Sun, Nov 06, 2022 at 10:01:30PM +0000, Catalin Marinas wrote: > > Patches 7-12 change some ARCH_KMALLOC_MINALIGN uses to > > ARCH_DMA_MINALIGN. The crypto changes have been rejected by Herbert > > previously but I still included them here until the crypto code is > > refactored. > > Herbert merged the changes to the crypto code that were required to be > able to safely lower the minimum alignment for kmalloc in [1]. Yes, I saw this. > Given this, I don't think there's anything blocking this series from > being merged. The requirement for SWIOTLB to get to the minimum > kmalloc alignment down to 8 bytes shouldn't prevent this series from > being merged, as the amount of memory that is allocated for SWIOTLB > can be configured through the commandline to minimize the impact of > having SWIOTLB memory. Additionally, even if no SWIOTLB is present, > this series still offers memory savings on a lot of ARM64 platforms > by using the cache line size as the minimum alignment for kmalloc. Actually, there's some progress on the swiotlb front to allow dynamic allocation. I haven't reviewed the series yet (I wasn't aware of it until v2) but at a quick look, it limits the dynamic allocation to bouncing buffers of at least a page size. Maybe this can be later improved for buffers below ARCH_DMA_MINALIGN. https://lore.kernel.org/r/cover.1681898595.git.petr.tesarik.ext@huawei.com > Can you please rebase this series so that it can be merged? I rebased it locally but the last stumbling block is sorting out the iommu bouncing. I was hoping Robin Murphy can lend a hand but he's been busy with other bits. I'll repost the series at 6.4-rc1.
On 4/19/2023 6:06 PM, Catalin Marinas wrote: > On Thu, Mar 16, 2023 at 11:38:47AM -0700, Isaac Manjarres wrote: >[...]>> Given this, I don't think there's anything blocking this series from >> being merged. The requirement for SWIOTLB to get to the minimum >> kmalloc alignment down to 8 bytes shouldn't prevent this series from >> being merged, as the amount of memory that is allocated for SWIOTLB >> can be configured through the commandline to minimize the impact of >> having SWIOTLB memory. Additionally, even if no SWIOTLB is present, >> this series still offers memory savings on a lot of ARM64 platforms >> by using the cache line size as the minimum alignment for kmalloc. > > Actually, there's some progress on the swiotlb front to allow dynamic > allocation. I haven't reviewed the series yet (I wasn't aware of it > until v2) but at a quick look, it limits the dynamic allocation to > bouncing buffers of at least a page size. Maybe this can be later > improved for buffers below ARCH_DMA_MINALIGN. Indeed. My patch allocates dynamic bounce buffers with dma_direct_alloc_pages() to keep things simple for now, but there is no real reason against allocating less than a page with another suitable allocator. However, I'd be interested what the use case is, so I can assess the performance impact, which depends on workload, and FYI it may not even be negative. ;-) Petr Tesarik
On Thu, Apr 20, 2023 at 11:52:00AM +0200, Petr Tesarik wrote: > On 4/19/2023 6:06 PM, Catalin Marinas wrote: > > On Thu, Mar 16, 2023 at 11:38:47AM -0700, Isaac Manjarres wrote: > >[...]>> Given this, I don't think there's anything blocking this series from > >> being merged. The requirement for SWIOTLB to get to the minimum > >> kmalloc alignment down to 8 bytes shouldn't prevent this series from > >> being merged, as the amount of memory that is allocated for SWIOTLB > >> can be configured through the commandline to minimize the impact of > >> having SWIOTLB memory. Additionally, even if no SWIOTLB is present, > >> this series still offers memory savings on a lot of ARM64 platforms > >> by using the cache line size as the minimum alignment for kmalloc. > > > > Actually, there's some progress on the swiotlb front to allow dynamic > > allocation. I haven't reviewed the series yet (I wasn't aware of it > > until v2) but at a quick look, it limits the dynamic allocation to > > bouncing buffers of at least a page size. Maybe this can be later > > improved for buffers below ARCH_DMA_MINALIGN. > > Indeed. My patch allocates dynamic bounce buffers with > dma_direct_alloc_pages() to keep things simple for now, but there is no > real reason against allocating less than a page with another suitable > allocator. I guess it could fall back to a suitably aligned kmalloc() for smaller sizes. > However, I'd be interested what the use case is, so I can assess the > performance impact, which depends on workload, and FYI it may not even > be negative. ;-) On arm64 we have an ARCH_DMA_MINALIGN of 128 bytes as that's the largest cache line size that you can find on a non-coherent platform. The implication is that ARCH_KMALLOC_MINALIGN is also 128, so smaller slab-{8,16,32,64,96,192} caches cannot be created, leading to some memory wastage. This series decouples the two static alignments so that we can have an ARCH_KMALLOC_MINALIGN of 8 while keeping ARCH_DMA_MINALIGN as 128. The problem is that there are some drivers that do a small kmalloc() (below a cache line size; typically USB drivers) and expect DMA to such buffer to work. If the cache line is shared with some unrelated data, either the cache maintenance in the DMA API corrupts such data or the cache dirtying overrides inbound DMA data. So, the solution is to bounce such small buffers if they end up in functions like dma_map_single(). All we need is for the bounce buffer to be aligned to the cache line size and honour the coherent mask (normally ok with one of the GFP_DMA/DMA32 flags if required). The swiotlb buffer would solve this but there are some (mobile) platforms where the vendor disables the bounce buffer to save memory. Having a way to dynamically allocate it in those rare cases above would be helpful.
On Wed, Apr 19, 2023 at 05:06:04PM +0100, Catalin Marinas wrote: > I rebased it locally but the last stumbling block is sorting out the > iommu bouncing. I was hoping Robin Murphy can lend a hand but he's been > busy with other bits. I'll repost the series at 6.4-rc1. Hey Catalin, just following up on this. I think it might be worthwhile to split this series into two series: Series 1: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN, and use the cacheline size to determine the minimum kmalloc alignment. Series 2: Lower the minimum kmalloc alignment to 8 bytes by adding support for using SWIOTLB to bounce unaligned kmalloc buffers for DMA transactions. Dividing the patches as such has the advantage of lowering the minimum kmalloc alignment to 64 bytes on many ARM64 systems while the work for lowering the minimum alignment to 8 bytes proceeds. This provides a noticeable decrease in the slab memory footprint (e.g. I observed a 15 MB decrease in slab usage on a device I was using). What are your thoughts on this? --Isaac
On Mon, May 15, 2023 at 12:09:12PM -0700, Isaac Manjarres wrote: > On Wed, Apr 19, 2023 at 05:06:04PM +0100, Catalin Marinas wrote: > > I rebased it locally but the last stumbling block is sorting out the > > iommu bouncing. I was hoping Robin Murphy can lend a hand but he's been > > busy with other bits. I'll repost the series at 6.4-rc1. > Hey Catalin, just following up on this. I think it might be worthwhile > to split this series into two series: > > Series 1: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN, > and use the cacheline size to determine the minimum kmalloc > alignment. > > Series 2: Lower the minimum kmalloc alignment to 8 bytes by adding > support for using SWIOTLB to bounce unaligned kmalloc buffers for DMA > transactions. > > Dividing the patches as such has the advantage of lowering the minimum > kmalloc alignment to 64 bytes on many ARM64 systems while the work for > lowering the minimum alignment to 8 bytes proceeds. This provides a > noticeable decrease in the slab memory footprint (e.g. I observed a 15 > MB decrease in slab usage on a device I was using). I attempted "series 1" some time ago and the discussion led to the combined approach (i.e. don't bother with limiting kmalloc minimum alignment to cache_line_size() but instead bounce those small buffers). In my series, I still have this fallback in case there's no swiotlb buffer. I'll post a new series this week (including DMA bouncing) but I'll try to move the bouncing towards the end of the series in case there are more discussions around this, at least the first part could be picked up.
On Tue, May 16, 2023 at 06:19:43PM +0100, Catalin Marinas wrote: > On Mon, May 15, 2023 at 12:09:12PM -0700, Isaac Manjarres wrote: > > Hey Catalin, just following up on this. I think it might be worthwhile > > to split this series into two series: > > > > Series 1: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN, > > and use the cacheline size to determine the minimum kmalloc > > alignment. > > > > Series 2: Lower the minimum kmalloc alignment to 8 bytes by adding > > support for using SWIOTLB to bounce unaligned kmalloc buffers for DMA > > transactions. > > I attempted "series 1" some time ago and the discussion led to the > combined approach (i.e. don't bother with limiting kmalloc minimum > alignment to cache_line_size() but instead bounce those small buffers). > In my series, I still have this fallback in case there's no swiotlb > buffer. > I'll post a new series this week (including DMA bouncing) but I'll try > to move the bouncing towards the end of the series in case there are > more discussions around this, at least the first part could be picked > up. Thanks Catalin! I think restructuring the series as you're suggesting makes sense. At least being able to pick up the first part of the series would be great, since it will have a positive impact on the memory footprint for a lot of devices. This also helps alleviate some of the memory overhead for devices that move from a 32-bit ARM kernel to a 64-bit ARM kernel. The second part can continue to be refined until the SWIOTLB and IOMMU bounce buffering refactor is complete. -Isaac