Message ID | 20230531154836.1366225-1-catalin.marinas@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 | expand |
On Wed, May 31, 2023 at 8:48 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > Here's version 6 of the series reducing the kmalloc() minimum alignment > on arm64 to 8 (from 128). There are patches already to do the same for > riscv (pretty straight-forward after this series). Thanks, Catalin for getting these patches out. Please add my "Tested-by:" tag for the series: Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> With the first 11 patches, I observed a reduction of 18.4 MB in the slab memory footprint on my Pixel 6 device. After applying the rest of the patches in the series, I observed a total reduction of 26.5 MB in the slab memory footprint on my device. These are great results! --Isaac
On Thu, 8 Jun 2023 at 07:45, Isaac Manjarres <isaacmanjarres@google.com> wrote: > > On Wed, May 31, 2023 at 8:48 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > Here's version 6 of the series reducing the kmalloc() minimum alignment > > on arm64 to 8 (from 128). There are patches already to do the same for > > riscv (pretty straight-forward after this series). > Thanks, Catalin for getting these patches out. Please add my "Tested-by:" tag > for the series: > > Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> > > With the first 11 patches, I observed a reduction of 18.4 MB > in the slab memory footprint on my Pixel 6 device. After applying the > rest of the patches in the series, I observed a total reduction of > 26.5 MB in the > slab memory footprint on my device. These are great results! > It would also be good to get an insight into how much bouncing is going on in this case, given that (AFAIK) Pixel 6 uses non-cache coherent DMA.
On Thu, Jun 08, 2023 at 10:05:58AM +0200, Ard Biesheuvel wrote: > On Thu, 8 Jun 2023 at 07:45, Isaac Manjarres <isaacmanjarres@google.com> wrote: > > > > On Wed, May 31, 2023 at 8:48 AM Catalin Marinas <catalin.marinas@arm.com> wrote: > > > Here's version 6 of the series reducing the kmalloc() minimum alignment > > > on arm64 to 8 (from 128). There are patches already to do the same for > > > riscv (pretty straight-forward after this series). > > Thanks, Catalin for getting these patches out. Please add my "Tested-by:" tag > > for the series: > > > > Tested-by: Isaac J. Manjarres <isaacmanjarres@google.com> > > > > With the first 11 patches, I observed a reduction of 18.4 MB > > in the slab memory footprint on my Pixel 6 device. After applying the > > rest of the patches in the series, I observed a total reduction of > > 26.5 MB in the > > slab memory footprint on my device. These are great results! > > > > It would also be good to get an insight into how much bouncing is > going on in this case, given that (AFAIK) Pixel 6 uses non-cache > coherent DMA. I enabled the "swiotlb_bounced" trace event from the kernel command line to see if anything was being bounced. It turns out that for Pixel 6 there are non-coherent DMA transfers occurring, but none of the transfers that are in either the DMA_FROM_DEVICE or DMA_BIDIRECTIONAL directions are small enough to require bouncing. --Isaac P.S. I noticed that the trace_swiotlb_bounced() tracepoint may not be invoked even though bouncing occurs. For example, in the dma-iommu path, swiotlb_tbl_map_single() is called when bouncing, instead of swiotlb_map(), which is what ends up calling trace_swiotlb_bounced(). Would it make sense to move the call to trace_swiotlb_bounced() to swiotlb_tbl_map_single() since that function is always invoked?
On Thu, 8 Jun 2023 14:29:45 -0700 Isaac Manjarres <isaacmanjarres@google.com> wrote: >[...] > P.S. I noticed that the trace_swiotlb_bounced() tracepoint may not be > invoked even though bouncing occurs. For example, in the dma-iommu path, > swiotlb_tbl_map_single() is called when bouncing, instead of > swiotlb_map(), which is what ends up calling trace_swiotlb_bounced(). > > Would it make sense to move the call to trace_swiotlb_bounced() to > swiotlb_tbl_map_single() since that function is always invoked? Definitely, if you ask me. I believe the change was merely forgotten in commit eb605a5754d0 ("swiotlb: add swiotlb_tbl_map_single library function"). Let me take the author into Cc. Plus Konrad, who built further on that commit, may also have an opinion. Petr T
On Mon, Jun 12, 2023 at 09:47:55AM +0200, Christoph Hellwig wrote: > On Mon, Jun 12, 2023 at 07:44:46AM +0000, Tomonori Fujita wrote: > > I cannot recall the patch but from quick look, moving trace_swiotlb_bounced() to > > swiotlb_tbl_map_single() makes sense. > > Agreed. There's actually two call-sites for trace_swiotlb_bounced(): swiotlb_map() and xen_swiotlb_map_page(). Both those functions also invoke swiotlb_tbl_map_single(), so moving the call to trace_swiotlb_bounced() to swiotlb_tbl_map_single() means that there will be 2 traces per bounce buffering event. The difference between the two call-sites of trace_swiotlb_bounced() is that the call in swiotlb_map() uses phys_to_dma() for the device address, while xen_swiotlb_map_page() uses xen_phys_to_dma(). Would it make sense to move the trace_swiotlb_bounced() call to swiotlb_tbl_map_single() and then introduce a swiotlb_tbl_map_single_notrace() function which doesn't do the tracing, and xen_swiotlb_map_page() can call this? --Isaac