Message ID | 20230612153201.554742-1-catalin.marinas@arm.com (mailing list archive) |
---|---|
Headers | show |
Series | mm, dma, arm64: Reduce ARCH_KMALLOC_MINALIGN to 8 | expand |
Hi Catalin, On Wed, 5 Jul 2023 at 18:17, Catalin Marinas <catalin.marinas@arm.com> wrote: > > Hi, > > That's v7 of the series reducing the kmalloc() minimum alignment on > arm64 to 8 (from 128). There's no new/different functionality, mostly > cosmetic changes and acks/tested-bys. > > Andrew, if there are no further comments or objections to this version, > are you ok to take the series through the mm tree? The arm64 changes are > fairly small. Alternatively, I can push it into linux-next now to give > it some wider exposure and decide whether to upstream it when the > merging window opens. Thanks. This patch series broke Dragonboard 845c (SDM845) running AOSP. With this series I run into random oops at __kmem_cache_alloc_node(). Here is one such boot log https://bugs.linaro.org/attachment.cgi?id=1146 Reverting "arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64" fixes it for the time being. Regards, Amit Pundir > > The updated patches are also available on this branch: > > git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux devel/kmalloc-minalign > > Changes since v6 > (https://lore.kernel.org/r/20230531154836.1366225-1-catalin.marinas@arm.com): > > - Moved ARCH_DMA_MINALIGN to include/linux/cache.h from slab.h (it > matches the asm/cache.h where most architectures defining this macro > place it). > > - Single body for __kmalloc_minalign() with an #ifdef. > > - Renamed the sg_dma_use_swiotlb() to sg_dma_is_swiotlb() (Robin's > suggestion). > > - Added acks/tested-by. > > Catalin Marinas (15): > mm/slab: Decouple ARCH_KMALLOC_MINALIGN from ARCH_DMA_MINALIGN > dma: Allow dma_get_cache_alignment() to be overridden by the arch code > mm/slab: Simplify create_kmalloc_cache() args and make it static > mm/slab: Limit kmalloc() minimum alignment to > dma_get_cache_alignment() > drivers/base: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN > drivers/gpu: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN > drivers/usb: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN > drivers/spi: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN > dm-crypt: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN > iio: core: Use ARCH_DMA_MINALIGN instead of ARCH_KMALLOC_MINALIGN > arm64: Allow kmalloc() caches aligned to the smaller cache_line_size() > dma-mapping: Force bouncing if the kmalloc() size is not > cache-line-aligned > iommu/dma: Force bouncing if the size is not cacheline-aligned > mm: slab: Reduce the kmalloc() minimum alignment if DMA bouncing > possible > arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 > > Robin Murphy (2): > scatterlist: Add dedicated config for DMA flags > dma-mapping: Name SG DMA flag helpers consistently > > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/cache.h | 3 ++ > arch/arm64/mm/init.c | 7 +++- > drivers/base/devres.c | 6 ++-- > drivers/gpu/drm/drm_managed.c | 6 ++-- > drivers/iommu/Kconfig | 1 + > drivers/iommu/dma-iommu.c | 58 ++++++++++++++++++++++++-------- > drivers/iommu/iommu.c | 2 +- > drivers/md/dm-crypt.c | 2 +- > drivers/pci/Kconfig | 1 + > drivers/spi/spidev.c | 2 +- > drivers/usb/core/buffer.c | 8 ++--- > include/linux/cache.h | 6 ++++ > include/linux/dma-map-ops.h | 61 ++++++++++++++++++++++++++++++++++ > include/linux/dma-mapping.h | 5 ++- > include/linux/iio/iio.h | 2 +- > include/linux/scatterlist.h | 60 ++++++++++++++++++++++++++------- > include/linux/slab.h | 14 +++++--- > kernel/dma/Kconfig | 7 ++++ > kernel/dma/direct.c | 2 +- > kernel/dma/direct.h | 3 +- > mm/slab.c | 6 +--- > mm/slab.h | 5 ++- > mm/slab_common.c | 41 +++++++++++++++++------ > 24 files changed, 244 insertions(+), 65 deletions(-) >
On Wed, Jul 05, 2023 at 07:10:02PM +0530, Amit Pundir wrote: > Hi Catalin, > > On Wed, 5 Jul 2023 at 18:17, Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > Hi, > > > > That's v7 of the series reducing the kmalloc() minimum alignment on > > arm64 to 8 (from 128). There's no new/different functionality, mostly > > cosmetic changes and acks/tested-bys. > > > > Andrew, if there are no further comments or objections to this version, > > are you ok to take the series through the mm tree? The arm64 changes are > > fairly small. Alternatively, I can push it into linux-next now to give > > it some wider exposure and decide whether to upstream it when the > > merging window opens. Thanks. > > This patch series broke Dragonboard 845c (SDM845) running AOSP. > With this series I run into random oops at __kmem_cache_alloc_node(). > Here is one such boot log https://bugs.linaro.org/attachment.cgi?id=1146 Hey Amit, From the log that you linked, this looks like there's corruption within the kmalloc slab caches. Can you please try enabling slub debug? That might give us more information about who is causing the corruption you're seeing here. To enable slub debug, you can add the following to your kernel config: CONFIG_SLUB_DEBUG=y and CONFIG_SLUB_DEBUG_ON=y or add slub_debug to the kernel command line. --Isaac
On Fri, 7 Jul 2023 at 06:12, Isaac Manjarres <isaacmanjarres@google.com> wrote: > > On Wed, Jul 05, 2023 at 07:10:02PM +0530, Amit Pundir wrote: > > Hi Catalin, > > > > On Wed, 5 Jul 2023 at 18:17, Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > > > Hi, > > > > > > That's v7 of the series reducing the kmalloc() minimum alignment on > > > arm64 to 8 (from 128). There's no new/different functionality, mostly > > > cosmetic changes and acks/tested-bys. > > > > > > Andrew, if there are no further comments or objections to this version, > > > are you ok to take the series through the mm tree? The arm64 changes are > > > fairly small. Alternatively, I can push it into linux-next now to give > > > it some wider exposure and decide whether to upstream it when the > > > merging window opens. Thanks. > > > > This patch series broke Dragonboard 845c (SDM845) running AOSP. > > With this series I run into random oops at __kmem_cache_alloc_node(). > > Here is one such boot log https://bugs.linaro.org/attachment.cgi?id=1146 > > Hey Amit, > > From the log that you linked, this looks like there's corruption within > the kmalloc slab caches. Can you please try enabling slub debug? That > might give us more information about who is causing the corruption > you're seeing here. > > To enable slub debug, you can add the following to your kernel config: > > CONFIG_SLUB_DEBUG=y > > and CONFIG_SLUB_DEBUG_ON=y or add slub_debug to the kernel command line. Hi Issac, I can't reproduce this crash if I have slub_debug enabled https://bugs.linaro.org/attachment.cgi?id=1149. But I can still reproduce the crash, with the same build, if I just remove the slub_debug bootarg https://bugs.linaro.org/attachment.cgi?id=1150 Is there any specific (slub_debug=) option(s) you want me to try? May be I can reproduce this bug just with those options. Regards, Amit Pundir > > --Isaac
Hi Amit, On Wed, Jul 05, 2023 at 07:10:02PM +0530, Amit Pundir wrote: > On Wed, 5 Jul 2023 at 18:17, Catalin Marinas <catalin.marinas@arm.com> wrote: > > That's v7 of the series reducing the kmalloc() minimum alignment on > > arm64 to 8 (from 128). There's no new/different functionality, mostly > > cosmetic changes and acks/tested-bys. > > > > Andrew, if there are no further comments or objections to this version, > > are you ok to take the series through the mm tree? The arm64 changes are > > fairly small. Alternatively, I can push it into linux-next now to give > > it some wider exposure and decide whether to upstream it when the > > merging window opens. Thanks. > > This patch series broke Dragonboard 845c (SDM845) running AOSP. > With this series I run into random oops at __kmem_cache_alloc_node(). > Here is one such boot log https://bugs.linaro.org/attachment.cgi?id=1146 > > Reverting "arm64: Enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64" > fixes it for the time being. I'm on holiday until the 24th, away from my computer and I don't have much time to dig into this. Are the DMA-capable devices on this platform non-coherent? We'd need to check whether the bouncing logic somehow overflows and corrupts adjacent memory. If the devices are coherent, maybe the DMA spills over the kmalloc() boundaries, now that we allocate buffers smaller than a cacheline even for DMA (no bouncing if the DMA is coherent). Also, are these devices behind an IOMMU? Yet another possibility is a not fully transparent system level cache with a line size larger than what's reported by the CTR_EL0.CWG. If you boot the platform with "swiotlb=noforce", do you still see issues? Or there's no swiotlb buffer allocated anyway. In addition to what Isaac suggests on slab debugging, it may be worth enabling DMA debugging as well (I don't remember the config name).
On Sat, Jul 8, 2023 at 6:03 AM Amit Pundir <amit.pundir@linaro.org> wrote: > Hi Issac, I can't reproduce this crash if I have slub_debug enabled > https://bugs.linaro.org/attachment.cgi?id=1149. > > But I can still reproduce the crash, with the same build, if I just > remove the slub_debug bootarg > https://bugs.linaro.org/attachment.cgi?id=1150 > Hey Amit, Thanks for trying this out. John Stultz and I tried this out as well, and we couldn't reproduce the issue with slub_debug enabled either. However, we were able to reproduce the issue you reported along with KASAN enabled, which also helped us catch an out-of-bounds access in the regmap-irq code. Once we fixed it, the memory corruption you reported was also fixed. I've sent the patch [1] for review. Can you please test it out and see if it works for you too? [1]: https://lore.kernel.org/all/20230711193059.2480971-1-isaacmanjarres@google.com/ Thanks, Isaac
On Wed, 12 Jul 2023 at 01:14, Isaac Manjarres <isaacmanjarres@google.com> wrote: > > On Sat, Jul 8, 2023 at 6:03 AM Amit Pundir <amit.pundir@linaro.org> wrote: > > Hi Issac, I can't reproduce this crash if I have slub_debug enabled > > https://bugs.linaro.org/attachment.cgi?id=1149. > > > > But I can still reproduce the crash, with the same build, if I just > > remove the slub_debug bootarg > > https://bugs.linaro.org/attachment.cgi?id=1150 > > > > Hey Amit, > > Thanks for trying this out. John Stultz and I tried this out as well, and we > couldn't reproduce the issue with slub_debug enabled either. > > However, we were able to reproduce the issue you reported along with > KASAN enabled, which also helped us catch an out-of-bounds access > in the regmap-irq code. Once we fixed it, the memory corruption you > reported was also fixed. I've sent the patch [1] for review. Can you please > test it out and see if it works for you too? It does fix the v6.5-rc1 boot regression I reported above on Dragonboard 845c running AOSP. Thank you. Regards, Amit Pundir > > [1]: https://lore.kernel.org/all/20230711193059.2480971-1-isaacmanjarres@google.com/ > > Thanks, > Isaac