Message ID | ad2b7034-ecc9-4d6e-844c-bfe32fb5b23c@gmx.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | WARNING: drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c:364 vchiq_prepare_bulk_data | expand |
On Mon, Jun 10, 2024, at 00:24, Stefan Wahren wrote: > > After that i'm getting the following warning: > > create_pagelist: block 1, DMA address 0x00000000f5f62800 doesn't > align with PAGE_MASK 0xfffffffffffff000 > > Then i bisected the issue to this commit: > > commit 1c1a429efd4ee8ca244cc2401365c983cda4ed76 > Author: Catalin Marinas <catalin.marinas@arm.com> > Date: Mon Jun 12 16:32:01 2023 +0100 > > arm64: enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 > > With the DMA bouncing of unaligned kmalloc() buffers now in place, > enable > it for arm64 to allow the kmalloc-{8,16,32,48,96} caches. In addition, > always create the swiotlb buffer even when the end of RAM is within the > 32-bit physical address range (the swiotlb buffer can still be > disabled on > the kernel command line). > > So how can the root cause of these warnings be fixed? > Or is the specific WARN_ON() now obsolete? > It appears that the buffer that gets passed to the device has a start address and length that both come from user space, and in this case crosses a page boundary, and the second half is not aligned to ARCH_KMALLOC_MINALIGN. This means it's not actually a kmalloc buffer that goes wrong, but userspace passing something that is problematic, see also the comment above create_pagelist(). If that is a correct interpretation of what is going on, one way to solve it would be to add the same logic for vchiq user buffers that we have for kmalloc buffers: if either the start or the size of the user buffer in vchiq_irq_queue_bulk_tx_rx() are unaligned, just allocate a new kernel buffer for the whole thing instead of pinning the user buffer. Arnd
On Mon, 10 Jun 2024 at 07:00, Arnd Bergmann <arnd@arndb.de> wrote: > > On Mon, Jun 10, 2024, at 00:24, Stefan Wahren wrote: > > > > > After that i'm getting the following warning: > > > > create_pagelist: block 1, DMA address 0x00000000f5f62800 doesn't > > align with PAGE_MASK 0xfffffffffffff000 > > > > Then i bisected the issue to this commit: > > > > commit 1c1a429efd4ee8ca244cc2401365c983cda4ed76 > > Author: Catalin Marinas <catalin.marinas@arm.com> > > Date: Mon Jun 12 16:32:01 2023 +0100 > > > > arm64: enable ARCH_WANT_KMALLOC_DMA_BOUNCE for arm64 > > > > With the DMA bouncing of unaligned kmalloc() buffers now in place, > > enable > > it for arm64 to allow the kmalloc-{8,16,32,48,96} caches. In addition, > > always create the swiotlb buffer even when the end of RAM is within the > > 32-bit physical address range (the swiotlb buffer can still be > > disabled on > > the kernel command line). > > > > So how can the root cause of these warnings be fixed? > > Or is the specific WARN_ON() now obsolete? > > > > It appears that the buffer that gets passed to the device > has a start address and length that both come from user > space, and in this case crosses a page boundary, and the > second half is not aligned to ARCH_KMALLOC_MINALIGN. > > This means it's not actually a kmalloc buffer that goes > wrong, but userspace passing something that is problematic, > see also the comment above create_pagelist(). > > If that is a correct interpretation of what is going on, > one way to solve it would be to add the same logic for > vchiq user buffers that we have for kmalloc buffers: > if either the start or the size of the user buffer in > vchiq_irq_queue_bulk_tx_rx() are unaligned, just > allocate a new kernel buffer for the whole thing instead > of pinning the user buffer. > > Arnd Why is swiotlb involved at all? The DMA controller on BCM2837 can access all RAM that is visible to the ARM cores. Phil
On Mon, Jun 10, 2024, at 10:26, Phil Elwell wrote: > On Mon, 10 Jun 2024 at 07:00, Arnd Bergmann <arnd@arndb.de> wrote: > > Why is swiotlb involved at all? The DMA controller on BCM2837 can > access all RAM that is visible to the ARM cores. When a device is not cache-coherent and the buffer is not cache aligned, we now use swiotlb to avoid clobbering data in the same cache line during DMA synchronization. We used to rely on kmalloc() returning buffers that are cacheline aligned, but that was very expensive. Arnd
On Mon, Jun 10, 2024 at 11:00:12AM +0200, Arnd Bergmann wrote: > On Mon, Jun 10, 2024, at 10:26, Phil Elwell wrote: > > On Mon, 10 Jun 2024 at 07:00, Arnd Bergmann <arnd@arndb.de> wrote: > > > > Why is swiotlb involved at all? The DMA controller on BCM2837 can > > access all RAM that is visible to the ARM cores. > > When a device is not cache-coherent and the buffer is not > cache aligned, we now use swiotlb to avoid clobbering data > in the same cache line during DMA synchronization. > > We used to rely on kmalloc() returning buffers that are > cacheline aligned, but that was very expensive. Could we reject buffers provided by userspace that are not cache-aligned ?
On Mon, Jun 10, 2024, at 11:15, Laurent Pinchart wrote: > On Mon, Jun 10, 2024 at 11:00:12AM +0200, Arnd Bergmann wrote: >> On Mon, Jun 10, 2024, at 10:26, Phil Elwell wrote: >> > On Mon, 10 Jun 2024 at 07:00, Arnd Bergmann <arnd@arndb.de> wrote: >> > >> > Why is swiotlb involved at all? The DMA controller on BCM2837 can >> > access all RAM that is visible to the ARM cores. >> >> When a device is not cache-coherent and the buffer is not >> cache aligned, we now use swiotlb to avoid clobbering data >> in the same cache line during DMA synchronization. >> >> We used to rely on kmalloc() returning buffers that are >> cacheline aligned, but that was very expensive. > > Could we reject buffers provided by userspace that are not > cache-aligned ? My guess is that this will likely break existing applications, in which case we cannot. It's probably a good idea to take a look at what buffers are actually passed by userspace today. That would also help decide how we allocate bounce buffers if we have to. E.g. it's a big difference if the buffers are always within a few bytes, kilobytes or megabytes. Arnd
On Mon, 10 Jun 2024 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote: > > On Mon, Jun 10, 2024, at 11:15, Laurent Pinchart wrote: > > On Mon, Jun 10, 2024 at 11:00:12AM +0200, Arnd Bergmann wrote: > >> On Mon, Jun 10, 2024, at 10:26, Phil Elwell wrote: > >> > On Mon, 10 Jun 2024 at 07:00, Arnd Bergmann <arnd@arndb.de> wrote: > >> > > >> > Why is swiotlb involved at all? The DMA controller on BCM2837 can > >> > access all RAM that is visible to the ARM cores. > >> > >> When a device is not cache-coherent and the buffer is not > >> cache aligned, we now use swiotlb to avoid clobbering data > >> in the same cache line during DMA synchronization. > >> > >> We used to rely on kmalloc() returning buffers that are > >> cacheline aligned, but that was very expensive. > > > > Could we reject buffers provided by userspace that are not > > cache-aligned ? > > My guess is that this will likely break existing applications, > in which case we cannot. > > It's probably a good idea to take a look at what buffers > are actually passed by userspace today. That would also > help decide how we allocate bounce buffers if we have to. > E.g. it's a big difference if the buffers are always > within a few bytes, kilobytes or megabytes. > > Arnd vchiq sends partial cache lines at the start and of reads (as seen from the ARM host) out of band, so the only misaligned DMA transfers should be from ARM to VPU. This should not require a bounce buffer. Phil
On Mon, Jun 10, 2024, at 11:24, Phil Elwell wrote: > On Mon, 10 Jun 2024 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote: >> >> My guess is that this will likely break existing applications, >> in which case we cannot. >> >> It's probably a good idea to take a look at what buffers >> are actually passed by userspace today. That would also >> help decide how we allocate bounce buffers if we have to. >> E.g. it's a big difference if the buffers are always >> within a few bytes, kilobytes or megabytes. > > vchiq sends partial cache lines at the start and of reads (as seen > from the ARM host) out of band, so the only misaligned DMA transfers > should be from ARM to VPU. This should not require a bounce buffer. Ok, so if the partial data is sent out of band already, it may be possible to just leave it out of the scatterlist so it doesn't get bounced by dma_map_sg(). I don't see where that case is handled but that shouldn't be too hard here. Arnd
On 2024-06-10 10:24 am, Phil Elwell wrote: > On Mon, 10 Jun 2024 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote: >> >> On Mon, Jun 10, 2024, at 11:15, Laurent Pinchart wrote: >>> On Mon, Jun 10, 2024 at 11:00:12AM +0200, Arnd Bergmann wrote: >>>> On Mon, Jun 10, 2024, at 10:26, Phil Elwell wrote: >>>>> On Mon, 10 Jun 2024 at 07:00, Arnd Bergmann <arnd@arndb.de> wrote: >>>>> >>>>> Why is swiotlb involved at all? The DMA controller on BCM2837 can >>>>> access all RAM that is visible to the ARM cores. >>>> >>>> When a device is not cache-coherent and the buffer is not >>>> cache aligned, we now use swiotlb to avoid clobbering data >>>> in the same cache line during DMA synchronization. >>>> >>>> We used to rely on kmalloc() returning buffers that are >>>> cacheline aligned, but that was very expensive. >>> >>> Could we reject buffers provided by userspace that are not >>> cache-aligned ? >> >> My guess is that this will likely break existing applications, >> in which case we cannot. >> >> It's probably a good idea to take a look at what buffers >> are actually passed by userspace today. That would also >> help decide how we allocate bounce buffers if we have to. >> E.g. it's a big difference if the buffers are always >> within a few bytes, kilobytes or megabytes. >> >> Arnd > > vchiq sends partial cache lines at the start and of reads (as seen > from the ARM host) out of band, so the only misaligned DMA transfers > should be from ARM to VPU. This should not require a bounce buffer. Hmm, indeed the dma_kmalloc_safe() takes into account that unaligned DMA_TO_DEVICE does not need bouncing, so it would suggest that something's off in what vchiq is asking for. Thanks, Robin.
Hi, Am 10.06.24 um 12:25 schrieb Robin Murphy: > On 2024-06-10 10:24 am, Phil Elwell wrote: >> On Mon, 10 Jun 2024 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote: >>> >>> On Mon, Jun 10, 2024, at 11:15, Laurent Pinchart wrote: >>>> On Mon, Jun 10, 2024 at 11:00:12AM +0200, Arnd Bergmann wrote: >>>>> On Mon, Jun 10, 2024, at 10:26, Phil Elwell wrote: >>>>>> On Mon, 10 Jun 2024 at 07:00, Arnd Bergmann <arnd@arndb.de> wrote: >>>>>> >>>>>> Why is swiotlb involved at all? The DMA controller on BCM2837 can >>>>>> access all RAM that is visible to the ARM cores. >>>>> >>>>> When a device is not cache-coherent and the buffer is not >>>>> cache aligned, we now use swiotlb to avoid clobbering data >>>>> in the same cache line during DMA synchronization. >>>>> >>>>> We used to rely on kmalloc() returning buffers that are >>>>> cacheline aligned, but that was very expensive. >>>> >>>> Could we reject buffers provided by userspace that are not >>>> cache-aligned ? >>> >>> My guess is that this will likely break existing applications, >>> in which case we cannot. >>> >>> It's probably a good idea to take a look at what buffers >>> are actually passed by userspace today. That would also >>> help decide how we allocate bounce buffers if we have to. >>> E.g. it's a big difference if the buffers are always >>> within a few bytes, kilobytes or megabytes. >>> >>> Arnd >> >> vchiq sends partial cache lines at the start and of reads (as seen >> from the ARM host) out of band, so the only misaligned DMA transfers >> should be from ARM to VPU. This should not require a bounce buffer. > > Hmm, indeed the dma_kmalloc_safe() takes into account that unaligned > DMA_TO_DEVICE does not need bouncing, so it would suggest that > something's off in what vchiq is asking for. I'm available to debug this further, but i need more guidance here. At least i extend the output for the error case: - WARN_ON(len == 0); - WARN_ON(i && (i != (dma_buffers - 1)) && (len & ~PAGE_MASK)); - WARN_ON(i && (addr & ~PAGE_MASK)); + if (len == 0) + pr_warn_once("%s: sg_dma_len() == 0\n", __func__); + else if (i && (i != (dma_buffers - 1)) && (len & ~PAGE_MASK)) + pr_warn_once("%s: following block not page aligned\n", __func__); + else if (i && (addr & ~PAGE_MASK)) { + pr_warn_once("%s: block %u, DMA address %pad doesn't align with PAGE_MASK 0x%lx\n", __func__, i, &addr, PAGE_MASK); + pr_warn_once("sg_dma_is_swiotlb: %d, dma_flags: %x\n", sg_dma_is_swiotlb(sg), sg->dma_flags); + } Example result: [ 84.180527] create_pagelist: block 1, DMA address 0x00000000f5f74800 doesn't align with PAGE_MASK 0xfffffffffffff000 [ 84.180553] sg_dma_is_swiotlb: 0, dma_flags: 0 Is this helpful? > > Thanks, > Robin.
On Tue, Jun 11, 2024, at 12:47, Stefan Wahren wrote: > Am 10.06.24 um 12:25 schrieb Robin Murphy: >> On 2024-06-10 10:24 am, Phil Elwell wrote: >>> On Mon, 10 Jun 2024 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote: >>>> Arnd >>> >>> vchiq sends partial cache lines at the start and of reads (as seen >>> from the ARM host) out of band, so the only misaligned DMA transfers >>> should be from ARM to VPU. This should not require a bounce buffer. >> >> Hmm, indeed the dma_kmalloc_safe() takes into account that unaligned >> DMA_TO_DEVICE does not need bouncing, so it would suggest that >> something's off in what vchiq is asking for. > I'm available to debug this further, but i need more guidance here. > > At least i extend the output for the error case: > > - WARN_ON(len == 0); > - WARN_ON(i && (i != (dma_buffers - 1)) && (len & > ~PAGE_MASK)); > - WARN_ON(i && (addr & ~PAGE_MASK)); > + if (len == 0) > + pr_warn_once("%s: sg_dma_len() == 0\n", __func__); > + else if (i && (i != (dma_buffers - 1)) && (len & > ~PAGE_MASK)) > + pr_warn_once("%s: following block not page > aligned\n", __func__); > + else if (i && (addr & ~PAGE_MASK)) { > + pr_warn_once("%s: block %u, DMA address %pad > doesn't align with PAGE_MASK 0x%lx\n", __func__, i, &addr, PAGE_MASK); > + pr_warn_once("sg_dma_is_swiotlb: %d, dma_flags: > %x\n", sg_dma_is_swiotlb(sg), sg->dma_flags); > + } > > Example result: > > [ 84.180527] create_pagelist: block 1, DMA address 0x00000000f5f74800 > doesn't align with PAGE_MASK 0xfffffffffffff000 > [ 84.180553] sg_dma_is_swiotlb: 0, dma_flags: 0 > > Is this helpful? It's interesting that this does not have the SG_DMA_SWIOTLB flag set, as the theory so far was that an unaligned user address is what caused this to bounce. I think the most helpful bit of information that is currently missing is the 'ubuf' and 'count' arguments that got passed down from userspace into create_pagelist(), to see what alignment they have in the failure case. Arnd
Am 11.06.24 um 13:08 schrieb Arnd Bergmann: > On Tue, Jun 11, 2024, at 12:47, Stefan Wahren wrote: >> Am 10.06.24 um 12:25 schrieb Robin Murphy: >>> On 2024-06-10 10:24 am, Phil Elwell wrote: >>>> On Mon, 10 Jun 2024 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote: >>>>> Arnd >>>> vchiq sends partial cache lines at the start and of reads (as seen >>>> from the ARM host) out of band, so the only misaligned DMA transfers >>>> should be from ARM to VPU. This should not require a bounce buffer. >>> Hmm, indeed the dma_kmalloc_safe() takes into account that unaligned >>> DMA_TO_DEVICE does not need bouncing, so it would suggest that >>> something's off in what vchiq is asking for. >> I'm available to debug this further, but i need more guidance here. >> >> At least i extend the output for the error case: >> >> - WARN_ON(len == 0); >> - WARN_ON(i && (i != (dma_buffers - 1)) && (len & >> ~PAGE_MASK)); >> - WARN_ON(i && (addr & ~PAGE_MASK)); >> + if (len == 0) >> + pr_warn_once("%s: sg_dma_len() == 0\n", __func__); >> + else if (i && (i != (dma_buffers - 1)) && (len & >> ~PAGE_MASK)) >> + pr_warn_once("%s: following block not page >> aligned\n", __func__); >> + else if (i && (addr & ~PAGE_MASK)) { >> + pr_warn_once("%s: block %u, DMA address %pad >> doesn't align with PAGE_MASK 0x%lx\n", __func__, i, &addr, PAGE_MASK); >> + pr_warn_once("sg_dma_is_swiotlb: %d, dma_flags: >> %x\n", sg_dma_is_swiotlb(sg), sg->dma_flags); >> + } >> >> Example result: >> >> [ 84.180527] create_pagelist: block 1, DMA address 0x00000000f5f74800 >> doesn't align with PAGE_MASK 0xfffffffffffff000 >> [ 84.180553] sg_dma_is_swiotlb: 0, dma_flags: 0 >> >> Is this helpful? > It's interesting that this does not have the SG_DMA_SWIOTLB > flag set, as the theory so far was that an unaligned > user address is what caused this to bounce. > > I think the most helpful bit of information that is > currently missing is the 'ubuf' and 'count' arguments > that got passed down from userspace into create_pagelist(), > to see what alignment they have in the failure case. Here is my attempt: if (len == 0) pr_warn_once("%s: sg_dma_len() == 0\n", __func__); else if (i && (i != (dma_buffers - 1)) && (len & ~PAGE_MASK)) pr_warn_once("%s: following block not page aligned\n", __func__); else if (i && (addr & ~PAGE_MASK)) { pr_warn_once("%s: block %u, DMA address %pad doesn't align with PAGE_MASK 0x%lx\n", __func__, i, &addr, PAGE_MASK); pr_warn_once("sg_dma_is_swiotlb: %d, dma_flags: %x\n", sg_dma_is_swiotlb(sg), sg->dma_flags); pr_warn_once("type = %s\n", (type == PAGELIST_WRITE) ? "PAGELIST_WRITE" : "PAGELIST_READ"); if (buf) pr_warn_once("buf = %p, count = %zu\n", buf, count); else pr_warn_once("ubuf = %p, count = %zu\n", ubuf, count); } Output: [ 66.184030] create_pagelist: block 1, DMA address 0x00000000f5fc7800 doesn't align with PAGE_MASK 0xfffffffffffff000 [ 66.184056] sg_dma_is_swiotlb: 0, dma_flags: 0 [ 66.184063] type = PAGELIST_READ [ 66.184066] ubuf = 00000000266a70a7, count = 0 > > Arnd
On Tue, Jun 11, 2024, at 13:37, Stefan Wahren wrote: > Am 11.06.24 um 13:08 schrieb Arnd Bergmann: >> On Tue, Jun 11, 2024, at 12:47, Stefan Wahren wrote: > > if (len == 0) > pr_warn_once("%s: sg_dma_len() == 0\n", __func__); > else if (i && (i != (dma_buffers - 1)) && (len & ~PAGE_MASK)) > pr_warn_once("%s: following block not page aligned\n", > __func__); > else if (i && (addr & ~PAGE_MASK)) { > pr_warn_once("%s: block %u, DMA address %pad doesn't align > with PAGE_MASK 0x%lx\n", __func__, i, &addr, PAGE_MASK); > pr_warn_once("sg_dma_is_swiotlb: %d, dma_flags: %x\n", > sg_dma_is_swiotlb(sg), sg->dma_flags); > pr_warn_once("type = %s\n", (type == PAGELIST_WRITE) ? > "PAGELIST_WRITE" : "PAGELIST_READ"); > if (buf) > pr_warn_once("buf = %p, count = %zu\n", buf, count); > else > pr_warn_once("ubuf = %p, count = %zu\n", ubuf, count); > } > > Output: > > [ 66.184030] create_pagelist: block 1, DMA address 0x00000000f5fc7800 > doesn't align with PAGE_MASK 0xfffffffffffff000 > [ 66.184056] sg_dma_is_swiotlb: 0, dma_flags: 0 > [ 66.184063] type = PAGELIST_READ > [ 66.184066] ubuf = 00000000266a70a7, count = 0 From my reading of the code, it's not really meant to handle count=0, so maybe the answer for that is to just return an error (or possibly success) when there is data attached to the request from user space, something like --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_dev.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_dev.c @@ -328,6 +328,11 @@ static int vchiq_irq_queue_bulk_tx_rx(struct vchiq_instance *instance, userdata = args->userdata; } + if (!args->size) { + ret = 0; + goto out; + } + status = vchiq_bulk_transfer(instance, args->handle, NULL, args->data, args->size, userdata, args->mode, dir);
Hi Arnd, Am 11.06.24 um 14:14 schrieb Arnd Bergmann: > On Tue, Jun 11, 2024, at 13:37, Stefan Wahren wrote: >> Am 11.06.24 um 13:08 schrieb Arnd Bergmann: >>> On Tue, Jun 11, 2024, at 12:47, Stefan Wahren wrote: >> if (len == 0) >> pr_warn_once("%s: sg_dma_len() == 0\n", __func__); >> else if (i && (i != (dma_buffers - 1)) && (len & ~PAGE_MASK)) >> pr_warn_once("%s: following block not page aligned\n", >> __func__); >> else if (i && (addr & ~PAGE_MASK)) { >> pr_warn_once("%s: block %u, DMA address %pad doesn't align >> with PAGE_MASK 0x%lx\n", __func__, i, &addr, PAGE_MASK); >> pr_warn_once("sg_dma_is_swiotlb: %d, dma_flags: %x\n", >> sg_dma_is_swiotlb(sg), sg->dma_flags); >> pr_warn_once("type = %s\n", (type == PAGELIST_WRITE) ? >> "PAGELIST_WRITE" : "PAGELIST_READ"); >> if (buf) >> pr_warn_once("buf = %p, count = %zu\n", buf, count); >> else >> pr_warn_once("ubuf = %p, count = %zu\n", ubuf, count); >> } >> >> Output: >> >> [ 66.184030] create_pagelist: block 1, DMA address 0x00000000f5fc7800 >> doesn't align with PAGE_MASK 0xfffffffffffff000 >> [ 66.184056] sg_dma_is_swiotlb: 0, dma_flags: 0 >> [ 66.184063] type = PAGELIST_READ >> [ 66.184066] ubuf = 00000000266a70a7, count = 0 sorry my debug attempt for count was pointless. The value is always decremented to zero at this point. So your suggested change won't have any effect. > From my reading of the code, it's not really meant to handle > count=0, so maybe the answer for that is to just return an > error (or possibly success) when there is data attached to > the request from user space, something like > > --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_dev.c > +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_dev.c > @@ -328,6 +328,11 @@ static int vchiq_irq_queue_bulk_tx_rx(struct vchiq_instance *instance, > userdata = args->userdata; > } > > + if (!args->size) { > + ret = 0; > + goto out; > + } > + > status = vchiq_bulk_transfer(instance, args->handle, NULL, args->data, args->size, > userdata, args->mode, dir); > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 11/06/2024 12:37 pm, Stefan Wahren wrote: > Am 11.06.24 um 13:08 schrieb Arnd Bergmann: >> On Tue, Jun 11, 2024, at 12:47, Stefan Wahren wrote: >>> Am 10.06.24 um 12:25 schrieb Robin Murphy: >>>> On 2024-06-10 10:24 am, Phil Elwell wrote: >>>>> On Mon, 10 Jun 2024 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote: >>>>>> Arnd >>>>> vchiq sends partial cache lines at the start and of reads (as seen >>>>> from the ARM host) out of band, so the only misaligned DMA transfers >>>>> should be from ARM to VPU. This should not require a bounce buffer. >>>> Hmm, indeed the dma_kmalloc_safe() takes into account that unaligned >>>> DMA_TO_DEVICE does not need bouncing, so it would suggest that >>>> something's off in what vchiq is asking for. >>> I'm available to debug this further, but i need more guidance here. >>> >>> At least i extend the output for the error case: >>> >>> - WARN_ON(len == 0); >>> - WARN_ON(i && (i != (dma_buffers - 1)) && (len & >>> ~PAGE_MASK)); >>> - WARN_ON(i && (addr & ~PAGE_MASK)); >>> + if (len == 0) >>> + pr_warn_once("%s: sg_dma_len() == 0\n", >>> __func__); >>> + else if (i && (i != (dma_buffers - 1)) && (len & >>> ~PAGE_MASK)) >>> + pr_warn_once("%s: following block not page >>> aligned\n", __func__); >>> + else if (i && (addr & ~PAGE_MASK)) { >>> + pr_warn_once("%s: block %u, DMA address %pad >>> doesn't align with PAGE_MASK 0x%lx\n", __func__, i, &addr, PAGE_MASK); >>> + pr_warn_once("sg_dma_is_swiotlb: %d, dma_flags: >>> %x\n", sg_dma_is_swiotlb(sg), sg->dma_flags); >>> + } >>> >>> Example result: >>> >>> [ 84.180527] create_pagelist: block 1, DMA address 0x00000000f5f74800 >>> doesn't align with PAGE_MASK 0xfffffffffffff000 >>> [ 84.180553] sg_dma_is_swiotlb: 0, dma_flags: 0 >>> >>> Is this helpful? >> It's interesting that this does not have the SG_DMA_SWIOTLB >> flag set, as the theory so far was that an unaligned >> user address is what caused this to bounce. dma-direct doesn't use that flag (at least for now; potentially it might offer a tiny micro-optimisation, but not enough to really matter). At this point, len and pagelistinfo->dma_dir (as passed to dma_map_sg()) are what would have defined the bounce condition... >> I think the most helpful bit of information that is >> currently missing is the 'ubuf' and 'count' arguments >> that got passed down from userspace into create_pagelist(), >> to see what alignment they have in the failure case. > Here is my attempt: > > if (len == 0) > pr_warn_once("%s: sg_dma_len() == 0\n", __func__); > else if (i && (i != (dma_buffers - 1)) && (len & ~PAGE_MASK)) > pr_warn_once("%s: following block not page aligned\n", > __func__); > else if (i && (addr & ~PAGE_MASK)) { > pr_warn_once("%s: block %u, DMA address %pad doesn't align > with PAGE_MASK 0x%lx\n", __func__, i, &addr, PAGE_MASK); > pr_warn_once("sg_dma_is_swiotlb: %d, dma_flags: %x\n", > sg_dma_is_swiotlb(sg), sg->dma_flags); > pr_warn_once("type = %s\n", (type == PAGELIST_WRITE) ? > "PAGELIST_WRITE" : "PAGELIST_READ"); > if (buf) > pr_warn_once("buf = %p, count = %zu\n", buf, count); > else > pr_warn_once("ubuf = %p, count = %zu\n", ubuf, count); > } > > Output: > > [ 66.184030] create_pagelist: block 1, DMA address 0x00000000f5fc7800 > doesn't align with PAGE_MASK 0xfffffffffffff000 > [ 66.184056] sg_dma_is_swiotlb: 0, dma_flags: 0 > [ 66.184063] type = PAGELIST_READ ...so from the dma_dir assignment, this would imply DMA_FROM_DEVICE, (indicating vchiq writing to RAM, CPU reading from RAM) which with the unaligned address below would indeed cause a bounce. However that appears to contradict what Phil said is supposed to happen :/ Thanks, Robin > [ 66.184066] ubuf = 00000000266a70a7, count = 0 >> >> Arnd >
diff --git a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c index 9fb8f657cc78..2beeb94f629e 100644 --- a/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c +++ b/drivers/staging/vc04_services/interface/vchiq_arm/vchiq_arm.c @@ -192,7 +192,7 @@ cleanup_pagelistinfo(struct vchiq_instance *instance, struct vchiq_pagelist_info } static inline bool -is_adjacent_block(u32 *addrs, u32 addr, unsigned int k) +is_adjacent_block(u32 *addrs, dma_addr_t addr, unsigned int k) { u32 tmp;