Message ID | 20170302085332.GU9606@toto (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Edgar, On 02/03/17 08:53, Edgar E. Iglesias wrote: > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote: >> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote: >>> Hi all, >>> >>> Edgar reported a data corruption on network packets in dom0 when the >>> swiotlb-xen is in use. He also reported that the following patch "fixes" >>> the problem for him: >>> >>> static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle, >>> size_t size, enum dma_data_direction dir) >>> { >>> - dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, DMA_MAP); >>> + printk("%s: addr=%lx size=%zd\n", __func__, handle, size); >>> + dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size + 64, dir, DMA_MAP); >>> >>> I am thinking that the problem has something to do with cacheline >>> alignment on the Xen side >>> (xen/common/grant_table.c:__gnttab_cache_flush). >>> >>> If op == GNTTAB_CACHE_INVAL, we call invalidate_dcache_va_range; if op >>> == GNTTAB_CACHE_CLEAN, we call clean_dcache_va_range instead. The >>> parameter, v, could be non-cacheline aligned. >>> >>> invalidate_dcache_va_range is capable of handling a not aligned address, >>> while clean_dcache_va_range does not. >>> >>> Edgar, does the appended patch fix the problem for you? >> >> >> Thanks Stefano, >> >> This does indeed fix the issue for me. > > > Hi again, > > Looking at the code, the problem here is that we may flush one cache line > less than expected. > > This smaller patch fixes it for me too: > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > index c492d6d..fa1b4dd 100644 > --- a/xen/include/asm-arm/page.h > +++ b/xen/include/asm-arm/page.h > @@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size) > { > const void *end; > dsb(sy); /* So the CPU issues all writes to the range */ > - for ( end = p + size; p < end; p += cacheline_bytes ) > + > + end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes); > + for ( ; p < end; p += cacheline_bytes ) > asm volatile (__clean_dcache_one(0) : : "r" (p)); > dsb(sy); /* So we know the flushes happen before continuing */ > /* ARM callers assume that dcache_* functions cannot fail. */ > > > Anyway, I'm OK with either fix. I would prefer your version compare to Stefano's one. Cheers,
On Thu, 2 Mar 2017, Julien Grall wrote: > On 02/03/17 08:53, Edgar E. Iglesias wrote: > > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote: > > > On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote: > > > > Hi all, > > > > > > > > Edgar reported a data corruption on network packets in dom0 when the > > > > swiotlb-xen is in use. He also reported that the following patch "fixes" > > > > the problem for him: > > > > > > > > static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t > > > > handle, > > > > size_t size, enum dma_data_direction dir) > > > > { > > > > - dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, > > > > dir, DMA_MAP); > > > > + printk("%s: addr=%lx size=%zd\n", __func__, handle, size); > > > > + dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size + > > > > 64, dir, DMA_MAP); > > > > > > > > I am thinking that the problem has something to do with cacheline > > > > alignment on the Xen side > > > > (xen/common/grant_table.c:__gnttab_cache_flush). > > > > > > > > If op == GNTTAB_CACHE_INVAL, we call invalidate_dcache_va_range; if op > > > > == GNTTAB_CACHE_CLEAN, we call clean_dcache_va_range instead. The > > > > parameter, v, could be non-cacheline aligned. > > > > > > > > invalidate_dcache_va_range is capable of handling a not aligned address, > > > > while clean_dcache_va_range does not. > > > > > > > > Edgar, does the appended patch fix the problem for you? > > > > > > > > > Thanks Stefano, > > > > > > This does indeed fix the issue for me. Thanks for reporting and testing! > > Hi again, > > > > Looking at the code, the problem here is that we may flush one cache line > > less than expected. > > > > This smaller patch fixes it for me too: > > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h > > index c492d6d..fa1b4dd 100644 > > --- a/xen/include/asm-arm/page.h > > +++ b/xen/include/asm-arm/page.h > > @@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p, > > unsigned long size) > > { > > const void *end; > > dsb(sy); /* So the CPU issues all writes to the range */ > > - for ( end = p + size; p < end; p += cacheline_bytes ) > > + > > + end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes); > > + for ( ; p < end; p += cacheline_bytes ) > > asm volatile (__clean_dcache_one(0) : : "r" (p)); > > dsb(sy); /* So we know the flushes happen before continuing > > */ > > /* ARM callers assume that dcache_* functions cannot fail. */ > > > > > > Anyway, I'm OK with either fix. > > I would prefer your version compare to Stefano's one. Julien, from looking at the two diffs, this is simpler and nicer, but if you look at xen/include/asm-arm/page.h, my patch made clean_dcache_va_range consistent with invalidate_dcache_va_range. For consistency, I would prefer to deal with the two functions the same way. Although it is not a spec requirement, I also think that it is a good idea to issue cache flushes from cacheline aligned addresses, like invalidate_dcache_va_range does and Linux does, to make more obvious what is going on.
Hi Stefano, On 02/03/17 19:12, Stefano Stabellini wrote: > On Thu, 2 Mar 2017, Julien Grall wrote: >> On 02/03/17 08:53, Edgar E. Iglesias wrote: >>> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote: >>>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote: > Julien, from looking at the two diffs, this is simpler and nicer, but if > you look at xen/include/asm-arm/page.h, my patch made > clean_dcache_va_range consistent with invalidate_dcache_va_range. For > consistency, I would prefer to deal with the two functions the same way. > Although it is not a spec requirement, I also think that it is a good > idea to issue cache flushes from cacheline aligned addresses, like > invalidate_dcache_va_range does and Linux does, to make more obvious > what is going on. invalid_dcache_va_range is split because the cache instruction differs for the start and end if unaligned. For them you want to use clean & invalidate rather than invalidate. If you look at the implementation of other cache helpers in Linux (see dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only align start & end. Also, the invalid_dcache_va_range is using modulo which I would rather avoid. The modulo in this case will not be optimized by the compiler because cacheline_bytes is not a constant. So I still prefer to keep this function really simple. BTW, you would also need to fix clean_and_invalidate_dcache_va_range.
On Thu, 2 Mar 2017, Julien Grall wrote: > Hi Stefano, > > On 02/03/17 19:12, Stefano Stabellini wrote: > > On Thu, 2 Mar 2017, Julien Grall wrote: > > > On 02/03/17 08:53, Edgar E. Iglesias wrote: > > > > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote: > > > > > On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote: > > Julien, from looking at the two diffs, this is simpler and nicer, but if > > you look at xen/include/asm-arm/page.h, my patch made > > clean_dcache_va_range consistent with invalidate_dcache_va_range. For > > consistency, I would prefer to deal with the two functions the same way. > > Although it is not a spec requirement, I also think that it is a good > > idea to issue cache flushes from cacheline aligned addresses, like > > invalidate_dcache_va_range does and Linux does, to make more obvious > > what is going on. > > invalid_dcache_va_range is split because the cache instruction differs for the > start and end if unaligned. For them you want to use clean & invalidate rather > than invalidate. > > If you look at the implementation of other cache helpers in Linux (see > dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only align > start & end. I don't think so, unless I am reading dcache_by_line_op wrong. > Also, the invalid_dcache_va_range is using modulo which I would rather avoid. > The modulo in this case will not be optimized by the compiler because > cacheline_bytes is not a constant. That is a good point. What if I replace the modulo op with p & (cacheline_bytes - 1) in invalidate_dcache_va_range, then add the similar code to clean_dcache_va_range and clean_and_invalidate_dcache_va_range? > BTW, you would also need to fix clean_and_invalidate_dcache_va_range. I'll do that, thanks for the reminder.
On 02/03/2017 22:39, Stefano Stabellini wrote: > On Thu, 2 Mar 2017, Julien Grall wrote: >> Hi Stefano, >> >> On 02/03/17 19:12, Stefano Stabellini wrote: >>> On Thu, 2 Mar 2017, Julien Grall wrote: >>>> On 02/03/17 08:53, Edgar E. Iglesias wrote: >>>>> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote: >>>>>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote: >>> Julien, from looking at the two diffs, this is simpler and nicer, but if >>> you look at xen/include/asm-arm/page.h, my patch made >>> clean_dcache_va_range consistent with invalidate_dcache_va_range. For >>> consistency, I would prefer to deal with the two functions the same way. >>> Although it is not a spec requirement, I also think that it is a good >>> idea to issue cache flushes from cacheline aligned addresses, like >>> invalidate_dcache_va_range does and Linux does, to make more obvious >>> what is going on. >> >> invalid_dcache_va_range is split because the cache instruction differs for the >> start and end if unaligned. For them you want to use clean & invalidate rather >> than invalidate. >> >> If you look at the implementation of other cache helpers in Linux (see >> dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only align >> start & end. > > I don't think so, unless I am reading dcache_by_line_op wrong. 343 .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2 344 dcache_line_size \tmp1, \tmp2 345 add \size, \kaddr, \size 346 sub \tmp2, \tmp1, #1 347 bic \kaddr, \kaddr, \tmp2 348 9998: 349 .if (\op == cvau || \op == cvac) 350 alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE 351 dc \op, \kaddr 352 alternative_else 353 dc civac, \kaddr 354 alternative_endif 355 .else 356 dc \op, \kaddr 357 .endif 358 add \kaddr, \kaddr, \tmp1 359 cmp \kaddr, \size 360 b.lo 9998b 361 dsb \domain 362 .endm 363 It has only one cache instruction in the resulting assembly because it has .if/.else assembly directives. Cheers,
On Thu, 2 Mar 2017, Julien Grall wrote: > On 02/03/2017 22:39, Stefano Stabellini wrote: > > On Thu, 2 Mar 2017, Julien Grall wrote: > > > Hi Stefano, > > > > > > On 02/03/17 19:12, Stefano Stabellini wrote: > > > > On Thu, 2 Mar 2017, Julien Grall wrote: > > > > > On 02/03/17 08:53, Edgar E. Iglesias wrote: > > > > > > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote: > > > > > > > On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini > > > > > > > wrote: > > > > Julien, from looking at the two diffs, this is simpler and nicer, but if > > > > you look at xen/include/asm-arm/page.h, my patch made > > > > clean_dcache_va_range consistent with invalidate_dcache_va_range. For > > > > consistency, I would prefer to deal with the two functions the same way. > > > > Although it is not a spec requirement, I also think that it is a good > > > > idea to issue cache flushes from cacheline aligned addresses, like > > > > invalidate_dcache_va_range does and Linux does, to make more obvious > > > > what is going on. > > > > > > invalid_dcache_va_range is split because the cache instruction differs for > > > the > > > start and end if unaligned. For them you want to use clean & invalidate > > > rather > > > than invalidate. > > > > > > If you look at the implementation of other cache helpers in Linux (see > > > dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only > > > align > > > start & end. > > > > I don't think so, unless I am reading dcache_by_line_op wrong. > > 343 .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2 > 344 dcache_line_size \tmp1, \tmp2 > 345 add \size, \kaddr, \size > 346 sub \tmp2, \tmp1, #1 > 347 bic \kaddr, \kaddr, \tmp2 > 348 9998: > 349 .if (\op == cvau || \op == cvac) > 350 alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE > 351 dc \op, \kaddr > 352 alternative_else > 353 dc civac, \kaddr > 354 alternative_endif > 355 .else > 356 dc \op, \kaddr > 357 .endif > 358 add \kaddr, \kaddr, \tmp1 > 359 cmp \kaddr, \size > 360 b.lo 9998b > 361 dsb \domain > 362 .endm > 363 > > It has only one cache instruction in the resulting assembly because it has > .if/.else assembly directives. Yes, but it does not only align start and end, all cache instructions are called on aligned addresses, right?
Hi Stefano, On 03/03/17 00:53, Stefano Stabellini wrote: > On Thu, 2 Mar 2017, Julien Grall wrote: >> On 02/03/2017 22:39, Stefano Stabellini wrote: >>> On Thu, 2 Mar 2017, Julien Grall wrote: >>>> Hi Stefano, >>>> >>>> On 02/03/17 19:12, Stefano Stabellini wrote: >>>>> On Thu, 2 Mar 2017, Julien Grall wrote: >>>>>> On 02/03/17 08:53, Edgar E. Iglesias wrote: >>>>>>> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote: >>>>>>>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini >>>>>>>> wrote: >>>>> Julien, from looking at the two diffs, this is simpler and nicer, but if >>>>> you look at xen/include/asm-arm/page.h, my patch made >>>>> clean_dcache_va_range consistent with invalidate_dcache_va_range. For >>>>> consistency, I would prefer to deal with the two functions the same way. >>>>> Although it is not a spec requirement, I also think that it is a good >>>>> idea to issue cache flushes from cacheline aligned addresses, like >>>>> invalidate_dcache_va_range does and Linux does, to make more obvious >>>>> what is going on. >>>> >>>> invalid_dcache_va_range is split because the cache instruction differs for >>>> the >>>> start and end if unaligned. For them you want to use clean & invalidate >>>> rather >>>> than invalidate. >>>> >>>> If you look at the implementation of other cache helpers in Linux (see >>>> dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only >>>> align >>>> start & end. >>> >>> I don't think so, unless I am reading dcache_by_line_op wrong. >> >> 343 .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2 >> 344 dcache_line_size \tmp1, \tmp2 >> 345 add \size, \kaddr, \size >> 346 sub \tmp2, \tmp1, #1 >> 347 bic \kaddr, \kaddr, \tmp2 >> 348 9998: >> 349 .if (\op == cvau || \op == cvac) >> 350 alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE >> 351 dc \op, \kaddr >> 352 alternative_else >> 353 dc civac, \kaddr >> 354 alternative_endif >> 355 .else >> 356 dc \op, \kaddr >> 357 .endif >> 358 add \kaddr, \kaddr, \tmp1 >> 359 cmp \kaddr, \size >> 360 b.lo 9998b >> 361 dsb \domain >> 362 .endm >> 363 >> >> It has only one cache instruction in the resulting assembly because it has >> .if/.else assembly directives. > > Yes, but it does not only align start and end, all cache instructions > are called on aligned addresses, right? I don't think so. The instruction "bic \kaddr, \kaddr, \tmp2" will make sure the start address is aligned to a cache line size. The C version of the assembly code is exactly what you wrote on the previous e-mail: end = p + size; p = (void *)ALIGN((uintptr_t)p, cacheline_bytes); Cheers,
diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h index c492d6d..fa1b4dd 100644 --- a/xen/include/asm-arm/page.h +++ b/xen/include/asm-arm/page.h @@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size) { const void *end; dsb(sy); /* So the CPU issues all writes to the range */ - for ( end = p + size; p < end; p += cacheline_bytes ) + + end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes); + for ( ; p < end; p += cacheline_bytes ) asm volatile (__clean_dcache_one(0) : : "r" (p)); dsb(sy); /* So we know the flushes happen before continuing */ /* ARM callers assume that dcache_* functions cannot fail. */