diff mbox

xen/arm and swiotlb-xen: possible data corruption

Message ID 20170302085332.GU9606@toto (mailing list archive)
State New, archived
Headers show

Commit Message

Edgar E. Iglesias March 2, 2017, 8:53 a.m. UTC
On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
> > Hi all,
> > 
> > Edgar reported a data corruption on network packets in dom0 when the
> > swiotlb-xen is in use. He also reported that the following patch "fixes"
> > the problem for him:
> > 
> >  static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
> >                 size_t size, enum dma_data_direction dir)
> >  {
> > -       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, DMA_MAP);
> > +       printk("%s: addr=%lx size=%zd\n", __func__, handle, size);
> > +       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size + 64, dir, DMA_MAP);
> > 
> > I am thinking that the problem has something to do with cacheline
> > alignment on the Xen side
> > (xen/common/grant_table.c:__gnttab_cache_flush).
> > 
> > If op == GNTTAB_CACHE_INVAL, we call invalidate_dcache_va_range; if op
> > == GNTTAB_CACHE_CLEAN, we call clean_dcache_va_range instead. The
> > parameter, v, could be non-cacheline aligned.
> > 
> > invalidate_dcache_va_range is capable of handling a not aligned address,
> > while clean_dcache_va_range does not.
> > 
> > Edgar, does the appended patch fix the problem for you?
> 
> 
> Thanks Stefano,
> 
> This does indeed fix the issue for me.


Hi again,

Looking at the code, the problem here is that we may flush one cache line
less than expected.

This smaller patch fixes it for me too:


Anyway, I'm OK with either fix.

Cheers,
Edgar



> 
> Cheers,
> Edgar
> 
> 
> > 
> > ---
> > 
> > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> > index 86de0b6..9cdf2fb 100644
> > --- a/xen/include/asm-arm/page.h
> > +++ b/xen/include/asm-arm/page.h
> > @@ -322,10 +322,30 @@ static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
> >  
> >  static inline int clean_dcache_va_range(const void *p, unsigned long size)
> >  {
> > -    const void *end;
> > +    size_t off;
> > +    const void *end = p + size;
> > +
> >      dsb(sy);           /* So the CPU issues all writes to the range */
> > -    for ( end = p + size; p < end; p += cacheline_bytes )
> > +
> > +    off = (unsigned long)p % cacheline_bytes;
> > +    if ( off )
> > +    {
> > +        p -= off;
> >          asm volatile (__clean_dcache_one(0) : : "r" (p));
> > +        p += cacheline_bytes;
> > +        size -= cacheline_bytes - off;
> > +    }
> > +    off = (unsigned long)end % cacheline_bytes;
> > +    if ( off )
> > +    {
> > +        end -= off;
> > +        size -= off;
> > +        asm volatile (__clean_dcache_one(0) : : "r" (end));
> > +    }
> > +
> > +    for ( ; p < end; p += cacheline_bytes )
> > +        asm volatile (__clean_dcache_one(0) : : "r" (p));
> > +
> >      dsb(sy);           /* So we know the flushes happen before continuing */
> >      /* ARM callers assume that dcache_* functions cannot fail. */
> >      return 0;
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

Comments

Julien Grall March 2, 2017, 5:56 p.m. UTC | #1
Hi Edgar,

On 02/03/17 08:53, Edgar E. Iglesias wrote:
> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
>>> Hi all,
>>>
>>> Edgar reported a data corruption on network packets in dom0 when the
>>> swiotlb-xen is in use. He also reported that the following patch "fixes"
>>> the problem for him:
>>>
>>>  static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
>>>                 size_t size, enum dma_data_direction dir)
>>>  {
>>> -       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, DMA_MAP);
>>> +       printk("%s: addr=%lx size=%zd\n", __func__, handle, size);
>>> +       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size + 64, dir, DMA_MAP);
>>>
>>> I am thinking that the problem has something to do with cacheline
>>> alignment on the Xen side
>>> (xen/common/grant_table.c:__gnttab_cache_flush).
>>>
>>> If op == GNTTAB_CACHE_INVAL, we call invalidate_dcache_va_range; if op
>>> == GNTTAB_CACHE_CLEAN, we call clean_dcache_va_range instead. The
>>> parameter, v, could be non-cacheline aligned.
>>>
>>> invalidate_dcache_va_range is capable of handling a not aligned address,
>>> while clean_dcache_va_range does not.
>>>
>>> Edgar, does the appended patch fix the problem for you?
>>
>>
>> Thanks Stefano,
>>
>> This does indeed fix the issue for me.
>
>
> Hi again,
>
> Looking at the code, the problem here is that we may flush one cache line
> less than expected.
>
> This smaller patch fixes it for me too:
> diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> index c492d6d..fa1b4dd 100644
> --- a/xen/include/asm-arm/page.h
> +++ b/xen/include/asm-arm/page.h
> @@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size)
>  {
>      const void *end;
>      dsb(sy);           /* So the CPU issues all writes to the range */
> -    for ( end = p + size; p < end; p += cacheline_bytes )
> +
> +    end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes);
> +    for ( ; p < end; p += cacheline_bytes )
>          asm volatile (__clean_dcache_one(0) : : "r" (p));
>      dsb(sy);           /* So we know the flushes happen before continuing */
>      /* ARM callers assume that dcache_* functions cannot fail. */
>
>
> Anyway, I'm OK with either fix.

I would prefer your version compare to Stefano's one.

Cheers,
Stefano Stabellini March 2, 2017, 7:12 p.m. UTC | #2
On Thu, 2 Mar 2017, Julien Grall wrote:
> On 02/03/17 08:53, Edgar E. Iglesias wrote:
> > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
> > > On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
> > > > Hi all,
> > > > 
> > > > Edgar reported a data corruption on network packets in dom0 when the
> > > > swiotlb-xen is in use. He also reported that the following patch "fixes"
> > > > the problem for him:
> > > > 
> > > >  static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t
> > > > handle,
> > > >                 size_t size, enum dma_data_direction dir)
> > > >  {
> > > > -       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size,
> > > > dir, DMA_MAP);
> > > > +       printk("%s: addr=%lx size=%zd\n", __func__, handle, size);
> > > > +       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size +
> > > > 64, dir, DMA_MAP);
> > > > 
> > > > I am thinking that the problem has something to do with cacheline
> > > > alignment on the Xen side
> > > > (xen/common/grant_table.c:__gnttab_cache_flush).
> > > > 
> > > > If op == GNTTAB_CACHE_INVAL, we call invalidate_dcache_va_range; if op
> > > > == GNTTAB_CACHE_CLEAN, we call clean_dcache_va_range instead. The
> > > > parameter, v, could be non-cacheline aligned.
> > > > 
> > > > invalidate_dcache_va_range is capable of handling a not aligned address,
> > > > while clean_dcache_va_range does not.
> > > > 
> > > > Edgar, does the appended patch fix the problem for you?
> > > 
> > > 
> > > Thanks Stefano,
> > > 
> > > This does indeed fix the issue for me.

Thanks for reporting and testing!


> > Hi again,
> > 
> > Looking at the code, the problem here is that we may flush one cache line
> > less than expected.
> > 
> > This smaller patch fixes it for me too:
> > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> > index c492d6d..fa1b4dd 100644
> > --- a/xen/include/asm-arm/page.h
> > +++ b/xen/include/asm-arm/page.h
> > @@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p,
> > unsigned long size)
> >  {
> >      const void *end;
> >      dsb(sy);           /* So the CPU issues all writes to the range */
> > -    for ( end = p + size; p < end; p += cacheline_bytes )
> > +
> > +    end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes);
> > +    for ( ; p < end; p += cacheline_bytes )
> >          asm volatile (__clean_dcache_one(0) : : "r" (p));
> >      dsb(sy);           /* So we know the flushes happen before continuing
> > */
> >      /* ARM callers assume that dcache_* functions cannot fail. */
> > 
> > 
> > Anyway, I'm OK with either fix.
> 
> I would prefer your version compare to Stefano's one.

Julien, from looking at the two diffs, this is simpler and nicer, but if
you look at xen/include/asm-arm/page.h, my patch made
clean_dcache_va_range consistent with invalidate_dcache_va_range. For
consistency, I would prefer to deal with the two functions the same way.
Although it is not a spec requirement, I also think that it is a good
idea to issue cache flushes from cacheline aligned addresses, like
invalidate_dcache_va_range does and Linux does, to make more obvious
what is going on.
Julien Grall March 2, 2017, 7:32 p.m. UTC | #3
Hi Stefano,

On 02/03/17 19:12, Stefano Stabellini wrote:
> On Thu, 2 Mar 2017, Julien Grall wrote:
>> On 02/03/17 08:53, Edgar E. Iglesias wrote:
>>> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
>>>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
> Julien, from looking at the two diffs, this is simpler and nicer, but if
> you look at xen/include/asm-arm/page.h, my patch made
> clean_dcache_va_range consistent with invalidate_dcache_va_range. For
> consistency, I would prefer to deal with the two functions the same way.
> Although it is not a spec requirement, I also think that it is a good
> idea to issue cache flushes from cacheline aligned addresses, like
> invalidate_dcache_va_range does and Linux does, to make more obvious
> what is going on.

invalid_dcache_va_range is split because the cache instruction differs 
for the start and end if unaligned. For them you want to use clean & 
invalidate rather than invalidate.

If you look at the implementation of other cache helpers in Linux (see 
dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only 
align start & end.

Also, the invalid_dcache_va_range is using modulo which I would rather 
avoid. The modulo in this case will not be optimized by the compiler 
because cacheline_bytes is not a constant.

So I still prefer to keep this function really simple.

BTW, you would also need to fix clean_and_invalidate_dcache_va_range.
Stefano Stabellini March 2, 2017, 10:39 p.m. UTC | #4
On Thu, 2 Mar 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 02/03/17 19:12, Stefano Stabellini wrote:
> > On Thu, 2 Mar 2017, Julien Grall wrote:
> > > On 02/03/17 08:53, Edgar E. Iglesias wrote:
> > > > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
> > > > > On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
> > Julien, from looking at the two diffs, this is simpler and nicer, but if
> > you look at xen/include/asm-arm/page.h, my patch made
> > clean_dcache_va_range consistent with invalidate_dcache_va_range. For
> > consistency, I would prefer to deal with the two functions the same way.
> > Although it is not a spec requirement, I also think that it is a good
> > idea to issue cache flushes from cacheline aligned addresses, like
> > invalidate_dcache_va_range does and Linux does, to make more obvious
> > what is going on.
> 
> invalid_dcache_va_range is split because the cache instruction differs for the
> start and end if unaligned. For them you want to use clean & invalidate rather
> than invalidate.
> 
> If you look at the implementation of other cache helpers in Linux (see
> dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only align
> start & end.

I don't think so, unless I am reading dcache_by_line_op wrong.


> Also, the invalid_dcache_va_range is using modulo which I would rather avoid.
> The modulo in this case will not be optimized by the compiler because
> cacheline_bytes is not a constant.

That is a good point. What if I replace the modulo op with

  p & (cacheline_bytes - 1)

in invalidate_dcache_va_range, then add the similar code to
clean_dcache_va_range and clean_and_invalidate_dcache_va_range?


> BTW, you would also need to fix clean_and_invalidate_dcache_va_range.

I'll do that, thanks for the reminder.
Julien Grall March 2, 2017, 11:19 p.m. UTC | #5
On 02/03/2017 22:39, Stefano Stabellini wrote:
> On Thu, 2 Mar 2017, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 02/03/17 19:12, Stefano Stabellini wrote:
>>> On Thu, 2 Mar 2017, Julien Grall wrote:
>>>> On 02/03/17 08:53, Edgar E. Iglesias wrote:
>>>>> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
>>>>>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
>>> Julien, from looking at the two diffs, this is simpler and nicer, but if
>>> you look at xen/include/asm-arm/page.h, my patch made
>>> clean_dcache_va_range consistent with invalidate_dcache_va_range. For
>>> consistency, I would prefer to deal with the two functions the same way.
>>> Although it is not a spec requirement, I also think that it is a good
>>> idea to issue cache flushes from cacheline aligned addresses, like
>>> invalidate_dcache_va_range does and Linux does, to make more obvious
>>> what is going on.
>>
>> invalid_dcache_va_range is split because the cache instruction differs for the
>> start and end if unaligned. For them you want to use clean & invalidate rather
>> than invalidate.
>>
>> If you look at the implementation of other cache helpers in Linux (see
>> dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only align
>> start & end.
>
> I don't think so, unless I am reading dcache_by_line_op wrong.

343         .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2
344         dcache_line_size \tmp1, \tmp2
345         add     \size, \kaddr, \size
346         sub     \tmp2, \tmp1, #1
347         bic     \kaddr, \kaddr, \tmp2
348 9998:
349         .if     (\op == cvau || \op == cvac)
350 alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
351         dc      \op, \kaddr
352 alternative_else
353         dc      civac, \kaddr
354 alternative_endif
355         .else
356         dc      \op, \kaddr
357         .endif
358         add     \kaddr, \kaddr, \tmp1
359         cmp     \kaddr, \size
360         b.lo    9998b
361         dsb     \domain
362         .endm
363

It has only one cache instruction in the resulting assembly because it 
has .if/.else assembly directives.

Cheers,
Stefano Stabellini March 3, 2017, 12:53 a.m. UTC | #6
On Thu, 2 Mar 2017, Julien Grall wrote:
> On 02/03/2017 22:39, Stefano Stabellini wrote:
> > On Thu, 2 Mar 2017, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 02/03/17 19:12, Stefano Stabellini wrote:
> > > > On Thu, 2 Mar 2017, Julien Grall wrote:
> > > > > On 02/03/17 08:53, Edgar E. Iglesias wrote:
> > > > > > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
> > > > > > > On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini
> > > > > > > wrote:
> > > > Julien, from looking at the two diffs, this is simpler and nicer, but if
> > > > you look at xen/include/asm-arm/page.h, my patch made
> > > > clean_dcache_va_range consistent with invalidate_dcache_va_range. For
> > > > consistency, I would prefer to deal with the two functions the same way.
> > > > Although it is not a spec requirement, I also think that it is a good
> > > > idea to issue cache flushes from cacheline aligned addresses, like
> > > > invalidate_dcache_va_range does and Linux does, to make more obvious
> > > > what is going on.
> > > 
> > > invalid_dcache_va_range is split because the cache instruction differs for
> > > the
> > > start and end if unaligned. For them you want to use clean & invalidate
> > > rather
> > > than invalidate.
> > > 
> > > If you look at the implementation of other cache helpers in Linux (see
> > > dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only
> > > align
> > > start & end.
> > 
> > I don't think so, unless I am reading dcache_by_line_op wrong.
> 
> 343         .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2
> 344         dcache_line_size \tmp1, \tmp2
> 345         add     \size, \kaddr, \size
> 346         sub     \tmp2, \tmp1, #1
> 347         bic     \kaddr, \kaddr, \tmp2
> 348 9998:
> 349         .if     (\op == cvau || \op == cvac)
> 350 alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
> 351         dc      \op, \kaddr
> 352 alternative_else
> 353         dc      civac, \kaddr
> 354 alternative_endif
> 355         .else
> 356         dc      \op, \kaddr
> 357         .endif
> 358         add     \kaddr, \kaddr, \tmp1
> 359         cmp     \kaddr, \size
> 360         b.lo    9998b
> 361         dsb     \domain
> 362         .endm
> 363
> 
> It has only one cache instruction in the resulting assembly because it has
> .if/.else assembly directives.

Yes, but it does not only align start and end, all cache instructions
are called on aligned addresses, right?
Julien Grall March 3, 2017, 4:20 p.m. UTC | #7
Hi Stefano,

On 03/03/17 00:53, Stefano Stabellini wrote:
> On Thu, 2 Mar 2017, Julien Grall wrote:
>> On 02/03/2017 22:39, Stefano Stabellini wrote:
>>> On Thu, 2 Mar 2017, Julien Grall wrote:
>>>> Hi Stefano,
>>>>
>>>> On 02/03/17 19:12, Stefano Stabellini wrote:
>>>>> On Thu, 2 Mar 2017, Julien Grall wrote:
>>>>>> On 02/03/17 08:53, Edgar E. Iglesias wrote:
>>>>>>> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
>>>>>>>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini
>>>>>>>> wrote:
>>>>> Julien, from looking at the two diffs, this is simpler and nicer, but if
>>>>> you look at xen/include/asm-arm/page.h, my patch made
>>>>> clean_dcache_va_range consistent with invalidate_dcache_va_range. For
>>>>> consistency, I would prefer to deal with the two functions the same way.
>>>>> Although it is not a spec requirement, I also think that it is a good
>>>>> idea to issue cache flushes from cacheline aligned addresses, like
>>>>> invalidate_dcache_va_range does and Linux does, to make more obvious
>>>>> what is going on.
>>>>
>>>> invalid_dcache_va_range is split because the cache instruction differs for
>>>> the
>>>> start and end if unaligned. For them you want to use clean & invalidate
>>>> rather
>>>> than invalidate.
>>>>
>>>> If you look at the implementation of other cache helpers in Linux (see
>>>> dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only
>>>> align
>>>> start & end.
>>>
>>> I don't think so, unless I am reading dcache_by_line_op wrong.
>>
>> 343         .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2
>> 344         dcache_line_size \tmp1, \tmp2
>> 345         add     \size, \kaddr, \size
>> 346         sub     \tmp2, \tmp1, #1
>> 347         bic     \kaddr, \kaddr, \tmp2
>> 348 9998:
>> 349         .if     (\op == cvau || \op == cvac)
>> 350 alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
>> 351         dc      \op, \kaddr
>> 352 alternative_else
>> 353         dc      civac, \kaddr
>> 354 alternative_endif
>> 355         .else
>> 356         dc      \op, \kaddr
>> 357         .endif
>> 358         add     \kaddr, \kaddr, \tmp1
>> 359         cmp     \kaddr, \size
>> 360         b.lo    9998b
>> 361         dsb     \domain
>> 362         .endm
>> 363
>>
>> It has only one cache instruction in the resulting assembly because it has
>> .if/.else assembly directives.
>
> Yes, but it does not only align start and end, all cache instructions
> are called on aligned addresses, right?

I don't think so. The instruction "bic     \kaddr, \kaddr, \tmp2" will 
make sure the start address is aligned to a cache line size.

The C version of the assembly code is exactly what you wrote on the 
previous e-mail:

    end = p + size;
    p = (void *)ALIGN((uintptr_t)p, cacheline_bytes);

Cheers,
diff mbox

Patch

diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index c492d6d..fa1b4dd 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -325,7 +325,9 @@  static inline int clean_dcache_va_range(const void *p, unsigned long size)
 {
     const void *end;
     dsb(sy);           /* So the CPU issues all writes to the range */
-    for ( end = p + size; p < end; p += cacheline_bytes )
+
+    end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes);
+    for ( ; p < end; p += cacheline_bytes )
         asm volatile (__clean_dcache_one(0) : : "r" (p));
     dsb(sy);           /* So we know the flushes happen before continuing */
     /* ARM callers assume that dcache_* functions cannot fail. */