Message ID | 9a1c3232-86e3-7301-23f8-50116abf37d3@virtuozzo.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Herbert Xu |
Headers | show |
On Fri, Oct 13, 2017 at 04:56:43PM +0300, Andrey Ryabinin wrote: > On 10/13/2017 07:45 AM, Josh Poimboeuf wrote: > > On Thu, Oct 12, 2017 at 12:05:04PM -0500, Christopher Lameter wrote: > >> On Wed, 11 Oct 2017, Josh Poimboeuf wrote: > >> > >>> I failed to add the slab maintainers to CC on the last attempt. Trying > >>> again. > >> > >> > >> Hmmm... Yea. SLOB is rarely used and tested. Good illustration of a simple > >> allocator and the K&R mechanism that was used in the early kernels. > >> > >>>> Adding the slub maintainers. Is slob still supposed to work? > >> > >> Have not seen anyone using it in a decade or so. > >> > >> Does the same config with SLUB and slub_debug on the commandline run > >> cleanly? > >> > >>>> I have no idea how that crypto panic could could be related to slob, but > >>>> at least it goes away when I switch to slub. > >> > >> Can you run SLUB with full debug? specify slub_debug on the commandline or > >> set CONFIG_SLUB_DEBUG_ON > > > > Oddly enough, with CONFIG_SLUB+slub_debug, I get the same crypto panic I > > got with CONFIG_SLOB. The trapping instruction is: > > > > vmovdqa 0x140(%rdi),%xmm0 > > > It's unaligned access. Look at %rdi. vmovdqa requires 16-byte alignment. > Apparently, something fed kmalloc()'ed data here. But kmalloc() guarantees only sizeof(unsigned long) > alignment. slub_debug changes slub's objects layout, so what happened to be 16-bytes aligned > without slub_debug, may become 8-byte aligned with slub_debg on. > > > > I'll try to bisect it tomorrow. It at least goes back to v4.10. > > Probably no point. I bet this bug always was here (since this code added). > > This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix > would be to align the data properly? > > --- > arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S | 8 ++++---- > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S > index 8fe6338bcc84..7fd5d9b568c7 100644 > --- a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S > +++ b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S > @@ -155,8 +155,8 @@ LABEL skip_ %I > .endr > > # Find min length > - vmovdqa _lens+0*16(state), %xmm0 > - vmovdqa _lens+1*16(state), %xmm1 > + vmovdqu _lens+0*16(state), %xmm0 > + vmovdqu _lens+1*16(state), %xmm1 > > vpminud %xmm1, %xmm0, %xmm2 # xmm2 has {D,C,B,A} > vpalignr $8, %xmm2, %xmm3, %xmm3 # xmm3 has {x,x,D,C} > @@ -176,8 +176,8 @@ LABEL skip_ %I > vpsubd %xmm2, %xmm0, %xmm0 > vpsubd %xmm2, %xmm1, %xmm1 > > - vmovdqa %xmm0, _lens+0*16(state) > - vmovdqa %xmm1, _lens+1*16(state) > + vmovdqu %xmm0, _lens+0*16(state) > + vmovdqu %xmm1, _lens+1*16(state) > > # "state" and "args" are the same address, arg1 > # len is arg2 > -- > 2.13.6 Makes sense. I can confirm that the above patch fixes the panic.
On Fri, Oct 13, 2017 at 6:56 AM, Andrey Ryabinin <aryabinin@virtuozzo.com> wrote: > > This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix > would be to align the data properly? I suspect anything that has the SHA extensions should also do unaligned loads efficiently. The whole "aligned only" model is broken. It's just doing two loads from the state pointer, there's likely no point in trying to align it. So your patch looks fine, but maybe somebody could add the required alignment to the sha256 context allocation (which I don't know where it is). But yeah, that other SLOB panic looks unrelated to this. Linus
On Fri, Oct 13, 2017 at 12:09 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Oct 13, 2017 at 6:56 AM, Andrey Ryabinin > <aryabinin@virtuozzo.com> wrote: >> >> This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix >> would be to align the data properly? > > I suspect anything that has the SHA extensions should also do > unaligned loads efficiently. The whole "aligned only" model is broken. > It's just doing two loads from the state pointer, there's likely no > point in trying to align it. > > So your patch looks fine, but maybe somebody could add the required > alignment to the sha256 context allocation (which I don't know where > it is). IIRC if we try the latter, then we'll risk hitting the #*!&@% gcc bug that mostly prevents 16-byte alignment from working on GCC before 4.8 or so. That way lies debugging disasters.
On Fri, Oct 13, 2017 at 3:09 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Fri, Oct 13, 2017 at 6:56 AM, Andrey Ryabinin > <aryabinin@virtuozzo.com> wrote: >> >> This could be fixed by s/vmovdqa/vmovdqu change like bellow, but maybe the right fix >> would be to align the data properly? > > I suspect anything that has the SHA extensions should also do > unaligned loads efficiently. The whole "aligned only" model is broken. > It's just doing two loads from the state pointer, there's likely no > point in trying to align it. +1, good engineering. AVX2 requires 32-byte buffer alignment in some places. It is trickier than this use case because __BIGGEST_ALIGNMENT__ doubled, but a lot of code still assumes 16-bytes. Jeff
diff --git a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S index 8fe6338bcc84..7fd5d9b568c7 100644 --- a/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S +++ b/arch/x86/crypto/sha256-mb/sha256_mb_mgr_flush_avx2.S @@ -155,8 +155,8 @@ LABEL skip_ %I .endr # Find min length - vmovdqa _lens+0*16(state), %xmm0 - vmovdqa _lens+1*16(state), %xmm1 + vmovdqu _lens+0*16(state), %xmm0 + vmovdqu _lens+1*16(state), %xmm1 vpminud %xmm1, %xmm0, %xmm2 # xmm2 has {D,C,B,A} vpalignr $8, %xmm2, %xmm3, %xmm3 # xmm3 has {x,x,D,C} @@ -176,8 +176,8 @@ LABEL skip_ %I vpsubd %xmm2, %xmm0, %xmm0 vpsubd %xmm2, %xmm1, %xmm1 - vmovdqa %xmm0, _lens+0*16(state) - vmovdqa %xmm1, _lens+1*16(state) + vmovdqu %xmm0, _lens+0*16(state) + vmovdqu %xmm1, _lens+1*16(state) # "state" and "args" are the same address, arg1 # len is arg2