Message ID | 56A6344C02000078000CAA56@prv-mh.provo.novell.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, 2016-01-25 at 06:42 -0700, Jan Beulich wrote: > > > > On 21.01.16 at 16:14, <dario.faggioli@citrix.com> wrote: > > On Wed, 2016-01-20 at 03:06 -0700, Jan Beulich wrote: > > > But of course another question then is why the XRSTORS faults > > > in the first place. I guess we'll need a debugging patch to dump > > > the full state to understand that. > > > > > If someone can produce and send such patch, I'm sure Harmandeep > > will be > > happy to give it a try on her hardware. > > So here you go. Instead of a debugging one, I hope I have at > once fixed the issue in a suitable way. Whether we'd like to keep > the debugging output we can decide later on. > Great, thanks! > Both patches need to be applied; while the order shouldn't matter, > the alignment one is a prereq to the actual change. > Ok. Harmandeep, can you give these patches a try when you get a chance? (contact me, by email or on IRC, if you need help doing so). Regards, Dario
Hi guys, I tried the patch and I am very happy to inform you all that the patch has solved my problem. Now I am able to boot Xen without disabling XSAVE. I have full log of boot at http://paste2.org/gVW0Z9nm (if any one is interested. also "XXX Hello, this is my first mod :)" is printed by my mod, so ignore that one). Thank you guys for your help. Regards, Harman On Mon, Jan 25, 2016 at 7:12 PM, Jan Beulich <JBeulich@suse.com> wrote: >>>> On 21.01.16 at 16:14, <dario.faggioli@citrix.com> wrote: >> On Wed, 2016-01-20 at 03:06 -0700, Jan Beulich wrote: >>> > > > On 19.01.16 at 20:55, <write.harmandeep@gmail.com> wrote: >>> > >>> > $ 'addr2line -e xen-syms ffff82d0801c1cce' returns >>> > 'xen/xen/arch/x86/xstate.c:387' which again points to >>> > xsave. Also, adding 'xsave=0' makes it boot just fine. >>> >>> Ah, I think I see the issue: We're zeroing the entire save area in >>> the fixup code, which will make XRSTORS fault unconditionally. >>> Shuai, would you please look into possible ways of fixing this? >>> Just setting the compressed flag may not be enough, and falling >>> back to plain XRSTOR doesn't seem to be an option either - I'm in >>> particular worried that the current model of zeroing everything >>> is bogus with the growing number of different components which >>> get loaded here. >>> >>> But of course another question then is why the XRSTORS faults >>> in the first place. I guess we'll need a debugging patch to dump >>> the full state to understand that. >>> >> If someone can produce and send such patch, I'm sure Harmandeep will be >> happy to give it a try on her hardware. > > So here you go. Instead of a debugging one, I hope I have at > once fixed the issue in a suitable way. Whether we'd like to keep > the debugging output we can decide later on. > > Both patches need to be applied; while the order shouldn't matter, > the alignment one is a prereq to the actual change. > > Jan >
The patch as I already said is letting me boot into the Xen, but the system is now resetting stating XSAVE as the cause. I have attached links to two cases where system was reset as the result. I don't think that problem is fully solved yet. http://paste2.org/Ky56Z92g http://paste2.org/3hcbG6L7 Regards, Harman On Tue, Jan 26, 2016 at 6:21 PM, Harmandeep Kaur <write.harmandeep@gmail.com> wrote: > Hi guys, > > I tried the patch and I am very happy to inform you > all that the patch has solved my problem. Now I am > able to boot Xen without disabling XSAVE. I have > full log of boot at http://paste2.org/gVW0Z9nm (if > any one is interested. also "XXX Hello, this is my > first mod :)" is printed by my mod, so ignore that > one). > > Thank you guys for your help. > Regards, > Harman > > On Mon, Jan 25, 2016 at 7:12 PM, Jan Beulich <JBeulich@suse.com> wrote: >>>>> On 21.01.16 at 16:14, <dario.faggioli@citrix.com> wrote: >>> On Wed, 2016-01-20 at 03:06 -0700, Jan Beulich wrote: >>>> > > > On 19.01.16 at 20:55, <write.harmandeep@gmail.com> wrote: >>>> > >>>> > $ 'addr2line -e xen-syms ffff82d0801c1cce' returns >>>> > 'xen/xen/arch/x86/xstate.c:387' which again points to >>>> > xsave. Also, adding 'xsave=0' makes it boot just fine. >>>> >>>> Ah, I think I see the issue: We're zeroing the entire save area in >>>> the fixup code, which will make XRSTORS fault unconditionally. >>>> Shuai, would you please look into possible ways of fixing this? >>>> Just setting the compressed flag may not be enough, and falling >>>> back to plain XRSTOR doesn't seem to be an option either - I'm in >>>> particular worried that the current model of zeroing everything >>>> is bogus with the growing number of different components which >>>> get loaded here. >>>> >>>> But of course another question then is why the XRSTORS faults >>>> in the first place. I guess we'll need a debugging patch to dump >>>> the full state to understand that. >>>> >>> If someone can produce and send such patch, I'm sure Harmandeep will be >>> happy to give it a try on her hardware. >> >> So here you go. Instead of a debugging one, I hope I have at >> once fixed the issue in a suitable way. Whether we'd like to keep >> the debugging output we can decide later on. >> >> Both patches need to be applied; while the order shouldn't matter, >> the alignment one is a prereq to the actual change. >> >> Jan >>
>>> On 26.01.16 at 14:13, <write.harmandeep@gmail.com> wrote: > The patch as I already said is letting me boot > into the Xen, but the system is now resetting > stating XSAVE as the cause. I have attached > links to two cases where system was reset as > the result. I don't think that problem is fully > solved yet. > http://paste2.org/Ky56Z92g > http://paste2.org/3hcbG6L7 I guess this would also get resolved by the 3rd patch just sent. Jan
I tried 3rd patch together with earlier two. I'm afraid the problem is not solved completely. Full log goes here, http://paste2.org/KEAetMHb Regards, Harmandeep On Tue, Jan 26, 2016 at 6:53 PM, Jan Beulich <JBeulich@suse.com> wrote: >>>> On 26.01.16 at 14:13, <write.harmandeep@gmail.com> wrote: >> The patch as I already said is letting me boot >> into the Xen, but the system is now resetting >> stating XSAVE as the cause. I have attached >> links to two cases where system was reset as >> the result. I don't think that problem is fully >> solved yet. >> http://paste2.org/Ky56Z92g >> http://paste2.org/3hcbG6L7 > > I guess this would also get resolved by the 3rd patch just sent. > > Jan >
>>> On 26.01.16 at 18:01, <write.harmandeep@gmail.com> wrote: > I tried 3rd patch together with earlier two. I'm > afraid the problem is not solved completely. > Full log goes here, http://paste2.org/KEAetMHb Well, wait - we can't solve all problems at once. The crash here is in the context of do_domctl(), i.e. makes me assume that the system booted up fine (and the problematic log messages are gone). So the original issue is fixed. Now for this second issue, would you mind first of all telling us what exactly it is you're doing when the crash occurs? Because, as you hopefully understand, such information often helps understanding where and/or why things are going wrong. Jan
Last time, I did absolutely nothing. System was idle and it crashed just after the login. Now, I booted the system again and this time, there is no reset. But, performance of the system is very slow. Browser (Mozilla Firefox) freezes a lot. Also, before applying patches, when I used to disabe xsave it resulted in same kind of performance issues. And the following is still present in the log. (XEN) traps.c:3290: GPF (0000): ffff82d0801c1cea -> ffff82d080252e5c (XEN) d1v1 fault#1: mxcsr=00001f80 (XEN) d1v1 xs=0000000000000003 xc=8000000000000000 (XEN) d1v1 r0=0000000000000000 r1=0000000000000000 (XEN) d1v1 r2=0000000000000000 r3=0000000000000000 (XEN) d1v1 r4=0000000000000000 r5=0000000000000000 (XEN) traps.c:3290: GPF (0000): ffff82d0801c1cea -> ffff82d080252e5c (XEN) d1v1 fault#2: mxcsr=00001f80 (XEN) d1v1 xs=0000000000000000 xc=0000000000000000 (XEN) d1v1 r0=0000000000000000 r1=0000000000000000 (XEN) d1v1 r2=0000000000000000 r3=0000000000000000 (XEN) d1v1 r4=0000000000000000 r5=0000000000000000 Full log here: http://paste2.org/C8WpyKOg Regards, Harmandeep On Tue, Jan 26, 2016 at 10:53 PM, Jan Beulich <JBeulich@suse.com> wrote: >>>> On 26.01.16 at 18:01, <write.harmandeep@gmail.com> wrote: >> I tried 3rd patch together with earlier two. I'm >> afraid the problem is not solved completely. >> Full log goes here, http://paste2.org/KEAetMHb > > Well, wait - we can't solve all problems at once. The crash here > is in the context of do_domctl(), i.e. makes me assume that the > system booted up fine (and the problematic log messages are > gone). So the original issue is fixed. Now for this second issue, > would you mind first of all telling us what exactly it is you're > doing when the crash occurs? Because, as you hopefully > understand, such information often helps understanding where > and/or why things are going wrong. > > Jan >
On Tue, 2016-01-26 at 23:32 +0530, Harmandeep Kaur wrote: > Last time, I did absolutely nothing. System was idle > and it crashed just after the login. Now, I booted the > system again and this time, there is no reset. But, > performance of the system is very slow. Browser > (Mozilla Firefox) freezes a lot. Also, before applying > patches, when I used to disabe xsave it resulted in > same kind of performance issues. > Mmm... are you sure the performance is actually affected by "xsave=0", and/or by Jan's patch? It's hard to check, as without at least one of them the box does not boot, but I don't think the things (e.g., Firefox freezing or not starting) are necessarily related. In particular, you have in your Xen boot parameters list, this item: "dom0_mem=1024M,max:1024M" This means that, in dom0, you will "only" have 1GB of RAM available. And if you just login after boot and start Firefox, dom0 is where Firefox is going to be running... and 1G, that Firefox will have to share with the rest of Linux running as dom0, may be too few RAM for it. And in fact, in your last log, we see this (from dom0, not from Xen!): [ 851.644443] Out of memory: Kill process 1945 (firefox) score 325 or sacrifice child [ 851.644461] Killed process 1945 (firefox) total-vm:1228008kB, anon-rss:305536kB, file-rss:0kB If you want to run a graphical environment on that test box, and browse with Firefox, then you should increase the amount of RAM you allow dom0 to use. When I suggested you to use 1024, I was assuming (given how your work environment is setup) you were not going to do any such thing. > And the following > is still present in the log. > > (XEN) traps.c:3290: GPF (0000): ffff82d0801c1cea -> ffff82d080252e5c > (XEN) d1v1 fault#1: mxcsr=00001f80 > (XEN) d1v1 xs=0000000000000003 xc=8000000000000000 > (XEN) d1v1 r0=0000000000000000 r1=0000000000000000 > (XEN) d1v1 r2=0000000000000000 r3=0000000000000000 > (XEN) d1v1 r4=0000000000000000 r5=0000000000000000 > (XEN) traps.c:3290: GPF (0000): ffff82d0801c1cea -> ffff82d080252e5c > (XEN) d1v1 fault#2: mxcsr=00001f80 > (XEN) d1v1 xs=0000000000000000 xc=0000000000000000 > (XEN) d1v1 r0=0000000000000000 r1=0000000000000000 > (XEN) d1v1 r2=0000000000000000 r3=0000000000000000 > (XEN) d1v1 r4=0000000000000000 r5=0000000000000000 > Mmm... and this is with all Jan's patches applied? So, just to make sure we understand each others, you're saying that, again with all patches applied, and with you not doing anything significantly different between a) and b) below, the system either: a) crashes right after login, like this: http://paste2.org/KEAetMHb b) does not crash (you're even able to try starting Firefox), but Xen produces the following output: http://paste2.org/C8WpyKOg Is this correct? Regards, Dario
On Tue, Jan 26, 2016 at 11:58 PM, Dario Faggioli <dario.faggioli@citrix.com> wrote: > On Tue, 2016-01-26 at 23:32 +0530, Harmandeep Kaur wrote: >> Last time, I did absolutely nothing. System was idle >> and it crashed just after the login. Now, I booted the >> system again and this time, there is no reset. But, >> performance of the system is very slow. Browser >> (Mozilla Firefox) freezes a lot. Also, before applying >> patches, when I used to disabe xsave it resulted in >> same kind of performance issues. >> > Mmm... are you sure the performance is actually affected by "xsave=0", > and/or by Jan's patch? It's hard to check, as without at least one of > them the box does not boot, but I don't think the things (e.g., Firefox > freezing or not starting) are necessarily related. > > In particular, you have in your Xen boot parameters list, this item: > > "dom0_mem=1024M,max:1024M" > > This means that, in dom0, you will "only" have 1GB of RAM available. > And if you just login after boot and start Firefox, dom0 is where > Firefox is going to be running... and 1G, that Firefox will have to > share with the rest of Linux running as dom0, may be too few RAM for > it. And in fact, in your last log, we see this (from dom0, not from > Xen!): > > [ 851.644443] Out of memory: Kill process 1945 (firefox) score 325 or sacrifice child > [ 851.644461] Killed process 1945 (firefox) total-vm:1228008kB, anon-rss:305536kB, file-rss:0kB > > If you want to run a graphical environment on that test box, and browse > with Firefox, then you should increase the amount of RAM you allow dom0 > to use. When I suggested you to use 1024, I was assuming (given how > your work environment is setup) you were not going to do any such > thing. > Actually I didn't notice this performance issue before this xsave bug, maybe we added this line later on. Anyways I can now check this by increasing the memory. >> And the following >> is still present in the log. >> >> (XEN) traps.c:3290: GPF (0000): ffff82d0801c1cea -> ffff82d080252e5c >> (XEN) d1v1 fault#1: mxcsr=00001f80 >> (XEN) d1v1 xs=0000000000000003 xc=8000000000000000 >> (XEN) d1v1 r0=0000000000000000 r1=0000000000000000 >> (XEN) d1v1 r2=0000000000000000 r3=0000000000000000 >> (XEN) d1v1 r4=0000000000000000 r5=0000000000000000 >> (XEN) traps.c:3290: GPF (0000): ffff82d0801c1cea -> ffff82d080252e5c >> (XEN) d1v1 fault#2: mxcsr=00001f80 >> (XEN) d1v1 xs=0000000000000000 xc=0000000000000000 >> (XEN) d1v1 r0=0000000000000000 r1=0000000000000000 >> (XEN) d1v1 r2=0000000000000000 r3=0000000000000000 >> (XEN) d1v1 r4=0000000000000000 r5=0000000000000000 >> > Mmm... and this is with all Jan's patches applied? Yes, all three patches applied. > So, just to make sure we understand each others, you're saying that, > again with all patches applied, and with you not doing anything > significantly different between a) and b) below, the system either: > > a) crashes right after login, like this: http://paste2.org/KEAetMHb > > b) does not crash (you're even able to try starting Firefox), but Xen > produces the following output: http://paste2.org/C8WpyKOg > > Is this correct? Yes. And I tried to boot several times and around 70-80% of times system crashes right after the login as in case 'a'. Regards, Harmandeep > Regards, > Dario > -- > <<This happens because I choose it to happen!>> (Raistlin Majere) > ----------------------------------------------------------------- > Dario Faggioli, Ph.D, http://about.me/dario.faggioli > Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) >
>>> On 26.01.16 at 19:02, <write.harmandeep@gmail.com> wrote: > Last time, I did absolutely nothing. System was idle > and it crashed just after the login. Now, I booted the > system again and this time, there is no reset. But, > performance of the system is very slow. Browser > (Mozilla Firefox) freezes a lot. Also, before applying > patches, when I used to disabe xsave it resulted in > same kind of performance issues. And the following > is still present in the log. > > (XEN) traps.c:3290: GPF (0000): ffff82d0801c1cea -> ffff82d080252e5c > (XEN) d1v1 fault#1: mxcsr=00001f80 > (XEN) d1v1 xs=0000000000000003 xc=8000000000000000 > (XEN) d1v1 r0=0000000000000000 r1=0000000000000000 > (XEN) d1v1 r2=0000000000000000 r3=0000000000000000 > (XEN) d1v1 r4=0000000000000000 r5=0000000000000000 > (XEN) traps.c:3290: GPF (0000): ffff82d0801c1cea -> ffff82d080252e5c > (XEN) d1v1 fault#2: mxcsr=00001f80 > (XEN) d1v1 xs=0000000000000000 xc=0000000000000000 > (XEN) d1v1 r0=0000000000000000 r1=0000000000000000 > (XEN) d1v1 r2=0000000000000000 r3=0000000000000000 > (XEN) d1v1 r4=0000000000000000 r5=0000000000000000 > > Full log here: http://paste2.org/C8WpyKOg This is not _still_ present, but that's another, different instance, now during guest creation. Please try to be precise in your wording. Namely, in this case, you again fail to mention what kind of guest you're now creating that causes this to re-appear. Jan
--- unstable.orig/xen/arch/x86/xstate.c 2016-01-25 09:35:12.000000000 +0100 +++ unstable/xen/arch/x86/xstate.c 2016-01-25 12:34:48.000000000 +0100 @@ -29,6 +29,8 @@ unsigned int *__read_mostly xstate_sizes static unsigned int __read_mostly xstate_features; static unsigned int __read_mostly xstate_comp_offsets[sizeof(xfeature_mask)*8]; +static uint32_t __read_mostly mxcsr_mask = MXCSR_DEFAULT; + /* Cached xcr0 for fast read */ static DEFINE_PER_CPU(uint64_t, xcr0); @@ -342,6 +344,7 @@ void xrstor(struct vcpu *v, uint64_t mas uint32_t hmask = mask >> 32; uint32_t lmask = mask; struct xsave_struct *ptr = v->arch.xsave_area; + unsigned int faults, prev_faults; /* * AMD CPUs don't save/restore FDP/FIP/FOP unless an exception @@ -361,35 +364,79 @@ void xrstor(struct vcpu *v, uint64_t mas /* * XRSTOR can fault if passed a corrupted data block. We handle this * possibility, which may occur if the block was passed to us by control - * tools or through VCPUOP_initialise, by silently clearing the block. + * tools or through VCPUOP_initialise, by silently adjusting state. */ - switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) ) + for ( prev_faults = faults = 0; ; prev_faults = faults ) { + switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) ) + { #define XRSTOR(pfx) \ alternative_io("1: .byte " pfx "0x0f,0xae,0x2f\n" \ + "3:\n" \ " .section .fixup,\"ax\"\n" \ - "2: mov %[size],%%ecx\n" \ - " xor %[lmask_out],%[lmask_out]\n" \ - " rep stosb\n" \ - " lea %[mem],%[ptr]\n" \ - " mov %[lmask_in],%[lmask_out]\n" \ - " jmp 1b\n" \ + "2: inc%z[faults] %[faults]\n" \ + " jmp 3b\n" \ " .previous\n" \ _ASM_EXTABLE(1b, 2b), \ ".byte " pfx "0x0f,0xc7,0x1f\n", \ X86_FEATURE_XSAVES, \ - ASM_OUTPUT2([ptr] "+&D" (ptr), [lmask_out] "+&a" (lmask)), \ - [mem] "m" (*ptr), [lmask_in] "g" (lmask), \ - [hmask] "d" (hmask), [size] "m" (xsave_cntxt_size) \ - : "ecx") - - default: - XRSTOR("0x48,"); - break; - case 4: case 2: - XRSTOR(""); - break; + ASM_OUTPUT2([mem] "+m" (*ptr), [faults] "+g" (faults)), \ + [lmask] "a" (lmask), [hmask] "d" (hmask), \ + [ptr] "D" (ptr)) + + default: + XRSTOR("0x48,"); + break; + case 4: case 2: + XRSTOR(""); + break; #undef XRSTOR + } + if ( likely(faults == prev_faults) ) + break; +#ifndef NDEBUG + gprintk(XENLOG_WARNING, "fault#%u: mxcsr=%08x\n", + faults, ptr->fpu_sse.mxcsr); + gprintk(XENLOG_WARNING, "xs=%016lx xc=%016lx\n", + ptr->xsave_hdr.xstate_bv, ptr->xsave_hdr.xcomp_bv); + gprintk(XENLOG_WARNING, "r0=%016lx r1=%016lx\n", + ptr->xsave_hdr.reserved[0], ptr->xsave_hdr.reserved[1]); + gprintk(XENLOG_WARNING, "r2=%016lx r3=%016lx\n", + ptr->xsave_hdr.reserved[2], ptr->xsave_hdr.reserved[3]); + gprintk(XENLOG_WARNING, "r4=%016lx r5=%016lx\n", + ptr->xsave_hdr.reserved[4], ptr->xsave_hdr.reserved[5]); +#endif + switch ( faults ) + { + case 1: + if ( (ptr->fpu_sse.mxcsr & ~mxcsr_mask) && + ((mask & XSTATE_SSE) || + ((mask & XSTATE_YMM) && + !(ptr->xsave_hdr.xcomp_bv & XSTATE_COMPACTION_ENABLED))) ) + ptr->fpu_sse.mxcsr &= mxcsr_mask; + ptr->xsave_hdr.xstate_bv &= ~mask; + if ( cpu_has_xsaves || cpu_has_xsavec ) + { + ptr->xsave_hdr.xcomp_bv &= this_cpu(xcr0) | this_cpu(xss); + ptr->xsave_hdr.xstate_bv &= ptr->xsave_hdr.xcomp_bv; + } + else + { + ptr->xsave_hdr.xstate_bv &= this_cpu(xcr0); + ptr->xsave_hdr.xcomp_bv = 0; + } + memset(ptr->xsave_hdr.reserved, 0, sizeof(ptr->xsave_hdr.reserved)); + continue; + case 2: + ptr->fpu_sse.mxcsr = MXCSR_DEFAULT; + ptr->xsave_hdr.xstate_bv = 0; + ptr->xsave_hdr.xcomp_bv = cpu_has_xsaves + ? XSTATE_COMPACTION_ENABLED : 0; + continue; + default: + domain_crash(current->domain); + break; + } } } @@ -414,7 +461,7 @@ int xstate_alloc_save_area(struct vcpu * BUG_ON(xsave_cntxt_size < XSTATE_AREA_MIN_SIZE); /* XSAVE/XRSTOR requires the save area be 64-byte-boundary aligned. */ - BUILD_BUG_ON(__alignof(*save_area) < 64); + BUILD_BUG_ON(__alignof(*v->arch.xsave_area) < 64); save_area = _xzalloc(xsave_cntxt_size, __alignof(*save_area)); if ( save_area == NULL ) return -ENOMEM; @@ -496,6 +543,8 @@ void xstate_init(struct cpuinfo_x86 *c) if ( bsp ) { + static typeof(current->arch.xsave_area->fpu_sse) __initdata ctxt; + xfeature_mask = feature_mask; /* * xsave_cntxt_size is the max size required by enabled features. @@ -504,6 +553,10 @@ void xstate_init(struct cpuinfo_x86 *c) xsave_cntxt_size = _xstate_ctxt_size(feature_mask); printk("%s: using cntxt_size: %#x and states: %#"PRIx64"\n", __func__, xsave_cntxt_size, xfeature_mask); + + asm ( "fxsave %0" : "=m" (ctxt) ); + if ( ctxt.mxcsr_mask ) + mxcsr_mask = ctxt.mxcsr_mask; } else {