Message ID | 570EADD2.8030300@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: > Didn't help, but a fresh look at the list of 4.5 patches helped. > What the hell was I thinking, I missed write_rdtscp_aux who > obviously uses MSR_TSC_AUX. So you want me to apply that to 4.5 od 4.5.1 and try that? Greetings Marc
On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: > Didn't help, but a fresh look at the list of 4.5 patches helped. > What the hell was I thinking, I missed write_rdtscp_aux who > obviously uses MSR_TSC_AUX. I applied this patch to 4.5, which didn't go cleanly, I had to do it manually, and there is no change in behavior. Sometimes, the Vm just crashes, but most times the filesystem is remounted ro. [ 84.658968] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27903 [ 84.664877] Aborting journal on device dm-0-8. [ 84.667992] EXT4-fs (dm-0): Remounting filesystem read-only [ 84.670972] EXT4-fs error (device dm-0): ext4_journal_check_start:56: Detected aborted journal [ 84.763331] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27898 [ 84.825412] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27895 [ 84.907959] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27893 [ 84.915187] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27900 [ 84.961062] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27889 [ 84.983700] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27891 [ 98.315538] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: comm aide: deleted inode referenced: 27897 [ 98.323606] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: comm aide: deleted inode referenced: 27904 [ 99.889927] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27892 [ 99.893823] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27901 [ 99.901140] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27890 [ 99.904898] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27896 [ 99.909758] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27899 [ 99.914394] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27894 [ 207.132045] serial8250: too much work for irq4 [ 207.220043] serial8250: too much work for irq4 [ 207.312028] serial8250: too much work for irq4 Greetings Marc
On 14/04/2016 00:29, Marc Haber wrote: > On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: >> Didn't help, but a fresh look at the list of 4.5 patches helped. >> What the hell was I thinking, I missed write_rdtscp_aux who >> obviously uses MSR_TSC_AUX. > > I applied this patch to 4.5, which didn't go cleanly, I had to do it > manually, and there is no change in behavior. Sometimes, the Vm just > crashes, but most times the filesystem is remounted ro. Ok, then I guess bisection is needed. Please first try commit 45bdbcfdf241. If it fails, then the bug come together with KVM's merge window changes for 4.5-rc1. Please apply the patch I sent here when bisection is past 46896c73c1a4dde527c3a3cc43379deeb41985a1 (which means that probably that should be the commit you try second; the bisection then becomes much easier). Thanks, Paolo > [ 84.658968] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27903 > [ 84.664877] Aborting journal on device dm-0-8. > [ 84.667992] EXT4-fs (dm-0): Remounting filesystem read-only > [ 84.670972] EXT4-fs error (device dm-0): ext4_journal_check_start:56: Detected aborted journal > [ 84.763331] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27898 > [ 84.825412] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27895 > [ 84.907959] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27893 > [ 84.915187] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27900 > [ 84.961062] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27889 > [ 84.983700] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #7669: comm aide: deleted inode referenced: 27891 > [ 98.315538] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: comm aide: deleted inode referenced: 27897 > [ 98.323606] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #23567: comm aide: deleted inode referenced: 27904 > [ 99.889927] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27892 > [ 99.893823] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27901 > [ 99.901140] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27890 > [ 99.904898] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27896 > [ 99.909758] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27899 > [ 99.914394] EXT4-fs error (device dm-0): ext4_lookup:1602: inode #4650: comm aide: deleted inode referenced: 27894 > [ 207.132045] serial8250: too much work for irq4 > [ 207.220043] serial8250: too much work for irq4 > [ 207.312028] serial8250: too much work for irq4 > > > Greetings > Marc > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote: > On 14/04/2016 00:29, Marc Haber wrote: > > On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: > >> Didn't help, but a fresh look at the list of 4.5 patches helped. > >> What the hell was I thinking, I missed write_rdtscp_aux who > >> obviously uses MSR_TSC_AUX. > > > > I applied this patch to 4.5, which didn't go cleanly, I had to do it > > manually, and there is no change in behavior. Sometimes, the Vm just > > crashes, but most times the filesystem is remounted ro. > > Ok, then I guess bisection is needed. Please first try commit > 45bdbcfdf241. If it fails, then the bug come together with KVM's merge > window changes for 4.5-rc1. Please apply the patch I sent here when > bisection is past 46896c73c1a4dde527c3a3cc43379deeb41985a1 (which means > that probably that should be the commit you try second; the bisection > then becomes much easier). I have never bisected this deeply. Can you please give more advice, with which two commits to start? And how do I find out whether I am "past" a commit? I am als not a git expert, a few command lines would be appreciated. Things have not become any easier this night; 4.5-rc7 ran for more than three hours before it failed :-( Greetings Marc
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote: > Ok, then I guess bisection is needed. Please first try commit > 45bdbcfdf241. That kernel labels itself as "4.4.0-rc5+", is that correct? Greetings Marc
On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote: > Ok, then I guess bisection is needed. Please first try commit > 45bdbcfdf241. I did git checkout 45bdbcfdf241 and built the resulting kernel 4.4.0-rc5. This one has now been running for ten hours, which is threefold the longest time that a faulty kernel has held before a VM experienced corruption. So I guess, that one is fine. Since 4.5.0-rc1 is bad, I guess I do: git checkout 45bdbcfdf241 git bisect start git bisect good git bisect bad v4.5.0-rc1 right? Greetings Marc
On 14/04/2016 18:47, Marc Haber wrote: >> > Ok, then I guess bisection is needed. Please first try commit >> > 45bdbcfdf241. > I did git checkout 45bdbcfdf241 and built the resulting kernel > 4.4.0-rc5. This one has now been running for ten hours, which is > threefold the longest time that a faulty kernel has held before a VM > experienced corruption. So I guess, that one is fine. Interesting, this means it's not a KVM bug. You can ignore my patch from yesterday (though we'll get it in anyway). > Since 4.5.0-rc1 is bad, I guess I do: > > git checkout 45bdbcfdf241 > git bisect start > git bisect good > git bisect bad v4.5.0-rc1 This is correct but you also want to do git bisect good 4.4.0 git bisect good 4.4.0-rc5 so that bisection basically works through the commits in the merge window. Thanks, Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 14, 2016 at 07:30:43PM +0200, Paolo Bonzini wrote: > On 14/04/2016 18:47, Marc Haber wrote: > >> > Ok, then I guess bisection is needed. Please first try commit > >> > 45bdbcfdf241. > > I did git checkout 45bdbcfdf241 and built the resulting kernel > > 4.4.0-rc5. This one has now been running for ten hours, which is > > threefold the longest time that a faulty kernel has held before a VM > > experienced corruption. So I guess, that one is fine. > > Interesting, this means it's not a KVM bug. You can ignore my patch > from yesterday (though we'll get it in anyway). > > > Since 4.5.0-rc1 is bad, I guess I do: > > > > git checkout 45bdbcfdf241 > > git bisect start > > git bisect good > > git bisect bad v4.5.0-rc1 > > This is correct but you also want to do > > git bisect good 4.4.0 > git bisect good 4.4.0-rc5 > > so that bisection basically works through the commits in the merge window. So I start over from this: [47/544]mh@fan:~/linux/debug/linux$ git checkout 45bdbcfdf241 HEAD is now at 45bdbcf... kvm: x86: Fix vmwrite to SECONDARY_VM_EXEC_CONTROL [48/545]mh@fan:~/linux/debug/linux$ git bisect start [49/546]mh@fan:~/linux/debug/linux$ git bisect good [50/547]mh@fan:~/linux/debug/linux$ git bisect bad v4.5-rc1 Bisecting: 5761 revisions left to test after this (roughly 13 steps) [cbd88cd4c07f9361914ab7fd7e21c9227986fe68] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux [51/548]mh@fan:~/linux/debug/linux$ git bisect good v4.4 Bisecting: 5468 revisions left to test after this (roughly 12 steps) [f9a03ae123c92c1f45cd2ca88d0f6edd787be78c] Merge tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs [52/549]mh@fan:~/linux/debug/linux$ git bisect good v4.4-rc5 Bisecting: 5468 revisions left to test after this (roughly 12 steps) [f9a03ae123c92c1f45cd2ca88d0f6edd787be78c] Merge tag 'for-f2fs-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs [53/550]mh@fan:~/linux/debug/linux$ This is going to take a few days as detecting a "bad" version may take a few hours. Greetings Marc
On Thu, Apr 14, 2016 at 07:22:20AM +0200, Marc Haber wrote: > On Thu, Apr 14, 2016 at 03:16:29AM +0200, Paolo Bonzini wrote: > > On 14/04/2016 00:29, Marc Haber wrote: > > > On Wed, Apr 13, 2016 at 10:36:34PM +0200, Paolo Bonzini wrote: > > >> Didn't help, but a fresh look at the list of 4.5 patches helped. > > >> What the hell was I thinking, I missed write_rdtscp_aux who > > >> obviously uses MSR_TSC_AUX. > > > > > > I applied this patch to 4.5, which didn't go cleanly, I had to do it > > > manually, and there is no change in behavior. Sometimes, the Vm just > > > crashes, but most times the filesystem is remounted ro. > > > > Ok, then I guess bisection is needed. Please first try commit > > 45bdbcfdf241. If it fails, then the bug come together with KVM's merge > > window changes for 4.5-rc1. Please apply the patch I sent here when > > bisection is past 46896c73c1a4dde527c3a3cc43379deeb41985a1 (which means > > that probably that should be the commit you try second; the bisection > > then becomes much easier). > > I have never bisected this deeply. Can you please give more advice, > with which two commits to start? And how do I find out whether I am > "past" a commit? I am als not a git expert, a few command lines would > be appreciated. I have tried bisecting, and finally bisect says that the bad commit is 0e749e54244eec87b2a3cd0a4314e60bc6781115 dax: increase granularity of dax_clear_blocks() operations However, a kernel built after $ git checkout 0e749e54244eec87b2a3cd0a4314e60bc6781115 seems to be fine, at least my VM is running for 15 hours now. I guess I need to start over again with git bisect good 0e749e54244eec87b2a3cd0a4314e60bc6781115 and git bisect bad v4.5. Currently, I cannot explain how this has happened, I must have flagged an actually good kernel as bad from my understanding of git bisect. Can you give advice how to continue here? Greetings Marc
On Thu, Apr 21, 2016 at 10:39:48AM +0200, Marc Haber wrote: > Currently, I cannot explain how this has happened, I must have flagged > an actually good kernel as bad from my understanding of git bisect. > > Can you give advice how to continue here? Yap, sounds like you marked a bisection step incorrectly, which lead into the wrong direction. How reliable is your reproducer? Also, do the bisection as Paolo suggested: * try 45bdbcfdf241. * then do $ git bisect start v4.5-rc1 v4.4 which marks -rc1 as bad and 4.4 as good. While you're doing that bisect, do what Paolo said by applying the diff here https://lkml.kernel.org/r/570EADD2.8030300@redhat.com when the bisection point you're at at each step contains 46896c73c1a4 ("KVM: svm: add support for RDTSCP") You should apply the above hunk by doing $ patch -p1 --dry-run -i /tmp/hunk If it applies fine, you then apply it $ patch -p1 -i /tmp/hunk All clear? If not, do not hesitate to ask. Thanks
On Thu, Apr 21, 2016 at 02:37:11PM +0200, Borislav Petkov wrote: > On Thu, Apr 21, 2016 at 10:39:48AM +0200, Marc Haber wrote: > > Currently, I cannot explain how this has happened, I must have flagged > > an actually good kernel as bad from my understanding of git bisect. > > > > Can you give advice how to continue here? > > Yap, sounds like you marked a bisection step incorrectly, which lead > into the wrong direction. How reliable is your reproducer? Usually, the crash or filesystem corruption happens in the first 15 to 30 minutes. I have had one instance running three hours before corrupting, I have therefore upped the run time to nine hours before saying "this kernel is good". What bothers me is that since I ended up with a "suspect" commit that actually results in a "good" kernel (running for 22 hours now), I must have said "bad" to an actually "good" kernel, which means that I had an unrelated crash or corruption. Is that reasoning correct? > Also, do the bisection as Paolo suggested: > > * try 45bdbcfdf241. That one qualified as "good" six days ago. I'll retry, maybe I just didn't wait long enough. "Trying" means make oldconfig, make deb-pkg in my case right? Does it matter what I answer to the numerous config questions that keep coming up during the oldconfig step? > * then do > > $ git bisect start v4.5-rc1 v4.4 > > which marks -rc1 as bad and 4.4 as good. Would it help to explicitly mark 0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge gained during the last week is not completely lost? > While you're doing that bisect, do what Paolo said by applying the diff > here > > https://lkml.kernel.org/r/570EADD2.8030300@redhat.com > > when the bisection point you're at at each step contains > > 46896c73c1a4 ("KVM: svm: add support for RDTSCP") > > You should apply the above hunk by doing > > $ patch -p1 --dry-run -i /tmp/hunk > > If it applies fine, you then apply it > > $ patch -p1 -i /tmp/hunk > > All clear? So I need to git log | grep 46896c73c1a4 and apply the patch again each time the commit is found? Greetings Marc
On Thu, Apr 21, 2016 at 04:50:05PM +0200, Marc Haber wrote: > What bothers me is that since I ended up with a "suspect" commit that > actually results in a "good" kernel (running for 22 hours now), I must > have said "bad" to an actually "good" kernel, which means that I had > an unrelated crash or corruption. Is that reasoning correct? Hmm, did that "unrelated crash or corruption" have the same symptoms as the original one? > That one qualified as "good" six days ago. I'll retry, maybe I just > didn't wait long enough. So if the trigger time is varying so much, I'd try to double that to make sure I'm fairly certain about each commit I'm testing. Also, this is a single box we're talking about, right? And you're sure it hasn't had any corruption issues so far? I see you have amd64_edac loading, so it must have ECC DIMMs. Have you had any reports in the past of ECC errors in dmesg? Or other MCEs, lockups, etc? Can you grep your logs for stuff like "hardware error", "mce", "edac" etc? Do a case-insensitive search. > "Trying" means make oldconfig, make deb-pkg in my case right? Does it > matter what I answer to the numerous config questions that keep coming > up during the oldconfig step? What I do is: $ git bisect <good|bad> to mark the current commit after having tested it. Then I do $ yes "" | make oldconfig to set the new config options. Then $ make -j7 $ make modules_install install and reboot into the new kernel. Kernel name will possibly change each time so I write down on paper which kernel I'm testing. You can verify when booting it by doing: $ dmesg | head [ 0.000000] Linux version 4.6.0-rc2+ (boris@pd) (gcc version 5.3.1 20160101 (Debian 5.3.1-5) ) #1 SMP PREEMPT Wed Apr 6 20:22:51 CEST 2016 ... that date at the end of the line and number "#1" should be current. Number is also in .version and gets issued when you finish building: Kernel: arch/x86/boot/bzImage is ready (#1) > Would it help to explicitly mark > 0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge > gained during the last week is not completely lost? I'd do the whole thing again, just to be sure. I know, bisection is very time-consuming :-\ And it is particularly annoying if it is done on the box I'm normally using daily. > So I need to git log | grep 46896c73c1a4 and apply the patch again > each time the commit is found? I think you can let git do that for ya: $ git branch --contains 46896c73c1a4 * (HEAD detached at 46896c73c1a4) that lists that the current checked out HEAD contains that commit. If you do $ git checkout 46896c73c1a4~1 then that "(HEAD detached..." line is not in the list of branches containing it. HTH.
On Thu, Apr 21, 2016 at 06:51:06PM +0200, Borislav Petkov wrote: > On Thu, Apr 21, 2016 at 04:50:05PM +0200, Marc Haber wrote: > > What bothers me is that since I ended up with a "suspect" commit that > > actually results in a "good" kernel (running for 22 hours now), I must > > have said "bad" to an actually "good" kernel, which means that I had > > an unrelated crash or corruption. Is that reasoning correct? > > Hmm, did that "unrelated crash or corruption" have the same symptoms as > the original one? Yes, but there are two symptoms. The VM either suffers file system issues (garbage read from files, or an aborted ext4 journal and following ro remount) or it stops dead in its tracks. > > That one qualified as "good" six days ago. I'll retry, maybe I just > > didn't wait long enough. > > So if the trigger time is varying so much, I'd try to double that to > make sure I'm fairly certain about each commit I'm testing. The longest trigger time I have seen was three hours, I tripled that to nine hours, that probably was not enough. > Also, this is a single box we're talking about, right? And you're sure > it hasn't had any corruption issues so far? It is a single box, and it runs perfectly with kernel 4.4. > I see you have amd64_edac loading, so it must have ECC DIMMs. Have you > had any reports in the past of ECC errors in dmesg? Or other MCEs, > lockups, etc? Can you grep your logs for stuff like "hardware error", > "mce", "edac" etc? Do a case-insensitive search. The box reports about one correctable error per week, so I probably have a faulty DIMM, but since the issue only surfaces in VMs while the host system is in perfect working order... And yes, I am pondering to simply replace the box with an Intel CPU. I see "mce: CPU supports 6 MCE banks" once for each reboot, and about 30 "Machine check events logged" since January. How do I see which events were logged? > > "Trying" means make oldconfig, make deb-pkg in my case right? Does it > > matter what I answer to the numerous config questions that keep coming > > up during the oldconfig step? > > What I do is: > > $ git bisect <good|bad> > > to mark the current commit after having tested it. Then I do > > $ yes "" | make oldconfig > > to set the new config options. So you basically select the default for new options. > Then > > $ make -j7 > $ make modules_install install > > and reboot into the new kernel. Kernel name will possibly change each > time so I write down on paper which kernel I'm testing. I go the way of Debian packages since it is easier to handle the crypto file systems when the machine is booting up. And yes, I think about doing a test reinstall on unencrypted disk to find out whether encryption plays a role, but I currently need the machine to urgently to take it out of serice for half a month, and, again, the host system is in perfect working order, it is just VMs that barf. > You can verify when booting it by doing: > > $ dmesg | head > [ 0.000000] Linux version 4.6.0-rc2+ (boris@pd) (gcc version 5.3.1 20160101 (Debian 5.3.1-5) ) #1 SMP PREEMPT Wed Apr 6 20:22:51 CEST 2016 > ... > > that date at the end of the line and number "#1" should be current. I check the date of the package I am installing and the date stamp of the kernels being installed to /boot. I'm reasonably sure I have that under control. > > Would it help to explicitly mark > > 0e749e54244eec87b2a3cd0a4314e60bc6781115 as good so that the knowledge > > gained during the last week is not completely lost? > > I'd do the whole thing again, just to be sure. > > I know, bisection is very time-consuming :-\ And it is particularly > annoying if it is done on the box I'm normally using daily. ... and if testing a "good" kernel means a day. > > So I need to git log | grep 46896c73c1a4 and apply the patch again > > each time the commit is found? > > I think you can let git do that for ya: > > $ git branch --contains 46896c73c1a4 > * (HEAD detached at 46896c73c1a4) > > that lists that the current checked out HEAD contains that commit. If you do > > $ git checkout 46896c73c1a4~1 > > then that "(HEAD detached..." line is not in the list of branches > containing it. And whenever 46896c73c1a4 is present, I need to apply Paolo's patch, right? Greetings Marc
On Thu, Apr 21, 2016 at 10:04:33PM +0200, Marc Haber wrote: > Yes, but there are two symptoms. The VM either suffers file system > issues (garbage read from files, or an aborted ext4 journal and > following ro remount) or it stops dead in its tracks. Stops dead? What does that mean exactly? Box is wedged solid and it doesn't react to any key presses? Because if so, this could really be a DRAM going bad and a correctable error turning into an uncorrectable. How old is the DRAM in that box? Judging by your CPU, it should be a couple of years... > The longest trigger time I have seen was three hours, I tripled that > to nine hours, that probably was not enough. So enlarge even more I guess. > The box reports about one correctable error per week, so I probably > have a faulty DIMM, but since the issue only surfaces in VMs while the > host system is in perfect working order... So it could be that correctable error turns into an uncorrectable one at some point. But then you should be getting an exception... > And yes, I am pondering to simply replace the box with an Intel CPU. Your CPU is fine, from what I've seen so far. > I see "mce: CPU supports 6 MCE banks" once for each reboot, and about > 30 "Machine check events logged" since January. How do I see which > events were logged? Hmm, you have [ 18.149300] MCE: In-kernel MCE decoding enabled. that's CONFIG_EDAC_DECODE_MCE, so you should have some "Hardware Error" lines in dmesg, I'd guess, decoding the errors. > So you basically select the default for new options. Yap. > I go the way of Debian packages since it is easier to handle the > crypto file systems when the machine is booting up. As long as you're testing the correct bisection kernels... > And yes, I think about doing a test reinstall on unencrypted disk to > find out whether encryption plays a role, but I currently need the > machine to urgently to take it out of serice for half a month, and, > again, the host system is in perfect working order, it is just VMs > that barf. Yeah, I can't reproduce it here and I have a very similar box to yours which is otherwise idle, more or less. Another fact which points to potentially DIMM going bad... > I check the date of the package I am installing and the date stamp of > the kernels being installed to /boot. I'm reasonably sure I have that > under control. Good. > ... and if testing a "good" kernel means a day. Yeah, it is annoying. In a perfect world, we all should have two identical boxes so that we use one as a workstation and the second for testing when the first one, the workstation barfs. I should bring that up with my manager next time... :-) > And whenever 46896c73c1a4 is present, I need to apply Paolo's patch, > right? Yap. Thanks.
On Sat, Apr 23, 2016 at 06:04:29PM +0200, Borislav Petkov wrote: > On Thu, Apr 21, 2016 at 10:04:33PM +0200, Marc Haber wrote: > > Yes, but there are two symptoms. The VM either suffers file system > > issues (garbage read from files, or an aborted ext4 journal and > > following ro remount) or it stops dead in its tracks. > > Stops dead? What does that mean exactly? Box is wedged solid and it > doesn't react to any key presses? No ping, no reaction on serial console, no reaction on virtual console, no syslog entries. > Because if so, this could really be a DRAM going bad and a correctable > error turning into an uncorrectable. How old is the DRAM in that box? > Judging by your CPU, it should be a couple of years... Uncorrectable errors would still be identified by the ECC hardware, and the box wouldn't be perfectly fine with an "old" kernel. > > The box reports about one correctable error per week, so I probably > > have a faulty DIMM, but since the issue only surfaces in VMs while the > > host system is in perfect working order... > > So it could be that correctable error turns into an uncorrectable one at > some point. But then you should be getting an exception... Yes, that would be in the logs. > > And yes, I am pondering to simply replace the box with an Intel CPU. > > Your CPU is fine, from what I've seen so far. But we still postulate that the issue does only show on older AMD CPUs. Otherwise, I wouldn't be the only one making this experience. > > I go the way of Debian packages since it is easier to handle the > > crypto file systems when the machine is booting up. > > As long as you're testing the correct bisection kernels... I am reasonably sure about that, yes. > > And yes, I think about doing a test reinstall on unencrypted disk to > > find out whether encryption plays a role, but I currently need the > > machine to urgently to take it out of serice for half a month, and, > > again, the host system is in perfect working order, it is just VMs > > that barf. > > Yeah, I can't reproduce it here and I have a very similar box to yours > which is otherwise idle, more or less. > > Another fact which points to potentially DIMM going bad... Do you want me to memtest for 24 hours? Greetings Marc
* Marc Haber (mh+linux-kernel@zugschlus.de) wrote: > On Sat, Apr 23, 2016 at 06:04:29PM +0200, Borislav Petkov wrote: > > On Thu, Apr 21, 2016 at 10:04:33PM +0200, Marc Haber wrote: > > > Yes, but there are two symptoms. The VM either suffers file system > > > issues (garbage read from files, or an aborted ext4 journal and > > > following ro remount) or it stops dead in its tracks. > > > > Stops dead? What does that mean exactly? Box is wedged solid and it > > doesn't react to any key presses? > > No ping, no reaction on serial console, no reaction on virtual > console, no syslog entries. > > > Because if so, this could really be a DRAM going bad and a correctable > > error turning into an uncorrectable. How old is the DRAM in that box? > > Judging by your CPU, it should be a couple of years... > > Uncorrectable errors would still be identified by the ECC hardware, > and the box wouldn't be perfectly fine with an "old" kernel. Hmm, your problem does sound like bad hardware, but.... If you've got a nice reliable crash, can you try turning transparent huge pages off on the host; echo never > /sys/kernel/mm/transparent_hugepage/enabled Dave > > > The box reports about one correctable error per week, so I probably > > > have a faulty DIMM, but since the issue only surfaces in VMs while the > > > host system is in perfect working order... > > > > So it could be that correctable error turns into an uncorrectable one at > > some point. But then you should be getting an exception... > > Yes, that would be in the logs. > > > > And yes, I am pondering to simply replace the box with an Intel CPU. > > > > Your CPU is fine, from what I've seen so far. > > But we still postulate that the issue does only show on older AMD > CPUs. Otherwise, I wouldn't be the only one making this experience. > > > > I go the way of Debian packages since it is easier to handle the > > > crypto file systems when the machine is booting up. > > > > As long as you're testing the correct bisection kernels... > > I am reasonably sure about that, yes. > > > > And yes, I think about doing a test reinstall on unencrypted disk to > > > find out whether encryption plays a role, but I currently need the > > > machine to urgently to take it out of serice for half a month, and, > > > again, the host system is in perfect working order, it is just VMs > > > that barf. > > > > Yeah, I can't reproduce it here and I have a very similar box to yours > > which is otherwise idle, more or less. > > > > Another fact which points to potentially DIMM going bad... > > Do you want me to memtest for 24 hours? > > Greetings > Marc > > -- > ----------------------------------------------------------------------------- > Marc Haber | "I don't trust Computers. They | Mailadresse im Header > Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 > Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
On Sat, Apr 23, 2016 at 08:43:41PM +0200, Marc Haber wrote: > Uncorrectable errors would still be identified by the ECC hardware, Not if the hardware decides to syncflood so that we don't even get to run the #MC handler... > and the box wouldn't be perfectly fine with an "old" kernel. Maybe the "old" kernel is not causing all the required ingredients to come together for the uncorrectable error to happen. But yeah, I agree, the fact that 4.4 is fine kinda doesn't fit with the uncorrectable error theory. > Yes, that would be in the logs. Presumably. And see above. > But we still postulate that the issue does only show on older AMD > CPUs. Otherwise, I wouldn't be the only one making this experience. It actually shows only on this one system. At least I'm not aware of any other report of the same issue. My system with a F10h, rev E is just fine. > Do you want me to memtest for 24 hours? Yeah, that memtest crap never triggers any ECCs. But if you're bored, why not...
Hi David, On Sat, Apr 23, 2016 at 07:52:46PM +0100, Dr. David Alan Gilbert wrote: > Hmm, your problem does sound like bad hardware, but.... > If you've got a nice reliable crash, can you try turning transparent huge pages > off on the host; > echo never > /sys/kernel/mm/transparent_hugepage/enabled I must have missed this hint in the middle of the "your hardware is bad" avalance that came over me. I spent two weeks bisecting "good" kernels since during the repeated reconfigurations, transparent huge pages got turned off in kernel configuration. After running each kernel for 24 hours, I eventually ended up with a working 4.5 kernel. The configuration diff was short, showing transparent huge pages, and - finally - upon re-reading the thread I found your hint. I have now the result that 4.5, 4.5.1 and 4.5.4 corrupt KVM guest memory reliably in the first hour of running under disk load, causing the VM to either drop dead in the water, or to read randomness from disk. Rebooting fixes the VM. This happens as soon as transparent huge pages are turned on in the host. Turning off transparent huge pages by echo never > /sys/kernel/mm/transparent_hugepage/enabled fixes the issue even without rebooting the host. Start up the VM again and it works just fine. Is this an issue in (a) transparent huge pages, (b) KVM or (c) qemu? Where should this issue be forwarded? Or do we just accept it and turn transparent huge pages off? Greetings Marc
On Thu, May 12, 2016 at 10:20:09PM +0200, Marc Haber wrote: > Hi David, > > On Sat, Apr 23, 2016 at 07:52:46PM +0100, Dr. David Alan Gilbert wrote: > > Hmm, your problem does sound like bad hardware, but.... > > If you've got a nice reliable crash, can you try turning transparent huge pages > > off on the host; > > echo never > /sys/kernel/mm/transparent_hugepage/enabled > > I must have missed this hint in the middle of the "your hardware is > bad" avalance that came over me. > > I spent two weeks bisecting "good" kernels since during the repeated > reconfigurations, transparent huge pages got turned off in kernel > configuration. After running each kernel for 24 hours, I eventually > ended up with a working 4.5 kernel. The configuration diff was short, > showing transparent huge pages, and - finally - upon re-reading the > thread I found your hint. > > I have now the result that 4.5, 4.5.1 and 4.5.4 corrupt KVM guest > memory reliably in the first hour of running under disk load, causing > the VM to either drop dead in the water, or to read randomness from > disk. Rebooting fixes the VM. This happens as soon as transparent huge > pages are turned on in the host. > > Turning off transparent huge pages by echo never > > /sys/kernel/mm/transparent_hugepage/enabled fixes the issue even > without rebooting the host. Start up the VM again and it works just > fine. > > Is this an issue in (a) transparent huge pages, (b) KVM or (c) qemu? > Where should this issue be forwarded? Or do we just accept it and turn > transparent huge pages off? Could you test this: http://lkml.kernel.org/r/1463070742-18401-1-git-send-email-aarcange@redhat.com ?
On Thu, May 12, 2016 at 11:24:02PM +0300, Kirill A. Shutemov wrote:
> http://lkml.kernel.org/r/1463070742-18401-1-git-send-email-aarcange@redhat.com
Is this in v4.6-rc7?
If so, can I just test v4.6-rc7?
If not so, would it be a valid approach to first check plain v4.6-rc7
and then patched v4.6-rc7?
Greetings
Marc
On Thu, May 12, 2016 at 10:34:57PM +0200, Marc Haber wrote: > On Thu, May 12, 2016 at 11:24:02PM +0300, Kirill A. Shutemov wrote: > > http://lkml.kernel.org/r/1463070742-18401-1-git-send-email-aarcange@redhat.com > > Is this in v4.6-rc7? No. > If so, can I just test v4.6-rc7? > > If not so, would it be a valid approach to first check plain v4.6-rc7 > and then patched v4.6-rc7? It haven't hit -mm tree yet. But I guess it should apply cleanly to v4.5. Or at least without major conflicts. Andrew, it looks like third bug report for the bug. It would be nice to propogate it quicker, if possible.
On Thu, May 12, 2016 at 11:42:16PM +0300, Kirill A. Shutemov wrote: > But I guess it should apply cleanly to v4.5. Or at least without major > conflicts. [11/511]mh@fan:~/linux/debug/linux$ curl 'http://marc.info/?l=linux-rdma&m=146307074800836&w=2' | patch -p1 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 12529 0 12529 0 0 9844 0 --:--:-- 0:00:01 --:--:-- 9849 patching file include/linux/mm.h Hunk #1 succeeded at 456 with fuzz 1 (offset -44 lines). patching file include/linux/swap.h Hunk #2 FAILED at 513. 1 out of 2 hunks FAILED -- saving rejects to file include/linux/swap.h.rej patching file mm/huge_memory.c Hunk #1 FAILED at 1298. Hunk #2 FAILED at 2079. Hunk #3 succeeded at 3340 (offset 117 lines). 2 out of 3 hunks FAILED -- saving rejects to file mm/huge_memory.c.rej patching file mm/memory.c Hunk #1 FAILED at 2373. Hunk #2 succeeded at 2331 with fuzz 2 (offset -56 lines). Hunk #3 FAILED at 2622. 2 out of 3 hunks FAILED -- saving rejects to file mm/memory.c.rej patching file mm/swapfile.c Hunk #1 FAILED at 922. 1 out of 1 hunk FAILED -- saving rejects to file mm/swapfile.c.rej [12/512]mh@fan:~/linux/debug/linux$ It doesn't, and it doesn't apply to 4.6-rc3 as well: [17/517]mh@fan:~/linux/debug/linux$ git checkout v4.6-rc3 Checking out files: 100% (9945/9945), done. Previous HEAD position was b562e44... Linux 4.5 HEAD is now at bf16200... Linux 4.6-rc3 [18/518]mh@fan:~/linux/debug/linux$ curl 'http://marc.info/?l=linux-rdma&m=146307074800836&w=2' | patch -p1 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 12529 0 12529 0 0 9692 0 --:--:-- 0:00:01 --:--:-- 9697 patching file include/linux/mm.h patching file include/linux/swap.h Hunk #2 FAILED at 513. 1 out of 2 hunks FAILED -- saving rejects to file include/linux/swap.h.rej patching file mm/huge_memory.c Hunk #1 FAILED at 1298. Hunk #2 FAILED at 2079. Hunk #3 succeeded at 3225 (offset 2 lines). 2 out of 3 hunks FAILED -- saving rejects to file mm/huge_memory.c.rej patching file mm/memory.c Hunk #1 FAILED at 2373. Hunk #2 succeeded at 2354 (offset -33 lines). Hunk #3 FAILED at 2622. 2 out of 3 hunks FAILED -- saving rejects to file mm/memory.c.rej patching file mm/swapfile.c Hunk #1 FAILED at 922. 1 out of 1 hunk FAILED -- saving rejects to file mm/swapfile.c.rej [19/519]mh@fan:~/linux/debug/linux$ How do I apply this? Greetings Marc
* Marc Haber (mh+linux-kernel@zugschlus.de) wrote: > Hi David, > > On Sat, Apr 23, 2016 at 07:52:46PM +0100, Dr. David Alan Gilbert wrote: > > Hmm, your problem does sound like bad hardware, but.... > > If you've got a nice reliable crash, can you try turning transparent huge pages > > off on the host; > > echo never > /sys/kernel/mm/transparent_hugepage/enabled > > I must have missed this hint in the middle of the "your hardware is > bad" avalance that came over me. > > I spent two weeks bisecting "good" kernels since during the repeated > reconfigurations, transparent huge pages got turned off in kernel > configuration. After running each kernel for 24 hours, I eventually > ended up with a working 4.5 kernel. The configuration diff was short, > showing transparent huge pages, and - finally - upon re-reading the > thread I found your hint. OK, good. When I sent that mail I'd hit a THP bug but in a corner of migration and at the time we didn't know why and there was no reason to think it would cause any other symptoms, but since it was also between 4.4 and 4.5 it did seem worth mentioning as a long shot, but it was no more than a long shot. > I have now the result that 4.5, 4.5.1 and 4.5.4 corrupt KVM guest > memory reliably in the first hour of running under disk load, causing > the VM to either drop dead in the water, or to read randomness from > disk. Rebooting fixes the VM. This happens as soon as transparent huge > pages are turned on in the host. > > Turning off transparent huge pages by echo never > > /sys/kernel/mm/transparent_hugepage/enabled fixes the issue even > without rebooting the host. Start up the VM again and it works just > fine. > > Is this an issue in (a) transparent huge pages, (b) KVM or (c) qemu? > Where should this issue be forwarded? Or do we just accept it and turn > transparent huge pages off? Try Andrea's fix for (a). Dave > > Greetings > Marc > > -- > ----------------------------------------------------------------------------- > Marc Haber | "I don't trust Computers. They | Mailadresse im Header > Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 > Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
On Fri, May 13, 2016 at 09:35:45AM +0100, Dr. David Alan Gilbert wrote: > also between 4.4 and 4.5 it did seem worth mentioning as a long shot, > but it was no more than a long shot. It was however helpful. I'd have bisected kernel configuration instead of using the runtime control first, and seeing your long shot two weeks earlier, it'd have saved myself those two weeks of tedious bisecting. > Try Andrea's fix for (a). In the works. Greetings Marc
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 31346a3f20a5..1481dea15844 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -39,6 +39,7 @@ #include <asm/kvm_para.h> #include <asm/virtext.h> +#include <asm/vgtod.h> #include "trace.h" #define __ex(x) __kvm_handle_fault_on_reboot(x) @@ -1240,9 +1241,6 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu) wrmsrl(MSR_AMD64_TSC_RATIO, tsc_ratio); } } - /* This assumes that the kernel never uses MSR_TSC_AUX */ - if (static_cpu_has(X86_FEATURE_RDTSCP)) - wrmsrl(MSR_TSC_AUX, svm->tsc_aux); } static void svm_vcpu_put(struct kvm_vcpu *vcpu) @@ -3847,6 +3845,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) svm->vmcb->save.cr2 = vcpu->arch.cr2; clgi(); + if (static_cpu_has(X86_FEATURE_RDTSCP)) + wrmsrl(MSR_TSC_AUX, svm->tsc_aux); local_irq_enable(); @@ -3923,6 +3923,8 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu) #endif ); + if (static_cpu_has(X86_FEATURE_RDTSCP)) + wrmsrl(MSR_TSC_AUX, __getcpu()); #ifdef CONFIG_X86_64 wrmsrl(MSR_GS_BASE, svm->host.gs_base); #else