diff mbox

[6/6] ARM: kvm: TMP: Commit the hyp page tables to main memory

Message ID 1384457866-16135-7-git-send-email-santosh.shilimkar@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Santosh Shilimkar Nov. 14, 2013, 7:37 p.m. UTC
This is a temporary hack which I have to use to avoid a weired crash while
starting the guest OS on Keystsone. They are random crashesh while the
guest os userspace starts. Additional data point is, it seen only with first
guest OS lanch. Subsequest guest OS starts normal.
    
I still don't know why this is needed but it helps to get around the issue
and hence including the patch in the series for the discussion

Cc: Christoffer Dall <christoffer.dall@linaro.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
---
 arch/arm/kvm/mmu.c |    1 +
 1 file changed, 1 insertion(+)

Comments

Santosh Shilimkar Nov. 14, 2013, 10:36 p.m. UTC | #1
On Thursday 14 November 2013 02:37 PM, Santosh Shilimkar wrote:
> This is a temporary hack which I have to use to avoid a weired crash while
> starting the guest OS on Keystsone. They are random crashesh while the
> guest os userspace starts. Additional data point is, it seen only with first
> guest OS lanch. Subsequest guest OS starts normal.
>     
> I still don't know why this is needed but it helps to get around the issue
> and hence including the patch in the series for the discussion
> 
> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> 
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> ---

Ignore this patch. I thought it helps but in more testing I found
that issue still comes up even with the below change. 

>  arch/arm/kvm/mmu.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 657f15e..5f6f460 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -826,6 +826,7 @@ int kvm_mmu_init(void)
>  		goto out;
>  	}
>  
> +	flush_cache_all();
>  	return 0;
>  out:
>  	free_hyp_pgds();
>
Christoffer Dall Nov. 15, 2013, 12:11 a.m. UTC | #2
On Thu, Nov 14, 2013 at 02:37:46PM -0500, Santosh Shilimkar wrote:
> This is a temporary hack which I have to use to avoid a weired crash while
> starting the guest OS on Keystsone. They are random crashesh while the
> guest os userspace starts. Additional data point is, it seen only with first
> guest OS lanch. Subsequest guest OS starts normal.
>     

what crashes?  The guest?  Where, how?


> I still don't know why this is needed but it helps to get around the issue
> and hence including the patch in the series for the discussion

It may not be needed but is just hiding the issue.  I'm afraid you're
going to have to dig a little more into this.

-Christoffer

> 
> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> 
> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
> ---
>  arch/arm/kvm/mmu.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 657f15e..5f6f460 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -826,6 +826,7 @@ int kvm_mmu_init(void)
>  		goto out;
>  	}
>  
> +	flush_cache_all();
>  	return 0;
>  out:
>  	free_hyp_pgds();
> -- 
> 1.7.9.5
>
Santosh Shilimkar Nov. 15, 2013, 12:15 a.m. UTC | #3
On Thursday 14 November 2013 07:11 PM, Christoffer Dall wrote:
> On Thu, Nov 14, 2013 at 02:37:46PM -0500, Santosh Shilimkar wrote:
>> This is a temporary hack which I have to use to avoid a weired crash while
>> starting the guest OS on Keystsone. They are random crashesh while the
>> guest os userspace starts. Additional data point is, it seen only with first
>> guest OS lanch. Subsequest guest OS starts normal.
>>     
> 
> what crashes?  The guest?  Where, how?
>
When guest userspace starts. The crashes are random but always after the
guest init process have started.
 
> 
>> I still don't know why this is needed but it helps to get around the issue
>> and hence including the patch in the series for the discussion
> 
> It may not be needed but is just hiding the issue.  I'm afraid you're
> going to have to dig a little more into this.
> 
I replied already on this. Further testing, I found the issue
even with this patch applied. I need to dig bit more as you said.

Regards,
Santosh
Christoffer Dall Nov. 15, 2013, 12:27 a.m. UTC | #4
On Thu, Nov 14, 2013 at 07:15:44PM -0500, Santosh Shilimkar wrote:
> On Thursday 14 November 2013 07:11 PM, Christoffer Dall wrote:
> > On Thu, Nov 14, 2013 at 02:37:46PM -0500, Santosh Shilimkar wrote:
> >> This is a temporary hack which I have to use to avoid a weired crash while
> >> starting the guest OS on Keystsone. They are random crashesh while the
> >> guest os userspace starts. Additional data point is, it seen only with first
> >> guest OS lanch. Subsequest guest OS starts normal.
> >>     
> > 
> > what crashes?  The guest?  Where, how?
> >
> When guest userspace starts. The crashes are random but always after the
> guest init process have started.
>  

So you get a guest kernel crash when guest userspace starts?

Are the crashes completely random or is it always some pointer
dereference that goes wrong, is it init crashing and causing the kernel
to crash (from killed init), or is it always the same kernel thread, or
anything coherent at all?

It could be anything, really.  You could try a really brute force
debugging option of adding a complete cache flush at the end of
user_mem_abort in arch/arm/kvm/mmu.c to see if this is cache related at
all...

Are you running with huge pages enabled?

> > 
> >> I still don't know why this is needed but it helps to get around the issue
> >> and hence including the patch in the series for the discussion
> > 
> > It may not be needed but is just hiding the issue.  I'm afraid you're
> > going to have to dig a little more into this.
> > 
> I replied already on this. Further testing, I found the issue
> even with this patch applied. I need to dig bit more as you said.

Yeah, sorry I missed your reply before replying to the patch.

-Christoffer
Santosh Shilimkar Nov. 15, 2013, 12:36 a.m. UTC | #5
On Thursday 14 November 2013 07:27 PM, Christoffer Dall wrote:
> On Thu, Nov 14, 2013 at 07:15:44PM -0500, Santosh Shilimkar wrote:
>> On Thursday 14 November 2013 07:11 PM, Christoffer Dall wrote:
>>> On Thu, Nov 14, 2013 at 02:37:46PM -0500, Santosh Shilimkar wrote:
>>>> This is a temporary hack which I have to use to avoid a weired crash while
>>>> starting the guest OS on Keystsone. They are random crashesh while the
>>>> guest os userspace starts. Additional data point is, it seen only with first
>>>> guest OS lanch. Subsequest guest OS starts normal.
>>>>     
>>>
>>> what crashes?  The guest?  Where, how?
>>>
>> When guest userspace starts. The crashes are random but always after the
>> guest init process have started.
>>  
> 
> So you get a guest kernel crash when guest userspace starts?
> 
> Are the crashes completely random or is it always some pointer
> dereference that goes wrong, is it init crashing and causing the kernel
> to crash (from killed init), or is it always the same kernel thread, or
> anything coherent at all?
>
Completely random. I have seen almost all of the above possible crashes
like pointer derefence, init process skipping some steps, console going
for toss, the log in prompt just won't let me log in etc

 
> It could be anything, really.  You could try a really brute force
> debugging option of adding a complete cache flush at the end of
> user_mem_abort in arch/arm/kvm/mmu.c to see if this is cache related at
> all...
>
I will try that. I strongly suspect this has to do with bad page tables.
remember I see this issue with when using memory which starts beyond 4GB.


> Are you running with huge pages enabled?
> 
Nope.

>>>
>>>> I still don't know why this is needed but it helps to get around the issue
>>>> and hence including the patch in the series for the discussion
>>>
>>> It may not be needed but is just hiding the issue.  I'm afraid you're
>>> going to have to dig a little more into this.
>>>
>> I replied already on this. Further testing, I found the issue
>> even with this patch applied. I need to dig bit more as you said.
> 
> Yeah, sorry I missed your reply before replying to the patch.
> 
np. Thanks

regards,
Santosh
Christoffer Dall Nov. 15, 2013, 12:42 a.m. UTC | #6
On Thu, Nov 14, 2013 at 07:36:37PM -0500, Santosh Shilimkar wrote:
> On Thursday 14 November 2013 07:27 PM, Christoffer Dall wrote:
> > On Thu, Nov 14, 2013 at 07:15:44PM -0500, Santosh Shilimkar wrote:
> >> On Thursday 14 November 2013 07:11 PM, Christoffer Dall wrote:
> >>> On Thu, Nov 14, 2013 at 02:37:46PM -0500, Santosh Shilimkar wrote:
> >>>> This is a temporary hack which I have to use to avoid a weired crash while
> >>>> starting the guest OS on Keystsone. They are random crashesh while the
> >>>> guest os userspace starts. Additional data point is, it seen only with first
> >>>> guest OS lanch. Subsequest guest OS starts normal.
> >>>>     
> >>>
> >>> what crashes?  The guest?  Where, how?
> >>>
> >> When guest userspace starts. The crashes are random but always after the
> >> guest init process have started.
> >>  
> > 
> > So you get a guest kernel crash when guest userspace starts?
> > 
> > Are the crashes completely random or is it always some pointer
> > dereference that goes wrong, is it init crashing and causing the kernel
> > to crash (from killed init), or is it always the same kernel thread, or
> > anything coherent at all?
> >
> Completely random. I have seen almost all of the above possible crashes
> like pointer derefence, init process skipping some steps, console going
> for toss, the log in prompt just won't let me log in etc
> 
>  
> > It could be anything, really.  You could try a really brute force
> > debugging option of adding a complete cache flush at the end of
> > user_mem_abort in arch/arm/kvm/mmu.c to see if this is cache related at
> > all...
> >
> I will try that. I strongly suspect this has to do with bad page tables.
> remember I see this issue with when using memory which starts beyond 4GB.
> 
> 

But once it crashes, if you kill the VM process and start a new one,
then the new one runs flawlessly?  Did you stress test the second VM
(hackbench or something) so we're sure the second one is indeed stable?

What happens if you start a guest, kill it immediately, and then start
another guest?

-Christoffer
Santosh Shilimkar Nov. 15, 2013, 1:19 a.m. UTC | #7
On Thursday 14 November 2013 07:42 PM, Christoffer Dall wrote:
> On Thu, Nov 14, 2013 at 07:36:37PM -0500, Santosh Shilimkar wrote:
>> On Thursday 14 November 2013 07:27 PM, Christoffer Dall wrote:
>>> On Thu, Nov 14, 2013 at 07:15:44PM -0500, Santosh Shilimkar wrote:
>>>> On Thursday 14 November 2013 07:11 PM, Christoffer Dall wrote:
>>>>> On Thu, Nov 14, 2013 at 02:37:46PM -0500, Santosh Shilimkar wrote:
>>>>>> This is a temporary hack which I have to use to avoid a weired crash while
>>>>>> starting the guest OS on Keystsone. They are random crashesh while the
>>>>>> guest os userspace starts. Additional data point is, it seen only with first
>>>>>> guest OS lanch. Subsequest guest OS starts normal.
>>>>>>     
>>>>>
>>>>> what crashes?  The guest?  Where, how?
>>>>>
>>>> When guest userspace starts. The crashes are random but always after the
>>>> guest init process have started.
>>>>  
>>>
>>> So you get a guest kernel crash when guest userspace starts?
>>>
>>> Are the crashes completely random or is it always some pointer
>>> dereference that goes wrong, is it init crashing and causing the kernel
>>> to crash (from killed init), or is it always the same kernel thread, or
>>> anything coherent at all?
>>>
>> Completely random. I have seen almost all of the above possible crashes
>> like pointer derefence, init process skipping some steps, console going
>> for toss, the log in prompt just won't let me log in etc
>>
>>  
>>> It could be anything, really.  You could try a really brute force
>>> debugging option of adding a complete cache flush at the end of
>>> user_mem_abort in arch/arm/kvm/mmu.c to see if this is cache related at
>>> all...
>>>
>> I will try that. I strongly suspect this has to do with bad page tables.
>> remember I see this issue with when using memory which starts beyond 4GB.
>>
full cache full at end of user_mem_abort() doesn't help. So it might not be
cache related then.

> 
> But once it crashes, if you kill the VM process and start a new one,
> then the new one runs flawlessly?  Did you stress test the second VM
> (hackbench or something) so we're sure the second one is indeed stable?
> 
> What happens if you start a guest, kill it immediately, and then start
> another guest?
> 
And the observation about subsequent VM's being stable also doesn't hold
true. Additional symptom what I saw was segmentation fault as well as
hitting kvm load/store trap. This also possibly indicates instructions
corruption.

Regards,
Santosh
Christoffer Dall Nov. 15, 2013, 1:35 a.m. UTC | #8
On Thu, Nov 14, 2013 at 08:19:13PM -0500, Santosh Shilimkar wrote:
> On Thursday 14 November 2013 07:42 PM, Christoffer Dall wrote:
> > On Thu, Nov 14, 2013 at 07:36:37PM -0500, Santosh Shilimkar wrote:
> >> On Thursday 14 November 2013 07:27 PM, Christoffer Dall wrote:
> >>> On Thu, Nov 14, 2013 at 07:15:44PM -0500, Santosh Shilimkar wrote:
> >>>> On Thursday 14 November 2013 07:11 PM, Christoffer Dall wrote:
> >>>>> On Thu, Nov 14, 2013 at 02:37:46PM -0500, Santosh Shilimkar wrote:
> >>>>>> This is a temporary hack which I have to use to avoid a weired crash while
> >>>>>> starting the guest OS on Keystsone. They are random crashesh while the
> >>>>>> guest os userspace starts. Additional data point is, it seen only with first
> >>>>>> guest OS lanch. Subsequest guest OS starts normal.
> >>>>>>     
> >>>>>
> >>>>> what crashes?  The guest?  Where, how?
> >>>>>
> >>>> When guest userspace starts. The crashes are random but always after the
> >>>> guest init process have started.
> >>>>  
> >>>
> >>> So you get a guest kernel crash when guest userspace starts?
> >>>
> >>> Are the crashes completely random or is it always some pointer
> >>> dereference that goes wrong, is it init crashing and causing the kernel
> >>> to crash (from killed init), or is it always the same kernel thread, or
> >>> anything coherent at all?
> >>>
> >> Completely random. I have seen almost all of the above possible crashes
> >> like pointer derefence, init process skipping some steps, console going
> >> for toss, the log in prompt just won't let me log in etc
> >>
> >>  
> >>> It could be anything, really.  You could try a really brute force
> >>> debugging option of adding a complete cache flush at the end of
> >>> user_mem_abort in arch/arm/kvm/mmu.c to see if this is cache related at
> >>> all...
> >>>
> >> I will try that. I strongly suspect this has to do with bad page tables.
> >> remember I see this issue with when using memory which starts beyond 4GB.
> >>
> full cache full at end of user_mem_abort() doesn't help. So it might not be
> cache related then.
> 
> > 
> > But once it crashes, if you kill the VM process and start a new one,
> > then the new one runs flawlessly?  Did you stress test the second VM
> > (hackbench or something) so we're sure the second one is indeed stable?
> > 
> > What happens if you start a guest, kill it immediately, and then start
> > another guest?
> > 
> And the observation about subsequent VM's being stable also doesn't hold
> true. Additional symptom what I saw was segmentation fault as well as
> hitting kvm load/store trap. This also possibly indicates instructions
> corruption.
> 
Cool, so we only know it breaks when the physical address is >4GB.
Awesome.

It may be helpful to cherry-pick this commit:
https://git.linaro.org/gitweb?p=people/cdall/linux-kvm-arm.git;a=commitdiff;h=df6dc9f43f2a37547d4ce034706ef0cfc4235129

Then capture a full trace of the VM when executing until the guest
crashes and look at the trace to see if we're mapping and faulting on
the pages we think we are or if it looks like something is being
truncated.

Feel free to send me one of those logs and I'll be happy to take a look.

-Christoffer
diff mbox

Patch

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 657f15e..5f6f460 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -826,6 +826,7 @@  int kvm_mmu_init(void)
 		goto out;
 	}
 
+	flush_cache_all();
 	return 0;
 out:
 	free_hyp_pgds();