diff mbox

KVM guest sometimes failed to boot because of kernel stack overflow if KPTI is enabled on a hisilicon ARM64 platform.

Message ID 5B34F5C0.9090001@hisilicon.com (mailing list archive)
State New, archived
Headers show

Commit Message

Wei Xu June 28, 2018, 2:50 p.m. UTC
Hi Will,

On 2018/6/27 14:28, Will Deacon wrote:
> On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
>> On 2018/6/26 18:47, Will Deacon wrote:
>>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
>>> replacing:
>>>
>>> 	dc      civac, cur_\()\type\()p
>>>
>>> with:
>>>
>>> 	dc      ivac, cur_\()\type\()p
>>>
>>> please? Only do this for the guest kernel, not the host. KVM will upgrade
>>> the clean to a clean+invalidate, so it's interesting to see if this has
>>> an effect on the behaviour.
>>
>> Only changed the guest kernel, the guest still failed to boot and the log
>> is same with the last mail.
>>
>> But if I changed to cvac as below for the guest, it is kind of stable.
>> 	dc      cvac, cur_\()\type\()p
>>
>> I have synced with our SoC guys about this and hope we can find the reason.
>> Do you have any more suggestion?
> 
> Unfortunately, not. It looks like somehow clean+invalidate is behaving
> just as an invalidate, and we're corrupting the page table as a result.
> 
> Hopefully the SoC guys will figure it out.

After replaced the dmb with dsb in both __idmap_kpti_get_pgtable_ent and
__idmap_kpti_put_pgtable_ent_ng, we tested 20 times and we can not reproduce
the issue.
Today we will continue to do the stress testing and will update the result tomorrow.

The dsb in __idmap_kpti_get_pgtable_ent is to make sure the dc has been done and
the following ldr can get the latest data.

The dsb in __idmap_kpti_put_pgtable_ent_ng is to make sure the str will be done
before dc. Although dmb can guarantee the order of the str and dc on the L2 cache,
dmb can not guarantee the order on the bus.

How do you think about it?
Thanks!

----


> 
> Will
> 
> .
>

Comments

Mark Rutland June 28, 2018, 3:34 p.m. UTC | #1
On Thu, Jun 28, 2018 at 03:50:40PM +0100, Wei Xu wrote:
> Hi Will,
> 
> On 2018/6/27 14:28, Will Deacon wrote:
> > On Wed, Jun 27, 2018 at 02:22:03PM +0100, Wei Xu wrote:
> >> On 2018/6/26 18:47, Will Deacon wrote:
> >>> If you look at the __idmap_kpti_put_pgtable_ent_ng asm macro, can you try
> >>> replacing:
> >>>
> >>> 	dc      civac, cur_\()\type\()p
> >>>
> >>> with:
> >>>
> >>> 	dc      ivac, cur_\()\type\()p
> >>>
> >>> please? Only do this for the guest kernel, not the host. KVM will upgrade
> >>> the clean to a clean+invalidate, so it's interesting to see if this has
> >>> an effect on the behaviour.
> >>
> >> Only changed the guest kernel, the guest still failed to boot and the log
> >> is same with the last mail.
> >>
> >> But if I changed to cvac as below for the guest, it is kind of stable.
> >> 	dc      cvac, cur_\()\type\()p
> >>
> >> I have synced with our SoC guys about this and hope we can find the reason.
> >> Do you have any more suggestion?
> > 
> > Unfortunately, not. It looks like somehow clean+invalidate is behaving
> > just as an invalidate, and we're corrupting the page table as a result.
> > 
> > Hopefully the SoC guys will figure it out.
> 
> After replaced the dmb with dsb in both __idmap_kpti_get_pgtable_ent and
> __idmap_kpti_put_pgtable_ent_ng, we tested 20 times and we can not reproduce
> the issue.
> Today we will continue to do the stress testing and will update the result tomorrow.
> 
> The dsb in __idmap_kpti_get_pgtable_ent is to make sure the dc has been done and
> the following ldr can get the latest data.
> 
> The dsb in __idmap_kpti_put_pgtable_ent_ng is to make sure the str will be done
> before dc. Although dmb can guarantee the order of the str and dc on the L2 cache,
> dmb can not guarantee the order on the bus.

The architecture mandates that a DMB must provide this ordering, so that
would be an erratum.

Per ARM DDI 0487C.a, page D3-2069, "Ordering and completion of data and instruction
cache instructions":

  All data cache instructions, other than DC ZVA, that specify an
  address:

  * Can execute in any order relative to loads or stores that access any
    address with the Device memory attribute,or with Normal memory with
    Inner Non-cacheable attribute unless a DMB or DSB is executed
    between the instructions.

Note that we rely on this ordering in head.S when creating the page
tables and setting up the boot mode. We also rely on this for the pmem
API.

Thanks,
Mark.
Mark Rutland June 28, 2018, 4:24 p.m. UTC | #2
On Thu, Jun 28, 2018 at 04:08:24PM +0000, Wangxuefeng (E) wrote:
> Hi, mark
>      Your means is that DMB must  make sure the completion of prior load/store
> or CMO  and make sure the data is visible to all obsevers (no matter device or
> cacheable).   DMB not only keep order?

Not quite -- DMB does not guarantee completion.

However, DMB must guarantee that loads/stores and CMOs are ordered on
the bus, all the way to the PoC.

Thanks,
Mark.
Marc Zyngier June 29, 2018, 8:47 a.m. UTC | #3
On Thu, 28 Jun 2018 20:18:13 +0100,
Zhangxiquan <zhangxiquan@hisilicon.com> wrote:
> 
> [1  <text/plain; utf-8 (base64)>]
> Hi Mark ,
> 
> I clined to agree with you that DMB is enough for order of DC and LDST.
> 
> Just want to know , has this code ever passed on any ARMer
> implementation ?such ad A75,A72....

This code has been tested on most ARM implementations, as this is what
we used to develop it. So far, you have the only example we know of
where this sequence doesn't work as expected.

More importantly, this piece of code is written to match the ARMv8
architecture requirements, and we can only expect implementations to
follow the exact same requirements.

Thanks,

	M.
Mark Rutland June 29, 2018, 9:59 a.m. UTC | #4
On Thu, Jun 28, 2018 at 07:24:30PM +0000, Zhangxiquan wrote:
> Do you think this order guarantee (between DC and ldst)is applicable for
> cacheable only , or it is also applicable for device ?

This also applies for device memory.

As I quoted previously, from ARM DDI 0487C.a page D3-2069:

  All data cache instructions, other than DC ZVA , that specify an
  address:

  * Can execute in any order relative to loads or stores that access any
    address with the Device memory attribute, or with Normal memory with
    Inner Non-cacheable attribute unless a DMB or DSB is executed
    between the instructions.

i.e. a DMB is sufficient to provide order between DC and loads/stores
which access device memory.

Thanks,
Mark.
diff mbox

Patch

diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 03646e6..bb767ea 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -209,7 +209,7 @@  ENDPROC(idmap_cpu_replace_ttbr1)

        .macro  __idmap_kpti_get_pgtable_ent, type
        dc      cvac, cur_\()\type\()p          // Ensure any existing dirty
-       dmb     sy                              // lines are written back before
+       dsb     sy                              // lines are written back before
        ldr     \type, [cur_\()\type\()p]       // loading the entry
        tbz     \type, #0, skip_\()\type        // Skip invalid and
        tbnz    \type, #11, skip_\()\type       // non-global entries
@@ -218,8 +218,9 @@  ENDPROC(idmap_cpu_replace_ttbr1)
        .macro __idmap_kpti_put_pgtable_ent_ng, type
        orr     \type, \type, #PTE_NG           // Same bit for blocks and pages
        str     \type, [cur_\()\type\()p]       // Update the entry and ensure
-       dmb     sy                              // that it is visible to all
+       dsb     sy                              // that it is visible to all
        dc      civac, cur_\()\type\()p         // CPUs. 	


Best Regards,
Wei