diff mbox

KVM: arm/arm64: Revert to old way of checking for device mapping in stage2_flush_ptes().

Message ID 1448975032-7156-1-git-send-email-p.fedin@samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Pavel Fedin Dec. 1, 2015, 1:03 p.m. UTC
This function takes stage-II physical addresses (A.K.A. IPA), on input, not
real physical addresses. This causes kvm_is_device_pfn() to return wrong
values, depending on how much guest and host memory maps match. This
results in completely broken KVM on some boards. The problem has been
caught on Samsung proprietary hardware.

Cc: stable@vger.kernel.org
Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
---
 arch/arm/kvm/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Ard Biesheuvel Dec. 2, 2015, 5:41 p.m. UTC | #1
Hi Pavel,

Thanks for getting to the bottom of this.

On 1 December 2015 at 14:03, Pavel Fedin <p.fedin@samsung.com> wrote:
> This function takes stage-II physical addresses (A.K.A. IPA), on input, not
> real physical addresses. This causes kvm_is_device_pfn() to return wrong
> values, depending on how much guest and host memory maps match. This
> results in completely broken KVM on some boards. The problem has been
> caught on Samsung proprietary hardware.
>
> Cc: stable@vger.kernel.org
> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
>

That commit is not in a release yet, so no need for cc stable

> Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
> ---
>  arch/arm/kvm/mmu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7dace90..51ad98f 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>
>         pte = pte_offset_kernel(pmd, addr);
>         do {
> -               if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
> +               if (!pte_none(*pte) &&
> +                   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)

I think your analysis is correct, but does that not apply to both instances?
And instead of reverting, could we fix this properly instead?

>                         kvm_flush_dcache_pte(*pte);
>         } while (pte++, addr += PAGE_SIZE, addr != end);
>  }
> --
> 2.4.4
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoffer Dall Dec. 2, 2015, 6:50 p.m. UTC | #2
On Tue, Dec 01, 2015 at 04:03:52PM +0300, Pavel Fedin wrote:
> This function takes stage-II physical addresses (A.K.A. IPA), on input, not
> real physical addresses. This causes kvm_is_device_pfn() to return wrong
> values, depending on how much guest and host memory maps match. This
> results in completely broken KVM on some boards. The problem has been
> caught on Samsung proprietary hardware.
> 
> Cc: stable@vger.kernel.org

cc'ing stable doesn't make sense here as the bug was introduced in
v4.4-rc3 and we didn't release v4.4 yet...

> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
> 
> Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
> ---
>  arch/arm/kvm/mmu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 7dace90..51ad98f 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>  
>  	pte = pte_offset_kernel(pmd, addr);
>  	do {
> -		if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
> +		if (!pte_none(*pte) &&
> +		    (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>  			kvm_flush_dcache_pte(*pte);
>  	} while (pte++, addr += PAGE_SIZE, addr != end);
>  }

You are right that there was a bug in the fix, but your fix is not the
right one.

Either we have to apply an actual mask and the compare against the value
(yes, I know, because of the UXN bit we get lucky so far, but that's too
brittle), or we should do a translation fo the gfn to a pfn.  Is there
anything preventing us to do the following?

if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(*pte)))

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ard Biesheuvel Dec. 2, 2015, 7:04 p.m. UTC | #3
On 2 December 2015 at 19:50, Christoffer Dall
<christoffer.dall@linaro.org> wrote:
> On Tue, Dec 01, 2015 at 04:03:52PM +0300, Pavel Fedin wrote:
>> This function takes stage-II physical addresses (A.K.A. IPA), on input, not
>> real physical addresses. This causes kvm_is_device_pfn() to return wrong
>> values, depending on how much guest and host memory maps match. This
>> results in completely broken KVM on some boards. The problem has been
>> caught on Samsung proprietary hardware.
>>
>> Cc: stable@vger.kernel.org
>
> cc'ing stable doesn't make sense here as the bug was introduced in
> v4.4-rc3 and we didn't release v4.4 yet...
>
>> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
>>
>> Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
>> ---
>>  arch/arm/kvm/mmu.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 7dace90..51ad98f 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>>
>>       pte = pte_offset_kernel(pmd, addr);
>>       do {
>> -             if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
>> +             if (!pte_none(*pte) &&
>> +                 (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>>                       kvm_flush_dcache_pte(*pte);
>>       } while (pte++, addr += PAGE_SIZE, addr != end);
>>  }
>
> You are right that there was a bug in the fix, but your fix is not the
> right one.
>
> Either we have to apply an actual mask and the compare against the value
> (yes, I know, because of the UXN bit we get lucky so far, but that's too
> brittle), or we should do a translation fo the gfn to a pfn.  Is there
> anything preventing us to do the following?
>
> if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(*pte)))
>

Yes, that looks better. I got confused by addr being a 'phys_addr_t'
but obviously, the address inside the PTE is the one we need to test
for device-ness, so I think we should replace both instances with this
Christoffer Dall Dec. 2, 2015, 7:23 p.m. UTC | #4
On Wed, Dec 02, 2015 at 08:04:42PM +0100, Ard Biesheuvel wrote:
> On 2 December 2015 at 19:50, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Tue, Dec 01, 2015 at 04:03:52PM +0300, Pavel Fedin wrote:
> >> This function takes stage-II physical addresses (A.K.A. IPA), on input, not
> >> real physical addresses. This causes kvm_is_device_pfn() to return wrong
> >> values, depending on how much guest and host memory maps match. This
> >> results in completely broken KVM on some boards. The problem has been
> >> caught on Samsung proprietary hardware.
> >>
> >> Cc: stable@vger.kernel.org
> >
> > cc'ing stable doesn't make sense here as the bug was introduced in
> > v4.4-rc3 and we didn't release v4.4 yet...
> >
> >> Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
> >>
> >> Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
> >> ---
> >>  arch/arm/kvm/mmu.c | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 7dace90..51ad98f 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
> >>
> >>       pte = pte_offset_kernel(pmd, addr);
> >>       do {
> >> -             if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
> >> +             if (!pte_none(*pte) &&
> >> +                 (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> >>                       kvm_flush_dcache_pte(*pte);
> >>       } while (pte++, addr += PAGE_SIZE, addr != end);
> >>  }
> >
> > You are right that there was a bug in the fix, but your fix is not the
> > right one.
> >
> > Either we have to apply an actual mask and the compare against the value
> > (yes, I know, because of the UXN bit we get lucky so far, but that's too
> > brittle), or we should do a translation fo the gfn to a pfn.  Is there
> > anything preventing us to do the following?
> >
> > if (!pte_none(*pte) && !kvm_is_device_pfn(pte_pfn(*pte)))
> >
> 
> Yes, that looks better. I got confused by addr being a 'phys_addr_t'

Yeah, that's what I thought when I saw this.  Admittedly we could have a
typedef for the IPA, but oh well...

> but obviously, the address inside the PTE is the one we need to test
> for device-ness, so I think we should replace both instances with this
> 

care to send a patch by any chance?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Fedin Dec. 3, 2015, 7:14 a.m. UTC | #5
Hello!

> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index 7dace90..51ad98f 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
> >
> >         pte = pte_offset_kernel(pmd, addr);
> >         do {
> > -               if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
> > +               if (!pte_none(*pte) &&
> > +                   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
> 
> I think your analysis is correct, but does that not apply to both instances?

 No no, another one is correct, since it operates on real PFN (at least looks like so). I have verified my fix against the original problem (crash on Exynos5410 without generic timer), and it still works fine there.

> And instead of reverting, could we fix this properly instead?

 Of course, i'm not against alternate approaches, feel free to. I've just suggested what i could, to fix things quickly. I'm indeed no expert in KVM memory management yet. After all, this is what mailing lists are for.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ard Biesheuvel Dec. 3, 2015, 8:09 a.m. UTC | #6
On 3 December 2015 at 08:14, Pavel Fedin <p.fedin@samsung.com> wrote:
>  Hello!
>
>> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> > index 7dace90..51ad98f 100644
>> > --- a/arch/arm/kvm/mmu.c
>> > +++ b/arch/arm/kvm/mmu.c
>> > @@ -310,7 +310,8 @@ static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
>> >
>> >         pte = pte_offset_kernel(pmd, addr);
>> >         do {
>> > -               if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
>> > +               if (!pte_none(*pte) &&
>> > +                   (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
>>
>> I think your analysis is correct, but does that not apply to both instances?
>
>  No no, another one is correct, since it operates on real PFN (at least looks like so). I have verified my fix against the original problem (crash on Exynos5410 without generic timer), and it still works fine there.
>

I don't think so. Regardless of whether you are manipulating HYP
mappings or stage-2 mappings, the physical address is always the
output, not the input of the translation, so addr is always either a
virtual address or a intermediate physical address, whereas
pfn_valid() operates on host physical addresses.

>> And instead of reverting, could we fix this properly instead?
>
>  Of course, i'm not against alternate approaches, feel free to. I've just suggested what i could, to fix things quickly. I'm indeed no expert in KVM memory management yet. After all, this is what mailing lists are for.
>

OK. I will follow up with a patch, as Christoffer requested. I'd
appreciate it if you could test to see if it also fixes the current
issue, and the original arch timer issue.

Thanks,
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pavel Fedin Dec. 3, 2015, 8:14 a.m. UTC | #7
Hello!

> >> I think your analysis is correct, but does that not apply to both instances?
> >
> >  No no, another one is correct, since it operates on real PFN (at least looks like so). I
> have verified my fix against the original problem (crash on Exynos5410 without generic timer),
> and it still works fine there.
> >
> 
> I don't think so. Regardless of whether you are manipulating HYP
> mappings or stage-2 mappings, the physical address is always the
> output, not the input of the translation, so addr is always either a
> virtual address or a intermediate physical address, whereas
> pfn_valid() operates on host physical addresses.

 Yes, you are right. I have reviewed this more carefully, and indeed, unmap_range() is also called by unmap_stage2_range(), so it can be both IPA and real PA.

> OK. I will follow up with a patch, as Christoffer requested. I'd
> appreciate it if you could test to see if it also fixes the current
> issue, and the original arch timer issue.

 I have just made the same patch, and currently testing it on all my boards. Also i'll test it on my ARM64 too, just in case. I was about to finish the testing and send the patch in maybe one or two hours.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ben Hutchings Dec. 4, 2015, 1:58 a.m. UTC | #8
On Wed, 2015-12-02 at 18:41 +0100, Ard Biesheuvel wrote:
> Hi Pavel,
> 
> Thanks for getting to the bottom of this.
> 
> On 1 December 2015 at 14:03, Pavel Fedin <p.fedin@samsung.com> wrote:
> > This function takes stage-II physical addresses (A.K.A. IPA), on input, not
> > real physical addresses. This causes kvm_is_device_pfn() to return wrong
> > values, depending on how much guest and host memory maps match. This
> > results in completely broken KVM on some boards. The problem has been
> > caught on Samsung proprietary hardware.
> > 
> > Cc: stable@vger.kernel.org
> > Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
> > 
> 
> That commit is not in a release yet, so no need for cc stable
[...]

But it is cc'd to stable, so unless it is going to be nacked at review
stage, any subsequent fixes should also be cc'd.

Ben.
Pavel Fedin Dec. 4, 2015, 6:39 a.m. UTC | #9
Hello!

> > > Cc: stable@vger.kernel.org
> > > Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
> > >
> >
> > That commit is not in a release yet, so no need for cc stable
> [...]
> 
> But it is cc'd to stable, so unless it is going to be nacked at review
> stage, any subsequent fixes should also be cc'd.

 Sorry guys for messing things up a bit, but the affected commit actually is in stable branch (4.4-rc3), so i decided to Cc: stable, just in case, because the breakage is quite big IMHO.

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ard Biesheuvel Dec. 4, 2015, 8:37 a.m. UTC | #10
On 4 December 2015 at 02:58, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Wed, 2015-12-02 at 18:41 +0100, Ard Biesheuvel wrote:
>> Hi Pavel,
>>
>> Thanks for getting to the bottom of this.
>>
>> On 1 December 2015 at 14:03, Pavel Fedin <p.fedin@samsung.com> wrote:
>> > This function takes stage-II physical addresses (A.K.A. IPA), on input, not
>> > real physical addresses. This causes kvm_is_device_pfn() to return wrong
>> > values, depending on how much guest and host memory maps match. This
>> > results in completely broken KVM on some boards. The problem has been
>> > caught on Samsung proprietary hardware.
>> >
>> > Cc: stable@vger.kernel.org
>> > Fixes: e6fab5442345 ("ARM/arm64: KVM: test properly for a PTE's uncachedness")
>> >
>>
>> That commit is not in a release yet, so no need for cc stable
> [...]
>
> But it is cc'd to stable, so unless it is going to be nacked at review
> stage, any subsequent fixes should also be cc'd.
>

Ah yes, thanks for pointing that out.

But please, don't cc your proposed patches straight to
stable@vger.kernel.org. I usually leave it up to the maintainer that
merges the patch to add the Cc: line to the commit log.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 7dace90..51ad98f 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -310,7 +310,8 @@  static void stage2_flush_ptes(struct kvm *kvm, pmd_t *pmd,
 
 	pte = pte_offset_kernel(pmd, addr);
 	do {
-		if (!pte_none(*pte) && !kvm_is_device_pfn(__phys_to_pfn(addr)))
+		if (!pte_none(*pte) &&
+		    (pte_val(*pte) & PAGE_S2_DEVICE) != PAGE_S2_DEVICE)
 			kvm_flush_dcache_pte(*pte);
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 }